AI

The Evolution of Chinese Large Language Models (LLMs)

4 Mins read

Pre-trained language model development has advanced significantly in recent years, especially with the advent of large-scale models. For languages such as English, there is no shortage of open-source chat models. However, the Chinese language has not seen equivalent progress. To bridge this gap, several Chinese models have been introduced, showcasing innovative approaches and achieving remarkable results. Some of the most prominent Chinese Large Language Models (LLMs) have been discussed in this article. 

Yi 

    The Yi model family is well known for its multidimensional capabilities, from basic language models to multimodal applications. The Yi models, which have 34B and 6B parameter versions, perform well on benchmarks such as MMLU. The vision-language models in this family combine semantic language spaces with visual representations using creative data engineering and scalable supercomputer infrastructure. Pre-training the models on a massive 3.1 trillion token corpus guarantees reliable results and strong performance on a range of tasks.

    HF Page: https://huggingface.co/01-ai

    GitHub Page: https://github.com/01-ai/Yi

    QWEN

      Together with base pre-trained models and refined conversation models, QWEN is a comprehensive collection of language models. The QWEN series performs exceptionally well in a variety of downstream tasks. The use of Reinforcement Learning from Human Feedback (RLHF) in the chat models makes them stand out in particular. These models are competitive even against larger models since they exhibit sophisticated tool use and planning skills. The series’ versatility has been demonstrated by special variations like CODE-QWEN and MATH-QWEN-CHAT, which excel at coding and mathematics-focused jobs.

      HF Page: https://huggingface.co/Qwen/Qwen-14B

      GitHub Page: https://github.com/QwenLM/Qwen

      DeepSeek-V2

        DeepSeek-V2 is a mixture-of-experts (MoE) model that balances potent performance and cost-effective operation. With a context length of 128K tokens, DeepSeek-V2 allows 236B parameters, of which only 21B are enabled per token. Through the use of DeepSeekMoE and Multi-head Latent Attention (MLA) architectures, the model achieves notable increases in efficiency, cutting training costs by 42.5% and increasing throughput.

        GitHub Page: https://github.com/deepseek-ai/DeepSeek-V2

        WizardLM

          WizardLM uses LLMs rather than manual human input to overcome the difficulty of creating high-complexity instruction data. The model iteratively rewrites instructions to increase complexity using a unique technique called Evol-Instruct. When LLaMA is fine-tuned using this AI-generated data, WizardLM is produced, which performs better than human-created instructions in assessments conducted by humans. Additionally, the model is favorably compared to OpenAI’s ChatGPT.

          GitHub Page: https://github.com/nlpxucan/WizardLM

          GLM-130B

            With 130 billion parameters, the multilingual (English and Chinese) GLM-130B model competes with the GPT-3 (Davinci) model in terms of performance. GLM-130B beats ERNIE TITAN 3.0 on Chinese benchmarks and excels several key models on English benchmarks, overcoming various technological obstacles during training. Due to its special scaling property, which enables INT4 quantization without causing performance loss after training, it is a highly effective option for large-scale model deployment.

            GitHub Page: https://github.com/THUDM/GLM-130B

            CogVLM

              CogVLM is a sophisticated visual language model whose architecture thoroughly incorporates vision-language elements. CogVLM uses a trainable visual expert module, in contrast to shallow alignment techniques, and achieves state-of-the-art performance across several cross-modal benchmarks. The model’s great performance and versatility are demonstrated by the variety of applications it supports, including visual grounding and image captioning.

              HF Page: https://huggingface.co/THUDM/CogVLM

              GitHub Page: https://github.com/THUDM/CogVLM

              Baichuan-7B

                With 4-bit weights and 16-bit activations, the Baichuan-7B models optimize for on-device deployment and reach state-of-the-art performance on Chinese and English benchmarks. Baichuan-7B’s quantization renders it appropriate for a multitude of uses, guaranteeing effective and efficient operation in practical situations.

                HF Page: https://huggingface.co/baichuan-inc/Baichuan-7B

                InternLM

                  Chinese, English, and coding problems are areas in which InternLM, a 100B multilingual model trained on over a trillion tokens, excels. Improved with superior human-annotated dialogue data and RLHF technology, InternLM produces responses consistent with morality and human values, giving it a strong option for intricate exchanges.

                  HF Page: https://huggingface.co/internlm

                  GitHub Page: https://github.com/InternLM/InternLM

                  Skywork-13B

                    With 3.2 trillion tokens under its belt, Skywork-13B is among the most extensively trained bilingual models. It performs well on tasks that are both general-purpose and domain-specific, with the help of a two-stage training technique. In addition, the approach addresses data contamination concerns and presents a unique leakage detection technique with the goal of democratizing access to high-quality LLMs.

                    GitHub Page: https://github.com/SkyworkAI/Skywork

                    ChatTTS

                      A generative text-to-speech model with support for both Chinese and English dialogue scenarios is ChatTTS. ChatTTS provides extremely accurate and natural-sounding speech output, having been trained on more than 100,000 hours of speech data. 

                      GitHub Page: https://github.com/cronrpc/ChatTTS-webui

                      Hunyuan-DiT

                        Hunyuan-DiT is a text-to-image diffusion transformer that performs exceptionally well in terms of fine-grained comprehension of Chinese and English. The architecture of the model is meticulously crafted to maximize performance, encompassing its positional encoding, text encoder, and transformer structure. Hunyuan-DiT benefits from an extensive data pipeline that facilitates iterative model optimization by means of ongoing assessments and modifications. Picture captions are refined using a Multimodal Large Language Model to improve language comprehension, which allows Hunyuan-DiT to participate in multi-turn multimodal conversations. Several human evaluations have confirmed that this model represents a new state-of-the-art in Chinese-to-image generation. 

                        ERNIE 3.0 

                          ERNIE 3.0 addresses the limitations of conventional pre-trained models that only use plain text without incorporating further knowledge. The model performs well in tasks involving both natural language creation and processing because of its combined architecture of auto-regressive and auto-encoding networks. After being trained on a 4TB plaintext corpus and a large-scale knowledge graph, the 10-billion parameter model beats the most advanced models on 54 Chinese natural language processing tasks. On the SuperGLUE benchmark, its English translation has attained optimal performance, even outperforming human performance.

                          HF Page: https://huggingface.co/nghuyong/ernie-3.0-base-zh

                          AND MANY MORE……….


                          Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
                          She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.



Source link

Related posts
AI

PRISE: A Unique Machine Learning Method for Learning Multitask Temporal Action Abstractions Using Natural Language Processing (NLP)

2 Mins read
In the domain of sequential decision-making, especially in robotics, agents often deal with continuous action spaces and high-dimensional observations. These difficulties result…
AI

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference

3 Mins read
Large Language Models (LLMs) face deployment challenges due to latency issues caused by memory bandwidth constraints. Researchers use weight-only quantization to address…
AI

Self-Route: A Simple Yet Effective AI Method that Routes Queries to RAG or Long Context LC based on Model Self-Reflection

3 Mins read
Large Language Models (LLMs) have revolutionized the field of natural language processing, allowing machines to understand and generate human language. These models,…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *