The Evolution of Chinese Large Language Models (LLMs)

Pre-trained language model development has advanced significantly in recent years, especially with the advent of large-scale models. For languages such as English, there is no shortage of open-source chat models. However, the Chinese language has not seen equivalent progress. To bridge this gap, several Chinese models have been introduced, showcasing innovative approaches and achieving remarkable results. Some of the most prominent Chinese Large Language Models (LLMs) have been discussed in this article.

`Yi`

The Yi model family is well known for its multidimensional capabilities, from basic language models to multimodal applications. The Yi models, which have 34B and 6B parameter versions, perform well on benchmarks such as MMLU. The vision-language models in this family combine semantic language spaces with visual representations using creative data engineering and scalable supercomputer infrastructure. Pre-training the models on a massive 3.1 trillion token corpus guarantees reliable results and strong performance on a range of tasks.

HF Page: https://huggingface.co/01-ai

GitHub Page: https://github.com/01-ai/Yi

QWEN

Together with base pre-trained models and refined conversation models, QWEN is a comprehensive collection of language models. The QWEN series performs exceptionally well in a variety of downstream tasks. The use of Reinforcement Learning from Human Feedback (RLHF) in the chat models makes them stand out in particular. These models are competitive even against larger models since they exhibit sophisticated tool use and planning skills. The series’ versatility has been demonstrated by special variations like CODE-QWEN and MATH-QWEN-CHAT, which excel at coding and mathematics-focused jobs.

HF Page: https://huggingface.co/Qwen/Qwen-14B

GitHub Page: https://github.com/QwenLM/Qwen

DeepSeek-V2

DeepSeek-V2 is a mixture-of-experts (MoE) model that balances potent performance and cost-effective operation. With a context length of 128K tokens, DeepSeek-V2 allows 236B parameters, of which only 21B are enabled per token. Through the use of DeepSeekMoE and Multi-head Latent Attention (MLA) architectures, the model achieves notable increases in efficiency, cutting training costs by 42.5% and increasing throughput.

GitHub Page: https://github.com/deepseek-ai/DeepSeek-V2

WizardLM

WizardLM uses LLMs rather than manual human input to overcome the difficulty of creating high-complexity instruction data. The model iteratively rewrites instructions to increase complexity using a unique technique called Evol-Instruct. When LLaMA is fine-tuned using this AI-generated data, WizardLM is produced, which performs better than human-created instructions in assessments conducted by humans. Additionally, the model is favorably compared to OpenAI’s ChatGPT.

GitHub Page: https://github.com/nlpxucan/WizardLM

GLM-130B

With 130 billion parameters, the multilingual (English and Chinese) GLM-130B model competes with the GPT-3 (Davinci) model in terms of performance. GLM-130B beats ERNIE TITAN 3.0 on Chinese benchmarks and excels several key models on English benchmarks, overcoming various technological obstacles during training. Due to its special scaling property, which enables INT4 quantization without causing performance loss after training, it is a highly effective option for large-scale model deployment.

GitHub Page: https://github.com/THUDM/GLM-130B

CogVLM

CogVLM is a sophisticated visual language model whose architecture thoroughly incorporates vision-language elements. CogVLM uses a trainable visual expert module, in contrast to shallow alignment techniques, and achieves state-of-the-art performance across several cross-modal benchmarks. The model’s great performance and versatility are demonstrated by the variety of applications it supports, including visual grounding and image captioning.

HF Page: https://huggingface.co/THUDM/CogVLM

GitHub Page: https://github.com/THUDM/CogVLM

Baichuan-7B

With 4-bit weights and 16-bit activations, the Baichuan-7B models optimize for on-device deployment and reach state-of-the-art performance on Chinese and English benchmarks. Baichuan-7B’s quantization renders it appropriate for a multitude of uses, guaranteeing effective and efficient operation in practical situations.

HF Page: https://huggingface.co/baichuan-inc/Baichuan-7B

InternLM

Chinese, English, and coding problems are areas in which InternLM, a 100B multilingual model trained on over a trillion tokens, excels. Improved with superior human-annotated dialogue data and RLHF technology, InternLM produces responses consistent with morality and human values, giving it a strong option for intricate exchanges.

HF Page: https://huggingface.co/internlm

GitHub Page: https://github.com/InternLM/InternLM

Skywork-13B

With 3.2 trillion tokens under its belt, Skywork-13B is among the most extensively trained bilingual models. It performs well on tasks that are both general-purpose and domain-specific, with the help of a two-stage training technique. In addition, the approach addresses data contamination concerns and presents a unique leakage detection technique with the goal of democratizing access to high-quality LLMs.

GitHub Page: https://github.com/SkyworkAI/Skywork

ChatTTS

A generative text-to-speech model with support for both Chinese and English dialogue scenarios is ChatTTS. ChatTTS provides extremely accurate and natural-sounding speech output, having been trained on more than 100,000 hours of speech data.

GitHub Page: https://github.com/cronrpc/ChatTTS-webui

Hunyuan-DiT

Hunyuan-DiT is a text-to-image diffusion transformer that performs exceptionally well in terms of fine-grained comprehension of Chinese and English. The architecture of the model is meticulously crafted to maximize performance, encompassing its positional encoding, text encoder, and transformer structure. Hunyuan-DiT benefits from an extensive data pipeline that facilitates iterative model optimization by means of ongoing assessments and modifications. Picture captions are refined using a Multimodal Large Language Model to improve language comprehension, which allows Hunyuan-DiT to participate in multi-turn multimodal conversations. Several human evaluations have confirmed that this model represents a new state-of-the-art in Chinese-to-image generation.

ERNIE 3.0

ERNIE 3.0 addresses the limitations of conventional pre-trained models that only use plain text without incorporating further knowledge. The model performs well in tasks involving both natural language creation and processing because of its combined architecture of auto-regressive and auto-encoding networks. After being trained on a 4TB plaintext corpus and a large-scale knowledge graph, the 10-billion parameter model beats the most advanced models on 54 Chinese natural language processing tasks. On the SuperGLUE benchmark, its English translation has attained optimal performance, even outperforming human performance.

HF Page: https://huggingface.co/nghuyong/ernie-3.0-base-zh

AND MANY MORE……….

Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.

🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and many others…

Source link

The Evolution of Chinese Large Language Models (LLMs)

`Yi`

QWEN

DeepSeek-V2

WizardLM

GLM-130B

CogVLM

Baichuan-7B

InternLM

Skywork-13B

ChatTTS

Hunyuan-DiT

ERNIE 3.0

Leave a Reply Cancel reply

About

Categories

The Evolution of Chinese Large Language Models (LLMs)

Yi

QWEN

DeepSeek-V2

WizardLM

GLM-130B

CogVLM

Baichuan-7B

InternLM

Skywork-13B

ChatTTS

Hunyuan-DiT

ERNIE 3.0

Related posts

PRISE: A Unique Machine Learning Method for Learning Multitask Temporal Action Abstractions Using Natural Language Processing (NLP)

FLUTE: A CUDA Kernel Designed for Fused Quantized Matrix Multiplications to Accelerate LLM Inference

Self-Route: A Simple Yet Effective AI Method that Routes Queries to RAG or Long Context LC based on Model Self-Reflection

Leave a Reply Cancel reply

About

Categories

`Yi`