An In-depth Guide to Meta LLaMa Language Model & LlaMa 2

TheCryptocurrencyPost

2 years ago

5jpZ3DRkEJJJ3k0stXvFXDuD9IO4gNLpL OOcqKY XlW6HwGliu PSZbq3feEbg2LhnLdOpI3fuu ROBxCPm6vZ1hQxu7KuBZXQ5hqPvcbF9UP4W8XK4

As AI advances, the access of the research community to generative AI powered tools such as language models is important for making innovations. However, today’s AI models often reside behind proprietary walls, hindering innovation. Meta’s release of LLaMA 2 is set to democratize this space, empowering researchers and commercial users worldwide to explore and push the boundaries of what AI can achieve.

In this article, we explain the Meta LLaMa model and its latest version LLaMa 2.

What is LLaMa?

In February 2023, Meta announced LLaMA, which stands for Large Language Model Meta Artificial Intelligence. This large language model (LLM) has been trained on various model sizes, ranging from 7 billion to 65 billion parameters. The LLaMa models change due to parameter sizes¹:

7B parameters (trained on 1 trillion tokens)
13B parameters
33B parameters (trained on 1.4 trillion tokens)
65B parameters (trained on 1.4 trillion tokens)

Meta AI states that LLaMa is a smaller language model which can be more suitable for retraining and fine tuning. This is a benefit because fine tuned models are more suitable for profit entities and specific usages.

For fine tuning of LLMs for enterprise purposes, take a look at our guide.

Unlike many powerful large language models that are typically only available via restricted APIs, Meta AI has chosen to make LLaMA’s model weights accessible to the researching AI community under a noncommercial license. The access was initially provided selectively to academic researchers, individuals linked with government institutions, civil society organizations, and academic institutions worldwide.

How was LLaMa trained?

Similar to other large language models, LLaMA operates by receiving a string of words as input and anticipating the next word to iteratively produce text.

The training of this language model prioritized text from the top 20 languages with the highest number of speakers, particularly those using the Latin and Cyrillic scripts.

The training data of LLaMa is mostly from large public websites and forums such as²:

Webpages scraped by CommonCrawl
Open source repositories of source code from GitHub
Wikipedia in 20 different languages
Public domain books from Project Gutenberg
The LaTeX source code for scientific papers uploaded to ArXiv
Questions and answers from Stack Exchange websites

How does LLaMa perform compared to other large language models?

According to the creators of LLaMA, the model with 13 billion parameters outperforms GPT-3 (which has 175 billion parameters) on most Natural Language Processing (NLP) benchmarks.³ Furthermore, their largest model competes effectively with top-tier models like PaLM and Chinchilla.

Figure 1. LLaMa vs other LLMs on a reasoning task (Source: LLaMa research paper)

Truthfulness & bias

LLaMa performs better than GPT-3 in the truthfulness test used in both LLMs performance measurement. However, as the results show, LLMs still need improvement in terms of truthfulness.

Figure 2. LLaMa vs GPT-3 on a truthfulness test (Source: LLaMa research paper)

LLaMa with 65B parameters produces less biased prompts compared to other big LLMs like GPT3.

Figure 3. LLaMa vs GPT-3 and OPT on response bias (Source: LLaMa research paper)

What is LLaMa 2?

On 18th of July 2023, Meta and Microsoft jointly announced their support for the LLaMa 2 family of large language models on the Azure and Windows platforms.⁴ Both Meta and Microsoft are united in their commitment to democratizing AI and making AI models widely accessible, and Meta is adopting an open stance with LlaMa 2. For the first time, the model is opened for research and commercial use.

The design of LLaMa 2 is meant to help developers and organizations in creating generative AI tools and experiences. They give developers the freedom to choose the kinds of models they want to develop, endorsing both open and frontier models.

Who can use LLaMa 2?

Customers of Microsoft’s Azure platform can fine-tune and use the 7B, 13B, and 70B-parameter LLaMa 2 models.
Also, it is accessible through Amazon Web Services, Hugging Face, and other providers.⁵
LLaMa will be designed to operate efficiently on a local Windows environment. Developers working with Windows can utilize LlaMa by directing it to the DirectML execution provider via the ONNX Runtime.

If you have questions or need help in finding vendors, don’t hesitate to contact us:

Find the Right Vendors

“Introducing LLaMA: A foundational, 65-billion-parameter language model.” Meta AI, 24 February 2023, https://ai.facebook.com/blog/large-language-model-llama-meta-ai/. Accessed 24 July 2023.
“LLaMA.” Wikipedia, https://en.wikipedia.org/wiki/LLaMA. Accessed 24 July 2023.
“LLaMA: Open and Efficient Foundation Language Models.” arXiv, 13 June 2023, https://arxiv.org/pdf/2302.13971.pdf. Accessed 24 July 2023.
“Microsoft and Meta expand their AI partnership with LLama 2 on Azure and Windows – The Official Microsoft Blog.” The Official Microsoft Blog, 18 July 2023, https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-meta-expand-their-ai-partnership-with-llama-2-on-azure-and-windows/. Accessed 24 July 2023.
“Meta and Microsoft Introduce the Next Generation of Llama.” Meta AI, 18 July 2023, https://ai.meta.com/blog/llama-2/. Accessed 24 July 2023.

Share on LinkedIn

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem’s work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem’s work in Hypatos was covered by leading technology publications like TechCrunch like Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Source link