Meta's LLaMA: The Smaller, Powerful Large Language Model
tl;dr; if you decide not to read the paper
Meta AI’s LLaMA Research Paper: LLaMA: Open and Efficient Foundation Language Models
LLMs and the GPT platform have garnered significant attention in the world of AI, with many exploring its potential to revolutionize our world.
Even as someone who has worked in machine learning for a while, I found it took substantial time and effort to understand the papers that describe how various LLM technologies work. For those looking to learn more, I highly recommend the Transformer Architecture blog for an excellent introduction to the topic. Additionally, blogs from Google ML on neural networks, embeddings, and collaborative filtering provide practical, hands-on examples. Lastly, I recommend reading two well-written foundational papers: "Attention Is All You Need" and "Training Compute-Optimal Large Language Models." These papers delve into the technical details of LLM technologies and provide valuable insights for anyone looking to understand the field.
So, what is this LLaMA up to?
Inference performance >> Training performance: LLaMA is built on the principle that the best inference performance is not necessarily achieved by the largest models, but by smaller models trained on more data. LLaMA is training smaller models on more tokens (which are pieces of words), allowing for easier fine-tuning and adaptability for specific use cases. Meta has trained LLaMA 65B and LLaMA 33B on 1.4 trillion tokens, while their smallest model, LLaMA 7B, is trained on one trillion tokens.
Smaller yet powerful: The resulting LLaMA models range from 7B to 65B parameters and deliver competitive performance compared to the best existing LLMs. Notably, LLaMA-13B outperforms GPT-3 while being more than 10× smaller. LLaMA-65B is also competitive with Chinchilla-70B and PaLM-540B. These models were evaluated on various problem types, including Common Sense Reasoning, Closed-book Question Answering, Reading Comprehension, Mathematical Reasoning, and Code Generation.
Trained on data that is publicly available, and compatible with open sourcing: Unlike previous studies that relied on proprietary datasets, LLaMA is trained on publicly available data, demonstrating that state-of-the-art performance can be achieved without such data. See below for details on the datasets used by LLaMA.
LLaMA can be run on a single GPU: Thereby making it a more accessible and efficient option for testing new approaches, validating others' work, and exploring new use cases. This is due to the fact that LLaMA requires far less computing power and resources compared to larger language models. By leveraging LLaMA's capabilities, researchers and developers can save time and resources while still achieving high-quality results.
Available to the research community: Meta AI is making LLaMA available in several sizes (7B, 13B, 33B, and 65B parameters), which will help democratize the access and study of LLMs since it can be run on a single GPU. This is in contrast to OpenAI's recent plans to not open source GPT4.5. I believe that releasing these models to the research community will only accelerate the development of large language models, and help address issues such as toxicity and bias.