Highlights:
- Microsoft described in a research study how it used two post-training optimization strategies to improve the output quality of Phi-4.
- Phi-4 is the latest in a long line of small language models that have been made publicly available by large tech companies in the previous year.
Microsoft Corp. launched the code for Phi-4 small language model to solve mathematical problems and generate text content.
The model was initially described by the business last month. At first, Phi-4 could only be accessed via Microsoft’s artificial intelligence development service, Azure Foundry. Hugging Face, a well-known platform for hosting open-source AI projects, is now offering the model for download.
In 2023, Microsoft unveiled Phi-4, the fourth version of a small language model series. It has 14 billion parameters, which are the configuration settings that dictate how a neural network processes information. Over the course of 21 days, Microsoft researchers trained it on a cluster of 1,920 H100 graphics processor units from Nvidia Corp.
The Transformer architecture, which supports the majority of large language models, is the industry standard upon which the model is built. Transformer models interpret user prompts by decomposing the input into individual words and interpreting each word’s meaning by examining the surrounding text. Additionally, they give priority to the sections of the surrounding text that are thought to be the most pertinent.
A so-called decoder-only version of the Transformer architecture is implemented by Phi-4. To ascertain the meaning of a word, a typical Transformer model examines the text both before and after the word. Because decoder-only models solely consider the text that comes before the word, they handle less data, which lowers the cost of inference.
Microsoft described in a research study how it used two post-training optimization strategies to improve the output quality of Phi-4. These techniques are referred to as supervised fine-tuning and direct preference optimization. Both entail providing a language model together with illustrations of how it ought to produce timely answers.
In an internal assessment, Microsoft contrasted Phi-4 with an LLM with five times as many parameters, Llama 3.3 70B. According to the business, Phi-4 performed higher on the well-known GPQA and MATH benchmarks. There are math problems and scientific questions in the two test datasets, respectively.
Phi-4 is the latest in a long line of small language models that have been made publicly available by large tech companies in the previous year.
Google LLC unveiled the Gemma line of small language models in February of last year. Between two billion and 27 billion parameters make up the algorithms in the series. Google claims that the 27 billion-parameter version can outpace other models that are more than twice as large.
Two Llama 3.2 models with less than five billion parameters were recently released by Meta Platforms Inc. Following the release, the company made even more effective versions of those models—which use a machine learning technique called quantification—open-source. The method lowers the amount of hardware required to process the data by compressing the data that a neural network consumes.