OZ Digital, LLC

  1. Home
  2. /
  3. Resources
  4. /
  5. Blog
  6. /
  7. How Small Language Models...

How Small Language Models Are Making a Big Impact

jason


Artificial Intelligence (AI) has seen remarkable growth in recent years, driving innovation and efficiency across nearly every industry. A key part of this revolution is the development of language models—AI systems designed to understand and generate human language.

Language models, which power applications like chatbots, virtual assistants, and automated content creation, have typically been large, complex systems. Although these large models have immense capabilities, they cost a lot in terms of computing power, time, and resources. For many companies, particularly small and medium-sized businesses, they can be difficult to implement and maintain. A challenge that’s led to the rise of smaller, more efficient language models that perform many of the same tasks but require far less computational power. Plus, they can be deployed more quickly—at a fraction of the cost, making AI solutions accessible to more organizations. Small language models (SLMs) are also more scalable, which means businesses can adopt AI in increments rather than investing heavily upfront.

Bigger Isn’t Always Better

Due to their novelty (and ChatGPT’s popularity), large language models (LLMs) became instantly popular, aside from their ability to process and generate coherent, contextually relevant text with human-like fluency.

LLMs are trained on massive amounts of data—billions and billions of parameters—that help them understand patterns and semantic relationships and generate intelligent responses.

However, there are drawbacks. For one, large language models require massive computational resources. The expenses required to train and operate them are substantial. The hardware infrastructure needed to train these models includes high-performance GPUs and extensive data storage. The training itself can take weeks or even months and the demand on resources limits who can develop and use such models.

On the other hand, smaller language models drastically cut down costs of training LLMs and running inferences. But won’t this stand in the way of the model’s performance—after all, you get what you pay for, right? Maybe not, as Ronen Eldan, the renowned mathematician at Microsoft Research, discovered in a somewhat serendipitous way.

In 2022, Ronen joined Microsoft Research to study generative language models and develop a cheaper and faster way to use them. Initially, he planned to train the models on specific tasks using smaller datasets, but the breakthrough came one afternoon while reading a story to his 5-year-old daughter.

As he read aloud, he noticed how a children’s story uses a limited vocabulary to convey meaning, leaving out irrelevant details. Despite its simplicity, it conveyed complex ideas and kept his daughter engaged. It made him wonder: if a simple, focused narrative could captivate and communicate effectively, why couldn’t a language model trained on a smaller, more targeted dataset perform as well? Until then, the standard way to train large language models had been to use massive amounts of data from all over the internet, much of it irrelevant to the business task at hand.

Inspired by this insight, Eldan and other Microsoft researchers began to create a discrete dataset starting with 3,000 words—including a roughly equal number of nouns, verbs, and adjectives. Then, they asked a large language model to create a children’s story using one noun, one verb, and one adjective from the list, a prompt they repeated millions of times over several days, generating millions of tiny children’s stories.

They called the resulting dataset “TinyStories” and used it to train very small language models of around 10 million parameters. To their surprise, when prompted to create its own stories, the small language model trained on TinyStories generated fluent narratives with perfect grammar.

Next, they experimented with carefully selected publicly available data filtered for educational value and content quality to train Microsoft’s small language model, Phi-1. After collecting publicly available information into an initial dataset, they used a prompting and seeding formula inspired by the one used for TinyStories. They made this process more sophisticated to capture more data. To ensure high quality, they repeatedly filtered the resulting content before feeding it back into a language model for further synthesis. In this way, over several weeks, they built up data large enough to train a more capable SLM.

This innovative training approach has produced a new class of more capable small language models that make AI more accessible. These models are anywhere between 1 million to a maximum of 33 million parameters—which is still just 2% the scale of GPT-2.

No doubt, large language models have created exciting new opportunities to be more productive and creative using AI.  But their size means they require significant computing resources to operate. While LLMs will still be the gold standard for solving many types of complex tasks, SLMs perform better for simpler tasks, are more accessible and easier to use for organizations with limited resources. Plus, they can be easily fine-tuned for specific needs.

Small But Mighty: The Rise of Small Language Models

Microsoft and many others, have been developing small language models (SLMs) that offer many of the same capabilities found in LLMs but are trained on smaller datasets.

Small language models, the pocket-sized versions of large language models, use machine learning to help them recognize patterns and relationships so they can make realistic, natural language responses. But, while LLMs are enormous and need a hefty dose of computational power and memory, SLMs such as Microsoft Phi-3 are trained on smaller, curated datasets with fewer parameters. They are more compact and can even be used offline, without an internet connection. That makes them great for apps on devices like a laptop or smartphone, where you can ask basic questions without getting in the weeds. Keeping the computation on the device can save costs by not sending data to process in the cloud. Further, you can ground the model in the data that’s on your device and personalize it to your needs.

Phi-3, Microsoft’s Small Language Model Makes Its Debut

Microsoft recently unveiled the Phi-3 family of open models, the most capable and cost-effective small language models available. Thanks to training innovations, Phi-3 models outperform similar- sized models in language, coding and math benchmarks. The first publicly available model in this family, Phi-3-mini, has 3.8 billion parameters, and despite its relatively small training dataset of 3.3 million tokens, it’s powerful.

Why You Should Care

For one, the reduced size of this language model makes it suitable to run locally, such as an app on a smartphone. Unlike larger models like ChatGPT that live in the cloud, Phi-3 does not require an internet connection.

Microsoft has confirmed that these SLMs will be more affordable than the LLMs, which could democratize generative AI, making it accessible to small businesses.

For instance, a business could use Phi-3 to summarize long documents or extract relevant insights and industry trends from market research reports. Another organization might use Phi-3 to generate content for marketing or sales teams, such as product descriptions or social media posts. A company might also use Phi-3 to power a support chatbot to answer customers’ basic questions about their plans or service upgrades.

In the near term, SLMs won’t replace large language models,” says Jason Milgram, SVP and Azure Leader at OZ Digital Consulting. “While there’s still a gap between small language models and the intelligence of large cloud-based models, SLMs are well-suited for tasks like writing marketing newsletters, generating email subject lines, or drafting social media posts that don’t require the power of an LLM,” he adds.

Large or Small Language Models—What’s Right for You?

The number of parameters in a model usually dictates its size and complexity. Larger models with more parameters are generally more capable but require more computational resources. The choice of size often depends on the specific problem being addressed.

When choosing a language model, consider the tasks you want to accomplish. For example, if your primary goal is sentiment analysis, providing answers to questions or text summarization—all of which require a deep understanding of natural language—then a large language model is right for you. In contrast, for tasks such as text classification or simple language generation, a small language model might be a better choice.

Data will also influence your choice. Large language designs require vast amounts of training data to achieve high quality. If you have limited data, a small language model might be preferable.

The choice between a large and small language model depends on an organization’s specific needs, the complexity of the task, and available resources. SLMs are well suited for organizations looking to build applications that can run locally on a device and where a task doesn’t require extensive reasoning or a quick response.

Partner with a Trusted Microsoft Partner

As a certified Microsoft partner with over a quarter century of experience in delivering transformative technology solutions, we can guide you wherever you are in your AI journey. Learn more here or reach out to schedule a free consultation today.