These companies include ecommerce firm Flipkart, which has developed an SLM to help it understand the interests of shoppers. Another model was developed jointly by Hyderabad-based non-profit Swecha (formerly Free Software Foundation Andhra Pradesh) and call centre software provider Ozonetel.
There are several large language models in Indian languages. Flipkart, after working on innovations with generative AI tools such as ChatGPT, a large language model, and Stable Diffusion, has lately started experimenting with small language models.
“We are using some open-source models and finetuning them for our use cases,” said Mayur Datar, Flipkart’s chief data scientist.
The online retail platform is training the model on curated datasets such as search queries by customers, or reviews posted for different products to understand the intent of a user, whether it is for shopping or for comparison of products.
Language models are also useful in summarisation of customer reviews in multiple languages which are otherwise difficult to comprehend manually.
Discover the stories of your interest
Flipkart’s SLMs are finetuned using open-source foundational language models such as Meta’s LlaMa2, and vary from 3 billion to 13 billion parameters in size.Using finetuned SLMs brings useful trade-offs in terms of enhancing the specificity of the model and other characteristics like cost and latency, Datar said.
“There are very large language models like ChatGPT, Gemini, etc., with hundreds of billions of parameters and are trained to answer pretty much any query under the sun. However, this comes at the cost of high compute requirements, which translates into higher costs, high latency, etc. But sometimes what you need for your specific task can be accomplished by a much smaller model built using a smaller curated data, and finetuned accordingly,” he added.
For Chaitanya Chokkareddy, chief technology officer of Ozonetel, the idea about creating a Telugu SLM came after he chanced upon a paper – authored by Microsoft research scientists and titled ‘TinyStories: How small can language models be and still speak coherent English’. It introduced a synthetic dataset of short stories that contained words that typical 3- to 4-year-olds understand, generated by GPT-3.5 and GPT-4.
He collaborated with Swecha Telangana and Indian Institute of Information Technology, Hyderabad, to compile a dataset of Telugu stories, to build an SLM with 40,000 pages of stories authored by 8,000 students from 20 colleges, who participated in a ‘datathon’ led by the two firms.
SLMs are built using the same methodology as any larger model, but on a smaller neural network, fewer parameters and less training data.
Some of the popular SLMs currently include Meta’s LlaMa 2 with 7 billion parameters, Microsoft’s Phi-2 with 2.7 billion parameters and Orca 2 with 13 billion parameters as well as Stability.ai’s Stable Beluga with 7 billion parameters.
Like Flipkart’s SLM, Ozonetel’s model is also retrained and finetuned using Meta’s LlaMa 2.
Ozonetel’s Chokkareddy said they will also be finetuning the open source Mistral model released by Paris-based startup Mistral to compare the benchmarks.
Experts believe SLMs have the potential of driving down data storage requirements and costs of adopting genAI technologies significantly.
“SLMs offer greater personalisation, are easier to deploy, can be trained for domain specific use-cases and can better manage data-security,” said Krish Ramineni, CEO at Fireflies.ai.
“Eventually the quality of models that can be hosted on your phone or browser will be of similar quality to the best large models today, which will be a big boost for security and privacy. This will open up lots of new business applications, especially for organisations that are not comfortable with sending their private and sensitive data to other vendors,” he added.
Denial of responsibility! Planetconcerns is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.