Refuel AI, a company using large language models (LLMs) to generate top-notch training data for AI models, just came out of stealth with $5.2 million in seed funding. They said they’ll use the money to expand their team and beef up their platform’s capabilities, getting it ready for launch in July.
Founded by Stanford grads Nihit Desai and Rishabh Bhargava, Refuel has also opened access to AutoLabel, an open-source library that makes it easy for any AI team to label their data in their own environment and with any LLM they want.
The offerings come as an answer to the data challenges that slow down AI development, keeping enterprises from embedding the next-gen technology into their products and business functions.
Every AI company needs AI-ready data
Every company’s in a race to become an AI company, bringing together in-house experts and third-party vendors to create models for different biz use cases. It’s a tough task, but it all starts with clean and labeled data. Do it right and your project’ll come to life!
Now, while companies have a lot of data at their disposal, not all of it is training-ready by default. The information has to be cleaned and annotated for training the model — a task that is typically handled by human teams and takes weeks to months. This just doesn’t scale for the demands of AI today.
“We spoke to lots of teams with amazing ideas for models and products – if only they had the right data to train them. It was then we realised that making clean, labelled data available super quickly was what we wanted to focus on.”
So, in 2021, the duo started Refuel and went on to build a dedicated platform that uses specialized LLMs to automate the creation and labeling of datasets (with quality on par with or better than humans) for every business and every use case.
According to the company, enterprise users will be able to use the platform by simply uploading their datasets and instructing the LLMs to label the data. They could also give guidelines and a few examples to ensure only high-quality training-ready data comes out.
“Within an hour, they (users) will have enough data to start training their AI models, which they can then seamlessly connect into their model training infrastructure. As these teams collect more data (especially from production), they can re-route it into Refuel for labeling, measuring performance and improving their datasets for model re-training,” the CEO added.
In private beta tests by select enterprises, the offering was found to speed up the process of data creation and labeling by up to 100%. Bhargava didn’t share the names of these companies but noted that Refuel AI is seeing interest from multiple verticals, from social media and fintech to healthcare, HR and ecommerce.
The road ahead
With this round, which was co-led by General Catalyst and XYZ Ventures, Refuel plans to grow its engineering team from six to 12 members and further invest in the platform and its LLM infrastructure to prepare for a commercial launch by the end of July. The company will also invest the capital in its open-source library and community.
“We’re hosting a competition to push the limits of LLM-powered data labeling, with prizes up to ten grand!” Bhargava noted.
Currently, in the data labeling space, the company competes with players like Tasq AI, Snorkel AI and SuperAnnotate.