What Is Snorkel.AI and Why Is It Worth Over 1 Billion Dollars?
A Billion Dollar Industry That Can't Be Ignored - Community Update #18
CTO’s Focus of the Week
A few community updates back, we discussed Greylock Partners and its recent investments in data and analytics ventures. Of the five we examined in that article, one has recently gotten a fresh round of funding. Snorkel AI has raised additional capital, bringing the total to $135 million and placing its value at $1 billion. In this article, we explore what Snorkel.AI is and why it is being funded, as well as the problem it is trying to solve amidst a competitive market.
What Is Snorkel.AI?
Snorkel AI, Inc. (www.snorkel.ai), established in 2019, is a technology start-up based in Palo Alto, California. The company’s aim is to “unlock a better, faster way to build AI applications.” Its executive team consists of nine members, including the five co-founders with expertise in technology and engineering. The company currently has over 50 employees, called “snorkelers,” and is hiring for multiple open positions as it continues to expand.
A Brief History of Snorkel AI
As a highlight in the history of artificial intelligence (AI), 2015 marked the first year machine learning (ML) beat humans at ImageNet Challenge run by a Stanford University professor. That same year, a team of Stanford AI lab researchers had something to prove: algorithms, models, or infrastructure are not the key to the success of ML projects—it is training data.
Their research, called Snorkel Project, began as part of a Defense Advanced Research Projects Agency (DARPA) program. DARPA’s Memex program involved researching the use of dark web data to fight human trafficking. In 2019, after successfully proving their point, the five-member team set out on their own to apply their findings to real-world applications. This was the start of Snorkel AI.
Snorkel Lands Multiple Rounds of VC-Backed Funding
In 2019, after launching out of stealth mode, Snorkel AI announced it had received a few million dollars in seed money. At that point, the start-up was just entering the revenue-generating stage. By 2020, it had raised Series A capital from Greylock, GV, and several other investors at an initial valuation of $135 million. Through subsequent rounds in 2021, it acquired Series B capital led by Lightspeed Venture Partners, at which time it announced the launch of its new Applications Studio tool.
Within only a few months since its Series B round, Snorkel AI attracted the attention of more investors. Led by Blackrock and Addition, it secured Series C funding, bringing its total to $85 million. Following is a breakdown of the fundraising history:
Seed Round – $3M (seed) at stealth stage – January 2019
Early Stage VC – $12M (Series A) at $135M valuation – July 2020
Early Stage VC – $35M (Series B) at [unknown] valuation – April 2021
Later Stage VC – $85M (Series C) at $1B valuation – August 2021
In less than three years, Snorkel AI had received backing from 10 top investors, which led to a valuation of $1 billion. This start-up's success lies in its ability to provide solutions for a challenging, yet important, data problem.
What Problem Is It Trying to Solve and Why Is It Important?
Snorkel AI specializes in image data labeling. According to its website, Snorkel is “an ML approach that utilizes programmatic labeling and statistical modeling to build and iteratively improve training datasets.” To achieve their goals, the team developed a technology platform called Snorkel Flow. This data-centric platform is the first of its kind, designed to remove the bottleneck of hand-labeled training data by training machine learning models in a fraction of the time.
Snorkel AI’s award-winning approach to training data has been used in technology products, medical applications and other real-world programs. It achieved this because its team recognized that the success of an AI system lies in its training data. To maintain a competitive advantage, companies are turning to AI to automate processes and help them with decision-making. Without large sets of labeled training data, neural networks cannot learn. Manual, hand-labeling has its limitations because it’s a slow process requiring a lot of man-power and is prone to errors.
What started out as a Stanford AI Lab spinout dating back to 2015 has turned into a full-fledged, VC-backed AI business. Snorkel AI has partnered with some of the world’s largest companies—Apple, Intel, IBM, Uber, Stanford Medicine and more—giving its data-labeling competitors a run for their money.
Who Is Snorkel.AI's Competition?
As the trend of businesses turning to AI increases at a rapid rate, more technology companies are tapping into opportunities to provide training data services. For tech start-ups, this has led to stiff competition against giants like Amazon. Besides large companies, Snorkel AI’s major competitors in the data labeling market include Appen, Labelbox, and CloudFactory. Scale AI is another competitor, which recently raised $325 million at a valuation of $7.3 billion.
These latest funding success stories show that venture capitalists consider data labeling service offerings to be crucial. To remain at a competitive advantage, Snorkel AI must leverage the fact that large companies and consulting firms hire contractors to perform data labeling. Such businesses, like Accenture, will have to contend with privacy risks when dealing in finance and other regulated fields.
Conclusion
The AI industry has grown by leaps and bounds as all types of organizations from start-ups and big tech to government and academia have joined the AI bandwagon. Like Greylock, venture capitalists are tapping into opportunities to support AI start-ups that offer data solutions to the challenges facing many businesses. Snorkel AI is a great example of how a research project grew into a billion-dollar business in a matter of a few years.
Ask A Data Consultant - Office Hours
Every newsletter I open up a day or two with a few slots for open office hours where my readers can sign up and you can ask me questions. I got to answer a lot of great questions so far and hopefully, they helped provide a lot of insights for those who signed up.
Sign Up Below:
Next Open Office Hours
Sign Up For My Next Office Hours on September the 8th at 8 AM - 10 AM PT or between 5 PM - 7 PM PT
Thanks To The SDG Community
I started writing this weekly update more seriously about 8-9 weeks ago. Since then I have gained hundreds of new subscribers as well as 13 supporters!
And all I can say is, Thank You!
You guys are keeping me motivated.
Every read, comment, like and financial subscription is amazing and I really appreciate all you.
If you want to help support this community consider clicking the link below.
Video Of The Week:We Don't Need Data Engineers, Just Better Tools - A Data Engineer Responds To A Data Scientist
Why do we need data engineers?
We just need better data science tools?
At least according to this one data scientist.
Articles Worth Reading
There are 20,000 new articles posted on Medium daily and that’s just Medium! I have spent a lot of time sifting through some of these articles as well as TechCrunch and companies tech blog and wanted to share some of my favorites!
Dataiku gets $400M at a $4.6B valuation, led by Tiger Global
Data science platform Dataiku announced today it has raised a $400 million Series E, bringing its valuation to $4.6 billion. The round was led by Tiger Global, with participation from returning investors like ICONIQ Growth, CapitalG, FirstMark Capital, Battery Ventures, Snowflake Ventures and Dawn Capital.
New investors include Insight Partners, Eurazeo, Lightrock and Olivier Pomel, the chief executive of Datadog.
Dataiku’s last round of funding was a $100 million Series D in 2020.
Founded in 2013, Dataiku is used by data scientists, but also designed for business analysts and other people with less technical backgrounds. The platform lets companies design and deploy AI and analytics apps, turn raw data into advanced analytics and design machine learning models. It’s been used for a wide array of use cases, including fraud detection, customer churn prevention and supply chain optimization.
20 ideas for better data visualization
Applications we design are becoming increasingly data-driven. The need for quality data visualization is high as ever. Confusing and misleading graphics are all around us, but we can change this by following these simple rules.
1. Choose the right chart type
Choosing the wrong chart type, or defaulting to the most common type of data visualization could confuse users or lead to data misinterpretation. The same data set can be represented in many ways, depending on what users would like to see. Always start with a review of your data set and user interview.
What Is Snowflake And Why You Should Use It For Your Cloud Data Warehouse
Snowflake is a cloud data platform. To be more specific it’s the first cloud built data platform. Its architecture allows data specialists to not only create data warehouses but also cloud data lake-houses because it can manage both structured and unstructured data easily.
Being that Snowflake is based in the cloud, this means that it allows developers to take advantage of elasticity and scalability without worrying about things like high up front costs, performance, or complexity of managing the system.
So Why Use Snowflake?
There are a lot of options when it comes to cloud data warehouses and everyone has specific use cases. However, our team is finding more and more users are either selecting or switching over to Snowflake for its many useful features and overall ease of use.
Best practices for monitoring a cloud migration
When you migrate workloads from on-premise infrastructure into a public cloud, you can improve the performance, reliability, and security of your application, and you might also lower your costs. To execute a successful cloud migration, you need a detailed inventory of your current deployments, visibility into your application’s performance as you shift traffic to the cloud, and confirmation that—once you’ve landed in the cloud—you’re still providing a high-quality user experience.
End Of Day Eighteen
Whether it be data engineering or AI, there are billions of dollars currently flooding the market to solve problems such as data labeling.
It will be interesting to see how the next five to ten years play out as I don’t foresee this being sustainable from an investment standpoint and I assume there will need to be a consolidation phase where many of these companies either merge or disappear.
But what are your thoughts.
What companies do you think will win in the data race?
Thanks for reading!