- Data science startup Hugging Face has a huge community, with users at Meta, Amazon and Google.
- The company’s valuation could hit $2 billion in a new funding round, sources told Insider.
- Machine learning experts deploy ready-to-use Hugging Face models for text analysis and other uses.
Hugging Face, a beloved platform in the machine learning community that offers quick access to a wide variety of popular tools like language analysis, is quietly raising a new round of funding that could add value to the startup up to $2 billion, multiple sources familiar with the deal tell Insider.
Existing investors Lux Capital and Addition are participating in the round, according to people with knowledge of the matter. The size of the funding round could not be learned and details may change before the round closes.
Hugging Face, Addition and Lux Capital and declined to comment for this story.
The round is expected to bring Hugging Face into a realm of emerging big data and machine learning startups that have passed the billion-dollar milestone, including DataRobot, Dataiku, dbt Labs, Collibra and Alation.
Founded in 2016 and named after the emoji, more than 5,000 companies use Hugging Face, including the AI division of Meta, Amazon Web Services, Microsoft and Google AI. Hugging Face also offers a tool similar to Streamlit – a machine learning platform, acquired by Snowflake for $800 million – called Gradio.
Hugging Face raised $40 million last year from investors including Betaworks, Lux Capital and NBA star Kevin Durant.
Beyond its tools, Hugging Face has been able to gain great valuation thanks to its huge community. Hugging Face’s tool, transformers, has 61,300 stars on GitHub, which is often a measure of success for developer tools. For comparison, PyTorch, Meta’s popular machine learning framework, has 55,500 stars; Google’s TensorFlow has 164,000 stars. Snowflake’s Streamlit has 18,700 stars compared to Gradio’s 6,000.
Hugging Face is also considered by many industry experts to be an essential part of the next-generation machine learning stack, an area in which Snowflake and Databricks have invested.
One of the heaviest aspects of machine learning is the “learning” part: training a model to understand what the business is doing. This can include teaching him to identify people in photos or to understand if a sentence has a positive or negative sentiment. It involves writing the underlying code, collecting and cleaning the data, and then processing it, which can take a lot of time and computing resources.
Hugging Face popularized a new class of machine learning tools called transformers which are machine learning models that have already been trained on large data sets. Data scientists and machine learning engineers can download a variety of these models and datasets.
By using pre-trained models, users don’t have to perform that huge lift to get started. Instead, experts and enthusiasts can immediately begin analyzing text to determine sentiment or generate summaries. Rather than having to create their own templates, developers can determine whether a sentence has positive or negative sentiment with just a few lines of code.
In that way, it’s similar to how businesses started to widely adopt cloud computing. Rather than setting up their own infrastructure, which is expensive and time-consuming, companies and engineers can immediately start using servers using Amazon Web Services, Azure or Google Cloud Platform.
However, these same machine learning experts can then retrain the models with their data, making them more accurate and useful for their own custom products. This elevates Hugging Face models beyond just a hobbyist tool into the realm of powering more advanced machine learning modes that many companies use to improve their services.
Hugging Face offers a variety of professional services and tools, including ways to automatically train a machine learning model without ever writing any code. A Hugging Face customer can upload a dataset and indicate the type of task they want to perform, such as sentiment analysis.
Hugging Face also hosts some of the most popular and widely used machine learning models that are already trained on existing large datasets. These include language analysis models like Google’s BERT and OpenAI’s GPT-2. There are also models for image recognition and audio speech recognition, among a number of others available. There are a total of 108 pre-trained models on Hugging Face.