Highlights:

  • Hugging Face will incorporate XetHub’s technology into its AI hosting platform, aiming primarily to improve the platform’s storage system.
  • Hugging Face’s Git implementation requires developers to re-upload entire files to update AI models or datasets, which can take hours for large files.

Hugging Face Inc. has acquired XetHub, a startup that assists developers in managing files generated during artificial intelligence projects.

The companies announced the deal recently, with Hugging Face describing it as their largest acquisition so far, implying a price higher than the USD 10 million spent to acquire Argilla Inc. in June. Argilla Inc. had developed a tool for creating AI training datasets.

Hugging Face runs a widely used platform for hosting open-source machine learning projects, housing over 1.3 million AI models, 450,000 training datasets, and other technical assets. The company has attracted investments from several major tech firms, including Nvidia Corp., which participated in its latest USD 235 million funding round.

XetHub, officially known as XetData Inc., is a Seattle-based software company backed by USD 7.5 million in investor funding. The company offers a platform that software teams can use to store code files and other technical assets generated during AI projects. Additionally, the platform includes productivity tools designed to simplify working with these files.

Hugging Face plans to incorporate XetHub’s technology into its AI hosting platform. The company stated that the primary aim of this initiative is to improve the platform’s storage system.

The company stores users’ AI models and datasets using Git, an open-source tool initially designed for managing code files. To accommodate larger files, the company also utilizes another open-source technology called LFS, which extends Git’s capacity to handle files much larger than its original design intended—a crucial feature since AI models and datasets can occupy gigabytes of space.

Hugging Face’s Git implementation has some limitations. When developers need to update an AI model or dataset hosted on the platform, they must re-upload the entire file, which can take hours for large AI files that contain gigabytes of data.

XetHub’s platform accelerates the process by dividing AI models and datasets into smaller chunks. When developers need to release an update, they only have to update the modified chunks rather than the entire file, leading to a substantial reduction in upload times.

The acquisition brings additional capabilities to Hugging Face. XetHub’s platform offers a feature that visualizes neural network architectures, making them more understandable for developers. It also includes collaboration tools that simplify tasks like editing training datasets.

XetHub CEO Yucheng Low and Hugging Face CTO Julien Chaumond stated in a blog post, “XetHub has developed technologies to enable Git to scale to TB repositories and enable teams to explore, understand and work together on large evolving datasets and models.”

The acquisition could also boost Hugging Face’s commercialization efforts. The company offers a paid version of its platform, Enterprise Hub, which organizations use to host their internal machine-learning projects. XetHub’s capability to accelerate file updates could enhance the user experience for Enterprise Hub customers.