Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124


Thomas Wolf, Co-Founder and Chief scientist to join the face, Thought we should need new ways to measure AI models.
Wolf told the audience of Brainstorm Ai In London while AI models are more advanced, the harder it is to say who makes the best.
“It’s hard to say what’s the best model is,” he said, pointing out the differences in nominalities between new releases from Opuai and Mobile. “They are all as real, very close.”
“The world of benchmarks develops a lot. We often measured MMLU model knowledge” These benchmarks are mostly saturated today. “
Over the past year, there is a growing chorus from academics, industry, and policy claiming AI benchmarks, such as MMLuWag, and not to reflect on the real world.
In a study published in February, the researchers of the European Commission joint Spears Center, published a paper called “Can we trust AI benchmarks? An Interdisciplinary Approval of current AI checking issues” It has found that “systemic flaws in current works of benchmarking” -As incorrect incentives, failures to carry out the results and contamination of data.
Go forward, Wolf said the AI industry must rely on two main benchmarks in 2025: one for evaluating models, and the other aligned in each case.
Huggging Face has been working on the last.
The new company program, “your bench,” seeks to help users know which model will be used for a specific task. Users feed some program documents, automatically generates a specified benchmark for work type that users can apply to something best for the case of use.
“Because these models are all working similar to this academic sign never means they are the same,” Wolf said.
Built in Wolf, Clément Delangue, and Julien Chaumand in 2016, Hugging Face has been a long-range champion of Open-Source AI.
Always referred to as the prolonged learning machine, the company provides an open source platform that allows developers, researchers, and applications, and applications. Users can also browse models and datas uploaded to others.
Wolfe tells Brainstorm audiences with “business model in accordance with open source” and the beauty of the company has the highest number of people engaged in these open community models and sharing. “
The Wolfe is predicted that the open-source AI will continue to develop, especially after the Deepsheek success earlier this year.
After the previous year’s launch, the Chinese made of eldese deferek R1 sent shockwaves by AI to AI when the advocates ended with AI models.
Wolf said Dereseeeek is a “chatgpt moment” for open source AI.
“Like the chatgpt is the opportunity to know the whole world AI, Dereseeh is the opportunity to know the whole world with this open society,” he said.
This story originally shown Fortune.com