Snorkel AI Commits $3M in Grants to Close the Agentic AI Evaluation Gap
February 28, 2026 · 2 min read
Claire Cummings
Snorkel AI has launched a $3 million Open Benchmarks Grants program to fund research teams building next-generation evaluation frameworks for agentic AI — the category of AI systems that operate autonomously across multi-step tasks.
The program addresses what Snorkel calls the "biggest blind spot" in AI development: models that ace standardized tests but fail unpredictably in production environments. Backed by partners including Hugging Face, Together AI, Prime Intellect, Factory HQ, Harbor, and PyTorch, the initiative targets the gap between how AI agents are benchmarked and how they actually perform.
What the Grants Fund
The program supports open-source datasets, benchmarks, and evaluation artifacts — the infrastructure that determines how frontier AI systems are measured and compared. Snorkel is specifically seeking proposals that move beyond static leaderboard metrics toward benchmarks reflecting real-world agentic behavior: tool use, multi-step reasoning, error recovery, and task completion across complex workflows.
Applications are reviewed on a rolling basis starting March 1, 2026, with no fixed closing date announced.
Why This Matters Now
As enterprises deploy AI agents for everything from customer service to code generation, the lack of reliable evaluation standards creates risk. A model's score on existing benchmarks tells you remarkably little about whether it will handle a novel customer query or correctly orchestrate a five-step API workflow. Snorkel's bet is that funding open evaluation infrastructure now will shape how the entire industry builds and buys AI systems.
The grant structure — open-source requirements, academic and independent team eligibility, rolling review — is designed to attract the exact researchers who might otherwise lack industry resources to pursue this work.
Who Should Apply
Academic research groups, open-source maintainers, and independent teams working on AI evaluation methodology are the primary targets. The program's emphasis on production-realistic benchmarks makes it particularly relevant for teams with experience deploying or auditing AI systems in enterprise settings.
For researchers exploring the intersection of AI safety, evaluation, and applied machine learning, Granted can help identify complementary government and foundation funding for this rapidly growing field.
More AI funding coverage is available on the Granted blog.
