English is only spoken by about 20% of the world’s population, yet existing AI benchmarks for multilingual models are falling short. For example, MMMLU has become saturated to the point that top models are clustering near high scores, and OpenAI says this makes them a poor indicator of real progress. Additionally, the existing multilingual benchmarks … continue reading
Testlio, a company that offers crowdsourced software testing, has announced a new end-to-end testing solution designed specifically for testing AI solutions. Leveraging Testlio’s community of over 80,000 testers, this new solution provides human-in-the-loop validation for each stage of AI development. “Trust, quality, and reliability of AI-powered applications rely on both technology and people,” said Summer … continue reading
Kong today unveiled the latest release of its open source platform for designing, mocking, debugging, and testing APIs. Insomnia 12 adds new features to enable developers to more rapidly build and test APIs and MCP servers. According to Kong, developers who are building MCP servers are facing similar challenges to what API developers faced years … continue reading
AWS and OpenAI today announced a new partnership that will have OpenAI’s workloads running on AWS’s infrastructure. AWS will build compute infrastructure for OpenAI that is optimized for AI processing efficiency and performance. Specifically, the company will cluster NVIDIA GPUs (GB200s and GB300s) on Amazon EC2 UltraServers. OpenAI will commit $38 billion to Amazon over … continue reading
OpenAI announces agentic security researcher that can find and fix vulnerabilities OpenAI has released a private beta for a new AI agent called Aardvark that acts as a security researcher, finding vulnerabilities and applying fixes, at scale. “Software security is one of the most critical—and challenging—frontiers in technology. Each year, tens of thousands of new … continue reading
OpenAI has released a private beta for a new AI agent called Aardvark that acts as a security researcher, finding vulnerabilities and applying fixes, at scale. “Software security is one of the most critical—and challenging—frontiers in technology. Each year, tens of thousands of new vulnerabilities are discovered across enterprise and open-source codebases. Defenders face the … continue reading
Elastic has introduced a new disk-friendly vector search algorithm, called DiskBBQ, to Elasticsearch. According to the company, this new algorithm is more efficient than traditional search techniques in vector databases, like Hierarchical Navigable Small Worlds (HNSW), which is currently the most commonly used technique. With HNSW, all vectors are required to reside in memory, which … continue reading
The AI coding editor Cursor announced the launch of Cursor 2.0, the next iteration of the platform, featuring a new interface for working with multiple agents and its first ever coding model. The new multi-agent interface centers around agents instead of files. With this new interface, up to eight agents can work in parallel, using … continue reading
JetBrains has released a new tool designed to enable developers to measure their actual productivity gains from AI tools. The company’s Developer Productivity AI Arena (DPAI Arena) is an open benchmarking platform for how well AI development tools complete real-world software engineering tasks. According to the company, current benchmarks that LLMs are run against rely … continue reading
During its annual conference, GitHub Universe, GitHub shared its plans for Agent HQ, its vision for the future of the platform where AI agents are natively integrated across all of GitHub. As part of this Agent HQ initiative, over the next several months, paid GitHub Copilot users will gain direct access to popular coding agents … continue reading
The Eclipse Foundation today introduced the Agent Definition Language (ADL), an open language and visual toolkit for defining agent behavior. It was introduced as a part of the Eclipse Language Models Operating System (LMOS) project, an open source platform for building and running multi-agent systems. “Agentic AI is redefining enterprise software, yet until now there … continue reading
OpenAI today announced that it has completed the restructuring of its business. When the company was founded in 2015, it was launched as a non-profit organization and that non-profit has controlled the for-profit arm of the business. Today’s restructuring turns the for-profit arm into a public benefit corporation called OpenAI PBC. The OpenAI Foundation—the new … continue reading