OpenAI starts creating new benchmarks that more accurately evaluate AI models across different languages and cultures

English is only spoken by about 20% of the world’s population, yet existing AI benchmarks for multilingual models are falling short. For example, MMMLU has become saturated to the point that top models are clustering near high scores, and OpenAI says this makes them a poor indicator of real progress. Additionally, the existing multilingual benchmarks … continue reading

Testlio expands its crowdsourced testing platform to provide human-in-the-loop testing for AI solutions

Testlio, a company that offers crowdsourced software testing, has announced a new end-to-end testing solution designed specifically for testing AI solutions. Leveraging Testlio’s community of over 80,000 testers, this new solution provides human-in-the-loop validation for each stage of AI development. “Trust, quality, and reliability of AI-powered applications rely on both technology and people,” said Summer … continue reading

Kong’s Insomnia 12 release adds capabilities to help with MCP server development

Kong today unveiled the latest release of its open source platform for designing, mocking, debugging, and testing APIs. Insomnia 12 adds new features to enable developers to more rapidly build and test APIs and MCP servers. According to Kong, developers who are building MCP servers are facing similar challenges to what API developers faced years … continue reading

OpenAI and AWS announce $38 billion deal for compute infrastructure

AWS and OpenAI today announced a new partnership that will have OpenAI’s workloads running on AWS’s infrastructure. AWS will build compute infrastructure for OpenAI that is optimized for AI processing efficiency and performance. Specifically, the company will cluster NVIDIA GPUs (GB200s and GB300s) on Amazon EC2 UltraServers. OpenAI will commit $38 billion to Amazon over … continue reading

October 2025: AI updates from the past month

OpenAI announces agentic security researcher that can find and fix vulnerabilities OpenAI has released a private beta for a new AI agent called Aardvark that acts as a security researcher, finding vulnerabilities and applying fixes, at scale. “Software security is one of the most critical—and challenging—frontiers in technology. Each year, tens of thousands of new … continue reading

OpenAI announces agentic security researcher that can find and fix vulnerabilities

OpenAI has released a private beta for a new AI agent called Aardvark that acts as a security researcher, finding vulnerabilities and applying fixes, at scale. “Software security is one of the most critical—and challenging—frontiers in technology. Each year, tens of thousands of new vulnerabilities are discovered across enterprise and open-source codebases. Defenders face the … continue reading

Elastic adds new vector search algorithm that cuts down on memory requirements, improves speed

Elastic has introduced a new disk-friendly vector search algorithm, called DiskBBQ, to Elasticsearch. According to the company, this new algorithm is more efficient than traditional search techniques in vector databases, like Hierarchical Navigable Small Worlds (HNSW), which is currently the most commonly used technique. With HNSW, all vectors are required to reside in memory, which … continue reading

Cursor 2.0 enables eight agents to work in parallel without interfering with each other

The AI coding editor Cursor announced the launch of Cursor 2.0, the next iteration of the platform, featuring a new interface for working with multiple agents and its first ever coding model. The new multi-agent interface centers around agents instead of files. With this new interface, up to eight agents can work in parallel, using … continue reading

JetBrains launches open benchmarking platform for measuring AI productivity

JetBrains has released a new tool designed to enable developers to measure their actual productivity gains from AI tools. The company’s Developer Productivity AI Arena (DPAI Arena) is an open benchmarking platform for how well AI development tools complete real-world software engineering tasks. According to the company, current benchmarks that LLMs are run against rely … continue reading

GitHub unveils Agent HQ, the next evolution of its platform that focuses on agent-based development

During its annual conference, GitHub Universe, GitHub shared its plans for Agent HQ, its vision for the future of the platform where AI agents are natively integrated across all of GitHub. As part of this Agent HQ initiative, over the next several months, paid GitHub Copilot users will gain direct access to popular coding agents … continue reading

Eclipse Foundation launches ADL, an open language for defining agent behavior

The Eclipse Foundation today introduced the Agent Definition Language (ADL), an open language and visual toolkit for defining agent behavior. It was introduced as a part of the Eclipse Language Models Operating System (LMOS) project, an open source platform for building and running multi-agent systems. “Agentic AI is redefining enterprise software, yet until now there … continue reading

OpenAI completes restructuring, strikes new deal with Microsoft

OpenAI today announced that it has completed the restructuring of its business. When the company was founded in 2015, it was launched as a non-profit organization and that non-profit has controlled the for-profit arm of the business. Today’s restructuring turns the for-profit arm into a public benefit corporation called OpenAI PBC. The OpenAI Foundation—the new … continue reading

« Previous PageNext Page »
DMCA.com Protection Status