
Claude Opus 4 and Claude Sonnet 4 are capable of undertaking long-running tasks and can work continuously for several hours. Claude Opus 4 excels at coding and complex problem-solving, whereas Claude Sonnet 4 improves on Sonnet 3.7 and balances performance and efficiency.
In addition to releasing these new models, the company also revealed a beta for extended thinking with tool use, the ability to use tools in parallel, and general availability of Claude Code.
The Anthropic API also added four new capabilities: the code execution tool, MCP connector, Files API, and the ability to cache prompts for up to one hour.
OpenAI adds new tools and features to the Responses API
New additions include remote MCP server support, support for the latest image generation model, the ability to use the Code Interpreter tool, and the ability to use the file search tool in OpenAI’s reasoning models.
The company has also added background mode, which allows the model to execute complex reasoning tasks asynchronously; reasoning summaries; and the ability to reuse reasoning items across different API requests.
Mistral launches LLM for coding agents
Devstral is a lightweight open source model designed specifically for agentic coding tasks. According to the SWE-Bench Verified benchmark, Devstral outperforms GPT-4.1-mini and Claude 3.5 Haiku. Its small size allows it to run on a single RTX 4090 or a Mac with 32GB RAM, enabling it to be utilized for local, on-device use.
“While typical LLMs are excellent at atomic coding tasks such as writing standalone functions or code completion, they currently struggle to solve real-world software engineering problems. Real-world development requires contextualising code within a large codebase, identifying relationships between disparate components, and identifying subtle bugs in intricate functions. Devstral is designed to tackle this problem. Devstral is trained to solve real GitHub issues,” Mistral wrote in its announcement.
AI updates from Google I/O
Google I/O was full of updates on AI, including new models such as the new text model Gemini Diffusion and Gemma 3n, a multimodal model designed for running on phones, laptops and tablets, capable of handling audio, text, image, and video.
Google also revealed two new Gemma model variants: MedGemma for health applications and SignGemma for translating sign language into spoken language text.
Gemini Code Assist for individuals and Gemini Code Assist for GitHub are both now generally available as well, and are powered by Gemini 2.5. This tool was first introduced as a preview back in February, and today’s GA release includes several new updates, including chat history and threads, the ability to specify rules to apply to every AI generation in the chat, custom commands, and the ability to review and accept code suggestions in parts, across files, or all together.
The company also announced a reimagined version of Colab, a new tool that generates UI components from wireframes or text prompts called Stitch, and new features in Firebase Studios, such as the ability to translate Figma designs into applications.
AI updates from Microsoft Build
A new coding agent has been added to GitHub Copilot that gets activated when a developer assigns it a GitHub issue or calls it via a prompt in VS Code. It can assist with a number of tasks, including adding features, fixing bugs, extending tests, refactoring code, and improving documentation. All of the agent’s pull requests require human approval before they run, GitHub confirmed.
Microsoft also announced Windows AI Foundry, a platform that supports the AI developer life cycle across training and inference. Developers will be able to manage and run open-source LLMs through Foundry Local or bring proprietary models and convert, fine-tune, and deploy them across clients and cloud.
Support for the Model Context Protocol (MCP) was also added across Microsoft’s platforms and services, including GitHub, Copilot Studio, Dynamics 365, Azure AI Foundry, Semantic Kernel, and Windows 11.
Microsoft also announced a new open source project called NLWeb to help developers create conversational AI interfaces for their websites using any model or data source they’d like. NLWeb endpoints also act as MCP servers, so developers will be able to easily make their content discoverable to AI agents if they’d like.
Shopify releases new developer tools
It is launching a new unified developer platform that integrates the Dev Dashboard and CLI and offers AI-powered code generation. Developers can also now create “dev stores” where they can preview apps in test environments, a feature that was previously only available to Plus plans, and is now available to all developers.
Other new features announced today include declarative custom data definitions, a unified Polaris UI toolkit, and Storefront MCP, which allows developers to build AI agents that will act as shopping assistants for stores.
HeyMarvin launches AI Moderated Interviewer
The AI Moderated Interviewer conducts moderated user interviews with potentially thousands of participants without a human facilitator. It can also analyze the interview responses to surface insights and trends.
“What makes it so powerful is that it enables free-flowing, qualitative, engaging conversations — but on demand and at scale,” said Prayag Narula, CEO and co-founder of HeyMarvin. “We’re talking hundreds, even thousands of people, something that was previously only seen at large scale using a small army of volunteers in moments like presidential elections. Now, even a small team can have that same in-depth dialogue with their customers. It’s not just a better survey, and it’s not replacing traditional user interviews. It’s a whole new way of doing research that simply didn’t exist a few months ago.”
Zencoder announces Autonomous Zen Agents for CI/CD
These agents run directly in CI/CD pipelines and can be triggered by webhooks from issue trackers or code events. They can resolve issues, implement fixes, improve code quality, generate and run tests, and create documentation.
“The next evolution in AI-powered development isn’t just about coding faster – it’s about accelerating the whole software development lifecycle, where coding is just one step,” said Andrew Filev, CEO and founder of Zencoder. “By bringing autonomous agents into CI/CD pipelines, we’re enabling teams to eliminate routine work and accelerate hand-offs, maintaining momentum 24/7, while keeping humans in control of what ultimately ships.”
Read last week’s AI updates here: OpenAI Codex, AWS Transform for .NET, and more — May 16, 2025