
Anthropic starts testing a Claude extension for Chrome
The extension will enable Claude to take action on websites on behalf of the user. “We’ve spent recent months connecting Claude to your calendar, documents, and many other pieces of software. The next logical step is letting Claude work directly in your browser,” the company says.
The company is starting off with a small pilot of 1,000 Max plan users, and will gradually expand the program out to more people if the pilot goes well.
According to Anthropic, one of the big safety challenges with agents that use the browser is prompt injection attacks, and some of the steps the company has taken to defend against them are providing site-level permissions and requiring action confirmations. This pilot will test how well those defenses hold up in real-world scenarios.
Google integrates Gemini CLI into Zed code editor
Google announced that it has brought the Gemini CLI to the open source code editor, Zed. The new integration will enable Zed users to generate and refactor code in the editor, get instant answers on code or error messages, and chat naturally in the terminal.
Developers will be able to follow along live with the Gemini agent as it makes changes. Once the agent is done working, Zed will display the changes in a review interface that shows a clear diff for each edit that can be reviewed, accepted, or modified, providing the same level of control as a code review.
Users will also be able to provide context beyond the codebase by pointing the agent to external sources like a URL with documentation or an API spec.
Microsoft packs Visual Studio August update with smarter AI features
Microsoft has released the August update for Visual Studio 2022, adding several features related to AI-assisted development.
The company announced that GPT-5 is now integrated into the IDE, and support for MCP is generally available as well. MCP support enables developers to authenticate with any OAuth provider directly from the IDE, perform one-click installation of MCP servers, and manage MCP access from GitHub policy settings.
Copilot Chat was updated with the ability to surface relevant code snippets more reliably using improved semantic code search to determine when queries should trigger a code lookup. Developers can now connect models from OpenAI, Google, and Anthropic to Visual Studio Chat, as well.
Agent Mode in Gemini Code Assist now available in VS Code and IntelliJ
This mode was introduced last month to the Insiders Channel for VS Code to expand the capabilities of Code Assist beyond prompts and responses to support actions like multiple file edits, full project context, and built-in tools and integration with ecosystem tools.
Since being added to the Insiders Channel, several new features have been added, including the ability to edit code changes using Gemini’s Inline diff, user-friendly quota updates, real-time shell command output, and state preservation between IDE restarts.
Separately, the company also announced new agentic capabilities in its AI Mode in Search, such as the ability to set dinner reservations based on factors like party size, date, time, location, and preferred type of food. U.S. users opted into the AI Mode experiment in Labs will also now see results that are more specific to their own preferences and interests. Google also announced that AI Mode is now available in over 180 new countries.
GitHub’s coding agent can now be launched from anywhere on platform using new Agents panel
GitHub has added a new panel to its UI that enables developers to invoke the Copilot coding agent from anywhere on the site.
From the panel, developers can assign background tasks, monitor running tasks, or review pull requests. The panel is a lightweight overlay on GitHub.com, but developers can also open the panel in full-screen mode by clicking “View all tasks.”
The agent can be launched from a single prompt, like “Add integration tests for LoginController” or “Fix #877 using pull request #855 as an example.” It can also run multiple tasks simultaneously, such as “Add unit test coverage for utils.go” and “Add unit test coverage for helpers.go.”
Anthropic adds Claude Code to Enterprise, Team plans
With this change, both Claude and Claude Code will be available under a single subscription. Admins will be able to assign standard or premium seats to users based on their individual roles. By default, seats include enough usage for a typical workday, but additional usage can be added during periods of heavy use. Admins can also create a maximum limit for extra usage.
Other new admin settings include a usage analytics dashboard and the ability to deploy and enforce settings, such as tool permissions, file access restrictions, and MCP server configurations.
Microsoft adds Copilot-powered debugging features for .NET in Visual Studio
Copilot can now suggest appropriate locations for breakpoints and tracepoints based on current context. Similarly, it can troubleshoot non-binding breakpoints and walk developers through the potential cause, such as mismatched symbols or incorrect build configurations.
Another new feature is the ability to generate LINQ queries on massive collections in the IEnumerable Visualizer, which renders data into a sortable, filterable tabular view. For example, a developer could ask for a LINQ query that will surface problematic rows causing a filter issue. Additionally, developers can hover over any LINQ statement and get an explanation from Copilot on what it’s doing, evaluate it in context, and highlight potential inefficiencies.
Copilot can also now help developers deal with exceptions by summarizing the error, identifying potential causes, and offering targeted code fix suggestions.
Groundcover launches observability solution for LLMs and agents
The eBPF-based observability provider groundcover announced an observability solution specifically for monitoring LLMs and agents.
It captures every interaction with LLM providers like OpenAI and Anthropic, including prompts, completions, latency, token usage, errors, and reasoning paths.
Because groundcover uses eBPF, it is operating at the infrastructure layer and can achieve full visibility into every request. This allows it to do things like follow the reasoning path of failed outputs, investigate prompt drift, or pinpoint when a tool call introduces latency.
IBM and NASA release open-source AI model for predicting solar weather
The model, Surya, analyzes high resolution solar observation data to predict how solar activity impacts Earth. According to IBM, solar storms can damage satellites, impact airline travel, and disrupt GPS navigation, which can negatively impact industries like agriculture and disrupt food production.
The solar images that Surya was trained on are 10x larger than typically AI training data, so the team has to create a multi-architecture system to handle it.
The model was released on Hugging Face.
Preview of NuGet MCP Server now available
Last month, Microsoft announced support for building MCP servers with .NET and then publishing them to NuGet. Now, the company is announcing an official NuGet MCP Server to integrate NuGet package information and management tools into AI development workflows.
“Since the NuGet package ecosystem is always evolving, large language models (LLMs) get out-of-date over time and there is a need for something that assists them in getting information in realtime. The NuGet MCP server provides LLMs with information about new and updated packages that have been published after the models as well as tools to complete package management tasks,” Jeff Kluge, principal software engineer at Microsoft, wrote in a blog post.
Opsera’s Codeglide.ai lets developers easily turn legacy APIs into MCP servers
Codeglide.ai, a subsidiary of the DevOps company Opsera, is launching its MCP server lifecycle platform that will enable developers to turn APIs into MCP servers.
The solution constantly monitors API changes and updates the MCP servers accordingly. It also provides context-aware, secure, and stateful AI access without the developer needing to write custom code.
According to Opsera, large enterprises may maintain 2,000 to 8,000 APIs — 60% of which are legacy APIs — and MCP provides a way for AI to efficiently interact with those APIs. The company says that this new offering can reduce AI integration time by 97% and costs by 90%.
Confluent announces Streaming Agents
Streaming Agents is a new feature in Confluent Cloud for Apache Flink that brings agentic AI into data stream processing pipelines. It enables users to build, deploy, and orchestrate agents that can act on real-time data.
Key features include tool calling via MCP, the ability to connect to models or databases using Flink, and the ability to enrich streaming data with non-Kafka data sources, like relational databases and REST APIs.
“Even your smartest AI agents are flying blind if they don’t have fresh business context,” said Shaun Clowes, chief product officer at Confluent. “Streaming Agents simplifies the messy work of integrating the tools and data that create real intelligence, giving organizations a solid foundation to deploy AI agents that drive meaningful change across the business.”
Anthropic expands Claude Sonnet 4’s context window to 1M tokens
With this larger context window, Claude can process codebases with 75,000+ lines of code in a single request. This allows it to better understand project architecture, cross-file dependencies, and make suggestions that fit with the complete system design.
Longer context windows are now in beta on the Anthropic API and Amazon Bedrock, and will soon be available in Google Cloud’s Vertex AI.
For prompts over 200K tokens, pricing will increase to $6 / million tokens (MTok) for input and $22.50 / MTok for output. The pricing for requests under 200K tokens will be $3 / MTok for input and $15 / MTok for output.
The company also extended its learning mode designed for students into Claude.ai and Claude Code. Learning mode asks users questions to guide then through concepts instead of providing immediate answers, to promote critical thinking of problems.
OpenAI adds GPT-4o as a legacy model in ChatGPT
With this update, paid users will now be able to select GPT-4o when using ChatGPT, along with other models like o3, GPT-4.1, and GPT-5 Thinking mini.
The model picker for GPT-5 also now includes Auto, Fast, and Thinking mode. Fast prioritizes giving the fastest answers, thinking prioritizes giving deeper answers that take longer to think through, and auto chooses between the two.
The company also increased the message limit for Plus and Team users to 3,000 per week on GPT-5 Thinking.
Google releases Gemma 3 270M
This new model is “designed from the ground up for task-specific fine-tuning with strong instruction-following and text structuring capabilities already trained in,” according to Google.
It is ideal in situations where there is a high-volume, well-defined task; speed and cost matters; user privacy needs to be protected; or there is a desire for a fleet of specialized task models.
Both pretrained and instruction tuned versions of the model are available for download from Hugging Face, Ollama, Kaggle, LM Studio, and Docker. Alternatively, the models can be tried out in Vertex AI.
NVIDIA releases latest models in Llama Nemotron family
Llama Nemotron are a family of reasoning models, and the latest updates include a new hybrid model architecture, compact quantized models, and a configurable thinking budget to give developers more control over token generation.
This combination lets the models reason more deeply and respond faster, without needing more time or computing power. This means better results at a lower cost,” the company wrote in an announcement.
Google’s coding agent Jules gets critique functionality
Google is enhancing its AI coding agent, Jules, with new functionality that reviews and critiques code while Jules is still working on it.
“In a world of rapid iteration, the critic moves the review to earlier in the process and into the act of generation itself. This means the code you review has already been interrogated, refined, and stress-tested … Great developers don’t just write code, they question it. And now, so does Jules,” Google wrote in a blog post.
According to the company, the coding critic is like a peer reviewer who is familiar with code quality principles and is “unafraid to point out when you’ve reinvented a risky wheel.”
GitHub to be folded into Microsoft’s CoreAI org
GitHub’s CEO Thomas Dohmke has announced his plans to leave the company at the end of the year.
In a memo to employees, he said that Microsoft doesn’t plan to replace him; rather, GitHub and its leadership team will now operate under Microsoft’s CoreAI organization, a group within the company focused on developing AI-powered tools, including GitHub Copilot.
“Today, GitHub Copilot is the leader of the most successful and thriving market in the age of AI, with over 20 million users and counting,” he wrote. “We did this by innovating ahead of the curve and showing grit and determination when challenged by the disruptors in our space. In just the last year, GitHub Copilot became the first multi-model solution at Microsoft, in partnership with Anthropic, Google, and OpenAI. We enabled Copilot Free for millions and introduced the synchronous agent mode in VS Code as well as the asynchronous coding agent native to GitHub.”
Sentry launches MCP monitoring tool
Application monitoring company Sentry is making it easier to gain visibility into MCP servers with the launch of a new monitoring tool.
With MCP monitoring, developers can understand things like which clients are experiencing errors, which tools are most used, or which tools are running slow. They can also correlate errors with events like traffic spikes or new release deployments, or figure out if errors are only happening on one type of transport.
According to Cody De Arkland, head of developer experience at Sentry, when Sentry launched its own MCP server, it was getting over 30 million requests per month. He said that at that scale, it’s inevitable that errors will occur, and existing monitoring tools were struggling with MCP servers.
bitHuman launches SDK for creating AI avatars
AI company bitHuman has announced a visual SDK for creating avatars for use as chat agents, instructors, virtual coaches, companions, and experts in different fields.
According to the company, the SDK allows avatars to be created on Arm-based and x86 systems without a GPU. The avatars have a small footprint and can be run online or offline on devices like Chromebooks, Mac Minis, and Raspberry Pis.
Because of their small footprint, these characters can be brought to a wide range of environments, including classrooms, kiosks, mobile apps, or edge devices.
OpenAI launches GPT-5
OpenAI announced the availability of GPT-5, which it says is “smarter across the board” compared to previous models.
Specifically for coding, GPT-5 achieved significant improvement in complex front-end generation and debugging larger repositories. Early testers said that it made better design choices in terms of spacing, typography, and white space, according to the company.
“We think you will love using GPT-5 much more than any previous AI,” CEO Sam Altman said during the livestream. “It is useful. It is smart. It is fast. It’s intuitive.”
Anthropic releases Claude Opus 4.1
This latest update improves the model’s research and data analysis skills, and achieves 74.5% on SWE-bench Verified (compared to 72.5% on Opus 4).
It is available to paid Claude users, in Claude Code, and on Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.
The company plans to release larger improvements across its models in the coming weeks as well.
AWS introduces Automated Reasoning checks to reduce AI hallucinations
Automated Reasoning checks are part of Amazon Bedrock Guardrails, and validate the accuracy of AI generated content against domain knowledge. According to AWS, this feature provides 99% verification accuracy.
This was first introduced as a preview at AWS re:Invent, and with this general availability release, several new features are being added, including support for large documents in a single build, simplified policy validation, automated scenario generation, enhanced policy feedback, and customizable validation settings.
Google adds Gemini CLI to GitHub Actions
This new offering is designed to act as an agent for routine coding tasks. At launch, it includes three workflows: intelligent issue triage, pull request reviews, and the ability to mention @gemini-cli in any issue or pull request to delegate tasks.
It is available in beta, and Google is offering free-of-charge quotas for Google AI Studio. It is also supported in Vertex AI and Standard and Enterprise tiers of Gemini Code Assist.
OpenAI announces two open weight reasoning models
OpenAI is joining the open weight model game with the launch of gpt-oss-120b and gpt-oss-20b.
Gpt-oss-120b is optimized for production, high reasoning use cases, and gpt-oss-20b is designed for lower latency or local use cases.
According to the company, these open models are comparable to its closed models in terms of performance and capability, but at a much lower cost. For example, gpt-oss-120b running on an 80 GB GPU achieved similar performance to o4-mini on core reasoning benchmarks, while gpt-oss-20b running on an edge device with 16 GB of memory was comparable to o3-mini on several common benchmarks.
Google DeepMind launches Genie 3
Genie 3 is a frontier model for generating real world environments. It can model physical properties of the world, like water, lighting, and environmental actions.
Users can also use prompts to change the generated world to add new objects and characters or change weather conditions, for example.
According to DeepMind, this research is important because it can enable AI agents to be trained in a variety of simulated environments.