August 2025: AI updates from the past month

Anthropic starts testing a Claude extension for Chrome

The extension will enable Claude to take action on websites on behalf of the user. “We’ve spent recent months connecting Claude to your calendar, documents, and many other pieces of software. The next logical step is letting Claude work directly in your browser,” the company says.

The company is starting off with a small pilot of 1,000 Max plan users, and will gradually expand the program out to more people if the pilot goes well.

According to Anthropic, one of the big safety challenges with agents that use the browser is prompt injection attacks, and some of the steps the company has taken to defend against them are providing site-level permissions and requiring action confirmations. This pilot will test how well those defenses hold up in real-world scenarios.

Google integrates Gemini CLI into Zed code editor

Google announced that it has brought the Gemini CLI to the open source code editor, Zed. The new integration will enable Zed users to generate and refactor code in the editor, get instant answers on code or error messages, and chat naturally in the terminal.

Developers will be able to follow along live with the Gemini agent as it makes changes. Once the agent is done working, Zed will display the changes in a review interface that shows a clear diff for each edit that can be reviewed, accepted, or modified, providing the same level of control as a code review.

Users will also be able to provide context beyond the codebase by pointing the agent to external sources like a URL with documentation or an API spec.

Microsoft packs Visual Studio August update with smarter AI features

Microsoft has released the August update for Visual Studio 2022, adding several features related to AI-assisted development.

The company announced that GPT-5 is now integrated into the IDE, and support for MCP is generally available as well. MCP support enables developers to authenticate with any OAuth provider directly from the IDE, perform one-click installation of MCP servers, and manage MCP access from GitHub policy settings.

Copilot Chat was updated with the ability to surface relevant code snippets more reliably using improved semantic code search to determine when queries should trigger a code lookup. Developers can now connect models from OpenAI, Google, and Anthropic to Visual Studio Chat, as well.

Agent Mode in Gemini Code Assist now available in VS Code and IntelliJ

This mode was introduced last month to the Insiders Channel for VS Code to expand the capabilities of Code Assist beyond prompts and responses to support actions like multiple file edits, full project context, and built-in tools and integration with ecosystem tools.

Since being added to the Insiders Channel, several new features have been added, including the ability to edit code changes using Gemini’s Inline diff, user-friendly quota updates, real-time shell command output, and state preservation between IDE restarts.

Separately, the company also announced new agentic capabilities in its AI Mode in Search, such as the ability to set dinner reservations based on factors like party size, date, time, location, and preferred type of food. U.S. users opted into the AI Mode experiment in Labs will also now see results that are more specific to their own preferences and interests. Google also announced that AI Mode is now available in over 180 new countries.

GitHub’s coding agent can now be launched from anywhere on platform using new Agents panel

GitHub has added a new panel to its UI that enables developers to invoke the Copilot coding agent from anywhere on the site.

From the panel, developers can assign background tasks, monitor running tasks, or review pull requests. The panel is a lightweight overlay on GitHub.com, but developers can also open the panel in full-screen mode by clicking “View all tasks.”

The agent can be launched from a single prompt, like “Add integration tests for LoginController” or “Fix #877 using pull request #855 as an example.” It can also run multiple tasks simultaneously, such as “Add unit test coverage for utils.go” and “Add unit test coverage for helpers.go.”

Anthropic adds Claude Code to Enterprise, Team plans

With this change, both Claude and Claude Code will be available under a single subscription. Admins will be able to assign standard or premium seats to users based on their individual roles. By default, seats include enough usage for a typical workday, but additional usage can be added during periods of heavy use. Admins can also create a maximum limit for extra usage.

Other new admin settings include a usage analytics dashboard and the ability to deploy and enforce settings, such as tool permissions, file access restrictions, and MCP server configurations.

Microsoft adds Copilot-powered debugging features for .NET in Visual Studio

Copilot can now suggest appropriate locations for breakpoints and tracepoints based on current context. Similarly, it can troubleshoot non-binding breakpoints and walk developers through the potential cause, such as mismatched symbols or incorrect build configurations.

Another new feature is the ability to generate LINQ queries on massive collections in the IEnumerable Visualizer, which renders data into a sortable, filterable tabular view. For example, a developer could ask for a LINQ query that will surface problematic rows causing a filter issue. Additionally, developers can hover over any LINQ statement and get an explanation from Copilot on what it’s doing, evaluate it in context, and highlight potential inefficiencies.

Copilot can also now help developers deal with exceptions by summarizing the error, identifying potential causes, and offering targeted code fix suggestions.

Groundcover launches observability solution for LLMs and agents

The eBPF-based observability provider groundcover announced an observability solution specifically for monitoring LLMs and agents.

It captures every interaction with LLM providers like OpenAI and Anthropic, including prompts, completions, latency, token usage, errors, and reasoning paths.

Because groundcover uses eBPF, it is operating at the infrastructure layer and can achieve full visibility into every request. This allows it to do things like follow the reasoning path of failed outputs, investigate prompt drift, or pinpoint when a tool call introduces latency.

IBM and NASA release open-source AI model for predicting solar weather

The model, Surya, analyzes high resolution solar observation data to predict how solar activity impacts Earth. According to IBM, solar storms can damage satellites, impact airline travel, and disrupt GPS navigation, which can negatively impact industries like agriculture and disrupt food production.

The solar images that Surya was trained on are 10x larger than typically AI training data, so the team has to create a multi-architecture system to handle it.

The model was released on Hugging Face.

Preview of NuGet MCP Server now available

Last month, Microsoft announced support for building MCP servers with .NET and then publishing them to NuGet. Now, the company is announcing an official NuGet MCP Server to integrate NuGet package information and management tools into AI development workflows.

“Since the NuGet package ecosystem is always evolving, large language models (LLMs) get out-of-date over time and there is a need for something that assists them in getting information in realtime. The NuGet MCP server provides LLMs with information about new and updated packages that have been published after the models as well as tools to complete package management tasks,” Jeff Kluge, principal software engineer at Microsoft, wrote in a blog post.

Opsera’s Codeglide.ai lets developers easily turn legacy APIs into MCP servers

Codeglide.ai, a subsidiary of the DevOps company Opsera, is launching its MCP server lifecycle platform that will enable developers to turn APIs into MCP servers.

The solution constantly monitors API changes and updates the MCP servers accordingly. It also provides context-aware, secure, and stateful AI access without the developer needing to write custom code.

According to Opsera, large enterprises may maintain 2,000 to 8,000 APIs — 60% of which are legacy APIs — and MCP provides a way for AI to efficiently interact with those APIs. The company says that this new offering can reduce AI integration time by 97% and costs by 90%.

Confluent announces Streaming Agents

Streaming Agents is a new feature in Confluent Cloud for Apache Flink that brings agentic AI into data stream processing pipelines. It enables users to build, deploy, and orchestrate agents that can act on real-time data.

Key features include tool calling via MCP, the ability to connect to models or databases using Flink, and the ability to enrich streaming data with non-Kafka data sources, like relational databases and REST APIs.

“Even your smartest AI agents are flying blind if they don’t have fresh business context,” said Shaun Clowes, chief product officer at Confluent. “Streaming Agents simplifies the messy work of integrating the tools and data that create real intelligence, giving organizations a solid foundation to deploy AI agents that drive meaningful change across the business.”

Anthropic expands Claude Sonnet 4’s context window to 1M tokens

With this larger context window, Claude can process codebases with 75,000+ lines of code in a single request. This allows it to better understand project architecture, cross-file dependencies, and make suggestions that fit with the complete system design.

Longer context windows are now in beta on the Anthropic API and Amazon Bedrock, and will soon be available in Google Cloud’s Vertex AI.

For prompts over 200K tokens, pricing will increase to $6 / million tokens (MTok) for input and $22.50 / MTok for output. The pricing for requests under 200K tokens will be $3 / MTok for input and $15 / MTok for output.

The company also extended its learning mode designed for students into Claude.ai and Claude Code. Learning mode asks users questions to guide then through concepts instead of providing immediate answers, to promote critical thinking of problems.

OpenAI adds GPT-4o as a legacy model in ChatGPT

With this update, paid users will now be able to select GPT-4o when using ChatGPT, along with other models like o3, GPT-4.1, and GPT-5 Thinking mini.

The model picker for GPT-5 also now includes Auto, Fast, and Thinking mode. Fast prioritizes giving the fastest answers, thinking prioritizes giving deeper answers that take longer to think through, and auto chooses between the two.

The company also increased the message limit for Plus and Team users to 3,000 per week on GPT-5 Thinking.

Google releases Gemma 3 270M

This new model is “designed from the ground up for task-specific fine-tuning with strong instruction-following and text structuring capabilities already trained in,” according to Google.

It is ideal in situations where there is a high-volume, well-defined task; speed and cost matters; user privacy needs to be protected; or there is a desire for a fleet of specialized task models.

Both pretrained and instruction tuned versions of the model are available for download from Hugging Face, Ollama, Kaggle, LM Studio, and Docker. Alternatively, the models can be tried out in Vertex AI.

NVIDIA releases latest models in Llama Nemotron family

Llama Nemotron are a family of reasoning models, and the latest updates include a new hybrid model architecture, compact quantized models, and a configurable thinking budget to give developers more control over token generation.

This combination lets the models reason more deeply and respond faster, without needing more time or computing power. This means better results at a lower cost,” the company wrote in an announcement.

Google’s coding agent Jules gets critique functionality

Google is enhancing its AI coding agent, Jules, with new functionality that reviews and critiques code while Jules is still working on it.

“In a world of rapid iteration, the critic moves the review to earlier in the process and into the act of generation itself. This means the code you review has already been interrogated, refined, and stress-tested … Great developers don’t just write code, they question it. And now, so does Jules,” Google wrote in a blog post.

According to the company, the coding critic is like a peer reviewer who is familiar with code quality principles and is “unafraid to point out when you’ve reinvented a risky wheel.”

GitHub to be folded into Microsoft’s CoreAI org

GitHub’s CEO Thomas Dohmke has announced his plans to leave the company at the end of the year.

In a memo to employees, he said that Microsoft doesn’t plan to replace him; rather, GitHub and its leadership team will now operate under Microsoft’s CoreAI organization, a group within the company focused on developing AI-powered tools, including GitHub Copilot.

“Today, GitHub Copilot is the leader of the most successful and thriving market in the age of AI, with over 20 million users and counting,” he wrote. “We did this by innovating ahead of the curve and showing grit and determination when challenged by the disruptors in our space. In just the last year, GitHub Copilot became the first multi-model solution at Microsoft, in partnership with Anthropic, Google, and OpenAI. We enabled Copilot Free for millions and introduced the synchronous agent mode in VS Code as well as the asynchronous coding agent native to GitHub.”

Sentry launches MCP monitoring tool

Application monitoring company Sentry is making it easier to gain visibility into MCP servers with the launch of a new monitoring tool.

With MCP monitoring, developers can understand things like which clients are experiencing errors, which tools are most used, or which tools are running slow. They can also correlate errors with events like traffic spikes or new release deployments, or figure out if errors are only happening on one type of transport.

According to Cody De Arkland, head of developer experience at Sentry, when Sentry launched its own MCP server, it was getting over 30 million requests per month. He said that at that scale, it’s inevitable that errors will occur, and existing monitoring tools were struggling with MCP servers.

bitHuman launches SDK for creating AI avatars

AI company bitHuman has announced a visual SDK for creating avatars for use as chat agents, instructors, virtual coaches, companions, and experts in different fields.

According to the company, the SDK allows avatars to be created on Arm-based and x86 systems without a GPU. The avatars have a small footprint and can be run online or offline on devices like Chromebooks, Mac Minis, and Raspberry Pis.

Because of their small footprint, these characters can be brought to a wide range of environments, including classrooms, kiosks, mobile apps, or edge devices.

OpenAI launches GPT-5

OpenAI announced the availability of GPT-5, which it says is “smarter across the board” compared to previous models.

Specifically for coding, GPT-5 achieved significant improvement in complex front-end generation and debugging larger repositories. Early testers said that it made better design choices in terms of spacing, typography, and white space, according to the company.

“We think you will love using GPT-5 much more than any previous AI,” CEO Sam Altman said during the livestream. “It is useful. It is smart. It is fast. It’s intuitive.”

Anthropic releases Claude Opus 4.1

This latest update improves the model’s research and data analysis skills, and achieves 74.5% on SWE-bench Verified (compared to 72.5% on Opus 4).

It is available to paid Claude users, in Claude Code, and on Anthropic’s API, Amazon Bedrock, and Google Cloud’s Vertex AI.

The company plans to release larger improvements across its models in the coming weeks as well.

AWS introduces Automated Reasoning checks to reduce AI hallucinations

Automated Reasoning checks are part of Amazon Bedrock Guardrails, and validate the accuracy of AI generated content against domain knowledge. According to AWS, this feature provides 99% verification accuracy.

This was first introduced as a preview at AWS re:Invent, and with this general availability release, several new features are being added, including support for large documents in a single build, simplified policy validation, automated scenario generation, enhanced policy feedback, and customizable validation settings.

Google adds Gemini CLI to GitHub Actions

This new offering is designed to act as an agent for routine coding tasks. At launch, it includes three workflows: intelligent issue triage, pull request reviews, and the ability to mention @gemini-cli in any issue or pull request to delegate tasks.

It is available in beta, and Google is offering free-of-charge quotas for Google AI Studio. It is also supported in Vertex AI and Standard and Enterprise tiers of Gemini Code Assist.

OpenAI announces two open weight reasoning models

OpenAI is joining the open weight model game with the launch of gpt-oss-120b and gpt-oss-20b.

Gpt-oss-120b is optimized for production, high reasoning use cases, and gpt-oss-20b is designed for lower latency or local use cases.

According to the company, these open models are comparable to its closed models in terms of performance and capability, but at a much lower cost. For example, gpt-oss-120b running on an 80 GB GPU achieved similar performance to o4-mini on core reasoning benchmarks, while gpt-oss-20b running on an edge device with 16 GB of memory was comparable to o3-mini on several common benchmarks.

Google DeepMind launches Genie 3

Genie 3 is a frontier model for generating real world environments. It can model physical properties of the world, like water, lighting, and environmental actions.

Users can also use prompts to change the generated world to add new objects and characters or change weather conditions, for example.

According to DeepMind, this research is important because it can enable AI agents to be trained in a variety of simulated environments.

Article Tags

About Jenna Barron

Jenna Barron is News Editor of SD Times.

View all posts by Jenna Barron

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI