Generative AI and software testing: Here’s what our experiments with generative AI and software testing found

Published: June 9th, 2023

Amid the cacophony of noise about generative AI and software development, we haven’t seen much thoughtful discussion about software testing specifically. We’ve been experimenting with ChatGPT’s test writing capabilities and wanted to share our findings. In short: we conclude that ChatGPT is only somewhat useful for writing tests today, but we expect that to change dramatically in the next few years and developers should be thinking now about how to future-proof their careers.

We’re the cofounders of CodeCov, a company acquired by Sentry that specializes in code coverage, so we’re no strangers to testing. For the past two months, we’ve been exploring the ability of ChatGPT and other generative AI tools to write unit tests. Our exploration primarily involved providing ChatGPT with coverage information for a particular function or class and code for that class. We then prompted ChatGPT to write unit tests for any part of the provided code that was uncovered, and determined whether or not the generated tests successfully exercised the uncovered lines of code.

We’ve found that ChatGPT can reliably handle 30-50% of test writing currently, though the tests it handles well are primarily the easier tests, or those that test trivial functions and relatively straightforward code paths. This suggests that ChatGPT is of limited use for test writing today, since organizations with any amount of testing culture will typically have written their most straightforward tests already. Where generative AI will be most helpful in future is in correctly testing more complex code paths, allowing developer time and attention to be diverted to more challenging problems.

However, we already have seen improvements in the quality of test generation, and we expect this trend to continue in the coming years. First, very large, tech-forward organizations like Netflix, Google, and Microsoft are likely to build models for internal use trained on their own systems and libraries. This should allow them to achieve substantially better results, and the economics are too compelling for them not to do so. Given the rapid rates of improvement that we’re seeing from generative AI programs, a well trained LLM could be writing a large portion of these companies’ software tests in the near future.

Further out, in the next three to five years, we anticipate that all organizations will be impacted. The companies developing generative AI tools – whether Scale AI, Google, Microsoft, or someone else – will train models to better understand code, and once AI is smart enough to understand the structure of code and how it executes, there is no reason that future-gen AI tools won’t be able to handle all unit testing. (Google had an announcement along these lines just last month). In addition, Microsoft’s ownership of GitHub gives them an enormous platform to distribute AI coding tools to millions of software developers easily, meaning large-scale adoption can happen very quickly.

Whether the world will be ready for fully automated testing is another question. Much like self-driving cars, we expect that AI will be able to write 100% of code before humans are 100% ready to trust it. In other words, even when AI can handle all unit testing, organizations will still want humans as a backstop to review any code that AI has written, and may still prefer human-authored tests for the most critical code paths. Additionally, developers will still want metrics like code coverage to demonstrate the veracity of an AI’s efforts. Trust may take a long time to build.

Looking further out, AI may redefine how we approach software testing in its entirety. Rather than generating and executing automated tests, the testing framework may be the AI itself. It’s not out of the question that a sufficiently advanced and trained AI with access to enough computing resources could simply exercise all code paths for us, return any executions that fail and recommend fixes for those failing paths, or just automatically correct them in the course of analyzing and executing the code. This could obviate the need for software testing in the traditional sense altogether.

In any event, it’s likely that in the coming years AI will be able to do much of the work that developers do today, testing included. This could be bad news for junior engineers, but it remains to be seen how this will play out. We can also imagine a scenario in which “AI + junior engineers” could do the work of a mid-level engineer at lower cost, so it’s unclear who will be most affected.

Whatever the case, it’s important to experiment with these tools now if you’re not doing so already. Ideally, your organization is already providing opportunities to test generative AI tools and determine how they can make teams productive and efficient, now or in the near future. Every company should be doing this. If that’s not the case where you work, then you should still be experimenting with your own code on your own time.

One way to think about the role AI will fill is to think of it as a junior developer. If you want to stay “above the algorithm” and have a continuing role alongside AI, pay attention to where junior developers tend to fail today, because that’s where humans will be needed.

The ability to review code will always be important. Instead of writing code, think of your role as a reviewer or mentor, the person who supervises the AI and helps it to improve. But whatever you do, don’t ignore it, because it’s clear to us that change is coming and our roles are all going to shift.

Article Tags

chatgpt, codecov, generative AI, testing

About Jerrod Engelberg

Jerrod Engelberg is Head of CodeCov at Sentry

View all posts by Jerrod Engelberg

About Eli Hooten

Eli Hooten is Director of Engineering at Sentry

View all posts by Eli Hooten

Cookie	Duration	Description
cf_use_ob	past	Cloudflare sets this cookie to improve page load times and to disallow any security restrictions based on the visitor's IP address.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__atuvc	1 year 1 month	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__atuvs	30 minutes	AddThis sets this cookie to ensure that the updated count is seen when one shares a page and returns to it, before the share count cache is updated.
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie	Duration	Description
__gads	1 year 24 days	The __gads cookie, set by Google, is stored under DoubleClick domain and tracks the number of times users see an advert, measures the success of the campaign and calculates its revenue. This cookie can only be read from the domain they are set on and will not track any data while browsing through other sites.
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_S6PB8V57DG	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_846073_1	1 minute	Set by Google to distinguish users.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_jsuid	1 year	This cookie contains random number which is generated when a visitor visits the website for the first time. This cookie is used to identify the new visitors to the website.
at-rand	never	AddThis sets this cookie to track page visits, sources of traffic and share counts.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
iutk	5 months 27 days	This cookie is used by Issuu analytic system to gather information regarding visitor activity on Issuu products.
uvc	1 year 1 month	Set by addthis.com to determine the usage of addthis.com service.
vuid	2 years	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
WMF-Last-Access	1 month 14 hours 26 minutes	This cookie is used to calculate unique devices accessing the website.

Cookie	Duration	Description
__Host-GAPS	2 years	This cookie allows the website to identify a user and provide enhanced functionality and personalisation.
_pxhd	session	Used by Zoominfo to enhance customer data.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
loc	1 year 1 month	AddThis sets this geolocation cookie to help understand the location of users who share the information.
mc	1 year 1 month	Quantserve sets the mc cookie to anonymously track user behaviour on the website.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
__gpi	1 year 24 days	No description
__Secure-YEC	1 year 1 month	No description
_heatmaps_g2g_100754890	10 minutes	No description
_techvalidate_session	session	No description
cf_7166_id	20 years	No description
cf_7166_person_last_update	session	No description
f5avraaaaaaaaaaaaaaaa_session_	session	No description available.
GoogleAdServingTest	session	No description
Gyazo_cfwoker	7 years 2 months 17 days 7 hours	No description
incap_ses_451_2783402	session	No description
incap_ses_769_2783402	session	No description
loglevel	never	No description available.
m	2 years	No description available.
nlbi_2783402	session	No description
prism_252377639	1 month	No description
TS011605d9	session	No description
ustream-guest	session	No description available.
visid_incap_2783402	1 year	No description
xtc	1 year 1 month	No description

AI

AI and Software Development

Observability

Guide to Observability

CI/CD

A guide to CI/CD

Cloud Native

Cloud Native Content

Data

A Guide to Data

Test

Security Testing

Mobile

Mobile Testing

API

Sponsored by Parasoft

Performance

Load & Performance Testing

DevSecOps

A Guide to DevSecOps

Enterprise Security

A Guide to Security

Supply Chain Security

Supply Chain Security

Dev Manager

Dev Managers Content

Agile

A Guide To Agile

Value Stream

A Guide To Value Stream

Productivity

A Guide To Productivity

DevOps

DevOps Content

API

Gravitee.io

AI

AI and Software Development

Value Stream Management

A Guide To Value Stream

Generative AI and software testing: Here’s what our experiments with generative AI and software testing found

Article Tags

Subscribe to SDTimes

About Jerrod Engelberg

About Eli Hooten

Related Articles

The past, present and future of chatbots

Snyk announces new DAST solution for securing APIs and web apps

The essential role of ‘human testers’ in leveraging generative AI for software testing

5 common assumptions in load testing—and why you should rethink them