In April 2024, the National Institute of Standards and Technology released a draft publication aimed to provide guidance around secure software development practices for generative AI systems. In light of these requirements, software development teams should begin implementing a robust testing strategy to ensure they adhere to these new guidelines.

Testing is a cornerstone of AI-driven development as it validates the integrity, reliability, and soundness of AI-based tools. It also safeguards against security risks and ensures high-quality and optimal performance.

Testing is particularly important within AI because the system under test is far less transparent than a coded or constructed algorithm. AI has new failure modes and failure types, such as tone of voice, implicit biases, inaccurate or misleading responses, regulatory failures, and more. Even after completing development, dev teams may not be able to confidently assess the reliability of the system under different conditions. Because of this uncertainty, quality assurance (QA) professionals must step up and become true quality advocates. This designation means not simply adhering to a strict set of requirements, but exploring to determine edge cases, participating in red teaming to try to force the app to provide improper responses, and exposing undetected biases and failure modes in the system. Thorough and inquisitive testing is the caretaker of well-implemented AI initiatives.

Some AI providers, such as Microsoft, require test reports to provide legal protections against copyright infringement. The regulation of safe and confident AI uses these reports as core assets, and they make frequent appearances in both the October 2023 Executive Order by U.S. President Joe Biden on safe and trustworthy AI  and the EU AI Act. Thorough testing of AI systems is no longer only a recommendation to ensure a smooth and consistent user experience, it is a responsibility.

What Makes a Good Testing Strategy?

There are several key elements that should be included in any testing strategy: 

Risk assessment – Software development teams must first assess any potential risks associated with their AI system. This process includes considering how users interact with a system’s functionality, and the severity and likelihood of failures. AI introduces a new set of risks that need to be addressed. These risks include legal risks (agents making erroneous recommendations on behalf of the company), complex-quality risks (dealing with nondeterministic systems, implicit biases, pseudorandom results, etc.), performance risks (AI is computationally intense and cloud AI endpoints have limitations), operational and cost risks (measuring the cost of running your AI system), novel security risks (prompt hijacking, context extraction, prompt injection, adversarial data attacks) and reputational risks.

An understanding of limitations – AI is only as good as the information it is given. Software development teams need to be aware of the boundaries of its learning capacity and novel failure modes unique to their AI, such as lack of logical reasoning, hallucinations, and information synthesis issues.

Education and training – As AI usage grows, ensuring teams are educated on its intricacies – including training methods, data science basics, generative AI, and classical AI – is essential for identifying potential issues, understanding the system’s behavior, and to gain the most value from using AI.

Red team testing – Red team AI testing (red teaming) provides a structured effort that identifies vulnerabilities and flaws in an AI system. This style of testing often involves simulating real-world attacks and exercising techniques that persistent threat actors might use to uncover specific vulnerabilities and identify priorities for risk mitigation. This deliberate probing of an AI model is critical to testing the limits of its capabilities and ensuring an AI system is safe, secure, and ready to anticipate real-world scenarios. Red teaming reports are also becoming a mandatory standard of customers, similar to SOC 2 for AI.

Continuous reviews – AI systems evolve and so should testing strategies. Organizations must regularly review and update their testing approaches to adapt to new developments and requirements in AI technology as well as emerging threats.

Documentation and compliance – Software development teams must ensure that all testing procedures and results are well documented for compliance and auditing purposes, such as aligning with the new Executive Order requirements. 

Transparency and communication – It is important to be transparent about AI’s capabilities, its reliability, and its limitations with stakeholders and users. 

While these considerations are key in developing robust AI testing strategies that align with evolving regulatory standards, it’s important to remember that as AI technology evolves, our approaches to testing and QA must evolve as well.

Improved Testing, Improved AI

AI will only become bigger, better, and more widely adopted across software development in the coming years. As a result, more rigorous testing will be needed to address the changing risks and challenges that will come along with more advanced systems and data sets. Testing will continue to serve as a critical safeguard to ensure that AI tools are reliable, accurate and responsible for public use. 

Software development teams must develop robust testing strategies that not only meet regulatory standards, but also ensure AI technologies are responsible, trustworthy, and accessible.

With AI’s increased use across industries and technologies, and its role at the forefront of relevant federal standards and guidelines, in the U.S. and globally, this is the opportune time to develop transformative software solutions. The developer community should see itself as a central player in this effort, by developing efficient testing strategies and providing safe and secure user experience rooted in trust and reliability.


You may also like…

The impact of AI regulation on R&D

EU passes AI Act, a comprehensive risk-based approach to AI regulation