Highlights –
- TruEra today announced the launch of version 2.0 of its diagnostics platform, TruEra Diagnostics, which gives data scientists access to an automated test incorporating continuous testing into the AI/ML development workflow.
- To help detect degradation, TruEra is also introducing the capacity to run comparative tests across model iterations.
It is challenging to enable responsible Artificial Intelligence (AI) using models that reduce bias and function consistently and reliably.
Components of responsible AI include both testing and offering AI explainability. One of the suppliers that offer testing tools for AI is TruEra, which recently joined the Intel Disruptor program to help advance explainable AI. Until now, organizations struggled with tools like TruEra’s. Testing is typically a one-off activity rather than a continuous, automated process.
Continuous integration/continuous deployment (CI/CD) pipelines have facilitated continuous testing in the software and DevOps. But that method has been chiefly unavailable for AI and machine learning (ML) tasks.
Will Uppington, cofounder and CEO of TruEra, said in an interview, “If you find a bug, how do you test, debug, and identify the problem and then write a test so that the bug never comes back? Machine learning developers do not have those tools today; they don’t have systematic ways of evaluating and testing their models and then systemically debugging them.”
To do this, TruEra announced the launch of version 2.0 of its diagnostics platform, TruEra Diagnostics. This version gives data scientists access to an automated test incorporating continuous testing into the AI/ML development workflow.
How does an automated test harness affect ML development?
The typical development process for data scientists involves creating models on a Jupyter notebook.
According to Uppington, TruEra Diagnostics 2.0 can be installed straight into notebooks. A data scientist can develop a test that will automatically run each time a model is trained by adding a few lines of code. They can also build up several policies that will automatically test a system when specific thresholds or limits are reached.
For instance, if a model, whether in development or production, causes an error or failure, TruEra Diagnostics 2.0 can offer a link that will allow a developer to debug test results.
The user interface of the TruEra system also offers suggestions to assist data scientists in choosing which experiments to execute. According to Uppington, there are a few universal categories, although the tests frequently depend on the model. Testing for bias is one of these categories.
Users can perform a test against bias measures and determine whether the outcome is above or below the organization’s predetermined acceptable level. When bias exceeds the threshold, a link offers a deeper look into the issue, revealing the aspects of the model that are genuinely to blame.
TruEra is also introducing the capacity to run comparative tests across model iterations to help detect degradation. Model summary dashboards help visualize the tests of one version of a model compared to the tests of another, presenting individual models’ test results in a comparative table.
According to Uppington, “One of the key things that you do with machine learning is you retrain your model. And one of the things that you want to do when you retrain the models is you want to make sure that you haven’t seen any performance degradation in your key quality metrics.”
TruEra aims to facilitate regulatory compliance for AI
While enterprises do not currently need to test models continuously and automatically to deploy a model into production, they may find that they will have to in the future.
Organizations that employ AI may be subject to new compliance requirements because AI rules are emerging in the U.S. and Europe. For instance, Uppington cited a Human Resources (HR) law in New York City that will soon compel businesses employing ML in HR systems to assess their AI’s impact and any bias independently. He asserted that firms must continually test and validate models to be able to comply in the future.
Continuous testing of models in development will also help accelerate models’ deployment into production. According to Uppington, he has spoken with financial services businesses where the development of a model has taken upwards of nine months, partly because a model must undergo significant validation before being put into production. The new continuous-testing methodology aims to accelerate development.
According to Uppington, with TruEra, organizations can now more simply construct templates for testing tasks or templates that may be utilized again throughout the development process.
Uppington said, “You don’t wait until the end to run all your tests and then find out there is a big issue. You do that continuously throughout the process, and that’s what this enables.”