AI Testing AI? AI Is Everywhere

AI is everywhere and everybody heard about it – one way or another. The image appearing in most people’s mind when hearing the phrase ‘artificial intelligence’ differs hugely, especially now, shortly after the release of Matrix Resurrections, but rest assured, we are still far from machines and/or robots taking over the world and enslaving humanity. To ensure this isn’t happening, join the community of AI developers and testers and discover together whether it is possible to implement an AI testing AI.

AI testing robot talk

The History of AI

If we look at the history of AI, one might say it all started in the 17th century (not a typo), with Descartes making his controversial statement on animals being only complex machines – claiming they have no thoughts or mind as they did not have souls (NB: in 2014 Pope Francis confirmed that pets have soles and they do indeed go to heaven, so rest assured). The real foundations of AI were laid in the 50ies with John McCarthy calling AI an AI, Turing creating his famous test and Weizenbaum trying to joke with the chatbot ELIZA – that eventually become quite serious.

When Deep Blue finally beat Kasparov 25 years ago after 6 matches spanning more than 1 year, people thought that was a real “Breakthru”. Then came Deep Mind in 2017 and became a champion only within 4 hours!
Now in 2022, these once-so-famous chess-bots are not even considered AI. Because AI evolved, it became something more… not just guessing answers in a chat app or applying brute force to win a chess game – AI today means acquiring knowledge by self-learning, understanding the environment, and gaining own experience, what a psychologist would call: human cognition.

AI chess

AI In Your Everyday Life

You probably have already met AI in your everyday life, and you might not even know about it. Though it is said that today’s AI have not yet reached their full capacity, there is a widespread interest in using them embedded within classical (controversial) systems to carry out smaller, more specific tasks. One of our test managers recently become ISTQB Certified in AI Testing, and if you hit up their syllabus on this topic, you will see how widely AI is implemented in our lives. From telecommunication, through automotive industry to healthcare – everyone is using AI. Have you heard of AWS? Pretty popular nowadays – and also a proud provider of AI services.
If you want to build you next app with AI, you have plenty of development frameworks and specialized hardware to choose from. 2020’s most-favourite frameworks were:

1. Caffe, which is not a cup of coffee, but equally delighting and was created at Berkeley
2. Torch, based on the Lua language and also used in Facebook, Google and Twitter
3. Scikit-learn, your Spotify best friend who also speaks Python,
according to the article of Towards Data Science. IT4nextgen also mentions Tensor Flow, Microsoft CNTK and Theano on their list for best AI-ML frameworks of 2021. Most of these tools are free and available for everyone, so stop napping and get to work.
Oh yeah, wait, you might also need some AI specific hardware, the keyword here is multiprocessing. If you are more of a down-to-earth guy, focus on GPUs – they support parallel processing of simple tasks better, compared to CPUs, as those have fewer cores and is better at computing complex tasks. If you are really into the 21st century, you can also look for AI specific Hardware or get AI as AIaaS (AI as a Service) in the Cloud.

AI usage

Challenges of Testing AI Systems

Once your AI system is built, you will have to face the challenges of Testing AI systems. When a Tester first meets AI, it might seem to be kind of an Easter egg. We start creating test cases just as usual, checking for functionality, performance, regression, and so on. And then we start to bump into walls and when all those beautiful ideas are squelched, we realize that our journey started into the wrong direction. This is not a hardcoded application, there are no “typical developer errors”, there is no solid ground. You, as a tester, usually only have the source data, that was used for the training, validation and testing of the tool, however there is rarely a concrete outcome or expectation, that you can use as the basis for you expected results.

If you are lucky, there is a legacy system, that you can use as a test oracle. Also, as long as your AI’s task is only to distinguish between cats and dogs – you are OK, that result can easily be evaluated by humans. But as soon as the AI takes a more complex task e.g., to count the risk of a co-worker getting heart attack while doing the morning routine with emails, where the calculations are based on thousands of attributes and sample data processed by the AI – the outcome is a bit harder to predict. Not to mention non-deterministic systems. With source code or requirements being, AI is learning as the project proceeds, and develops its predictions based on feedbacks it receives from the outside world. But how can you make sure you give the right feedback? One answer is: you can ask human experts! But what if they also won’t agree on what should have been the outcome? Yeah, not so easy.

Anyway, once everyone is aligned on the expected result, there is another big question: what is the expected accuracy and the level of tolerance? This also is something that is continuously fine-tuned during the project lifecycle. Should the AI provide accurate result for 50% of the examined entities? 75%? 90%? Even 100%? Depends on the task and the concept.

The ISTQB syllabus highlights many more challenges a tester should overcome when testing AI systems. In many cases, the fact that that AI don’t have souls. They are easily “biased” against minorities, or favour one group over another, if it is more “logical”. AI can also work as autonomous systems, until a point where they must give the control back to a human.

Imagine yourself in the situation of an end user who is facing an AI system. For them TIE – Transparency, Interpretability and Explainability are the most crucial quality characteristics.

Now, as you might have guessed from the title of this article, we not only can test the AI, but we can also use AI for testing.

About the specific AI testing methods, test techniques and metrics we plan to write in following articles.

Still crazy about Artificial Intelligence? No spoilers, but stay tuned for our next article, where we will deeper explore the no longer supersecret world of AI and uncover the myths of using AI for testing, see some unique testing metrics and will look behind the scenes of our favorite topic, testing techniques – all this and more, using AI.

AI testing


Author: Laura Albert, Test Manager, DACHS