Using Metamorphic Testing for AI-based Applications

Hai Pham Hoang

Image of autonomous car created by Copilot

Introduction

Testing AI-based applications is always challenging. AI-based applications can be a self-learning, autonomous, probabilistic and non-deterministic systems. There can be many valid outputs for the same input, so testers are difficult in defining expected result for testing scenarios. Tester can’t base on an exact value to check for the result of a test case. If the system is self-learning, it will always change output, design a test case that can suit with the situation is rather difficult or else tester must create new test for new changes. With some complex AI-based system, tester can’t know exactly the expectation. Besides, in conventional testing, one of the technique is testing against the requirement specifications. However, testing against the requirement is also challenging. AI is to produce generalized behavior while testable specification intends to specify general behavior. So, we have to have different approach for testing AI-based applications.

Introduction about metamorphic testing

Metamorphic testing based on the idea that it is often easier to reason about the relationship of a program than understanding clearly it input-output behavior. Metamorphic testing (MT) is a property-based testing technique. It means that MT describe the system functionality based on its generic relations between inputs rather than between input and output or in another way of talk, it describes how a change in the input are reflected in the output. MT provides another approach compared to traditional way of testing, where correctness is not determined by checking concrete output, but by applying a transformation to the input and examine if the metamorphic output satisfied the relation. If the relation is violated, there is a failure in the system. So, MT address the test oracle in software testing, where it is impossible or difficult to know the exact output of a test case.

Terms using in MT:

Seed input: the initial input used for the test, and it will be used to transform. If we know the seed output, we can define stricter relationships, but if not, we still can define the relation to check without 100% it is correct.
Transform: modify the seed input to get a new input in a way that those inputs have relationship that the output can be predictable.
Metamorphic relation: the transform on the input must have a known affect on the output and this affect are called metamorphic relation. Checking this relation are hold after the metamorphic is what we have to do.

One typical example of MT is testing search engine as search engines nowadays are often natural language processing / ML-based system. These are the queries for Google search on Feb 2024, when adding a restrictive keyword, the output should decrease after perturbation.

Most used of metamorphic relationships are: invariance, increase, decrease. Example of them could be:

Invariance: when transforming the input with a synonym, the output should remain unchanged or permuting the order of the elements should not affect the calculation.
Increase: when transforming the input, the output should increase, such as the percentage of ability to bankrupt of a person in a predicted software should increase when he has no house or the sum of two numbers should be increase twice of multiple each number by 2.
Decrease: when transforming the input, the output should decrease, such as the percentage of ability to bankrupt of a person in a predicted software should decrease when he has a lot of houses.

Some examples of applying MT in practice

Object recognition

In these two below images, there are many things different of the same place, for example: time of photo taken is day vs night, there are different noise in the image such as some people vs many people, neon light vs natural light, no decoration vs a lot of decoration, … and the system is still expected to categorize the object correctly.

Here are some variances can be added to test this type of task:

Weather
– Sunny	High brightness, shadow …
– Rainy	Reduced visibility
– Thunderstorm	Adding noise
– Cloudy	Reduced visibility or adding noise
People: – Pedestrian – Motorcyclist – Rider – Hawker – …	Adding noise
Decoration: – Tree – Flower – Banner – Flags – …	Add noise
Other:
– Light	Day, night, runsire, sunset, artificial light, direct light, …
– Rotation	Straight face photo take, slightly straight, …
– Scale
– Nearby objects	Object next to, behinds, on the left, on the right such as buildings, house….

Of course we can combine those variance to create more test scenarios such as rainy with night light, different angle of photo taken at noon, …. Once we have these transformation, based on the expected metamorphic relation, we can check the behavior of the system, for example, the system still have to realize the right object with rainy and thunderstorm. To do so, we have a very large test case set to test and we can archive high coverage with confidence.

Sentiment analysis

The second example would be an automation based on AI tool to analyze the customer’s feedback for an airline service. For example, “This was a good flight” is Positive, “I didn’t love the food” is Negative, “This is an international flight” is a Neutral one.

From those understanding, we can create below test cases based on MT technique for the seed input “This was a good flight”.

So through just one seed input, we can generate large number of cases to test.

Conclusion

Through above examples, we can see that MT is very useful when the correct output cannot be verified, regardless of the input values, if the MT relationships are violated, there is likely a defect in the implementation. So, MT has been used in areas that difficult to have test oracle such as compiler, image recognition, search engine, machine learning, autonomous car in reality ….

Reference

Documents, articles, paper on Internet
Images on Internet
Images created by Copilot

Hai Pham Hoang

Hai is a Senior Test Team Manager at NashTech with 20+ years of expertise in software testing. With a particular passion for software testing, Hai's specialization lies in Accessibility Testing. Her extensive knowledge encompasses international standards and guidelines, allowing her to ensure the highest levels of accessibility in software products. She is also a Certified Trusted Tester.

Using Metamorphic Testing for AI-based Applications

Hai Pham Hoang

Table of Contents

Introduction

Introduction about metamorphic testing

Some examples of applying MT in practice

Object recognition

Sentiment analysis

Conclusion

Reference

Hai Pham Hoang

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements