A sign that should NOT be ignored
Almost every development team has experienced this at least once: a test fails in CI, nobody touched the code, and a subsequent rerun suddenly passes.
What happens? …
It’s test flakiness, and it usually points to deeper problems in how tests and systems are built.
When this happens, the test suite stops being a reliable source of truth. Developers and testers no longer know whether a failure indicates a real problem or just another false alarm.
What Is Test Flakiness?
A test is considered flaky when it produces different results across multiple runs, even though the code hasn’t changed at all.
If a test sometimes passes and sometimes fails, it becomes unreliable and unreliable tests can’t be trusted, at least not completely.
Flaky tests don’t just slow teams down, they slowly destroy trust in the testing process.
Real-world example
1. Timing Issues in Asynchronous Code
Example:
A test checks whether a background task finishes within two seconds.
- Passes on a fast machine
- Fails on a CI server
The feature works, but the test assumes timing will always be consistent in the most ideal condition.
2. Shared Database State
Example: Two tests use the same database.
- One test inserts data and doesn’t clean up
- Another test expects the database to be empty
Depending on test order, failures appear randomly.
3. Random Data in Tests
Example:
A test uses random input without fixing a seed.
Most runs pass, but occasional failures are hard to reproduce and debug.
Flaky Test vs Weak Test
It’s important to distinguish flaky tests from weak tests.
- Weak tests consistently pass while missing real defects.
- Flaky tests fail inconsistently, even when the requirement is covered extensively.
Both are problematic, but flaky tests are especially harmful because they actively decrease confidence in test results.
Rerunning Tests Is Not a Fix
Many teams tries to solve flakiness by rerunning failed tests or adding retries.
Reruns don’t fix the root cause, they just hide it. Over time, developers learn that failures don’t matter.
Rerunning a failed test until it passes doesn’t prove the code is correct. It proves only one thing: the test result was inconvenient.
When teams rely on reruns, they silently agree that test failures don’t really mean anything..
Flaky Tests Are a Leadership Problem?
Flaky tests persist not because engineers don’t know how to fix them, but because leadership allows them to exist.
Deadlines, delivery pressure, and “we’ll fix it later” mindset all send the same message: reliability is optional.
Flaky Tests Reflect The Quality Of Your Code Reviews?
Here’s an uncomfortable truth: flaky tests often expose problems that code reviews miss.
They reveal:
- Hidden race conditions
- Tight dependency between components
- Overreliance on real infrastructure
- Code that cannot be observed or controlled by either the testers or developers
If your tests are unstable, it’s usually because your system behavior is unstable. The tests are just one of many ways to show it.
How to reduce Test Flakiness
Fixing flaky tests requires more than quick workarounds.
Effective strategies include:
- Design tests so they do not depend on shared state
- Mocking or simulating external services
- Controlling time and randomness explicitly
- Designing asynchronous code with clear completion signals
Treating flaky tests as bugs that must be fixed will improve test reliability, often leads to better system design overall.
Conclusion
Test flakiness is not just a testing inconvenience, it is a warning that something in the system or test design is unstable or poorly controlled.
Teams that take flakiness seriously build more reliable tests, more dependable systems, and higher confidence in every test cycle.
Strong team adopt a better standard: Zero Tolerance for Flakiness
Flaky tests should be treated as bugs, not inconveniences.