Before I start, I want to point out that I am not referring to one particular project or any particular individual. I believe these problems are industry-wide having spoken to others. Nearly all automation testers I have worked with have busted a gut to make this faulty machine work. I am hating the game, not the player.
If I am not mistaken, I appear to have awoken in an alternate reality where vast sums of money, time, and resources are allocated to both the writing and the continual maintenance of end-to-end tests. We have a new breed of a developer known as the automation tester whose primary reason for being is not only to find bugs but also to write a regression test to negate the need to do a re-run of the initial manual testing.
Automated regression tests sound great in theory, and anybody starting a new job could not fail to be impressed when finding out that every story in every sprint would have an accompanying end-to-end test written in Selenium webdriver.
I have heard numerous tales of end-to-end tests usually written in selenium webdriver getting deleted due to their brittle nature. Test automation seems to only result in CI build sabotage with non-deterministic tests making change and progression next to impossible. We have test automation engineers too busy or unwilling to carry out manual tests and instead stoking the flames of hell with these underperforming time and resource grasping non-deterministic tests.
Tests that re-run on failure are standard and even provided by some test runners. Some of the most challenging code to write is being written and maintained by the least experienced developers. Test code does not have the same spotlight of scrutiny shone on it. We never stop to ask ourselves whether this insane effort is worth it. We don’t track metrics, and we only ever add more tests.
It is like a bizarre version of Groundhog Day only it is a broken build and not a new day that starts the same series of events. I am now going to list the repeating problems that I see on a project laden with the burden of carrying a massive end-to-end test suite.
At this time of writing, nearly all tests assert their expectations on a fixed set of inputs. Below is a simple login feature file:
Feature: Login Action Scenario: Successful Login with Valid Credentials Given User is on Home Page When User Navigate to LogIn Page And User enters UserName and Password Then Message displayed Login Successfully
The feature file executes the following Java code in what is known as a step definition:
@When("^User enters UserName and Password$") public void user_enters_UserName_and_Password() throws Throwable { driver.findElement(By.id("log")).sendKeys("testuser_1"); driver.findElement(By.id("pwd")).sendKeys("Test@123"); driver.findElement(By.id("login")).click(); }
This test will only ever find bugs if this finite set of inputs triggers the bug. A new user entering other characters other than testuser_1
and Test@123
won’t be caught by this end-to-end test. We can increase the number of inputs by using a cucumber table:
Given I open Facebook URL And fill up the new account form with the following data | First Name | Last Name | Phone No | Password | DOB Day | DOB Month | DOB Year | Gender | | Test FN | Test LN | 0123123123 | Pass1234 | 01 | Jan | 1990 | Male |
The most likely time that these tests will find bugs is the first time they run. While the above tests or tests still exist, we will have to maintain these tests. If they use selenium webdriver, then we might run into latency problems on our continuous integration pipeline.
These tests can be pushed down the test pyramid onto the unit tests or integration tests.
I am not saying we should do away with end-to-end tests, but if we want to avoid the maintenance of these often brittle tests, then we should only test the happy path. I want a smoke test that lets me know the most crucial functionality is working. Exceptional paths should be handled at a more granular level in the developer unit tests or integration tests.
The most common reason for a bug in the login example is user input. We should not be spinning up selenium to test user input. We can write inexpensive unit tests to check user input that does not require the maintenance overhead of an end-to-end test. We still need one end-to-end test for the happy path just to check it all hangs together, but we don’t need end-to-end tests for the exceptional paths.
Testing can be and should be broken up with most of the burden carried by unit tests and integration tests.
Has everyone forgotten the test pyramid?
I have blogged about this previously in my post Cypress.io: the Selenium killer. It is nearly impossible not to write non-deterministic selenium tests because you have to wait for the DOM and the four corners of the cosmos to be perfectly aligned to run your tests.
If you are testing a static webpage with no dynamic content, then selenium is excellent. If however, your website has one or more of these conditions, then you are going to have to contend with flakey or non-deterministic tests:
An automation tester faced with any of the above conditions will litter their tests with a series of waits, polling waits, checking for ajax calls to have finished, checking for javascript to have loaded, checking for animations to have completed, etc.
The tests turn into an absolute mess and a complete maintenance nightmare. Before you know it, you have test code like this:
click(selector) { const el = this.$(selector) // make sure element is displayed first waitFor(el.waitForDisplayed(2000)) // this bit waits for element to stop moving (i.e. x/y position is same). // Note: I'd probably check width/height in WebdriverIO but not necessary in my use case waitFor( this.client.executeAsync(function(selector, done) { const el = document.querySelector(selector) if (!el) throw new Error( `Couldn't find element even though we .waitForDisplayed it` ) let prevRect function checkFinishedAnimating() { const nextRect = el.getBoundingClientRect() // if it's not the first run (i.e. no prevRect yet) and the position is the same, anim // is finished so call done() if ( prevRect != null && prevRect.x === nextRect.x && prevRect.y === nextRect.y ) { done() } else { // Otherwise, set the prevRect and wait 100ms to do another check. // We can play with what amount of wait works best. Probably less than 100ms. prevRect = nextRect setTimeout(checkFinishedAnimating, 100) } } checkFinishedAnimating() }, selector) ) // then click waitFor(el.click()) return this; }
My eyes water looking at this code. How can this be anything but one big massive flake and that takes time and effort to keep this monster alive?
Cypress.io gets around this by embedding itself in the browser and executing in the same event loop as the browser and code executes synchronously. Taking the asynchronicity and not having to resort to polling, sleeping, and waiting for helpers is hugely empowering.
Test automation engineers are very possessive about their tests, and in my experience, we don’t do any work to identify whether a test is paying its way.
We need tooling that monitors the flakiness of tests, and if the flakiness is too high, it automatically quarantines the test. Quarantining removes the test from the critical path and files a bug for developers to reduce the flakiness.
If re-running the build is the solution to fixing a test, then that test needs to be deleted. Once developers get into the mindset of pressing the build again button, then all faith in the test suite has gone.
The test runner courgette can be disgracefully configured to re-run on a fail:
@RunWith(Courgette.class)= @CourgetteOptions( threads = 1, runLevel = CourgetteRunLevel.FEATURE, rerunFailedScenarios = true, showTestOutput = true, )) public class TestRunner { }
What is being said by rerunFailedScenarios = true
is that our tests are non-deterministic, but we don’t care, we are just going to re-run them because hopefully next time they will work. I take this as an admission of guilt. Current test automation thinking has deemed this acceptable behavior.
If your test is non-deterministic, i.e. it has different behavior when running with the same inputs, then delete it. Non-deterministic tests can drain the confidence of your project. If your developers are pressing the magic button without thinking, then you have reached this point. Delete these tests and start again.
Test maintenance has been the death of many test automation initiatives. When it takes more effort to update the tests than it would take to re-run them manually, test automation will be abandoned. Your test automation initiative should not fall victim to high maintenance costs.
There’s a lot more to testing than simply executing and reporting. Environment setup, test design, strategy, test data, are often forgotten. You can watch your monthly invoice skyrocket from your cloud provider of choice as the number of resources required to run this every expanding test suite.
Automation testers are often new to development and are suddenly tasked with writing complicated end-to-end tests in selenium webdriver, and as such, they need to do the following:
Thread.sleep
and other hacks. A puppy dies in heaven every time an automation tester uses Thread.sleep
with some arbitrary number in the futile hope that after x
milliseconds the world will be as they expect. Failure is the only result of using Thread.sleep
Automation test code needs to come under the same scrutiny as real code. These difficult to write test scenarios should not be a sea of copy and paste hacks to reach the finish point.
I have some sympathy with this point, but manual testing is not as compelling as writing code, so manual testing is perceived as outdated and boring. Automation tests should be written after the manual testing to catch regressions. A lot of automation testers that I have worked with do not like manual testing anymore, and it is falling by the wayside. Manual testing will catch many more bugs than writing one test with one fixed set of inputs.
It is often commonplace now to write Gherkin syntax on a brand new ticket or story and go straight into writing the feature file and step definition. If this happens, then, manual testing is bypassed, and a regression test is written before the actual regression has happened. We are writing a test for a bug that will probably never happen.
In my estimation, we are spending vast sums of money and resources on something that’s just not working. The only good result that I have seen from automated testing is an insanely long build, and we have made change exceptionally difficult.
We are not sensible about automated testing. It sounds great in principle. Still, there are so many bear traps that we can quickly end up in a dead-end where change is excruciating and difficult to maintain tests are kept alive for no good reason.
I will leave you with these questions that I think need to be answered:
Thread.sleep
code in their rush to complete the task? This is the root of the flake.
Install LogRocket via npm or script tag. LogRocket.init()
must be called client-side, not
server-side
$ npm i --save logrocket // Code: import LogRocket from 'logrocket'; LogRocket.init('app/id');
// Add to your HTML: <script src="https://cdn.lr-ingest.com/LogRocket.min.js"></script> <script>window.LogRocket && window.LogRocket.init('app/id');</script>
Hey there, want to help make our blog better?
Join LogRocket’s Content Advisory Board. You’ll help inform the type of content we create and get access to exclusive meetups, social accreditation, and swag.
Sign up nowThe useReducer React Hook is a good alternative to tools like Redux, Recoil, or MobX.
Node.js v22.5.0 introduced a native SQLite module, which is is similar to what other JavaScript runtimes like Deno and Bun already have.
Understanding and supporting pinch, text, and browser zoom significantly enhances the user experience. Let’s explore a few ways to do so.
Playwright is a popular framework for automating and testing web applications across multiple browsers in JavaScript, Python, Java, and C#. […]
15 Replies to "Automated testing is not working"
I mostly agree… though browser derived integration tests can be very useful. Personally, my preference is Jest + Puppeteer.
Puppeteer does not allow access to the dev console for investigation. A better solution is headless chrome which can be turned off to allow developers to see the display and have access to the dev console for investigation.
Completely agree. I also hate end to end tests. Targeted tests for the win.
Wow just wow. The amount of wrong info in this is amazing. I don’t even know where to start. I guess I would venture a guess that the person who wrote this lives in an area that’s pretty far behind. Number one not everything should have an end to end test. Google Martin Fowlers. Next the way in which the examples are written are extremely brittle which is why your test automation is useless. I honestly thought people were starting to get over this false perception that automation is a waste but it is possible to go full CI/CD straight to prod on a merge and the only way to do it is test automation.
Alleluia! From your lips to CIO’s ears.
I laughed and cried at this…
A bunch of common sense bad practise which any good test automation engineer just don’t do, not sure what kind of testers you are used to work with??? Ofc UI test automation is only for happy path, ofc test pyramid test other edge cases, ofc test need to be stable and not flakey, ofc for long term project scalability manual test everything is not the way to go….
I totally agree with Lorenzo and Austin. Programming 101, garbage in garbage out. Follow best practices and common sense and you can’t go wrong with automation.
Surely one can make the case that writing code to make software is inherently bad because of a buggy poorly written application by following the logic put forth in this article. There are both bad developers and bad testers.
I completely agree with you. While test automation has it’s flaws, its necessary for a CI/CD process to be successful. Most organizations, with a fully implemented test automation framework in place, actually have their developers write the e2e tests. I doubt any dev would write brittle code like what’s in the article. Even I, an automated tester, wouldn’t write code that bad. Now who’s the least experienced developer?
Visual and API testing could solve the gaps leaving by selenium abd will provide a/b testing out of The box.
It’s rare to meet such opinion about automated testing. Manual still alive and it’s cool!
“Why is nobody questioning if the payback is worth the effort?”
Will answer this through personal experience. I am working on an application which is critical to the business and central point for the company. 2 people have been writing end to end tests with appropriate mix of API, Web and App based tests. We have automated much of the regression suite in about 4 months and already caught more than 20 bugs which were pure regressions. Now why only 20 bugs you may ask. Then we need to be reminded that they were the bugs which were missed by dev-unit tests as well as manual guys. Also needs to be reminded that they were high severity bugs. With heavy changes performed each sprint especially in a large scale application there is a huge risk for existing functionality to break, something which we need to test before every release. Can we burden the responsibility of running these tests to manual QAs at each and every sprint? Can we expect same level of reliability and accuracy from manual testing as we repeatedly have seen from automation? And that too each and every time? Do we exactly know impact area for each and every changes we make?
This attitude of trying to too much expect a payback from each and every effort is something which big companies dont want to be involved in. Automation solves the problem of repeated checks. It provides peace of mind to all the business people whenever we make business flow changes or even technical optimizations. It frees up time for manual testers to perform more and more exploratory tests. That is what automation is for. Setting wrong expectations and not following a correct process does not mean there is something wrong with the job itself.
‘Why are we allowing flakey tests to be the norm, not the exception?”
We definitely are not allowing flaky tests to be the norm. We invest good time amount of time in tests stability. If automation testers are not able to make atleast 90% of their tests stable, they are not doing their job correctly.
“Why is selenium the norm when it is not fit for purpose?”
You have mentioned Cypress, while its an amazing tool its not fit for large scale testing projects. Refer https://dzone.com/articles/cypress-vs-selenium-webdriver-which-is-better-for
Selenium has been the de-facto tool since years. There must be and is a reason for it.
“Why is re-running a test with the same inputs and getting a different result excusable to the point where we have runners such as courgette that do this automatically?”
Ok, this is a tough question. Agreed when we have too many non-deterministic tests, there is some problem. But for any complex projects, even many bugs found from manual testing fails to be reproduced again. Many a times there goes something wrong with the system itself. And this is expected. Try to work to on a large scale systems and you will these problems every now and then. Many times scripts are at fault too but again once properly designed, need to retry tests becomes lesser and lesser.
This is a really helpful guide for QA professionals to overcome some common challenges while using automation testing tools like QARA Enterprise, Ranorex, Katalon Studio, etc.
I read the blog post “Automated Testing is Not Working” with great interest, and I must say, the author raises some valid points. As a software developer who has dealt with automated testing extensively, I can relate to the challenges mentioned.