AI - what role does it have to play in QA?
Today I want to talk about the biggest current development out there - AI. We’ve all heard the “all your jobs will be lost to AI” scaremongering and maybe we should be a little concerned. But, from my point of view, that concern is better placed in not being left behind at being able to use this wonderful new set of tools we have rather then Skynet stealing your job.
Where can we leverage this emerging technology?
Since ChatGPT blazed onto the scene in late 2022 people have been asking it many, many different sorts of questions and tasking it with absurd challenges such as designing a tiny language for ants and creating a murder mystery plot that takes place entirely in a fridge (I asked it for these examples!).
It didn’t take long for these requests to spread into people’s work lives and for AI to start supporting job workflows. Last year was the first year I actively started to look for places I could use AI in my role as a test engineer, to improve efficiencies and be more responsive to changes and challenges in my work place.
In today’s blog I am going to talk about a few examples where I have absolutely found AI support to be beneficial to me and discuss other situations where I can see the potential of it to help me.
Context: I currently work in a Microsoft ecosystem so the enterprise AI tool available to me is MS Copilot. ChatGPT is blocked on my corporate network but is my favoured AI tool in my personal time. I have also used Lumo from Proton as it respects privacy better but I have found its results not as strong as ChatGPT.
Further context: as I state on the About page of this blog, I do not agree with the use of generative AI to create content for me. Every word on this page has been written by a human (me). One of my greatest fears with AI is people delegating their thinking to a machine. We have already seen the decline of reading in the population and its associated negative impact upon language skills (I once worked with a younger man who was apparently proud to declare he had never read a book other than those he had to for school. I frequently used vocabulary he clearly didn’t understand. These two facts are not unrelated…) - I worry about the impact of AI upon people’s writing skills.
I actually quite like the name MS use - Copilot. That is the correct image in my opinion - AI as an assistant, helping you out, fetching info, crunching numbers, doing monotonous tasks etc.
Some concrete examples
In no particular order I will talk about some real life scenarios where I have found AI to be of great use.
- Interrogating and parsing logs:
- I had to test an API upgrade a few months back for a mobile banking app. The upgrade made no visible changes to the functionality of the app but it did introduce a few subtle differences in back end behaviour. The challenge I had was that these were buried in amongst huge logs of very detailed but largely irrelevant information.
- I obtained the logs before the API upgrade was deployed and again after - for the same operations in the mobile app. I then explained the context to Copilot and asked it to compare the two log files and present to me its findings together with the rationale for its conclusion.
- I *could have done similar with a side by side comparison tool and judicious use of CTRL + F but Copilot did it far quicker and presented the results in a nice, easily readable and understandable fashion.
- Interpreting CLI errors:
- I’m an old Linux man and the one saving grace of having to use a Mac for work is that it too has a Command Line Interface (CLI) or terminal. Given the option I will always prefer to work in the terminal than a GUI. A while back I was setting up automated regression tests to run locally on my Mac so used a Maven terminal command to kick things off. I was getting a lot of build errors initially but I wasn’t sure what the cause was.
- I again explained succinctly to Copilot what I was attempting to do and copied in chunks of the terminal output containing the errors.
- It quickly identified potential issues and suggested possible fixes. We didn’t get everything working immediately but we did after some work.
- Again, I *could have done the above by Googling error messages and trawling Stack Overflow etc but using AI helped me reach the solution much quicker and easier. Or, we could say, more efficiently. Spotting a pattern yet?
- Creating test cases:
- I have mentioned my caution towards using generative AI to actually create content and this one is veering close to that but with a major caveat - I’ve asked AI for suggestions for scenarios and then used my human judgement to choose and *actually create the test cases.
- In my experience so far, AI creates far more test scenarios than is necessary. Thus it makes it a good tool to start with then filter out inappropriate (and potentially impossible) test cases. For example a recent suggested test case was to test in dark mode - all very well except our app currently doesn’t support dark mode!
- Copilot produces broad bullet points which I translate into actual test scripts, either manual or automated. I could ask it to do all the work but as I’ve talked about for me that is a step too far. I would have to check all its working anyway so I wouldn’t save *that much time.
- As ever, I clearly outlined the context of my request then fed Copilot the acceptance criteria from the original ticket. I asked it to put its test case scenarios into an Excel spreadsheet and then manually created the test case scripts in qTest.
- Obviously this could be done by a human being - I’m an experienced test engineer, I am a subject matter expert, I know what I need to test. Once again leveraging an AI tool just makes this process a lot more efficient.
- Test data generation:
- This is one I have had mixed results with but there is definitely potential there depending on your system under test. The majority of the data I need to test my banking apps needs to be created by hand using bespoke (and frequently legacy) banking back end apps. I have yet to find a way to utilise AI for this sort of task.
- Non banking specific data though is ideal for tasking AI to create. Mock addresses, fake National Insurance numbers, names, dates etc etc are all bread and butter stuff for Copilot or ChatGPT.
- Yet again, of course this is possible using traditional web tools but using AI is the smarter way to approach such mundane tasks.
- Sounding board:
- Of course this particular use case is not unique to testing. AI is widely used to “bounce ideas off” and I find this pretty useful when thinking about upcoming tasks. With this though we do need to be extremely aware of AI’s desire to please its user. On many occasions I have had AI take me down a dead end, trying to make something work which just won’t because it doesn’t want to say no to me. As long as we bear this mind, along with its nasty habit of hallucinating, chatting to an AI bot to get inspiration can definitely be a rewarding experience.
Possible future uses
I have some more ideas about other situations AI could be a time / effort saver for me - again in no specific order.
- Exploratory testing support:
- I’ve talked about the value of exploratory testing already in this blog and I don’t think it can be overstated. I’ve been working on the same mobile apps for over three years now so it perhaps isn’t of such value to me in my current role but if for example I changed job and started working on applications I wasn’t familiar with this could really help.
- Idea generation - ask AI for suggested customer journeys, possible edge case scenarios, input field validations to help guide exploratory tests.
- Simulating user behaviour. We can ask an AI tool to simulate likely journeys through an app, given particular user personas (which we could in turn use AI to create).
- Assistance with risk assessment and test prioritisation:
- I asked Lumo to suggest possible ways in which it could support me in my job and this was one of the ideas it returned. It intrigued me somewhat. I’m not sure how much data it would need but Lumo claimed by feeding it info such as defect trends and requirements it could predict the likelihood of a range of test suites failing or not and the potential business impact.
- Accessibility:
- This is an area which my current organisation (like many others) doesn’t pay enough attention to. However, we are aware of this and are making improvements. AI could support us here too.
- Measuring colour contrast. If we paste images of screens into Copilot and ask it to measure the contrast ratio between text and its background to confirm how accessible the palette choice is.
- This is an area which my current organisation (like many others) doesn’t pay enough attention to. However, we are aware of this and are making improvements. AI could support us here too.
Points to remember
- Context:
- One thing which has consistently come up for me when working with an AI tool is providing context. This really helps the bot understand what exactly you want from it. For example:
I need to test an API upgrade for an Android mobile banking app.
I will provide you with the acceptance criteria from the original Jira ticket.
Create a list of potential test cases and put these into a spreadsheet for me to download.
etc.
(This example may be slightly misleading as I have broken the instructions down into somewhat short and simple sentences. In my experience AI is just as capable of understanding longer, more complex language too.)
- Repeatability:
- I have found that instructions such as
Do the same operation again with this new data.often gave inconsistent results. I’ve had better results writing an initial (set of) prompt(s), supplying the data in question then restarting with the exact same prompt(s) with the new set of data.
- I have found that instructions such as
- Hallucinations!
- Always be aware that (at this moment in time), AI has a terrible habit of making stuff up! If you are at all sceptical of what it is telling, trust your instinct and ask it to “show its working”. I’ve been extremely doubtful of AI ever since ChatGPT told me, quite confidently, that Frank Sauzee was a legendary Dundee Utd player! (He never played for the club…)
- One strategy I’ve had some success with is to use more than one tool. Eg, I’ve had Copilot tell me something I didn’t fully trust so I fed its output into ChatGPT and asked it to verify. Of course you can do it the old fashioned way and Google things too!
Conclusion
AI seems like it’s everywhere these days (yes Microsoft I’m looking at you) and I would question the need for an AI button in the likes of WhatsApp but it would be remiss to dismiss it as a passing fad. It absolutely can bring value to our work as software testing professionals but we should always exercise caution and remember our brains are the most important tools we have.
AI should be an enhancement, another tool in our arsenal but it should not replace our own critical thinking capacity. At this moment in time it is very fallible: it makes things up and can provide misleading or inaccurate analyses.
However, as long as always remind ourselves of its weaknesses and exercise quality control of its output, it can be a very useful tool. I’ve talked about some real examples from my working life and some areas where I could see it also working. That is 100% the tip of the iceberg though! The AI scene is moving extremely quickly, who knows what uses we will find for it in a year or two?