AI Failure Story Paying Heavy Price for Test Strategy Errors

As Artificial Intelligence (AI) is gaining wide acceptance in the software development world, the question is open on how you should apply software testing practices to this type of software. Based on a true story, Anna Royzman explains that you need to carefully design your test strategy… and not only for AI software.

Author: Anna Royzman, Test Masters Academy, http://testmastersacademy.org/

One of the important tasks of the testers in the company is to shorten software delivery process and save development time. There is never enough time to develop. Time is a very limited resource in any field, but especially in the development of self-learning artificial intelligence (AI).

Naturally, self-learning occurs “by itself” only partially. AI is like a very young child that needs to be looked after and guided. The teaching of AI is conducted through the carefully designed test sets. And it’s the testers who need to oversee the “education” of the AI, since programmers, with the tendency of a positive thinking, are largely unable to cope with this task. Especially when it comes to the very complex projects.

One example of such complex and ambitious projects is the Elon Musk’s OpenAI. Their latest endeavor was to create AI bots that can defeat the team of professional gamers at 5v5 matchup in Valve’s famous video game Dota 2, and once again demonstrate the superiority of AI over the intelligence of humans.

Dota 2 is a team game in the genre MOBA where two teams of 5 gamers each compete for the goal to destroy the main building of the opponent. Each player picks and controls one of the 115 characters (heroes) and must coordinate their actions with other team members to jointly achieve the victory. The game is among the highest in complexity, as the team must demonstrate the strategy, improvisation within the unfamiliar context and the influence of the allied units to succeed in a fast-paced environment.

The OpenAI team, inspired by their AI bot repeated victory in the show match against a single professional player in a 1v1 game on August 11th 2017, promised to create bots that could demonstrate a better “full team” game over the professional players in 5 on 5 match. The big rivalry was scheduled in a year, for the next Dota 2 world competition, The International 8.

One of the main advantages of bots over humans is the speed. Bots have a much higher response; they are always attentive and concentrated on what is happening in the field, and they transfer their actions directly through the game API to the server, while people must use the cursor to move to the desired area in order to communicate to computer their intents. In addition, bots do not experience stress.

The main advantages of humans – they learn fast and can filter out unnecessary information.

OpenAI used proprietary Proximal Policy Optimization technology, as well as 128,000 preemptible CPU cores on GCP and 256 P100 GPUs on GCP to train their bots. Such technology allowed the bots to get a game experience equivalent to 1,576,800 hours of continuous play practice. A fantastic figure considering that professional players in Dota 2 have a game experience of 15,000 hours on the average. Meaning, the bot plays two orders of magnitude more than a professional player for life, and for a year runs a difference of tens of thousands of times.

It seems that OpenAI had a lot of time. But this impression is deceptive. AI learns by poking, but the specifics of the game Dota 2 is in an incredibly large variety of choices. The main task of the AI development team was to narrow down these choices for bots: discard patterns that clearly lead to failure, and pre-set priorities for patterns that clearly look much more efficient than random actions.

On June 25th 2018, 10 months since the beginning of the “team bot” development, the bot creators staged the 5-by-5 match show with the Dota 2 commentators (former pro-players, now more like showmen) in a heavily stripped-down game: the same 5 heroes for each team were pre-selected, and several gaming capabilities were restricted. Bots confidently won this match, but even then, there were doubts that the OpenAI team will be able to prepare bots in time for an honest match without the restrictions in either the choice of heroes or the gaming capabilities.

On August 5th, 2018, a year after the start of development, the second show of bots against commentators took place. Again, it was a stripped-down game with a bunch of restrictions and the selection of 5 heroes within the choice of 18 pre-selected characters, not 115. It’s like playing chess, when there are only pawns and the king on the field, i.e. little like a full game. The bots defeated the commentators once again, but it became clear to everyone that the OpenAI team could not complete the goal of defeating humans in a full-fledged game within one year.

And now it’s time for OpenAI to show the results of its year-long work in front of the audience of Rogers Arena in Vancouver on August 23rd, 2018 — during The International 8, the official Dota 2 World Championship 2018 event. As expected, on that day it was still necessary to play a cut version of the game with a limited choice of 18 characters. The first team standing up against the bots was the Pain Gaming, the Brazilian team which took the last place at the current championship just before the match with AI bots. The bots started the game well and led the first 25 minutes of the game, but then the Brazilian team seized the initiative and easily took the victory.

The second rival of bots was the team of cybersport stars from China who recently left the professional Dota2 championship scene. The Chinese team managed to seize the initiative as early as the beginning of the game and beat the bots with no difficulty.

Developers from OpenAI admitted that 99% of the games for bots’ training were less than 30 minutes, and when the game dragged on, the bots did not know what to do. As I see it, the incorrect testing strategy did not allow developers to achieve their goals. Since the Dota 2 games could range from minutes to hours, the well-designed test sets should have included the games of various lengths. The development time should be spent wisely, even if a day of computing power is enough to hold millions of gaming hours per day. AI should be guided. The progress of AI should be tested in a variety of conditions, and then again adjustments should be made.

Conclusion

The OpenAI example teaches us that you need to save development time through the careful design of testing strategy. And it is we, the testers, bear a lot of responsibility for how the development team spends time and how the AI learning process takes place.

Effective test strategy and design of test experiments are the essential skills of the modern testers. It’s the topic of my full day tutorial at the leading software testing conference ConTEST NYC, on November 14th 2018 in New York. Join me in learning how to create a winning testing strategy every time, no matter what your project and product is. Competent testing saves the team months and even years and makes the learning process of AI more meaningful and humane.

About the author

Anna Royzman is an international conference speaker and the founder of the Global Quality Leadership Institute, an non-profit organization that aims to become the leading advocate for quality in technology through innovative educational programs. The Test Masters Academy, an initiative envisioned and managed by Anna directly, organizes conferences and peer workshops. Follow Anna on Twitter on @QA_nna and @TestMastersAcad.

2 Comments on AI Failure Story: Paying a Heavy Price for Test Strategy Errors

Samy Ingals September 20, 2018 at 11:38

Unfortunately, with all this Agile crap going on in software development and software testing currently, people believe that it is not worth to think and develop a test strategy before delivering code!!!!
Durga October 11, 2018 at 14:40

Agile when done correctly requires a lot of planning. You need a plan to reflect which features will be developed, you need a plan to reflect when unit testing, system testing, integration testing will be performed etc. For example – Integration testing can be done once every sprint or one for a release or every other sprint. You definitely need a strategy. We used to write test cases that ultimately helped developers verify their code and fill any gaps.