Which is more popular Go or Starcraft

AI is better than 99.8 percent of the players in the computer game StarCraft II

An artificial intelligence (AI) developed by DeepMind called AlphaStar does better than 99.8 percent of active human players in the StarCraft II computer game. The authors note that the methods used can theoretically also be applied to other complex areas. This emerges from a study that appeared in the journal “Nature” (see primary source).

The authors emphasize that StarCraft II is a major challenge for artificial intelligence research because of its complexity, the need to process incomplete information, the competitive nature, and the iconic status of the game. Previous computer programs in the past, according to the authors, were nowhere near capable of beating professional StarCraft players. AI programs in other popular computer games with a professional scene like Dota 2 could in the past as well beat professional players, but only in a restricted version of the game.

DeepMind had one back in January Version of AlphaStar published who did well in the strategy game against professional players. However, this version was criticized for "inhuman" advantages: For example, the program could see the entire map at the same time, in contrast to people who can only see part of it on the screen. It was also able to perform many more actions per minute (APM) than humans can (resulting in almost perfect control of individual units).

AlphaStar was therefore restricted in such a way that the AI ​​plays in newer versions with comparable restrictions as human players. From July onwards, versions of AlphaStar appeared anonymously in public ranked games on the European server under standard conditions against human players at. There, the AI ​​was able to achieve the highest ranking level (Grandmaster) with all three factions playable in StarCraft II and in the end took a higher rank than 99.8 percent of the active players in the previous month. Since there are still big differences in playing strength, especially in the highest percentile of players, the fact that the AI ​​scores better than 99.8 percent of players does not mean that it can consistently beat even the best professional players.

This version of AlphaStar was trained with reinforcement learning, a version trained with supervised learning only performed better than 84 percent of the active human players. (See that Fact sheet of the SMC on the subject of machine learning methods.)

 

Overview

     

  • Prof. Dr. Marcus Liwicki, Chair of the Machine Learning Group, Luleå University of Technology
  •  

  • Prof. Jan Peters, Ph.D., Professor of Intelligent Autonomous Systems, Technical University of Darmstadt
  •  

  • Prof. Dr. Kristian Kersting, Head of the Machine Learning Department, Computer Science Department and Center for Cognitive Science, Technische Universität Darmstadt
  •  

Statements

Prof. Dr. Marcus Liwicki

Chair of the Machine Learning Group, Luleå University of Technology

“In the Nature article 'Grandmaster level in StarCraft II using multi-agent reinforcement learning', authors from DeepMind and from the Netherlands present the results of an experiment in which reinforcement learning based AI in the online real-time strategy game StarCraft II achieved very good results Achieved: in 30 to 60 rated games a rating of about 6, which corresponds to a top player (top 1.5 percent of the rated players). "

“The experiment is generally very interesting and shows that machine learning can learn meaningful moves. It should be noted, however, that it is not the first AI to win against humans. The interesting thing here is that the AI ​​'learned' to play well. However, it is doubtful whether the given limits are really 'fair'. The click rate is around 270 per minute. This corresponds to a value that professional gamers also achieve, but most of the clicks are empty clicks in order to remain active. It is hard to imagine that people would continuously perform at least four meaningful actions per second over the entire length of several games. "

“As with previous articles, we have to keep in mind that this is still an incremental step - we are still in a controlled simulated environment that can be repeated a thousand times. It will be a long time before AI reacts intelligently in real environments with unpredictable twists. "

“As written last time (for Research in Context 'Poker AI Pluribus beats human pros in Texas Hold'em with six players‘; Note d. Red.): In general, the term 'superhuman' is currently being used in an inflationary manner, especially for items from companies or marketing-oriented research institutes that would also like to use such items for branding. In scientific practice, readers should be careful not to be blinded by such terms - and as an author, they should be avoided. Many machines around us are superhuman: the calculator calculates better, the car drives faster, the airplane can fly ... and in some games the AI ​​is better. It becomes interesting when the AI ​​actually learns more quickly in previously unknown and unrestricted situations to make better rational decisions. "

When asked to what extent it is surprising that the authors achieve much better results with reinforcement learning than with supervised learning:
“That's not surprising. Basically, reinforcement learning is also a kind of supervised learning. In my first AI courses (early 2000s) it was still viewed as supervised learning with very little supervision (at the end of the game). When the AI ​​learns supervised (from humans), it mimics human players; with reinforcement learning she can break away from this and develop new strategies if necessary. Please note (Section: Infrastructure) (Page 9 of the study; editor's note), that several learning agents develop a strategy in parallel (16,000 games each) and in the end only the best strategies survive. It is impossible that human players can learn from so many games. "

When asked whether models learned with reinforcement learning can be transferred to other problems with comparatively little computing effort:
"No. The main purpose of these games is to gain knowledge. A transfer of learned models to completely different problems is rather difficult to implement, since the inputs and outputs are very different (with image or speech processing, the input remains an image or a spoken / written text). On the other hand, one can imagine that once the knowledge and the best architectures have been learned, the general ideas can be transferred to other problems with much less computational effort. Trained models can also be implemented very energy-efficiently on hardware. "

Prof. Jan Peters, Ph.D.

Professor of Intelligent Autonomous Systems, Technical University of Darmstadt

“Beating top players in the StarCraft II computer game is a very impressive achievement. This game has a very high dimensional action space - much higher than the Atari games, possibly even than the maximum 19 * 19 actions of Go. Unfortunately, the manually developed simplifications have not (yet?) Been published - so the performance relative to Go cannot yet be assessed. "

“But the biggest challenge with the StarCraft II computer game is the partial observability of the problem. Such so-called POMDP (partially observable Markov decision processes: decision-making processes in which not all information is available to the actors; editor's note) are among the most difficult problems facing AI. You have to actively collect information about your opponent and your card, which in turn comes at a cost. This problem has not been addressed. It won most of its games with superhuman responsiveness and control - the human sensorimotor system is neither as fast nor as accurate as a computer. The system, however, showed no signs of 'intelligence'. The overall strategy of the AI ​​seems to be planned in advance, with little adaptation to the opponent. A top player noticed in the last game he played against the AI ​​that the AI ​​was not 'scouting', that is, sending units to gather information about the opponent. The top player took advantage of this by building the army in the blind spot, and the AI ​​lost in stupid ways and made mistakes that even intermediate players wouldn't make. "

“Unfortunately, far too little is published about the methodology itself. The only thing that can be said for sure is that the LSTM networks (Long short-term memory, a technology from the field of AI that creates a kind of longer short-term memory through the use of neural networks, which can increase the efficiency of neural networks; editor's note.) were used to enable an internal state representation required for POMDP, and that a mix of existing reinforcement learning algorithms was used. So, in my opinion, it is far too early to say that StarCraft is resolved. You can only say more when DeepMind has published its AI and the community has been allowed to analyze it. "

Prof. Dr. Kristian Kersting

Head of the Machine Learning Department, Computer Science Department and Center for Cognitive Science, Technische Universität Darmstadt

“It's fascinating that an AI system can play such a complex real-time strategy game like StarCraft II at such a high level. In a real-time strategy game, all players carry out their actions interactively, simultaneously and in real time. You have to evaluate the current situation and alternative courses of action at all times and decide on the right course of action. It's not easy for a machine - for an AI system. Therefore, real-time strategy games were and are a popular test bed in AI research. AlphaStar follows this tradition, but takes it to a new level. Like AlphaGo, AlphaZero, CrazyAra and many other systems, AlphaStar shows that hybrid AI systems - systems that combine various AI techniques such as symbolic search, reinforcement learning and deep learning in a single system - can learn island skills very well. The AlphaStar study is methodologically very well structured and carried out. It shows that hybrid AI systems can master complex real-time strategy games. Maybe not all, but at least StarCraft II. "

“The dream that humans will construct machines that show intelligent behavior in one way or another is not new and defines the goal of AI research. However, the question is always the same: What is the benchmark for intelligence, whether human or machine intelligence? One of the answers: games like chess and StarCraft II. Initially, 'simpler' games were considered and solved. Now through AlphaStar we see that machines can also teach themselves to play a complex real-time strategy game at a high level. This is a milestone in AI research. The use of reinforcement learning is not surprising. This is a common approach. What's exciting about AlphaStar: It shows the potential of hybrid AI systems. However, much remains to be done. Future AI systems will be able to adapt to new situations. They learn, think, see and plan and use natural language. They understand us and adapt to us and our problems. They become partners with people. This is the 'third wave' of artificial intelligence - after the first wave of AI, the programming of all eventualities (1980) and the current wave, machine learning (2010). AlphaStar encourages this. It will fuel the creativity of AI researchers and developers. AlphaStar still lacks an understanding that we humans can be communicated. Here, humans are clearly superior to machines. It has also been shown time and again that bringing AI systems out of the laboratory into the real world is more difficult than you might think. But yes, AlphaStar is an important step, the first step of a marathon that lies ahead of us. "

When asked to what extent it is surprising that the authors achieve much better results with reinforcement learning than with supervised learning:
“It's not surprising. Reinforcement learning is a classic topic in AI research. If we were to use Supervised Learning StarCraft II, we would need a teacher who would rate each of our StarCraft II actions at all times: the action was good, and that was bad. That's not how we would learn to play StarCraft II at home. Instead, we play the game without a teacher. If we have won or lost in the end, that is a reward for us, which can be positive or negative. However, we do not know which of the many actions led to success and which did not. Reinforcement learning solves this dilemma. Over time, it learns which actions lead to success and when - that is, to maximize the overall reward. It has also been used for other games such as backgammon and chess. AlphaStar now shows that you can also learn to play StarCraft II at a very high level with it. "

When asked about the high energy costs involved in training such models and whether models learned with reinforcement learning can be transferred to other problems with comparatively little computational effort:
“Yes, unfortunately, training some, but not all, deep models is energy-intensive. Therefore, many research groups are working to make deep learning less energy-hungry. Different hardware, new models and learning methods. All of this is being investigated. If we have a model learned with reinforcement learning, then one can actually try to transfer it to other problems with comparatively little computational effort. This is called transfer learning and is also known in reinforcement learning. However, it is not quite as easy as with image recognition. Figuratively speaking, on the one hand we have to explore the world in StarCraft II (situations, places and actions), on the other hand we have to use experiences we have already gained from past situations. It's like us humans. To know that pasta is really our favorite dish, we would have to try all the dishes in the world. But you can't do that. Approaches that are only based on exploration or exploitation often lead to suboptimal solutions. Therefore, a transfer is a little more difficult but possible. And in case of doubt, also more energy-efficient. "

Information on possible conflicts of interest

All: None specified.

Primary source

Vinyals O et al. (2019): Gandmaster level in StarCraft II using multi-agent reinrforcement learning. Nature. DOI: 10.1038 / s41586-019-1724-z.