As with so many things in the world, the key to cracking Ms. Pac-Man is team work and a bit of positive reinforcement. That… and access to funding from Microsoft and 150-plus artificial intelligence agents — as Maluuba can now attest.
Last month, the Canadian deep learning company (a subsidiary of Microsoft as of January) became the first team of AI programmers to beat the 36-year-old classic.
It was a fairly anticlimactic defeat. The number hit 999,990, before the odometer flipped back over to zero. But it was an impressive victory nonetheless, marking the first time anyone — human or machine — has achieved the feat. It’s been a white whale for the AI community for a while now.
Google’s DeepMind was able to beat nearly 50 Atari games back in 2015, but the complexity of Ms. Pac-Man, with its many boards and moving parts, has made the classic title an especially difficult target. Maluuba describes its approach as “divide and conquer,” taking on the Atari 2600 title by breaking it up into various smaller tasks and assigning each to individual AI agents.
“When we decomposed the game, there were over 150 agents working on different problems,” Maluuba program manager Rahul Mehrotra told TechCrunch. For example, the Maluuba team created an agent for each fruit palate. For ghosts, the team created four agents. For edible ghosts, four more. All of these agents work in parallel, and they would seed their reward to the high level agent and then could make a decision about what’s the best decision to make at this point.
Mehrotra likens the process to running a company. Larger goals are achieved by breaking employees up into individual teams. Each has their own specific goals, but all are working toward the same aggregate achievement.
“This idea of breaking things down into smaller problems is the basis of how humans solve problems,” explains CTO Kaheer Suleman. “A company doing product development is a good example. The goal of the whole organization is to develop a product, but individually, there are groups that have their own reward and goal for the process.”
The system also uses reinforcement learning, where each action is associated with either a positive or negative response. The agents then learn through trial and error. In all, the process was trained using more than 800 million frames of the game, according to a paper
Mehrotra suggests the possibility of using a similar system in retail, with an AI helping human sales reps determine which customers to assist first in order to maximize their own revenue. Actually translating all of this into a useful real-world experience will prove another challenge in an of itself.