Uncertainty and Knowledge in AI: Looking back at AI Challenge 2011

I’m almost half a decade late, but I recently read xathis’ postmortem for his entry into the Google-sponsored AI Challenge 2011, where contestants pitted their AI ants against other AI ants in a bid to dominate a small, grid-based map with static obstacles (water tiles). He had a very clear approach to the whole thing, and I enjoyed reading the postmortem, but as I watched the replay, I noticed he underutilized his ants, even though he eventually won hands down. Especially for some early skirmishes, he loses ants or halts progress due to stalemates on account of poor tactics (or rather, a failure to prioritize engagements in his strategy) because some ants decide to wait for food spawns around the hill instead of pushing outward or securing a frontline.

Here’s an example of what I mean:

Red spends close to 200 turns holding a line when it could have pushed with backups.

Red spends close to 200 turns holding a line when it could have pushed with backups.

White sacrifices ants over and over when it could have gained an upper hand by joining forces.

White sacrifices ants over and over when it could have gained an upper hand by joining forces.

 

Anyway, let’s analyze this situation a little.

 


 

For an exercise like this, terrain knowledge, or map awareness, is extremely important. Typically, for strategy games, this means that as a player, you need to do some fog-busting: a term you might be familiar with from the Civilization series that describes the act of keeping a unit at a strategic location so you can remove as much of the fog of war as possible. Fog-busting prevents the risks associated with being surrounded by the unknown. In this particular example, unknown terrain means the risk of ambush and lost opportunities in the form of missed food (which means less ants spawned at your hill) and unrazed hills (which means enemy ants continue to spawn). xathis attempts to solve this problem (and another one I’ll mention later) by making sure that as much of the map is visible to him at every turn using a clever algorithm:

Every tile on the map has an exploreValue that is increased by one at the beginning of each turn, unless it is reachable within 10 steps by one of my ants (determined via bfs), in that case the exploreValue is resetted to zero. So the exploreValue is an indication of how long the tile has been out of my reach. [Read more]

The ant then moves to a tile that has the highest exploreValue (>10) in either one or two moves. Now, this behavior is only called if the ant has nothing else to do that turn (like collect food, defend or attack), so it has low enough priority that it doesn’t screw with any immediate fighting behavior from what I can tell. Still, many ants end up chasing tiles with high exploreValues as a result. The emergent behavior this provides is very similar to the aforementioned fog-busting a human player might do, where there ends up being ants spread throughout the player’s “territory.” This creates a blanket of ping-ponging ants across the whole map. As xathis says:

Often single ants keep moving back and fourth for a long time. [An ant in a dead-end cave] has to move one step into the cave because it has to explore a tile … but every other second turn the ant has nothing to explore thus … [it moves] one step in direction of the cave exit only to discard this mission the following turn moving back in again.

This approach gives the player maximal, but not optimal, vision of the map. The reason I distinguish between the two is simple: if your only goal is to prevent ambushes, there is no use in keeping track of “dead tiles” such as dead ends, or tunnels. The rules of the game are clear and constant; thus, there is a guaranteed limit to the enemy’s movements, i.e., no enemy unit can move into the tunnel (or out of it) unless they go through the mouth of the tunnel. Thus, it is a waste of an ant to keep the tunnel clear of fog. This might not seem like a big deal initially, but imagine there is a very long tunnel that has been cleared by a group of 6 ants. With xathis’ naive algorithm, the army is suddenly out of 6 ants because they are stuck in a loop of moving a step toward the mouth of the cave and back into the action, only to move back a turn later and fade into a marginally useful greedy fog-busting mission.

Which brings me to the second problem xathis solved by blanketing the map with ants. I think he wanted to keep this fog-busting behavior as is, because the Challenge required him to collect food to spawn units. According to his postmortem, this apparently meant that seeing as much of the map as possible was the best strategy and allowed for faster army growth. He’s got a point: One thing maximal vision of the map would do is trivialize food collection. If you keep the entirety of your territory free from fog, any food that appears within your territory will be seen and gathered and thus turned into another ant as quickly as possible… And quick is good.

 

So why do I think this fog-busting strategy is sub-optimal? It’s sub-optimal because it doesn’t make use of our knowledge of the game’s rules. The game specifications have a few notes on food spawning:

  • Each map is supposed to have some hidden rate for spawning food.
  • Food spawning rotates through sets of symmetric tiles, such that a set isn’t visited again until every other set has already been spawned.
  • Some specific tiles in mirror-symmetric maps are hardcoded to never spawn food.

The Challenge page even has a nice hint: “The best way to gather the most food is to explore the map and cover the most area.” But I think this hint is unintentionally misleading. From what I can tell, depending on the rate of food generation, blanketing the map with ants might not be optimal. For very high rates, you’d indeed want to keep an eye on every tile inside your territory so you can pick up any food before the next spawn wave. However, if food spawns slowly (and from observation, in some games/maps it sometimes only spawns a few times in hundreds of turns inside a 10×10 area), your ant may have time to patrol a larger area for food at the cost of losing the immediacy of acquisition. I made a quick animation to illustrate my point:

Depending on the food spawn rate, it might be suboptimal to keep ants on fog-busting missions.

 

As you can see, patrolling frees up more ants for other tasks such as fighting than blanketing, but comes with the tradeoff of delaying ant spawns. So how important is this tradeoff? Well, let’s look at the source code for the Challenge.

Here’s the function that handles what happens at the end of a turn:

The default function that handles food spawns is the food_do_symmetric(self, amount) function, as called from line 1467 above through do_food (which selects the right do_food_ function based on the game settings):

The function basically pops a set of food spawn locations from a deque and them places them at the back when done. When it hits a null element instead of an actual set of tiles, it shuffles the spawn locations and places the null element back in at the back. If there are no spawn locations to pull from, it calls a function to generate them. If there are, and if there are enough players so that self.food_extra += Fraction(self.food_rate * self.num_players, self.food_turn) in finish_turn() is greater than or equal to the number of spawn locations in the set, it adds pending_food to each location and decreases amount by the number of locations in the set. Eventually, either amount food is queued to be placed and then immediately asked to be placed through place_food(), or no food is placed at all until the game passes a large enough amount. That’s how food_rate regulates food generation.

For reference, here’s place_food, which goes through all the pending food items and places them if the tile is free:

The generation method for the food_sets used for the default case is get_symmetric_food_sets(), as found in line 1272 of ants.py (and called on line 1117 above in do_food_symmetric):

As the comment at the start of the snippet states, this function generates sets of symmetric food spawn locations. These sets cannot contain locations that touch; that is, the game will not spawn food at locations that are a distance of 1 apart. These sets cannot have locations on hills or water. These sets cannot be equidistant from ants, etc… As far as I can tell, these sets cover all land that isn’t a hill, so food will spawn on the same set of tiles twice only after food has already spawned on every other set of tiles. This means that for a large map, you probably won’t see food spawn at the same location twice, given the game ends at 1000 turns. You could utilize this information to min-max your movement, but that’s a bit overkill in my opinion, especially given the cost of storing and processing that information.

The main question I had asked above was whether the trade-off between blanketing and partolling was worth it. Well, there are two things about the code that are worth mentioning. Firstly, as you might have seen above in the code, any food that can’t be placed due to being blocked (i.e. tile is not marked as LAND) gets thrown in a list called pending_food and is spawned the next time the location is free. Secondly, a similar system exists for ant spawning, where ants are queued to spawn when food is collected, and only spawn if the hill is free. Effectively, this means that the only penalty to not collecting food is delaying a spawn. There is no loss of food. There is no loss of ants. The only thing lost is the opportunity to attack and defend with a greater force because you let the food sit. That means that as long as you can keep up with the number of enemies approaching the frontlines, there is no penalty to patrolling instead of blanketing, and given that holding a frontline is easier than forming it at the start of the game, having the ants to spare to form frontlines early would greatly improve the defensive position of a bot, and possibly allow it to rush an attack.

Again, this depends on the map’s food spawn rate, and the number of players still in the game and whatnot will affect this rate (vide finish_turn()). Accordingly, it might be nice to keep track of a general spawn rate within a given (somewhat sizable) box and keep enough ants patrolling your territory to – on average – keep up with the rate at which food is generated, while sending whatever ants you have left to fight and expand your territory. This probably means that you’d have more ants patrolling toward the start of the match than you would at the end, as the number of players still in the game is sure to decrease, effectively slowing food spawns. This might end up making it so you slowly shift from blanketing to patrolling as the game goes on, which might be better in the long run than simply blanketing (and might allow you to win matches that might otherwise end with multiple survivors).

 


 

 

All of this is, of course, rather pedantic, and xathis proved that a simpler, solid approach is sometimes all you need, but it got me thinking. For games like this, uncertainty is pretty much a given: Will I have enough resources at a given time to counter something that may or may not happen? What’s my opponent’s build or unit composition? etc. Random number generators also add some intrinsic uncertainty, making player choices anything from calculated risks to pure gambles. Perhaps that shines a light on something that lacks in a lot of AIs, namely the willingness to take stupid risks. And I’m not talking just “I might be able to win a 3 vs. 5 engagement” kind of risk taking. I’m talking about planning for the unknown; I’m talking about meta-gaming; I’m talking about human vs. human behavior.

A not so insignificant portion of human behavior is filling in gaps, guessing and going with a gut feeling, mostly using knowledge already obtained. Some AIs already do this to some extent, and switch tactics when they see you’re countering their setup. SC2 does this for example. By and large, though, AI is dumb. Not dumb in the sense that it executes bad strategies or tactics, but dumb in the sense that it’s some form of state machine that can only look at what it has been programmed to look at and do only what it has been programmed to do. It doesn’t create new tactics, it doesn’t improvise out of desperation. An AI might imitate human behavior to some extent if trained to do so (e.g. through machine learning), but a) most games don’t do this and b) the AI still has similar bounds to its knowledge and actions. Perhaps machine-learning isn’t incorporated into games much because preparing a good training set is hard (really hard), or because it turns out it’s not really fun that way. I don’t know. There is also the fact that we’re so far away from strong AI that perhaps it is naive of me to want some semblance to it in my games…

I’m again somewhat falling back to talking about RTS games, of course (as I’m wont to do) – keep in mind there are some great FPS, Racing and TCG AIs out there. Still, no matter what RTS/TBS I’ve played, it has seemed to me like there’s always been some way to trick the AI consistently. Some shortcut that only a human player can utilize, some knowledge that can let you see a few steps ahead: “cheesing.” It might be as simple as knowing the enemy’s timing or something. Not to say that I can beat AIs consistently (I can’t), but most AIs generally behave consistently. I know Gandhi’s going to nuke me eventually in Civ. I know the Wolf’s going to bring lots of swordmen in SH:Crusader. I know that Castile/Spain won’t be able to pick up Quest for the New World or Reconquista till they reach a certain tech level in EUIV. Perhaps it’s because the rulesets are so complicated, the movesets are so varied that the clarity of creating an FPS AI doesn’t translate to what is necessary to create an RTS AI. There always seems to be something that gets missed or doesn’t feel natural, and it’s a symptom of the rigidity of the code; and you can capitalize on that as a player. It seems to me that all of that detracts from the “fun” in the game. Well, maybe not detract – it changes the mode of fun in the game. Kind of like the difference between playing a game for the first time and playing it for the 100th time. The thrill of the unknown becomes the gratification of out-classing the AI. It’s still fun, but in a different way.

Is it even possible to program AI so that it behaves human? Would that even be fun? You tell me.

 

Tags: , , , ,

Leave a Reply

You must be logged in to post a comment.