Uncertainty and Knowledge in AI: Looking back at AI Challenge 2011 |

Uncertainty and Knowledge in AI: Looking back at AI Challenge 2011

I’m almost half a decade late, but I recently read xathis’ postmortem for his entry into the Google-sponsored AI Challenge 2011, where contestants pitted their AI ants against other AI ants in a bid to dominate a small, grid-based map with static obstacles (water tiles). He had a very clear approach to the whole thing, and I enjoyed reading the postmortem, but as I watched the replay, I noticed he underutilized his ants, even though he eventually won hands down. Especially for some early skirmishes, he loses ants or halts progress due to stalemates on account of poor tactics (or rather, a failure to prioritize engagements in his strategy) because some ants decide to wait for food spawns around the hill instead of pushing outward or securing a frontline.

Here’s an example of what I mean:

Red spends close to 200 turns holding a line when it could have pushed with backups.

White sacrifices ants over and over when it could have gained an upper hand by joining forces.

Anyway, let’s analyze this situation a little.

For an exercise like this, terrain knowledge, or map awareness, is extremely important. Typically, for strategy games, this means that as a player, you need to do some fog-busting: a term you might be familiar with from the Civilization series that describes the act of keeping a unit at a strategic location so you can remove as much of the fog of war as possible. Fog-busting prevents the risks associated with being surrounded by the unknown. In this particular example, unknown terrain means the risk of ambush and lost opportunities in the form of missed food (which means less ants spawned at your hill) and unrazed hills (which means enemy ants continue to spawn). xathis attempts to solve this problem (and another one I’ll mention later) by making sure that as much of the map is visible to him at every turn using a clever algorithm:

Every tile on the map has an exploreValue that is increased by one at the beginning of each turn, unless it is reachable within 10 steps by one of my ants (determined via bfs), in that case the exploreValue is resetted to zero. So the exploreValue is an indication of how long the tile has been out of my reach. [Read more]

The ant then moves to a tile that has the highest exploreValue (>10) in either one or two moves. Now, this behavior is only called if the ant has nothing else to do that turn (like collect food, defend or attack), so it has low enough priority that it doesn’t screw with any immediate fighting behavior from what I can tell. Still, many ants end up chasing tiles with high exploreValues as a result. The emergent behavior this provides is very similar to the aforementioned fog-busting a human player might do, where there ends up being ants spread throughout the player’s “territory.” This creates a blanket of ping-ponging ants across the whole map. As xathis says:

Often single ants keep moving back and fourth for a long time. [An ant in a dead-end cave] has to move one step into the cave because it has to explore a tile … but every other second turn the ant has nothing to explore thus … [it moves] one step in direction of the cave exit only to discard this mission the following turn moving back in again.

This approach gives the player maximal, but not optimal, vision of the map. The reason I distinguish between the two is simple: if your only goal is to prevent ambushes, there is no use in keeping track of “dead tiles” such as dead ends, or tunnels. The rules of the game are clear and constant; thus, there is a guaranteed limit to the enemy’s movements, i.e., no enemy unit can move into the tunnel (or out of it) unless they go through the mouth of the tunnel. Thus, it is a waste of an ant to keep the tunnel clear of fog. This might not seem like a big deal initially, but imagine there is a very long tunnel that has been cleared by a group of 6 ants. With xathis’ naive algorithm, the army is suddenly out of 6 ants because they are stuck in a loop of moving a step toward the mouth of the cave and back into the action, only to move back a turn later and fade into a marginally useful greedy fog-busting mission.

Which brings me to the second problem xathis solved by blanketing the map with ants. I think he wanted to keep this fog-busting behavior as is, because the Challenge required him to collect food to spawn units. According to his postmortem, this apparently meant that seeing as much of the map as possible was the best strategy and allowed for faster army growth. He’s got a point: One thing maximal vision of the map would do is trivialize food collection. If you keep the entirety of your territory free from fog, any food that appears within your territory will be seen and gathered and thus turned into another ant as quickly as possible… And quick is good.

So why do I think this fog-busting strategy is sub-optimal? It’s sub-optimal because it doesn’t make use of our knowledge of the game’s rules. The game specifications have a few notes on food spawning:

Each map is supposed to have some hidden rate for spawning food.
Food spawning rotates through sets of symmetric tiles, such that a set isn’t visited again until every other set has already been spawned.
Some specific tiles in mirror-symmetric maps are hardcoded to never spawn food.

The Challenge page even has a nice hint: “The best way to gather the most food is to explore the map and cover the most area.” But I think this hint is unintentionally misleading. From what I can tell, depending on the rate of food generation, blanketing the map with ants might not be optimal. For very high rates, you’d indeed want to keep an eye on every tile inside your territory so you can pick up any food before the next spawn wave. However, if food spawns slowly (and from observation, in some games/maps it sometimes only spawns a few times in hundreds of turns inside a 10×10 area), your ant may have time to patrol a larger area for food at the cost of losing the immediacy of acquisition. I made a quick animation to illustrate my point:

Depending on the food spawn rate, it might be suboptimal to keep ants on fog-busting missions.

As you can see, patrolling frees up more ants for other tasks such as fighting than blanketing, but comes with the tradeoff of delaying ant spawns. So how important is this tradeoff? Well, let’s look at the source code for the Challenge.

Here’s the function that handles what happens at the end of a turn:

def finish_turn(self):
    """ Called by engine at the end of the turn """
    self.do_orders()
    self.do_attack()
    self.do_raze_hills()
    self.do_spawn()
    self.do_gather()
    self.food_extra += Fraction(self.food_rate * self.num_players, self.food_turn)
    food_now = int(self.food_extra)
    left_over = self.do_food(food_now)
    self.food_extra -= (food_now - left_over)

    ...

1458

1459

1460

1461

1462

1463

1464

1465

1466

1467

1468

1469

1470

def finish_turn(self):

""" Called by engine at the end of the turn """

self.do_orders()

self.do_attack()

self.do_raze_hills()

self.do_spawn()

self.do_gather()

self.food_extra += Fraction(self.food_rate * self.num_players, self.food_turn)

food_now = int(self.food_extra)

left_over = self.do_food(food_now)

self.food_extra -= (food_now - left_over)

...

The default function that handles food spawns is the food_do_symmetric(self, amount) function, as called from line 1467 above through do_food (which selects the right do_food_ function based on the game settings):

def do_food_symmetric(self, amount=1):
    """ Place food in the same relation player start positions.

    Food that can't be placed is put into a queue and is placed
    as soon as the location becomes available.
    Positions are randomly ordered and cycled to evenly
    distribute food.
    """
    # if this is the first time calling this function then
    # create the food sets
    if not hasattr(self, 'food_sets'):
        self.food_sets = deque(self.get_symmetric_food_sets())
        # add a sentinal so we know when to shuffle
        self.food_sets.append(None)

    # place food while next food set is <= left over amount
    while True:
        s = self.food_sets.pop()
        # if we finished one rotation, shuffle for the next
        if s is None:
            shuffle(self.food_sets)
            self.food_sets.appendleft(None)
            s = self.food_sets.pop()

        if len(s) > amount:
            self.food_sets.append(s)
            self.place_food()
            return amount
        else:
            amount -= len(s)
            self.food_sets.appendleft(s)
            for loc in s:
                self.pending_food[loc] += 1

1106

1107

1108

1109

1110

1111

1112

1113

1114

1115

1116

1117

1118

1119

1120

1121

1122

1123

1124

1125

1126

1127

1128

1129

1130

1131

1132

1133

1134

1135

1136

1137

1138

def do_food_symmetric(self, amount=1):

""" Place food in the same relation player start positions.

Food that can't be placed is put into a queue and is placed

as soon as the location becomes available.

Positions are randomly ordered and cycled to evenly

distribute food.

"""

# if this is the first time calling this function then

# create the food sets

if not hasattr(self, 'food_sets'):

self.food_sets = deque(self.get_symmetric_food_sets())

# add a sentinal so we know when to shuffle

self.food_sets.append(None)

# place food while next food set is <= left over amount

while True:

s = self.food_sets.pop()

# if we finished one rotation, shuffle for the next

if s is None:

shuffle(self.food_sets)

self.food_sets.appendleft(None)

s = self.food_sets.pop()

if len(s) > amount:

self.food_sets.append(s)

self.place_food()

return amount

else:

amount -= len(s)

self.food_sets.appendleft(s)

for loc in s:

self.pending_food[loc] += 1

The function basically pops a set of food spawn locations from a deque and them places them at the back when done. When it hits a null element instead of an actual set of tiles, it shuffles the spawn locations and places the null element back in at the back. If there are no spawn locations to pull from, it calls a function to generate them. If there are, and if there are enough players so that self.food_extra += Fraction(self.food_rate * self.num_players, self.food_turn) in finish_turn() is greater than or equal to the number of spawn locations in the set, it adds pending_food to each location and decreases amount by the number of locations in the set. Eventually, either amount food is queued to be placed and then immediately asked to be placed through place_food(), or no food is placed at all until the game passes a large enough amount. That’s how food_rate regulates food generation.

For reference, here’s place_food, which goes through all the pending food items and places them if the tile is free:

def place_food(self):
    """ Place food in scheduled locations if they are free
    """
    for loc in list(self.pending_food.keys()):
        if self.map[loc[0]][loc[1]] == LAND:
            self.add_food(loc)
            self.pending_food[loc] -= 1

            # remove from queue if the count reaches 0
            if not self.pending_food[loc]:
                del self.pending_food[loc]

1140

1141

1142

1143

1144

1145

1146

1147

1148

1149

1150

def place_food(self):

""" Place food in scheduled locations if they are free

"""

for loc in list(self.pending_food.keys()):

if self.map[loc[0]][loc[1]] == LAND:

self.add_food(loc)

self.pending_food[loc] -= 1

# remove from queue if the count reaches 0

if not self.pending_food[loc]:

del self.pending_food[loc]

The generation method for the food_sets used for the default case is get_symmetric_food_sets(), as found in line 1272 of ants.py (and called on line 1117 above in do_food_symmetric):

def get_symmetric_food_sets(self, starting=False):
    """ Split map into sets of squares

        Each set contains self.num_players points where each point
          is at a consistent offset from each player's starting
          position.
        A square may be the same distance to more than one player
            which will cause the set to be smaller than the number of players
        Assumes map is symmetric.
    """
    if not hasattr(self, 'map_symmetry'):
        # randomly choose one symmetry
        # get_map_symmetry will raise an exception for non-symmetric maps
        self.map_symmetry = choice(self.get_map_symmetry())


    food_sets = []
    # start with only land squares
    visited = [[False for col in range(self.width)]
                      for row in range(self.height)]

    # aim for ahill0 will always be 0
    ant0 = self.map_symmetry[0][0]

    if starting:
        vision_squares = self.get_initial_vision_squares()

    for row, squares in enumerate(visited):
        for col, square in enumerate(squares):
            # if this square has been visited then we don't need to process
            if square:
                continue

            # skip locations of hills
            if (row, col) in self.hills:
                continue

            if starting:
                # skip locations outside of initial ants' view radius
                if (row, col) not in vision_squares:
                    continue

            # offset to ant 0
            o_row, o_col = row - ant0[0], col - ant0[1]
            # set of unique food locations based on offsets from each starting ant
            locations = list(set([
                self.destination(loc, self.offset_aim((o_row, o_col), aim))
                for loc, aim, _ in self.map_symmetry
            ]))
            # duplicates can happen if 2 ants are the same distance from 1 square
            # the food set will be smaller and food spawning takes this into account

            # check for spawn location next to each other
            # create food dead zone along symmetric lines
            too_close = False
            loc1 = locations[0]
            for loc2 in locations[1:]:
                if self.distance(loc1, loc2) == 1:
                    # spawn locations too close
                    too_close = True
                    break
            if too_close:
                continue

            # prevent starting food from being equidistant to ants
            if not starting or len(locations) == self.num_players:
                # set locations to visited
                for loc in locations:
                    visited[loc[0]][loc[1]] = True
                food_sets.append(locations)

    return food_sets

1272

1273

1274

1275

1276

1277

1278

1279

1280

1281

1282

1283

1284

1285

1286

1287

1288

1289

1290

1291

1292

1293

1294

1295

1296

1297

1298

1299

1300

1301

1302

1303

1304

1305

1306

1307

1308

1309

1310

1311

1312

1313

1314

1315

1316

1317

1318

1319

1320

1321

1322

1323

1324

1325

1326

1327

1328

1329

1330

1331

1332

1333

1334

1335

1336

1337

1338

1339

1340

1341

1342

1343

def get_symmetric_food_sets(self, starting=False):

""" Split map into sets of squares

Each set contains self.num_players points where each point

is at a consistent offset from each player's starting

position.

A square may be the same distance to more than one player

which will cause the set to be smaller than the number of players

Assumes map is symmetric.

"""

if not hasattr(self, 'map_symmetry'):

# randomly choose one symmetry

# get_map_symmetry will raise an exception for non-symmetric maps

self.map_symmetry = choice(self.get_map_symmetry())

food_sets = []

# start with only land squares

visited = [[False for col in range(self.width)]

for row in range(self.height)]

# aim for ahill0 will always be 0

ant0 = self.map_symmetry[0][0]

if starting:

vision_squares = self.get_initial_vision_squares()

for row, squares in enumerate(visited):

for col, square in enumerate(squares):

# if this square has been visited then we don't need to process

if square:

continue

# skip locations of hills

if (row, col) in self.hills:

continue

if starting:

# skip locations outside of initial ants' view radius

if (row, col) not in vision_squares:

continue

# offset to ant 0

o_row, o_col = row - ant0[0], col - ant0[1]

# set of unique food locations based on offsets from each starting ant

locations = list(set([

self.destination(loc, self.offset_aim((o_row, o_col), aim))

for loc, aim, _ in self.map_symmetry

]))

# duplicates can happen if 2 ants are the same distance from 1 square

# the food set will be smaller and food spawning takes this into account

# check for spawn location next to each other

# create food dead zone along symmetric lines

too_close = False

loc1 = locations[0]

for loc2 in locations[1:]:

if self.distance(loc1, loc2) == 1:

# spawn locations too close

too_close = True

break

if too_close:

continue

# prevent starting food from being equidistant to ants

if not starting or len(locations) == self.num_players:

# set locations to visited

for loc in locations:

visited[loc[0]][loc[1]] = True

food_sets.append(locations)

return food_sets

As the comment at the start of the snippet states, this function generates sets of symmetric food spawn locations. These sets cannot contain locations that touch; that is, the game will not spawn food at locations that are a distance of 1 apart. These sets cannot have locations on hills or water. These sets cannot be equidistant from ants, etc… As far as I can tell, these sets cover all land that isn’t a hill, so food will spawn on the same set of tiles twice only after food has already spawned on every other set of tiles. This means that for a large map, you probably won’t see food spawn at the same location twice, given the game ends at 1000 turns. You could utilize this information to min-max your movement, but that’s a bit overkill in my opinion, especially given the cost of storing and processing that information.

The main question I had asked above was whether the trade-off between blanketing and partolling was worth it. Well, there are two things about the code that are worth mentioning. Firstly, as you might have seen above in the code, any food that can’t be placed due to being blocked (i.e. tile is not marked as LAND) gets thrown in a list called pending_food and is spawned the next time the location is free. Secondly, a similar system exists for ant spawning, where ants are queued to spawn when food is collected, and only spawn if the hill is free. Effectively, this means that the only penalty to not collecting food is delaying a spawn. There is no loss of food. There is no loss of ants. The only thing lost is the opportunity to attack and defend with a greater force because you let the food sit. That means that as long as you can keep up with the number of enemies approaching the frontlines, there is no penalty to patrolling instead of blanketing, and given that holding a frontline is easier than forming it at the start of the game, having the ants to spare to form frontlines early would greatly improve the defensive position of a bot, and possibly allow it to rush an attack.

Again, this depends on the map’s food spawn rate, and the number of players still in the game and whatnot will affect this rate (vide finish_turn()). Accordingly, it might be nice to keep track of a general spawn rate within a given (somewhat sizable) box and keep enough ants patrolling your territory to – on average – keep up with the rate at which food is generated, while sending whatever ants you have left to fight and expand your territory. This probably means that you’d have more ants patrolling toward the start of the match than you would at the end, as the number of players still in the game is sure to decrease, effectively slowing food spawns. This might end up making it so you slowly shift from blanketing to patrolling as the game goes on, which might be better in the long run than simply blanketing (and might allow you to win matches that might otherwise end with multiple survivors).

All of this is, of course, rather pedantic, and xathis proved that a simpler, solid approach is sometimes all you need, but it got me thinking. For games like this, uncertainty is pretty much a given: Will I have enough resources at a given time to counter something that may or may not happen? What’s my opponent’s build or unit composition? etc. Random number generators also add some intrinsic uncertainty, making player choices anything from calculated risks to pure gambles. Perhaps that shines a light on something that lacks in a lot of AIs, namely the willingness to take stupid risks. And I’m not talking just “I might be able to win a 3 vs. 5 engagement” kind of risk taking. I’m talking about planning for the unknown; I’m talking about meta-gaming; I’m talking about human vs. human behavior.

A not so insignificant portion of human behavior is filling in gaps, guessing and going with a gut feeling, mostly using knowledge already obtained. Some AIs already do this to some extent, and switch tactics when they see you’re countering their setup. SC2 does this for example. By and large, though, AI is dumb. Not dumb in the sense that it executes bad strategies or tactics, but dumb in the sense that it’s some form of state machine that can only look at what it has been programmed to look at and do only what it has been programmed to do. It doesn’t create new tactics, it doesn’t improvise out of desperation. An AI might imitate human behavior to some extent if trained to do so (e.g. through machine learning), but a) most games don’t do this and b) the AI still has similar bounds to its knowledge and actions. Perhaps machine-learning isn’t incorporated into games much because preparing a good training set is hard (really hard), or because it turns out it’s not really fun that way. I don’t know. There is also the fact that we’re so far away from strong AI that perhaps it is naive of me to want some semblance to it in my games…

I’m again somewhat falling back to talking about RTS games, of course (as I’m wont to do) – keep in mind there are some great FPS, Racing and TCG AIs out there. Still, no matter what RTS/TBS I’ve played, it has seemed to me like there’s always been some way to trick the AI consistently. Some shortcut that only a human player can utilize, some knowledge that can let you see a few steps ahead: “cheesing.” It might be as simple as knowing the enemy’s timing or something. Not to say that I can beat AIs consistently (I can’t), but most AIs generally behave consistently. I know Gandhi’s going to nuke me eventually in Civ. I know the Wolf’s going to bring lots of swordmen in SH:Crusader. I know that Castile/Spain won’t be able to pick up Quest for the New World or Reconquista till they reach a certain tech level in EUIV. Perhaps it’s because the rulesets are so complicated, the movesets are so varied that the clarity of creating an FPS AI doesn’t translate to what is necessary to create an RTS AI. There always seems to be something that gets missed or doesn’t feel natural, and it’s a symptom of the rigidity of the code; and you can capitalize on that as a player. It seems to me that all of that detracts from the “fun” in the game. Well, maybe not detract – it changes the mode of fun in the game. Kind of like the difference between playing a game for the first time and playing it for the 100th time. The thrill of the unknown becomes the gratification of out-classing the AI. It’s still fun, but in a different way.

Is it even possible to program AI so that it behaves human? Would that even be fun? You tell me.

Tags: AI, AI Challenge 2011, ants, Strategy, xathis

This entry was posted on Saturday, May 31st, 2014 at 1:36 am and is filed under Discussion, Programming, python. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

You must be logged in to post a comment.

Merrill Edmonds
Engineering, Programming and Design

Uncertainty and Knowledge in AI: Looking back at AI Challenge 2011

Leave a Reply