Someone in my office decided to setup a table-tennis tournament recently. This got us all talking about various tournament types. I’ve been interested in ladder tournaments since I learned about them, but have never had the opportunity to actually take part in one. The basic idea is that you get a list of participants and you initially rank them in whatever way you like (random is fine). Once you have your ranked list, then the games begin. Players lower on the list can challenge players above them as frequently as they want (as long as they don’t challenge the same person two times in a row). If a challenger beats a defender, their ranks swap. That’s the gist.
Ultimately our office decided to go with a standard bracket that’s seeded by preliminary pseudo-random challenge groups. And that’s fine, but I was still curious about ladder tournaments. So this got me thinking it’d be a fun thing to model in python. So I did that.
I assumed that given enough time/games, players would sort out and the ‘best’ player would filter to the top. What I discovered was that this is not the case at all.
Full disclosure: I am not a statistician and I’m mostly talking out of my ass here, but I think the basic stuff I’m working with is sound (I welcome any and all corrections and comments at @samteebee on Twitter).
The basic rules by which the script operates are these:
The game is played over a specified number of rounds provided to the script.
A random challenger is chosen each round.
Challengers can only challenge players ranked higher than them (first place player can’t challenge anyone).
A challenger may not challenge the same defender two times in a row.
If a challenger defeats a defender, they swap ranks (ties go to the defender).
Both players roll a 10 sided die (0-9) then multiply that value by their current strength value to determine their fight value.
Challenger gains 10 strength for winning.
Both challenger and defender always gain 1 strength for match participation.
Player ranking is initialized in reverse-player order (Player1 == rank 1, Player2 == rank 2, etc…)
All players start with a Strength value of 1
Here’s an example of the output from a 100,000 match run:
player6 -- Final strength:86036 -- Final rank:1 -- # of times initiated challenges:12641 -- Total rounds played:25045 -- # of rounds ended in 1st place:2839
player4 -- Final strength:86920 -- Final rank:2 -- # of times initiated challenges:11911 -- Total rounds played:24979 -- # of rounds ended in 1st place:3387
player3 -- Final strength:86811 -- Final rank:3 -- # of times initiated challenges:12350 -- Total rounds played:24840 -- # of rounds ended in 1st place:2904
player7 -- Final strength:86490 -- Final rank:4 -- # of times initiated challenges:12246 -- Total rounds played:25039 -- # of rounds ended in 1st place:3031
player2 -- Final strength:84600 -- Final rank:5 -- # of times initiated challenges:13037 -- Total rounds played:24969 -- # of rounds ended in 1st place:2733
player8 -- Final strength:84929 -- Final rank:6 -- # of times initiated challenges:13163 -- Total rounds played:25038 -- # of rounds ended in 1st place:2614
player5 -- Final strength:88058 -- Final rank:7 -- # of times initiated challenges:11727 -- Total rounds played:24977 -- # of rounds ended in 1st place:3483
player1 -- Final strength:85334 -- Final rank:8 -- # of times initiated challenges:12925 -- Total rounds played:25113 -- # of rounds ended in 1st place:2770
What strikes me here is how uniform the numbers are. There are no significant spikes. The “strongest” player does not end in first place. The first place ‘winner’ doesn’t even have the highest tally of rounds ending in 1st place. The most potentially useful info here: the strongest player also has the largest 1st place tally (though they ended in next-to-last place!)
My wife, Steff, gave me the idea of having challengers default to challenging first position by default (and only challenging someone else if they’d just previously lost to that defender). The theory being that strong players might filter up to (and stay at) the top position more rapidly/consistently. Surprisingly, these are the results of that experiment:
player3 -- Final strength:78925 -- Final rank:1 -- # of times initiated challenges:12965 -- Total rounds played:23514 -- # of rounds ended in 1st place:7456
player4 -- Final strength:88070 -- Final rank:2 -- # of times initiated challenges:12584 -- Total rounds played:25239 -- # of rounds ended in 1st place:9781
player6 -- Final strength:83148 -- Final rank:3 -- # of times initiated challenges:12417 -- Total rounds played:24117 -- # of rounds ended in 1st place:8631
player8 -- Final strength:85669 -- Final rank:4 -- # of times initiated challenges:12761 -- Total rounds played:24748 -- # of rounds ended in 1st place:8919
player2 -- Final strength:94001 -- Final rank:5 -- # of times initiated challenges:12023 -- Total rounds played:26820 -- # of rounds ended in 1st place:11870
player7 -- Final strength:82322 -- Final rank:6 -- # of times initiated challenges:12875 -- Total rounds played:24171 -- # of rounds ended in 1st place:8254
player1 -- Final strength:90241 -- Final rank:7 -- # of times initiated challenges:12069 -- Total rounds played:25980 -- # of rounds ended in 1st place:11054
player5 -- Final strength:87912 -- Final rank:8 -- # of times initiated challenges:12306 -- Total rounds played:25411 -- # of rounds ended in 1st place:10137
the weakest player who’d played the fewest games and had the fewest turns in 1st place ended the tournament in first. And the strongest, arguably best player, who kept 1st place for the greatest number of rounds ended up in 5th place.
This has convinced me that a ladder tournament is probably not a very good way of sorting a group of competitors into anything meaningful (Wikipedia agrees with me here).
I’ve played around a bit more with the script, too, trying to force it to resolve to something that resembles a best-to-worst sorting device, but I’ve not made much headway.
If you’re interested in checking out the script, you can grab it here: