For three straight years, NFL Network has produced a list of the Top 100 Players of 20xx. Many people have criticized the results, and this summary from Bill Barnwell hits on some of the main issues. But my issue isn’t with the mistakes the players may be making in the voting booth, but the mistakes made in tabulating the votes. I want to suggest to the fine folks at NFL Network an alternative method for deriving a list of the top 100 players. This method has three big advantages over the current process:
(1) It will take players only a few minutes — or as long as they like — to participate.
(2) More players will be part of the judging, since the time commitment will be lessened.
(3) The results will be more accurate.
Instead of asking players to write down a bunch of names from memory, my suggested method would involve asking them a bunch of simple and straightforward questions. Imagine a player sitting in front of a computer screen, and asked to pick an answer to each of the following:
-
“Who should be ranked higher: Adrian Peterson or Max Unger?” [Clicks Peterson.]
“Who should be ranked higher: Tyvon Branch or Ben Roethlisberger?” [Clicks Roethlisberger.]
“Who should be ranked higher: Reggie Bush or Andy Dalton?” [Thinks for a second… picks Bush.]
“Who should be ranked higher: Patrick Willis or Joe Flacco?” [Thinks…. picks Willis.]
“Who should be ranked higher: Jimmy Graham or Jacoby Jones?” [Clicks Graham.]
That’s a lot better than the current system, described below by Mike Florio of Pro Football Talk:
“All players are given the opportunity to vote through ballots we send to all 32 teams around Thanksgiving,” NFL Network spokesman Alex Riethmiller told PFT via email. “For convenience sake, we try to time it with Pro Bowl balloting, so they can do them together. In addition to ballots collected that way, we also give ballots to many of the players that we interview for our shows. This year, in total, we received 481 votes.”
To vote, each player lists only his top 20 players in the league. The player listed at No. 1 gets 20 points, the player listed at No. 2 gets 19 points, and the process continues until the player listed at No. 20 gets one point.
So it’s really not a “top 100” list. It’s the 100 players who received the highest vote totals from players who attempted to list their personal top 20, presumably without the benefit of all 32 rosters or starting lineups or Pro Bowl qualifiers or anything else that would ensure they aren’t accidentally overlooking someone as they pull 20 names out of thin air.
The NFL Network is in a bit of bind here. You can’t expect players to sit down and rank 100 players — that’s way too time consuming. You also subject yourself to criticism if you give give players a group of players and ask them to pick the best among that group. At that point, you’re introducing subjectivity into the equation if you whittle the list down to 200 or so players.
Understanding the NFL Network’s two significant constraints — having to rank the best 100 out of 1600+ players and having voters (i.e., players) who don’t want to spend more than a few minutes — I can understand why the network settled on the process they chose. Asking players to pick from memory their top 20 players and then have the Network total things up on the back end solves both of those problems.
But there is a way to serve both masters and and produce a more accurate list. And, it can get us to go much deeper than 100 players, too. It’s called an Elo Rater, named after Arpad Elo. The Elo Rater is a binary system that only looks at wins and losses and, implicitly, strength of schedule. It was originally used to rate chess players, because how else would you rate chess players besides wins and losses and quality of opponents?
Here’s what Wikipedia has to say about the Elo rating system:
A player’s Elo rating is represented by a number, which increases or decreases based upon the outcome of games between rated players. After every game, the winning player takes points from the losing one. The total number of points gained or lost after a game is determined by the difference between the ratings of the winner and loser. In a game between a high-rated player and a low-rated player, the high-rated player is expected to score more points. If the high-rated player wins, only a few rating points will be taken from the low-rated player. However, if the lower rated player scores an upset win, many rating points will be transferred…. This makes the rating system self-correcting. A player whose rating is too low should, in the long run, do better than the rating system predicts, and thus gain rating points until the rating reflects the true playing strength.
So here’s my suggestion, NFL Network. Use an Elo rating system to come up with the Top 100 list. Instead of having two people face off in a chess match, we use the Elo Rating by simply ask an NFL player one question: Who is better, Player X or Player Y?
Baseball-Reference has used an Elo Rater, with the general public as judges, to come up with a list of the best players in MLB history. You can read the fine print here, but here is the CliffsNotes version for the NFL Network.
Every active player in the NFL is given an initial rating of 1500 points. These ratings are then updated by randomly selecting pairs of players and having them “play” each other. The judges, of course, are active NFL players. You can allow each player to answer as many questions as he likes: each question can be answered in just a few seconds.
Let’s say the process begins by asking Richard Sherman who is better: Calvin Johnson or Dwight Freeney. Since this is the first of many thousands of ratings, both Johnson and Freeney have ratings of 1500. We begin by calculating the probability of each player winning, according to the following equation:
Probability of Calvin Johnson winning = 1 / (1 + 10^[1]1500 – 1500) / 400
Probability of Dwight Freeney winning = 1 / (1 + 10^[2]1500 – 1500) / 400
Fortunately, no one needs to do any math here: a computer can do all the hard work instantly. Inside the parentheses, you see we part of the formula reads ‘1500 – 1500’ — that’s because those are the ratings for the two players. If you do the math, you’ll see that the probability of Johnson winning, just like the probability of Freeney winning, is 50%. That’s because each player has the same rating.
Note that there are no pre-existing biases here: the players will be 100% responsible for wherever we land in the ratings. Here is how: After the winner has been determined, the ratings of the two players are adjusted. If Sherman chooses Johnson, then the new ratings become:
Calvin Johnson’s new rating = 1500 (his old rating) + 20 * (.50)
Dwight Freeney’s new rating = 1500 (his old rating) – 20 * (.50)
This means Megatron will now be at 1510, while Freeney will drop to 1490.
Now, let’s say we’ve had several thousand “matchups” judged by NFL players. At that point, let’s say Peyton Manning has a rating of 2000, and Arian Foster has a rating of 1900. Now, we have Aldon Smith sitting in our player rater chair, and he sees this come across the screen: “Peyton Manning or Arian Foster?”
Again, let’s start with the win probability.
Probability Peyton Manning wins = 1 / (1 + 10^[3]1900 – 2000) / 400 = 0.640
Probability Arian Foster wins = 1 / (1 + 10^[4]2000 – 1900) / 400 = 0.360
If Aldo Smith picks Manning, then the new ratings are:
Peyton Manning’s new rating = 2000 + 20 * 0.360 = 2007 [5]As you can tell, the number “20” is the key variable here. That variable is known as the K-Factor, and the correct number is subject to much discussion. I think 20 works for our … Continue reading
Arian Foster’s new rating = 1900 – 20 * 0.360 = 1893
If Smith instead picks Foster, then the new ratings become:
Peyton Manning’s new rating = 2000 – 20 * 0.640 = 1987
Arian Foster’s new rating = 1900 + 20 * 0.640 = 1913
The ratings are self-correcting: if a player’s rating gets too high, a bunch of losses will bring him back to where he belongs. Similarly, if a player’s rating is too low, “beating” a bunch of higher-ranked players will shoot him up the rankings. [6]One bit of fine print. Pairs should not be chosen completely at random. The first player should be randomly selected, but his “opponent” should be someone relatively close to him (say, … Continue reading
So how should this be run? I think the NFL Network should start by creating a list of the top 400 players. They can do this however they like: by using an Elo Rater that is opened up to fan voting would be a good way to get another measure (and hits for NFL.com!) of player ratings. Because of the millions of votes, you could put all 1600+ players into the system, and get a reasonable top 400. Another option is NFL Network could simply choose its top 400 players. Once that’s done, you create a simple website, much like this one, and have NFL players log onto the website and “vote” for each matchup they see. I suspect that in 10 minutes, a player could rate at least 30 different matchups. Do that for 600 players, and you can probably get 20,000 ratings. At that point, the computer does the rest, and it will spit out a ranking of the top 100 — or 400 — players in the NFL.
A player could probably rate 5 players in a minute if he wanted. And that gives us the real advantage of the Elo system over what NFL Network is using. Intsead of having a player just rank his top 20 players, you will end up with over 20,000 ratings, and these are ratings that mean something. Now, the ratings won’t look silly because a bunch of players simply forgot a name: that won’t be an option, as the best 400 names will appear over the 20,000+ ratings. I suspect that with a user friendly system — i.e., players clicking on a computer instead of sitting down with a pen and pad — you might end up with 30,000+ ratings. Then the NFL Network could truly produce a list of the top 100 players.
References
↑1, ↑2 | 1500 – 1500) / 400 |
---|---|
↑3 | 1900 – 2000) / 400 |
↑4 | 2000 – 1900) / 400 |
↑5 | As you can tell, the number “20” is the key variable here. That variable is known as the K-Factor, and the correct number is subject to much discussion. I think 20 works for our purposes, but so would 10, or 24, or some other reasonable number. |
↑6 | One bit of fine print. Pairs should not be chosen completely at random. The first player should be randomly selected, but his “opponent” should be someone relatively close to him (say, within 250 points). This will prevent bizarre choices from distorting the ratings. |