Cracking the BCS Egg

Summary: The author has proven that one of the BCS's computer systems would have given bizarre results for 1997, the last season before the BCS system was implemented. No season since then has produced extremely controversial computer ratings, but it is likely to happen again. The author challenges the NCAA and its BCS arm to clarify its knowledge of computer ratings. Changes to the system are proposed.

Article:

College Football's BCS system has been the recipient of loads of criticism annually, as pundits and millions of fans speculate incessantly about possible disaster scenarios. What has amazed me is how insulated the BCS's decision-makers have remained through it all. The NCAA and university officials behind the BCS seem to have adopted a position of silence, preferring to let results do the talking. By the end of the season, the BCS's rankings usually settle into a form that is reasonably acceptable to most, and so no major revolt has yet been incited.

My goal with this article is to stir the pot by demonstrating that the computer ranking systems of the BCS have the potential to produce farcical rankings that would have the college football world up in arms if similar rankings were to arise in the future.

I will cover only the computer rankings aspect of the BCS system. This is a field in which I have a modicum of expertise. I have experimented with such things since 1993. Over one hundred mathematical ranking systems have appeared on the Internet, so there are plenty of dabblers in the field. As one of those dabblers, my specialty has been learning to devise numerous varieties of systems in order to understand their differences, flaws, and limitations. I have programmed about a dozen different models, and with several of them I have spent long hours tweaking parameters to see what happens to rankings for many past seasons.

One of my recent projects was a program to emulate the Colley Matrix rankings, one of the BCS selectors. I found something rather interesting about the Colley rankings, but before getting to that, I want to give an overview of the BCS's problems, as I see them.

The biggest problem with the BCS is that little explication of its actions has ever been provided. Since we're talking about computer ratings, the Internet would seem a likely source of good information on the matter. However, the BCS overseers have never had a web site that would inspire confidence in it as a group devoted to a serious understanding of college football ratings. Their current web site (http://www.bcsfootball.org/bcsfootball/) leads with a logo that includes the FOX network's logo. The banner reads "Bowl Championship Series in association with foxsports.com on msn." A prominent feature is a section of ads for DVDs (only $24.95!). Other than that, there are some football news headlines, and conspicuous displays of the corporatized logos of the BCS bowl games. A small blurb states, "The BCS was implemented beginning with the 1998 season to determine the national champion for college football while maintaining and enhancing the bowl system," and not much more of substance.

If one searches (as of this writing) the term "BCS" on the NCAA's web site (www.ncaa.org), the articles fetched are almost all dated from 2003 and earlier. Not a one takes us to anything talking about the mechanics of the system, who is in charge, or why we should trust them.

This dearth of information has been the rule over the lifetime of the BCS. This should not be surprising, though. The NCAA men's basketball tournament committee has long operated on a closed-door basis. We know theoretically who is in charge, but why we should trust their actions will always remain a mystery. Unfortunately, there is one big difference between football and basketball. The basketball committee is selecting 65 schools for a tournament. All the best teams are going to be selected, and any that are left out unfairly are really only marginally qualified. Few are going to be very outraged by oversights on marginal teams. Meanwhile, the BCS system selects only ten teams for the top bowl games, and only two teams for the ostensible "national championship" game. The stakes are much higher for football's BCS.

Thankfully, the mathematics of the BCS system are at least partially disclosed. We know in advance which opinion polls and ranking systems will be involved and how they will be weighted to establish the over-all BCS rankings. What we do not know is exactly how each computer system works. To my knowledge, only one of the systems, that of Dr. Wesley Colley, may be reproduced based on an outline of the theory provided by its designer. The web sites of the other five systems give either no descriptions of their methods, or not enough to reproduce them with confidence. All we know for sure is that the NCAA forbids the use of margins of victory in calculating computer ratings for use by the BCS. Whether a team won or lost is the only data to be considered by any computer selector.

Now, this rule against using margins to rank teams is quite a bone of contention. The actions of pollsters on just this point were questioned long before the inception of the BCS. If it appeared that a team lost ground in a poll because of a lackluster win, there was usually an outcry of fans and pundits claiming that "a win is a win." A classic case was when Penn State only beat Indiana by six points in 1994, and lost ground to Nebraska in the polls (Penn State was favored by 26 in that game).

There is certainly some merit to the "a win is a win" philosophy. Personally, I did not think any less of Penn State after that game (and my power ratings list Penn State as #1 for 1994). On the other hand, it is logical to argue that if Team A beats Team C by 50, and team B beats Team C by 1, then Team A is likely the better team and should be favored if they meet Team B.

Certainly two results are not sufficient for an accurate comparison of teams. However, many would argue that not even twelve results are enough to accurately rate teams. If the information contained in scores is disallowed, then so little comparative data remains that it's hard to trust power ratings based solely on who beat whom.

Power ratings are highly problematic for a small sample of games, regardless of whether margins are considered. Ignoring margins can make them even more problematic. I will support these assertions by examining the Colley system. I have coded it based on the outline provided on Colley's web site. Colley publishes ratings back to the 1998 season, and I have verified that my program exactly duplicates his ratings for 1998 through 2007. The Colley ratings look generally acceptable for all those seasons, at least for the top few teams. I can find plenty of odd rankings for middle-of-the-pack teams. However, people generally only really care about the rankings of the top few teams, so we will examine top tens.

I ran Colley's system on some seasons prior to 1998. It did not take long to find an objectionable ranking, as Colley's #1 team for 1997 was Tennessee. As most fans well remember, Tennessee lost two games that year, including a 42-17 whipping at the hands of the team most systems (that considered margin of victory) rated at #1: Nebraska.

Colley's top ten teams before and after the 1997 bowl season (as calculated by my Colley Matrix emulation) are as follows:

    End of the regular season Colley ratings:
    
    team                     w  l       power
    -----------------------------------------
  1 Tennessee               11  1    1.008562
  2 Michigan                11  0    0.970567
  3 Nebraska                12  0    0.937788
  4 Florida                  9  2    0.916233
  5 Florida State           10  1    0.915298
  6 Auburn                   9  3    0.892810
  7 Georgia                  9  2    0.870004
  8 Washington State        10  1    0.865106
  9 Kansas State            10  1    0.847633
 10 North Carolina          10  1    0.846583

    *****************************************

    Post-bowl Colley ratings:

    team                     w  l       power
    -----------------------------------------
  1 Tennessee               11  2    0.989129
  2 Michigan                12  0    0.982527
  3 Nebraska                13  0    0.974416
  4 Florida                 10  2    0.952213
  5 Florida State           11  1    0.949675
  6 Auburn                  10  3    0.921386
  7 Georgia                 10  2    0.899771
  8 UCLA                    10  2    0.878634
  9 Kansas State            11  1    0.864743
 10 North Carolina          11  1    0.864365

Considering only who beat whom, it was almost inevitable that a Southeastern Conference team would top the 1997 Colley rankings. The SEC had an incredible 32-4 record against non-SEC teams during the regular season. The next best non-conference record was owned by the Pac 10, going 23-7. It is then no surprise that ratings based on wins alone would put the team with the most wins within the SEC on top. However, ranking Tennessee #1 for 1997 makes no sense. Nebraska and Michigan were undefeated, and Florida and Nebraska beat Tennessee by 13 and 25 points, respectively. Most would consider a #1 rating for Tennessee laughable.

The web site of the Anderson-Hester system (http://www.andersonsports.com/football/ACF_sugr.html), another BCS selector, specifically mentions 1997 in touting the success of the BCS. Anderson and Hester state that since the inception of the BCS, it has never failed to set up "true national championship games" whenever possible. By "true" championship games they are referring to games pitting undefeated teams. In other words, when two top teams have gone undefeated prior to the bowls, the BCS has always matched them, unlike the scenarios before the BCS-era which often saw the only two undefeated teams go to different bowl games. The last time that happened was 1997, when Michigan went to the Rose Bowl and Nebraska went to the Orange Bowl.

Anderson and Hester are right that whenever it was possible to match undefeated teams, the BCS has. However, some undefeated teams have been left out of the championship game (Tulane in 1998, Auburn and Utah in 2004, Boise State in 2006, and Hawaii in 2007).

More importantly, here we have an example of BCS selectors talking up the BCS's strengths by pointing to the 1997 season. Yet we do not know which schools the BCS would have selected for the championship game in 1997. According to my version of Colley's program, Colley would have selected Tennessee and Michigan, not Nebraska and Michigan.

I suspect the BCS could have been in for a huge embarrassment if it had been in place in 1997. There is a good chance that Tennessee, Florida, and/or Florida State may have been backed more strongly by their computer ratings than Nebraska and/or Michigan. Imagine if they had blocked both Nebraska and Michigan from the championship game. Are they prepared for something like this to happen in the future?

The original version of this article was twice as long. What you have read to this point is all that really needed to be said. The BCS organizers clearly did not do much research on power ratings when they started doing their thing. It has been proven. If you are keen on going deeper, continue reading! (click here)

Note: Links cited in the article are not active to avoid the need to repair broken links in the future.