March Madness Models
March Madness is upon us! I'm going to take a little break from covering NBA Possession Timing 101 and talk college hoops to help you place some bets in Vegas this week. Last year I built March Madness machine learning models to help me fill out my annual bracket. These models use some of the same concepts as my draft models. This year I have improved the models a bit and would like to share some insight for the 2015 NCAA tourney.
I'm going to provide little to no description of my methodology for two reasons:
- I don't have time to write up anything fancy
- Who knows, maybe someday I'll start up a sports betting hedge fund and if so, it wouldn't be wise to reveal all of my secrets
With that being said I will give you a small peak into how they are built. My models are based upon a top-down approach using team level stats from various data sources1. I use data from all D1 teams across all games since 20102. Some current limitations to the models are that they don't account for: recent injuries/suspensions (Louisville), historical underachieving (Kansas), historical overachieving (Izzo), recent hot/cold trends (Zona), travel distance (Dayton), and probably a lot more that I would try to incorporate if this were my real job.
Despite the limitations I believe the models are good at predicting how different playing styles compete against each other to influence outcome. My goal is to have an effective way to predict the outcome of a game between two teams who haven't necessarily played against each other…which is perfect for March Madness. Enough methodology, let's get to the interesting part - what to bet on!
Who should you bet on in the round of 64?
One of my models will output what it thinks the final score difference will be (similar to Vegas point spreads). This is essentially an against the spread (ATS) predictor. It is interesting to compare the models suggestions with real Vegas spreads. I am always surprised by the spreads my models produce and how closely they align with actual Vegas spreads3. This suggests that casino models aren't all that different from my own. Across the round of 64 games, the correlation between my models projections and actual Vegas spreads is an absurd 97! Here is a table comparing:
The most interesting matchups to look at are the ones where Vegas and my model differ the most. These indicate that the model is not considering something that Vegas is else or the model has some sort of edge. Here is the same chart sorting by difference with the extremes likely to be the best opportunities for betting:
If you are a careful observer you will see that there are more positive than negative differences (meaning the model predicts a bigger win by the favorite compared to Vegas). My explanation for this (with no research to back it up) is that my model is not accounting for March being different than the rest of the season i.e. games are lower scoring in the tourney where it is likely that defenses are trying harder and offenses are less comfortable being away from home.
Let me discuss a few of the bigger differences:
- Louisville(-8) - Vegas is down on Louisville compared to my model. This could be because they have been slumping a bit towards the end of the season or because of the semi-recent loss of Chris Jones. In any case I don't buy it.
- Arizona(-23.5)/Oklahoma(-13)/Virginia(-16.5) - With spreads > 12, the 3-4 point higher predictions by my model could be solely from the fact that my model's numbers are scaled a bit higher for favorites as mentioned above.
- Ohio St.(-3.5) - Even though Ohio St. is a 10 seed, both Vegas and the model think the Buckeyes are significant favorites. Vegas may be artificially lower because of the perception that a 10 seed shouldn't be favored by more than 3.5. Looks like a good opportunity.
- Utah(-6.5) - Vegas thinks less of them likely because of how they finished the season dropping 4 of their last 7 in Pac 12 play (even losing to my Huskies, Go Dawgs!). Not to mention everyone thinks 5 seeds are jinxed which makes S.F. Austin a hot play. Be the contrarian here.
- Oregon(-1.5) vs. Oklahoma St. - Oregon was surging to end the season and Oklahoma St. did the exact opposite which explains the difference with Vegas. My model favors the Cowboys but this comes down to how much you value momentum.
- Indiana(+5.5) - They struggled toward the end of the season which would explain some of the difference. I'm not sure about a bet straight up but it seems a solid ATS bet with a spread of 5.5.
- UCLA(+3.5) - My numbers don't show UCLA as a surprise like the public. Got to go Pac 12 ATS here.
- San Diego St.(-3) - My model sees this as a toss-up - I could see this being a real low scoring affair where the Aztecs win but don't cover. However I ran a model built only on NCAA Tourney games (rather than all regular season games) and those predictions said they were 7 point favorites - this is likely because suffocating defenses do well in neutral court games in March. I'm not touching this one.
- Texas(-1.5) - The only other game besides Oregon where the model predicted a different favorite than Vegas. I'm going to go with the numbers and write it off as the state of Texas thinking everything is bigger and better in Texas causing the line to be artificially pushed.
I'll let you do what you want with this information but if I were in Vegas right now (ahem Dan) I would place some ATS bets on: Louisville, Georgetown, Ohio St., Utah, Indiana, UCLA, Butler
Look out for more betting insight throughout the tourney as the madness continues.
The first year that sports reference has all "advanced" team stats (at least as of this time last year) ↩