NBA superstar Steph Curry is one of the best shooters the world has ever seen. But as great as he is, he still misses roughly 53% of his attempts. (The average shooter in the NBA misses about 55% of his shots.)
With so much of a player’s success based on taking optimal shots – such as ones from the best locations on the court, as well as other factors – four UVA students from the University of Virginia’s School of Data Science set out to build a model that could predict whether a field goal attempt will be a make or a miss.
Using a dataset from the 2016-17 season, which tracked more than 210,000 shots, the team of Kristy Bell, Abhi Dommalapati, Jack Peele and Spencer Bozsik created a model that won the School of Data Science Sports Analytics Club’s first-ever Hackathon last semester.
“I was proud of my team and how we worked together to effectively use the data science pipeline to answer our question of interest,” Bell said. “More than anything, winning the competition gave our group confidence that we were able to apply what we were learning in our program to a real sports dataset.”
Kristy Bell says working with “messy sports data” has been an invaluable experience. (Contributed photo)
UVA Today caught up with Bell, a Pennsylvania native who graduated last spring from UVA with an undergraduate degree in statistics and economics – and who is now pursuing her master’s degree here in data science – to learn more about the team’s model and methodology.
Q. Can you tell UVA Today readers a little more about the team’s objective?
A. We were provided with a dataset that listed several attributes for every shot in the 2016-17 NBA season, including player, home team, away team, shot type, shot location, time left in the game and the player’s last shot outcome. The data required a bit of data wrangling prior to model building (e.g. changing shot location to distance from the net, simplifying shot type).
Using a subset of the provided features we sought to construct a model to predict whether a shot was a make or a miss. … Our ultimate goal was to have a model that was able to accurately predict whether or not a shot was made, on a fresh dataset containing the same variables.
Q. How did you, Abhi, Jack and Spencer work on this as a team? Did you divvy up specific tasks?
A. For the most part, we decided to code together. Whoever wasn’t sharing their screen was on their computer coming up with more ideas and/or parsing through class notes to help the teammate actively coding. Although we found group coding to be efficient, the allocated meeting time for the hackathon was insufficient for adequate data cleaning and model building. As a result, we met once together outside the allotted time to test our model and tune our parameters.
Q. Were there any things that surprised you during the course of the project?
A. We were surprised to find that the NBA shot log data was not as clean and intuitive as we initially believed. More time during the hackathon was spent understanding where the data comes from and creating new variables from old ones than we had anticipated.

