The First Detour
Once upon a time, in the realm of weather-related nerdiness, I embarked on a quest to decipher the secrets of changing weather patterns. Armed with my mighty keyboard and a burning hatred for sweltering summers, I planned to uncover the truth about my local area’s climate evolution. You see, summer and I have never been the best of friends. The scorching heat and suffocating humidity make me cringe harder than a cat stuck in a cucumber maze. So, I figured, why not dive into the delightful world of data and investigate if there’s any hope for an early arrival of autumn? I dubbed it my “Project Meteorological Marvel.”
My cunning plan involved sifting through decades of weather records, gathering juicy tidbits on how temperatures have tortured us poor mortals over the years. I wanted to spot trends, make dazzling graphs, and perhaps even predict when autumn would grace us with its blessed presence. Oh, how I yearned for a reliable sign that summer’s reign of terror would soon be over! Of course, this was no ordinary undertaking. I needed a trustworthy data source, and what better place to turn to than the National Oceanic and Atmospheric Administration (NOAA)? If you can’t trust the NOAA to provide accurate historical weather data, well, I guess we’re all doomed!
Now, I must confess, I had no intention of becoming a weather forecaster during this escapade. That’s a whole different level of sorcery reserved for the truly brave and slightly crazy souls. No, my friends, my mission was solely to unravel the mysteries of the past, not predict the future. So, off I went, armed with my web-scraping skills and a fervent desire to put an end to endless summers. And thus, my epic journey into the realm of weather data began… but did it?
Well, it seems that once people discover your supreme data-crunching powers, they start throwing project ideas at you like confetti at a parade. Take my poor, football-obsessed husband for example. He came up with the brilliant notion of analyzing if there’s any connection between a quarterback’s race and the number of times they get a sweet, sweet roughing the passer call in their favor. And as if that wasn’t enough, I thought, why not spice it up even more and explore if the defender’s race also plays a role in how many roughing the passer flags rain down upon them? Heck, let’s even toss in the officials’ race for good measure. Who knew the football field could have so many hidden layers of sociology and statistics? But hey, I’ll play along and start with quarterbacks for now. Let the mind-boggling journey begin! Just don’t blame me if we end up in a statistical black hole of absurdity.
At first, I was going to look at NCAA football statistics given that the sample size would be much larger than for the NFL. However, I didn’t really find a good source for the data to either download or extract. It just doesn’t seem like the NCAA collects that data down to the player level. As luck would have it I was able to find a source for NFL penalty data. the aptly named NFL Penalties is a sited dedicated to capturing penalty data so that users can basically settle disputes “over a player and their frequent ability to get away with murder, or not.” The site’s author does a pretty good job at articulating problems with the data and any mitigation actions taken. Ultimately the data on the site is provided by nflfastR. 1
Now that I’ve talked about the general concept and the search for a data source, here my next steps.:
- Collect Roughing the Passer (RTP) data for the quarterback from NFL Penalties.
- Collect the relevant biographical data on each of the quarterbacks.
- Use Python and relevant libraries such as Pandas2 and matplotlib3 to perform data cleaning, exploration, univariate and bivariate analysis, and visualizations.
- Publish the findings along with the documented methodology.
I’m not sure how long this will take me, could be a day, could be weeks. Either way, check back often for updates on the project’s progress.
- Carl S, Baldwin B (2023). nflfastR: Functions to Efficiently Access NFL Play by Play Data. https://www.nflfastr.com/, https://github.com/nflverse/nflfastR. ↩︎
- The pandas development team. (2023). pandas-dev/pandas: Pandas (v2.1.0). Zenodo. https://doi.org/10.5281/zenodo.8301632 ↩︎
- J. D. Hunter, “Matplotlib: A 2D Graphics Environment”, Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007. ↩︎