Alright, let’s talk about this Keys vs. Navarro tennis thing I messed around with today. It was more of a “can I even get this to work” kind of deal, not some super serious analysis or anything.

First off, I grabbed some data. I’m not gonna lie, finding clean tennis data can be a pain. I ended up cobbling something together from a few different sources online – basically scraping match results and player stats. It was messy, lots of cleaning needed. Think manually fixing typos in player names, standardizing date formats… the usual data janitor stuff.
Then, I wanted to see head-to-head records. I started by just trying to filter the data down to matches where either Keys or Navarro played. Then I grouped by opponent and counted wins and losses. This part was pretty straightforward, thankfully. It involved some pandas dataframes, you know, grouping, filtering, counting… basic stuff.
Next, I wanted to dig a little deeper. Just knowing they played each other isn’t that interesting. I wanted to look at things like: what surfaces did they play on (hard court, clay, grass), who served better, who was better at returning serves?
- So, I started pulling in more stats – aces, double faults, first serve percentage, break points won. Again, this data wasn’t all in one place, so more merging and cleaning.
- Then, I calculated some simple stats. Like, average aces per match, first serve points won percentage, that kind of thing. Nothing too fancy, just trying to get a feel for their strengths and weaknesses.
Visualization time! Numbers are boring, right? I wanted to make some charts to compare their stats. I used matplotlib, because it’s what I know. Simple bar charts comparing their averages for different stats. I tried a scatter plot of first serve percentage vs. second serve percentage, but it didn’t really tell me much, so I ditched it.
Finally, I tried a super basic predictive model. I’m talking very basic. I just wanted to see if I could throw their stats into a logistic regression and predict who would win a hypothetical match. I split the data into training and testing sets, trained the model, and… well, the accuracy wasn’t great. But hey, it was a proof of concept!

Lessons learned? Tennis data is a pain to work with. Data cleaning is always the biggest time sink. And my predictive modeling skills are rusty. But overall, it was a fun little project. I got to dust off my data wrangling skills and play around with some interesting data. Would I bet my life savings on my model’s predictions? Absolutely not. But it was a good way to spend an afternoon.