Okay, so today I decided to try and build a “home run predictor.” Sounds cool, right? I have no idea if this will actually work, but that’s part of the fun. I started by grabbing some coffee. Need that fuel!

Data Gathering
First things first, I needed data. Lots of it. I spent a good chunk of the morning just looking for where to find baseball stats. Finally, I got a hold of some game logs that had a ton of information: batter, pitcher, stadium, weather, and (of course) whether a home run was hit. It was a total mess of numbers and letters, but hey, that’s what I signed up for.
Cleaning the Mess
Next came the not-so-fun part: cleaning up the data. Let me tell you, this took forever. There were missing values, weird codes, and all sorts of inconsistencies. I felt like I was wrestling with a spreadsheet monster. I used a bunch of basic spreadsheet formulas to replace some of the weird stuff and just flat-out deleted rows that were totally incomplete. Definitely wasn’t perfect, but it was something I could work with.
Picking Features
After cleaning, I had to decide what actually matters for predicting a home run. Like, does the batter’s shoe size matter? Probably not. I bolded the stuff that seemed important: things like the pitcher’s past performance, the batter’s slugging percentage, maybe even the wind speed at the stadium. This was mostly guesswork, to be honest. I just went with my gut.
Simple Model Building
- I decided not going too fancy.
- split the data into a “training” set and a “testing” set.
- Make very very basic model, more like a set of rules.
- For example, if the batter has a high slugging percentage AND the pitcher has allowed a lot of home runs recently, I predicted a home run.
- Otherwise, no home run. Super simple.
Testing and (Lots of) Failing
Now for the moment of truth: testing the predictor. I ran my cleaned-up “testing” data through my super-basic model and… it didn’t work very well. Like, at all. It was barely better than just flipping a coin. Bummer.
But, that’s okay! This was just a first attempt. I’m sure there are tons of things I could improve. Maybe I need better data, maybe I need to consider more factors, or maybe my super-simple rules are just too… simple. It is like find some better way to predict a home run.

So, my “home run predictor” isn’t ready to make me a millionaire betting on baseball games just yet. But, it was a fun project, and I learned a lot along the way. Time to go back to the drawing board (and maybe grab another coffee). Maybe tomorrow I’ll try a slightly more complicated model, and see what happens. Who knows, I might even have a predictor ready by end of the season!