What to Expect from Public MLB Analytics and Commentary in 2016

This is likely Ken "Hawk" Harrelson's final season calling White Sox games. (via billymax85 & Howell Media Solutions)

This is likely Ken “Hawk” Harrelson’s final season calling White Sox games. (via billymax85 & Howell Media Solutions)

At its best, baseball fandom transcends simply enjoying the sport: it teaches us new things about ourselves and becomes the foundation of a community. For me (perhaps more than others), analytical baseball writing is the lifeblood of that community, and reminds me why I stay a fan.

I’m using the word “analytical” in a broad sense here, covering everything from thoughtful game recaps, to dives into why a particular player or team isn’t meeting expectations, to the eloquence of someone like Roger Angell. I’m also using the word “writing” in a broad sense — a rigorous statistical analysis, an effective piece of software, or a cogent data visualization count just as much in this formulation as some sparkling prose. Most of us grew up getting this writing (and with it, a piece of our connection to the game) from newspapers and baseball books, but these days, for better or for worse, it’s largely online.

In that spirit, and at the risk of getting meta, here are some things to expect from the baseball Internet in 2016.

More Data, More Stats

One baseball event of 2015 that will have the big long-term impact on how the game is discussed and analyzed (and played) is the roll-out of Statcast, MLB’s fancy radar/camera setup that quantifies in a systematic manner many things that were not previously captured, or captured only by hand privately and imprecisely.

Last year, the primary public manifestation of Statcast data was exit velocity, which provides the opportunity for a more process-oriented, less results-focused analysis of batted balls. However, in spite of the excitement generated by these data and the subsequent rush by many analysts to draw conclusions from them, two large issues remained. The first is that exit velocity was not accurately captured for a large number of batted balls, with data availability that was skewed toward more successful balls in play.

The second is that there exists, as yet, no compelling (public) model for what the relationship between exit velocity and performance (either past or future) looks like. Without having a well-specified theoretical or empirical framework for using exit velocity, last year’s analysis of exit velocity was necessarily limited to exploration, rather than inference.

However, there’s reason to believe that both of these issues will be partially ameliorated this coming year. The second one comes naturally—as a data set grows over a longer period of time, our understanding of its predictive power and all-around utility increases. With respect to data quality, I was told by Daren Wilman, proprietor of Baseball Savant and recently hired director of baseball research and development for MLB.com, that he expects that the exit velocity data for 2016 will be more accurate and will be available for nearly every batted ball. Additionally, the likely addition of launch angle (already being shared on a limited basis) will potentially greatly increase the utility of the velocity data.

The portion of Statcast data that was of most interest to many analysts was on the fielding side, including granular positioning and reaction data that would allow for a more direct measurement of defensive aptitude. To that end, Wilman says to expect visualizations of player ranges (like this one) at some time during the year.

A point of concern after Statcast’s announcement was that the data wouldn’t be shared with the public in a systematic manner; for those of us who believe in transparency and open science (and who do public analytics), the level of sharing during 2015 was a bit of a letdown. It’s not clear how this is likely to change during this coming year, though it should be noted that Statcast data was sometimes available by special request, as we saw with Neil Weinberg’s essay in The Hardball Times Baseball Annual 2016. On a more open front, Wilman did mention that there are plans in the work for a public Application Program Interface (API), which is a good sign, but it’s hard to be too excited without knowing which data will be available to the public in a broad fashion.

Some other big advances in baseball statistics in 2015 came out of Baseball Prospectus, which introduced two new big pitcher metrics (Deserved Run Average and Contextual FIP) and a number of sophisticated metrics for catcher defense. Two things to look for in 2016 is how those metrics influence discussions now that they’re no longer so new, and how people think of them with the ability to view them over the course of a full season.

More exciting, though, is the prospect of bigger changes to other BP stats. Per Harry Pavlidis, BP’s director of technology, BP expects to roll out substantial changes this year to PECOTA, its oft-cited projection system, and to Fielding Runs Above Average, its fielding stat. Without knowing more detail, it’s hard to say how meaningful the changes will be, but the prospect of substantial changes at one of the preeminent sabermetric sites is something worth anticipating.

Of course, part of rolling out new stats is disseminating them in a helpful manner, and Pavlidis also mentioned that there will be changes to the currently quite old-school presentation of the numbers on BP’s site, with some likely in place by Opening Day and others by midseason. He also said that BP plans to incorporate confidence intervals in the presentation, which is (as far as I know) a first for a major stats site, and something I think is essential, especially as the statistical theory behind commonplace metrics grows more complex over time.

One final set of things to look out for is how previous years’ analyses will seep into the game. There’s been some suggestion that catcher framing is becoming less impactful, which is something to monitor over the course of the year. Even more important is the possibility that MLB will seek to shrink the strike zone (especially at the bottom edge), which would be an indirect response to the analysis by Brian Mills and Jon Roegele (among others) illuminating the strike zone’s growth and the corresponding decrease in scoring over the past few years.

Changes in Baseball Media

Every year there are big changes in online media, and last year those changes included the worrying shuttering of two sites featuring a substantial amount of baseball writing. The first was Grantland, shut down in late October, with its baseball writers (including Jonah Keri, Ben Lindbergh and Michael Baumann) scattering to various places around the Web.

A Hardball Times Update
Goodbye for now.

The other is Just A Bit Outside, the Rob Neyer-led baseball vertical at Fox Sports that ran from July 2014 until this past December. While its composition differed from Grantland (much of its content came from ex-players like Gabe Kapler or cross-posting with BP and FanGraphs), it too represents a regular home for baseball writing that doesn’t exist anymore.

As a consumer, I’m not particularly worried that I won’t be able to read the writers I liked at those sites, or that there isn’t enough baseball writing out there. Instead, it’s a concern about a broader decline in quality, both from established names and from up-and-coming analysts. For new writers, it’s strictly a matter of opportunity; if there are fewer outlets for their work, then it’s harder for new voices to get the break they need, and less good writing to read. For established names, the worry has to do with resources and emphasis. Jonah Keri won a SABR award for his thoroughly-researched Grantland feature on base stealing, and it’s hard to imagine that analysis being published on most other sites, even the large outlets he’s writing for this season. (These concerns are, of course, hardly original, and are touched on this Jason McIntyre piece about CBS Sports that mentions Jon Heyman’s departure.)

On the topic of new voices, there’s at least one new site of some interest: the prospect-oriented 2080 Baseball, founded and staffed by a number of writers with prospect experience (largely at Baseball Prospectus). It will be interesting to see what sort of stories come out of there in the future and how much they contribute to the greater baseball discourse. Hopefully, there will also be quality baseball content at Bill Simmons’ new site, The Ringer, when it launches later this year.

A Change in Tone

Amid the doldrums of the offseason, two related and long-simmering topics concerning the baseball media were discussed a bit more in the open. Expressed generally, they are what’s going to be written, and who’s going to be writing.

The first topic was most notably discussed by Rian Watt at BP, who posited that baseball analysis is moving away from sabermetric and statistical analysis and in a direction that he calls “intersectional”—that is, with a greater emphasis on the players as people and the game as a microcosm of our larger society. (It’s also worth reading Craig Calcaterra’s response to Watt’s article; in which he discusses how Watt’s prediction is as much a return to an older style of reporting as it is a paradigm shift.)

One small piece of evidence for this transition can be seen by looking at this year’s SABR Analytics Conference Research Awards. In past years, none of the contemporary commentary nominees were quite in the vein of what Watt’s talking about (though a number were focused on the economics of the game, and some of the historical commentary nominees fit into his characterization). This year, two very clearly are: Meg Rowley’s discussion of exclusion in major league front office hiring and Alexis Brudnicki’s essay on the challenges of working in baseball as a woman. Two nominations isn’t proof, but it is a suggestion. While Watt’s piece met with a fair bit of criticism from those who think he’s wrong, those who’d rather he be wrong, and those who think it’s all a bit overblown, it’s an interesting thought to have in mind as the season starts.

You may have noticed that the two pieces I just cited were both written by women. They were the only two of the 15 nominated for SABR awards this year that fit that description. That represented an increase in parity, as all 15 articles nominated the prior year were written by men. This gets to the point of who is doing the writing, which is just as important as the topics covered. With increased efforts at a few prominent baseball sites to hire from a wider pool of candidates, there’s room for a little bit of hope that the people who write about baseball will more closely resemble the people watching it going forward.

The focus of this piece is 2016 in baseball writing, but the most visible change with regards to diversity in commentary will be in another medium. Jessica Mendoza will be in the Sunday Night Baseball booth as ESPN’s first full-time female analyst. She acquitted herself well in her appearances last year, and so it’s easy to be optimistic about her laying the groundwork for a time when such an appointment is no longer such a big deal.

Furthermore, I’d be wrong not to mention that it’s likely the last season for two of the most prominent TV announcers in the sport, men who are a huge part of their team’s national identities. The universally-beloved Vin Scully is one (and we covered him here at THT on Friday); the less-universally-beloved Ken “Hawk” Harrelson is the other. Whether it’s counting the days until you don’t have to use the home feed for White Sox games on MLB.TV or making a point of staying up late on the East Coast to catch a few innings of Vin, their retirements represent milestones even for those who aren’t Dodgers or White Sox fans.

This preview’s been, for lack of a better term, a bit of inside baseball. Given the way that many of us engage with the game and the people around it, though, it’s important for a season preview to consider not just what will happen on the field, but how the games are processed, analyzed, quantified, and otherwise discussed. For some of us, this secondary portion of baseball is just as important as the games.

References & Resources

Frank Firke crunches numbers for a tech company. He writes about baseball at The Hardball Times and irregularly about other sports at his blog, Clown Hypothesis. Follow him on Twitter @ClownHypothesis.
Newest Most Voted
Inline Feedbacks
View all comments
87 Cards
8 years ago

I, like most bloggers, too have enjoyed Vin Scully for all my life in his various broadcast packages (starting with KTTV LA on the cable feed in the 1970s).

Conscience moves me to flag-the-Pale Hose flag for Hawk Harrelson as well. He’s a homer, yes , but consistent, open and enthusiastic about his “Good Guys Wear Black” biases, a sharp observer of the game on the field and he has given proper respects for the opponents when earned. I will miss the Hawk; it was Tim McCarver I was joyed to hear reduced to the occasional Cardinals game on the regional feed. Signed, Cards Fan of 40 years

Alan Nathan
8 years ago

You have said, “The second is that there exists, as yet, no compelling (public) model for what the relationship between exit velocity and performance (either past or future) looks like. ” I disagree. For example, take a look at the recent article in this very journal by Glenn Healey, http://www.hardballtimes.com/the-intrinsic-value-of-a-batted-ball/. Glenn’s analysis was based on HITf/x rather than Statcast data, which may affect some details but most certainly not the qualitative relationship between exit velocity and performance (i.e., outcome). I have tweeted on several occasions heat maps showing how on-base probability depends on exit velocity and launch angle (e.g., the “donut hole”) and wrote about it in my own recent article: http://www.hardballtimes.com/optimizing-the-swing-part-deux-paying-homage-to-teddy-ballgame/. Then there are the series of articles written a few years ago by Mike Fast, again based on HITf/x data.

Bottom line: Although there is still a lot of work to be done, I think we already know a lot about the relationship exit velocity (and launch angle) and performance.

Mike P
8 years ago
Reply to  Alan Nathan

Agreed — I also compared exit velo to wOBA and batting average, and while a simple x/y scatter plot is hardly rigorous scientific investigation, it did show some relationship between velo and success. (Not for batting average, though, as expected.)



Alan Nathan
8 years ago
Reply to  Mike P

Thanks, Mike. I was remiss in not referencing your articles in my comment.

Frank Firke
8 years ago
Reply to  Alan Nathan

I wrote this comment and then read Eli’s below; he basically wrote what I did more tersely, but I’ll leave this here anyhow:

My choice of words certainly could’ve been better, but I think we’re thinking about modeling in a different sense. You’re right that there’s been some good research that links batted ball data to performance, but it requires parameters beyond exit velocity that aren’t currently available to the public. Even with that additional data, more work needs to be done to decompose a complex set of relationships into inferences that will be straightforward to use on their own (and then for those inferences to be distributed widely and people to understand the limitations as well).

As it stands, if someone tells me at the end of May that some batter’s average exit velocity is down 10% compared to last year (or some other simple parameter), I don’t have any idea what to do with that information–is he hurt? should I be pessimistic about performance going forward? Is that just luck that’s going to regress? Whereas if someone tells me a pitcher’s fastball velo is up two ticks, it’s easier to see how that relates to performance both past and future.

It may well turn out that average exit velocity (or % of balls that are likely hits, or Glenn Healy’s I stat, or something else) will be a very useful tool for contextualizing performance the way that BABIP and FIP are, but I think the jury’s still out on any one tool until more research has been done on a better pool of data.

Peter Jensen
8 years ago
Reply to  Frank Firke

“but I think the jury’s still out on any one tool until more research has been done on a better pool of data.”

Well Frank, you got that half right anyway. There was plenty of research that was done at the time of the release of the May 2009 Hit Fx data to serve as models of what can be done with the data, and that data has been available to teams on almost every at bat since 2008. I am certain that most teams have developed sophisticated projection models that have incorporated Hit Fx data. Similarly, there was much work done by the presenters at the 2010 and 2011 Pitch Fx Summits to serve as models on what could be done with Field Fx data that includes tracking of fielder, runner, and batter movements on hit balls. There are plenty of analysts working in the public sphere that are capable of using those models or similar models of their on creation for making improved projections if the full Statcast data were available to them, and if they were not constrained by non-dis-closer agreements from sharing their research, and if the Statcast data had been vetted and proven accurate and reliable. This last if is a big if. We know that there still are errors in Hit Fx data that Sportvision has been aware of since 2009 and yet have remained uncorrected. We know that the first releases of Field Fx had many errors as well. we know that the Trackman ball tracking radars used fro Statcast lose track of some fly balls and have to extrapolate the lost positions with software. What we don’t know is how these problems affect the overall accuracy of the data that gets used in projection models.

A significant first step that I would like to see in 2016 is a public comparison of the Hit Fx data with Statcast data for the same balls in play. There could be three results. The best would be if the two systems had only insignificant differences on almost all balls in play. That would give us confidence in both systems and allow us to merge the large database of existing Hit Fx data with the new data provided by Statcast. The second result would be if they differed in a consistent way so that one data set could be easily translated to correspond with the other. The third result would be if the two data sets differed significantly on a large number of hit balls in a non consistent pattern. This would imply that one or both systems had major problems that would need fixing before their data could be used for serous analysis.

8 years ago
Reply to  Alan Nathan
Eli Ben-Poratmember
8 years ago

I don’t think he was inferring that there is nothing interesting in the velocity data, rather that there isn’t a simple linear formula that describes the value of hitting the ball hard, especially when contrasted to fastball velocity, where we see a near linear relationship between velocity and whiffs. Even in Mike P’s article above he points out that Harper was in the same class as Wilson Ramos, even while producing a generationally-good season. I think this will evolve into a version of hard/medium/soft where we have something like HR Optimal/Medium/Donut Hole which will encapsulate the 3 variables (velocity, launch angle and spray angle) into 3 easy to digest buckets. Those buckets can then create simple linear formulas that are easy to work with and easy to understand.

In other words, if Player A hit 20% of his batted balls into the “HR Optimal” bucket (which have a 35% HR prob), but only connected for 24% HRs in that bucket, he’d be on Mike Podhozer’s “aka the upsiders” list.

Alan Nathan
8 years ago

Some remarks on Peter’s comment: Peter knows, and perhaps many other readers also know, that there are well-known issues with HITf/x that have never been fixed, at least to my knowledge. The main one in the present context is the systematic *underestimation* of the exit speed. The reason is that the cameras pick up the batted ball some distance from the impact point, where the ball has already lost speed due to the combined effects of gravity and air drag. This problem was discussed publicly at the 2009 PITCHf/x summit, so there is no secret about it. It is not possible to fix the problem exactly, but there are steps one can take to mitigate the effect. I actually worked on this problem in the weeks after the summit back then (along with Rand Pendleton of Sportvision), but as far as I can tell, nothing was ever changed. I like to think that the reason for not implementing the changes was due to the departure of Marv White not long after the summit (but that is wild speculation on my part). By that I mean that if Marv would have remained, I feel certain the problem would have been fixed.

Bottom line: Anyone who has access to both HITf/x and Statcast/Trackman data for exit speeds will immediately see the the former is less than the latter. In an article that should appear at The Hardball Times in the next week or so, I estimate that the mean discrepancy is 4-5 mph.

Peter Jensen
8 years ago
Reply to  Alan Nathan

Alan – The other problem that is even more difficult to fix because there is no easy way to know when it occurs is when a ball hits the ground before Hit Fx determines its Hit Fx characteristics. The last I knew Hit Fx would assign the speed of the ball after hitting the ground as its Speed off the bat and the mirror image of its rebound angle as its initial vertical angle. So a ball that rebounded from its contact with the ground at a speed of 60 MPH and an angle of 15 degrees was record as having speed off the bat of 60MPH and an initial vertical angle of -15 degrees. This is not only a very bad guess, but it is not identified as being a guess.

Frank Firke
8 years ago
Reply to  Peter Jensen

This reply is largely to your comment above, Peter:

I think we agree on all of this. There has been lots of (presumably fruitful) research on this stuff; I’m sure team analysts chuckled if they bothered to read the piece. The key word in the title of the article and the sentence Alan highlighted is “public.” It’s not really relevant to me if teams have this figured out; I care about fans and public researchers. As a random fan, I can’t use batted ball measurements to analyze players, and that’s the focus of the article (not an overall state of the industry commentary on batted ball data).

This is not meant to impugn the work of anybody that has released public stuff; it’s not their fault the data aren’t there. As I mentioned in the article, if I had my druthers any data collected by the league (as opposed to individual teams) would be public, and I think it’s disappointing (and short-sighted) that they have kept this stuff private. If they bother releasing these new data in a good format (and/or any archived data), then there’s reason to believe the public models will improve pretty rapidly.

8 years ago

Is there something besides Harrelson’s age and reduced schedule that suggests this year will be his last? He signed a two-year contract extension this fall. http://chicago.cbslocal.com/2015/10/05/levine-hawk-harrelson-to-cut-back-to-81-game-broadcast-load/

He has said before that he wants to die in the booth. It seems like White Sox losing seasons take a lot out of him.

Frank Firke
8 years ago
Reply to  JorgeFabregas


Ken Harrelson has agreed to a multiyear extension to continue broadcasting White Sox games, but that does not mean the 2016 season won’t be his last.

The 74-year-old play-by-play voice said he loves his job more than ever. That said, if the Sox have another disappointing season like 2015, forget it.

“It’s a contract, that, as I told [Sox vice president for sales and marketing Brooks Boyer] and [chairman] Jerry [Reinsdorf], it might be at the end of the season where I say ‘Hey, I’ve had enough,’’ Harrelson told the Sun-Times Wednesday. “I hope that’s not the case because that means our team didn’t do well again. If I have to go through another season like we did last year, that would probably be enough — no, you can count me out.”