A New Classic in Sabermetric Literature

Renowned sabermetric author Jim Albert is back with a new book.

Jim Albert has a new book out. Released on Sept. 17 from CRC Press, it’s called Visualizing Baseball, and it’s a treat.

Albert is well known in the sabermetric world. He’s a math and statistics professor at Bowling Green, and his previous publications include Teaching Statistics Using Baseball, Curve Ball (co-written with Jay Bennett), and Analyzing Baseball with R (co-written by Hardball Times alum Max Marchi).

Curve Ball is a particular favorite of mine. Its subtitle is Baseball Statistics and the Role of Chance in the Game, and I often recommend it to people looking for a beginning sabermetric text. It not only lucidly describes probability and chance in the national pastime, its chapters comprise a basic history of sabermetrics and key underlying concepts. Curve Ball was published in 2001, so it doesn’t include the latest advances in data and analyses.

Visualizing Baseball brings Curve Ball up to date.  It is a pretty short book—just 135 pages—and many of those pages are filled with graphs rather than text. Jim is adept at combining words and visuals. His prose is pointed and precise, and he doesn’t waste time “sabersplaining.” Instead, he articulates a concept and then shows it on a graph. Both parts of your brain are engaged, your understanding deepens, and the lessons carry longer.

The graphs start out simple and grow in complexity as the content becomes more complex. Moving from simple scatterplots with fitted lines, Albert moves onto graphs with elaborate labels and changes the size of dots based on some underlying values. He inserts box plots, classic PITCHf/x graphs, density graphs, isobars and violin plots, to name a few techniques. It sounds overwhelming, but each graph builds on a previous one, and it all makes sense as you move along.

Most importantly, Albert doesn’t clog his graphs with lots of graph junk and needless color. The only color is blue—everything else is in black/gray scale. So many graphs these days seem to be built to impress other graphic artists instead of educating the reader. This book is a welcome antidote to that trend.

There are nine chapters in Visualizing Baseball. I’m going to briefly review each chapter and include a few graphs from his blog. By the way, I recommend his blog as part of your regular sabermetric reading list.

Chapter One: Team Statistics is an overview of basic trends in baseball, such as the number of triples per year. This is where Albert emphasizes scatter points with fitted lines, illustrates simple trends over time, and shows how the game has evolved.

Chapter 2: Career Trajectories uses more scatter points and fitted lines to show how career trajectories can differ. There’s a nice use of side-by-side graphics of, for instance, Robin Yount and Roberto Alomar (the sign of a pro: Albert knows to keep the scales even on side-by-side graphs).

Chapter 3: Run Expectancy is where Albert starts to get creative. This chapter was one of my favorites. Jim used a simple scatter graph to show how values can change between base/out states, how the slope of each line differs by the number of outs, and how third base/one out jumps out as outside the overall pattern. He adds some arrows to the graph to illustrate how something like RE24 works and then finishes the short chapter with a graph of the average relative value of each type of plate outcome—and how often each one occurs. This is one of the best introductory chapters to run expectancy I’ve read.

Here is the initial run expectancy graph.

Chapter 4: The Count takes run expectancy further, down to the count. Once again, there are several terrific visual insights here, such as this graph of how run values change by count. (I’ve copied this graph from Jim’s blog

A Hardball Times Update
Goodbye for now.

Albert takes this chart a step farther and compares two players, Mike Trout and Bryce Harper, on side-by-side graphs for the 2015 season.

There are several fun facts here: Harper never goes negative. What’s more, his 0-2 run value is just as good as his 1-1 value. Both create much more value on 3-1, relative to 3-0, than the average batter. (Could this be the result of intentional walks?)

Chapter 5: PITCHf/x Data goes graph-crazy. Of course, Jim uses the classic horizontal/vertical break graph and shows how they differ by pitch. Plus, he throws in a detailed graphical analysis of Clayton Kershaw’s stuff, with an extra graph or two thrown in to show variance of outcomes by pitch type and other cool things.

Chapter 6: Batted Balls delves into more data, starting with Greg Rybarczyk’s ESPN Home Run Tracker and then, as expected, moving into StatCast data. The section on home runs is a great overview of how home runs are hit, the importance of vertical and horizontal angles and how ballparks impact home runs.

In the second part of this chapter, Albert does the usual angle/exit velocity thing, but his graphs are slightly different than the one you’ll find at Statcast.

Chapter 7: Plate Discipline explores the seven different swinging stats at Fangraphs (O-Swing, Z-Swing, and the like) and how they play out between swing and contact rates. Again, a heavy use of scatterplots with some strike zone graphs thrown into the mix.

Chapter 8: Probability and Modeling brings out the classic win probability game graph along with leverage index. It touches on the notion of WPA in the same simple way Chapter Three illustrated how RE24 works. Then things get really fun as Albert explores postseason probabilities and batting average swings.

Jim Albert has written a lot about probabilities in baseball, and his expertise shows in this chapter. He touches on the concepts of true talent vs. actual performance and regression to the mean. These are subjects that lend themselves well to a graph, and your average reader will emerge from this chapter with a clearer understanding of the subject.

Chapter 9: Streakiness and Clutch Play is the final chapter, and Albert pulls out all the graphical stops. The reader gets rug plots, moving average plots, dot plots and geometric plots, as well as a return to the old standard scatter plot to illustrate how the distribution of outcomes can differ significantly for similar players. This is one of the best uses of graphs.

His penultimate graph is a beauty, pulling together many of the graphical techniques he has built to illustrate Stephen Piscotty’s clutchiness in 2016.

Visualizing Baseball is a baseball stathead’s beginner’s book that isn’t just for beginners. As such, it will occupy a space on my bookshelf next to Curve Ball and Ken Ross’ A Mathematician at the Ballpark. It will fit right in.

References & Resources

  • Jim has published a blog accompanying Visualizing Baseball. You can find some graphs there, as well as the R code behind each chapter.
  • If you’re interested in making great graphs, you can start with the classic that Jim references, The Elements of Graphing Data by William S. Cleveland.
  • Then go onto read Edward Tufte, the guru of data visualization.
  • Don’t forget to regularly visit Jim’s baseball blog, Baseball With R.


Dave Studeman was called a "national treasure" by Rob Neyer. Seriously. Follow his sporadic tweets @dastudes.
7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Michael Bacon
7 years ago

Curve Ball was an interesting read. A hardback copy of Visualizing Baseball is $150, which is prohibitive. The paperback is “only” $30, which will be way out of reach for most fans, especially with so many graphs, no matter how good the book.

Kristopher
7 years ago
Reply to  Michael Bacon

Ouch. That’s a hefty price. Hopefully it’s not one of those “I wrote a book that also happens to be required reading in this course, and it costs $150.” With that said, that first graph bothers me quite a bit -or- it bothers me as much as a graph can bother a person. We use graphs to make tabular data more digestible, and I don’t think this one does that. Beyond repeatedly reading it as though two men were on second base, I’m stuck wondering why the axis goes from nobody on to one runner on third base. I guess that the two points are related in that the labelling system that I don’t like is then sorted in ascending order. Maybe I’m picky, but in a book on graphs, I’d kinda expect better.

Jetsy Extrano
7 years ago
Reply to  Kristopher

Yes, that x axis is bizarre! Why not increasing run order?

I think it’s an accidental fallout after choosing “123” order instead of “321”.

WARrior
7 years ago

The Harper-Trout comparison clearly indicates Harper is the better hitter, in almost all counts, and overall (0-0). That didn’t make sense to me, until I saw that it referred just to 2015, Harper’s best year. For the record, though, Trout’s has been higher in other years, and he has a major edge in career value, about .080 vs. .049.

Michael Bacon
7 years ago

“As for cost, that’s obviously up to each person’s preference, but there is no reason to buy the hardcover book. The paperback is fine.”

That is just a little less than a quarter a page. Four pages costs about one dollar. Ten pages costs a Starbucks coffee. Twenty pages a Crappy Frappy.

Easton8
7 years ago

Curve Ball is a particular favorite of mine!
192.168.1.1