Seeing is believing

Some baseball writers have suggested that some of the more, shall we say, devoted members of the sabermetric community should leave their mother’s basement and go watch a game. I disagree; instead of merely watching a game, they should observe a game. Because I believe that the future of sabermetrics lies in Observational Analysis.

Observational Analysis involves numbers, but it is not about statistics in the traditional sense. It is about creating a complete and permanent record of what happens on the field, so that in-game events can be reviewed and analyzed at any time in the future.

Most times, analysts have only traditional stats to work with, spiced with imperfect anecdotal recollections of the game and its plays. This gap makes it very hard to figure out why the results that happened, happened—and the latter, of course, is the key to figuring out how likely they are to happen again.

Observational Analysis is about scrutinizing, and in some cases measuring or timing, important elements of the game, in order to decipher the internal workings of the game of baseball in unprecedented depth and detail.

Current examples of Observational Analysis

One of the greatest innovations for baseball analysis in recent years is Sportvision’s PitchFx, which has opened up many new avenues of inquiry among analysts. PitchFx focuses multiple cameras at the area between the mound and home plate in order to capture the precise trajectory of each pitch. When the data are extracted and examined, we have a perfect example of Observational Analysis, where analysts are manipulating not discrete, box-score counting stats such as plate appearances and hits, but more fundamental parameters like pitch velocity, location and movement.

Observational Analysis of PitchFx data is being published on a nearly daily basis, here at THT and elsewhere, by numerous talented writers, including Joe P. Sheehan, Mike Fast, Josh Kalk, John Walsh and many others.

Furthermore, Sportvision has indicated that we can expect a system (which I will refer to as “HitFx”) that provides the initial trajectory of the ball off the bat. Trajectory data for batted balls is one of the most promising things that Observational Analysis will eventually deliver.

One example of how this is the case was unveiled in my 2008 Hardball Times Annual article, “Of Home Runs and Free Agents.” In the article, I detailed the relationship between Speed Off Bat (SOB—i.e., how hard a player hits the ball) and the outcome of the hit, as measured by slugging average (see the plot below, which shows 2007 data for Andruw Jones and Torii Hunter). Knowing which players hit the ball the hardest (both overall and vs. various factors), or which pitchers (if any) suppress SOB the most, would obviously be very valuable information.


There are several ways to obtain Speed Off Bat data, including a radar gun, an aerodynamic model such as my Hit Tracker, or a camera-based system such as HitFx. Each method has its advantages and drawbacks:

  • Radar guns are already available in every park, but they require a live operator, and they introduce error based on exactly when the gun is triggered.
  • Hit Tracker is video-based and thus can be used on any game at any level, and even games from the past. But instead of direct measurement of SOB, it employs an aerodynamic model, and it requires a lot of time.
  • HitFx will be automatic, and it will measure SOB directly. But it needs to overcome technical hurdles before it even can be said to exist.

Regardless of the method used to obtain this valuable information, there is no alternative to observing the striking and propelling of the ball, which makes such activity a key part of Observational Analysis.

Limitations of Existing Systems

PitchFx is a great step forward, and HitFx most likely will be as well; however, they will not capture all of the important events that take place during a game. For example, unless HitFx were to cover the full trajectory of each hit, it would do nothing to advance the field of defensive analysis.

Here’s why: The landing point of a fly ball can vary enormously due to the effects of wind, temperature and altitude, so by itself, HitFx will never be able to predict the landing point of a fly ball with any greater precision than we get today with the conceptual “defensive zones.” Similarly, knowing the initial trajectory of a grounder off the bat will not allow us to know when and where the ball intersects with the infielders; there are too many variables in all those inelastic ball-ground collisions (also known as “bounces”).

Whether we are considering a fly ball or grounder, if we don’t know where it goes with increased precision, we can’t do anything new on defensive analysis. So we need better data. Fortunately, it is within our power to truly revolutionize defensive analysis, using only our eyes and a stopwatch. What follows is an example of how Observational Analysis can reveal the true reasons why certain outcomes transpire.

Example of Observational Analysis: batted ball comparison

On Aug. 25, 2007, in the 4th inning of a game against the St. Louis Cardinals, Atlanta’s Andruw Jones hit a leadoff single. On Sept. 20, 2007, in the 6th inning of a game against the Milwaukee Brewers, Andruw Jones grounded out, 4-3, to end the inning. At a glance, these two events don’t seem to have much in common besides the batter, but beneath the surface there is an interesting story.

A review of the Retrosheet box scores for the two games indicates that the first event was a “single to CF (ground ball)”, while the second event was a “groundout, 2B-1B.” Unfortunately, Retrosheet does not have zone data for either hit, so we can’t tell the direction of the grounders using this source. All we get are the outcomes: For the single, we can assume it went through the middle between Cardinals David Eckstein and Aaron Miles, while for the groundout, we know only that Milwaukee’s Rickie Weeks fielded it and threw out Jones at first.

The best way to dig deeper on these two hits is to actually watch the plays by viewing them on (subscription required). For the Aug. 25 single, you can find the single at time 1:41:44 of the 700K stream, or 1:43:53 of the 400K stream. For the Sept. 20 groundout, you can find the video at 2:49:44 of the 700K stream, or 2:48:35 of the 400K stream. I strongly encourage you to go watch the two hits, for if you do, you will realize that, in terms of direction and speed, the balls were struck virtually identically.

A Hardball Times Update
Goodbye for now.

Although the two balls skirted the mound and then second base to reach the same spot in the infield at the same time, the outcomes of the two hits were different because of the positioning of the infielders. In the Aug. 25 game, the Cardinals played Andruw Jones to pull, but only mildly so, with shortstop Eckstein towards the hole and second baseman Miles shaded up the middle. Jones’ grounder split the two fielders evenly and reached the outfield grass ahead of Eckstein’s belated dive.

In the Sept. 20 game, the Brewers applied a much stronger shift against Jones, precisely in accordance with the recommendation that I would make two months later in my THT Annual Article. In their strong shift, second baseman Weeks was positioned a couple steps to the shortstop side of second base, where he had only to bend over and drop his glove to scoop Jones’ medium-speed roller and throw him out.

It is interesting to consider how the two hits appear with different types or amounts of information. If you had only the Retrosheet box score, you would never recognize any relationship between the two hits, much less the crucial difference made by defensive positioning. If you also had zone data, you would learn that, on the groundout, Rickie Weeks fielded the ball on the other side of second base. This would be recorded as an “Out Of Zone” play, and, in the absence of any other information, analysts would ever after regard this as a great play by Weeks (though in fact it was as easy as a play can be).

Only via Observational Analysis, tracking each hit and noting the initial positions of the fielders, would you really understand what happened and why. You would recognize that the two hits were identical, and that the play outcomes were different only because of the initial positioning of the infielders. The decision to employ a strong shift made the difference, and it reflects great credit on the Milwaukee organization for acquiring and acting on the information that led them to make that decision.

Observational Analysis: Something we can all get behind

Interestingly, Observational Analysis is an approach that should appeal to both kinds of baseball enthusiasts. You know who I’m referring to: the Stats Guys who argue with numbers, and the Non-stats Guys who argue with words. Neither group has been very effective at converting the other to its way of thinking; their methods of persuasion reflect their own outlook instead of the others’, thus they mostly talk past each other (particularly when it comes to Hall of Fame voting)!

Although some Stats Guys may cling to the idea that all the answers can be found in a box score, most will think that they’ve died and gone to heaven when they dig into a database that has the precise trajectory of every pitch and hit for an entire season, along with the locations of all the fielders, and the weather conditions for every minute of all 2,430 games. Just try getting them out of Mom’s basement once they have their hands on that!

The truly hardcore Luddites might resent being told how often the average shortstop fields balls hit 20 feet to their left at 75-80 mph, but most Non-stats Guys will appreciate the insights gained through Observational Analysis, all the more so because those insights will come from an increased focus on what is actually happening on the field.

Who will perform Observational Analysis? There are three possibilities:

  • Organizations such as Sportvision, BIS or others. These are the companies that can afford to hire observers, and that can survive while waiting for demand for the information to appear and grow. They will make the information available, for a price.
  • MLB teams. They have access to the labor needed, and a strong incentive to obtain the information ahead of their competition. They will keep their information private.
  • Independent analysts. There are a lot of us, and we’ve all already demonstrated a willingness to spend our free time analyzing baseball. A collaborative group along the lines of Retrosheet, sharing the workload and the fruits of their labor, might be the solution that makes the information available to the most people.

There is a fourth possibility, but I consider it unlikely: Perhaps no one will take the time to study what takes place on the field and share their insights with the rest of us. However, I feel that the potential of Observational Analysis is manifest, and I expect that it will steadily develop and eventually become as big a part of baseball as traditional stats are at present.

Here’s to the future: Observational Analysis!

Comments are closed.