On the Nature of the Strike Zone in Two and Three Dimensions

by Matthew Mata
November 9, 2015

This is an example of the truncated strike zone.

The majority of strike zone analysis considers the zone to be a region of the plane vertically passing through the front of home plate. This can be limiting in that we are unable to identify strikes that may pass through the top or sides of the strike zone based on its rule book definition, which is three dimensional. In addition, judging a pitch only at this front position may make a called pitch look worse than it was in terms of missing the strike zone.

To understand such effects that result from using a 2D analogue, and also how well the rule book strike zone is called, we will write an algorithm to find (1) whether a pitch passes through the strike zone in 3D and if not, (2) how close came to doing so. This amounts to building a virtual umpire that uses the PITCHf/x data to assign balls and strikes.

For this algorithm, we take PITCHf/x data and compute the minimum distance of each pitch, throughout its entire trajectory, to the pentagonal prism that is the 3D strike zone. We can then find, for example, the percentage of pitches that pass through the zone that are called strikes by umpires and, for those that are incorrectly called, by how much they missed the zone. Comparisons can also be drawn between this and the 2D strike zone to see which better represents umpires’ calls.While the rule book strike zone may not be the three-dimensional solid that best matches called pitches, we can still try to find a solid that does a better job, and gain some insight into what the called strike zone actually looks like.

Finally, we will delve in to the well–documented disparity in 2D between the strike zone to left- and right-handed hitters, where the strike zone to lefties is shifted slightly to the left relative to the zone for righties, by treating each zone as a transformation of the other.

In terms of current analysis, Eric Lang recently examined the strike zone in three dimensions. In the article, Lang examined the length of a pitch’s path in the strike zone versus its probability of being called a strike and back-door strikes. Our approach is oriented toward the outside of the strike zone and a pitch’s distance to it, but the framework of our algorithm is such that the calculations done by Lang using iterations of a pitch over 0.025-foot intervals could be done exactly by solving equations related to distance.

Strike Zone Definitions

The rule book strike zone, for our purposes, consists of a volume above home plate that is defined above and below by the sz_top and sz_bot variables from the PITCHf/x data. This is a reasonable starting point and could easily be modified within the algorithm to either be fixed heights or some batter-specific set of values for each pitch.

The front strike zone is typically where pitches are considered for assessing balls and strikes, and consists of a portion of a vertical plane at the front of home plate. We will initially define it as the width of the plate with vertical bounds of sz_top and sz_bot, but will also explore changing these values to better match called pitches.

The process of determining whether a pitch passed through the part of the strike zone at the front of home plate is straightforward in that one only need check whether the center of the pitch, when its y-coordinate is equal to 17/12 feet, is less than 1.5 inches from the zone. However, to design an algorithm to work in three-space, requires a much more complicated approach. The goal is to write an algorithm to find, from 50 feet from home to the back of the batter’s box (unless the ball hits the ground prior), the closest each pitch comes to the strike zone. If that distance is zero, the pitch is a strike and otherwise, it is a ball.

The broad strokes of the algorithm are that the space around the plate is divided into several regions, based on whether a pitch in each region will be closest to a face, edge or corner of the strike zone. Then, while the pitch is in each region, we need to find the minimum distance from the parameterized location of the pitch, (x(t),y(t),z(t)), to a face/edge/corner of the zone. Every time the pitch switches regions, the minimization algorithm switches in kind. Once this is performed over each region the pitch passes through, the smallest of these distances is taken to get the minimum distance to the strike zone. Within the algorithm, several values are tracked, including the minimum distance and in which region it occurred. A more detailed explanation of the algorithm, along with a version of the algorithm in R, can be found at the end of this article.

Before running the algorithm for the aforementioned applications, we can first get some idea of how many potential strike calls we are losing from restricting to 2D by comparing the percentage of pitches that passed through the front of the strike zone in 2015 to those that passed through any part of its 3D representation. The 2D strike zone at the front of home plate had 48.54 percent of pitches passing through this portion of a plane in 2015. For the 3D strike zone, the percent goes up to 49.47 percent, resulting in slightly less than one percent of pitches passing through the 3D zone, but not the 2D one.

So with 702,594 pitches recorded by PITCHf/x in 2015, this affects approximately 6,500 pitches. We can also see how many pitches this affects in terms of finding the minimum distance to the strike zone. If we take all pitches that had not hit the ground before reaching the front of the plate in 2015, 23.31 percent achieved their minimum distance at some location other than at the front of the plate, meaning that nearly a quarter of all pitches pass closer to the strike zone than 2D analysis would indicate, even if only by a small amount.

Correctly-Called Strike Percentage and Minimum Distance of Incorrect Calls

Our first use of the algorithm will be to find the percentage of strikes called correctly and, for those called incorrectly, how far they missed by (at least in terms of the PITCHf/x data and vertical strike zone limits). We consider both of these cases for lefty and righty batters in 2015 and focus on the four regions that correspond to faces (front, top, left, right of the strike zone).

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

Both front planes of the strike zone produce a similar percentage of correct calls at around 83.5 percent. For right-handed batters, about two-thirds of pitches are called correctly for pitches passing through the side faces of the strike zone. The sides for lefties are less balanced with 78.5 percent of strikes called correctly on the outer edge of the plate while only 34 percent are called on the inner edge. For strikes entering from the top of the strike zone, neither had a high percentage. This may be due to the top and bottom of the strike zone from PITCHf/x differing from the umpires’ choice for them.

We can also find the distance that called strikes missed the strike zone by in each region. We will take every pitch that was called a strike but had a non-zero distance from the strike zone and find the average distance over the nine regions around home plate.

For the front, top and sides of the strike zone, the distance from the zone for called strikes ranges from approximately 1.25 to 1.8 inches. The left-handed zone performs worse on the front and left face while the right-handed zone has larger average distances for the top and right faces.

Since this is a very restrictive approach to the strike zone, based on the exact dimensions of home plate, we can be a bit more forgiving near the edges and observe its effect on the same metrics. We will pad the strike zone on the sides by 1.5 inches, which serves as a characteristic distance (based on the range of values for incorrectly called strikes from the above diagrams). This will be referred to as the lenient rule book strike zone.

The percentages for the lenient strike zone are similar to those from before, albeit slightly lower:

This drop in the percentages is due to the fact that, as more area is allotted to generate strikes, pitches that were correctly labeled balls now show up as strikes. The overall effect is that the percentage of correct strike calls drops across the four major regions corresponding to faces.

With this expansion of the strike zone, the average distances that pitches were incorrectly called strikes drop a bit in three of the four major areas, but again, at the cost of getting more calls incorrect.

Called Percentages for Various Strike Zones (2012-2015)

To see how well each of the strike zones considered matches with umpires’ calls, we will run the algorithm for the last four seasons (2012-2015) to get four percentages: correctly called strikes, correctly called balls, correctly called pitches, and the average of the strikes and balls called correctly (weighting each percentage equally). Depending on which is deemed most important or of interest, any of these four choices could be used to analyze the differences between the representations of the zone.

For example, the weighted percentage might work well if one wants to compensate for the large number of potential called strikes that are lost to balls in play and fouls. In 2015, 96,871 pitches were called strikes and 228,567 were called balls, so working with just correct-call percentage, getting calls right for balls will have a stronger influence than strikes. Recall from earlier that just below 50 percent of all pitches passed through the strike zone in 2015 (2D or 3D), so using a 50-50 weighting scheme seems reasonable to adjust for this. In this study, however, we will focus on the correctly called pitch percentage as a good general-purpose metric.

Rule Book Strike Zone
	LHB				RHB
Year	Strike %	Ball %	Call %	Weight %	Strike %	Ball %	Call %	Weight %
2012	75.3	90	85.2	82.7	85.7	88.4	87.5	87.1
2013	77.7	91	86.7	84.4	86	89.4	88.2	87.7
2014	78.5	91.6	87.3	85	85.5	90.2	88.6	87.9
2015	79	91.8	87.7	85.4	85	90.9	88.9	87.9

From 2012 to 2015, the correct-call percentages for the rule book strike zone increase by a few tenths of a percent per year, with the disparity between left and right-handed correct calls going from 2.3 percent in 2012 to 1.2 percent in 2015. For balls and strikes, only the percentage of strikes called correctly to RHB has not increased over the last four years (and actually decreased three years straight).

For the front of the strike zone, the percentages are similar:

Front Strike Zone
	LHB				RHB
Year	Strike %	Ball %	Call %	Weight %	Strike %	Ball %	Call %	Weight %
2012	74.2	91	85.6	82.6	85	89.6	88	87.3
2013	76.5	92	87	84.2	85.2	90.4	88.7	87.8
2014	77.4	92.5	87.5	84.9	84.8	91.3	89.1	88.1
2015	78.1	92.7	88.1	85.4	84.4	91.9	89.5	88.2

Compared to the rule book strike zone, the percentages of correct calls for strikes are lower (due to less surface area for pitches to pass through for strikes) and, by the same token, the percentages of correctly-called balls are up. Using the call percentage, the front of the strike zone gives a better approximation of the zone an umpire calls than the actual strike zone itself.

Applying the algorithm for the lenient strike zone generates a decrease in correct-call percentages compared to the other two representations:

Lenient Rule Book Strike Zone
	LHB				RHB
Year	Strike %	Ball %	Call %	Weight %	Strike %	Ball %	Call %	Weight %
2012	85.4	82.7	83.6	84	93.6	79.9	84.5	86.8
2013	88	83.8	85.1	85.9	94.1	81	85.4	87.6
2014	88.9	84.5	85.9	86.7	94.3	82.1	86.2	88.2
2015	89.7	84.8	86.4	87.3	94	83.1	86.7	88.5

Adding the extra 1.5 inches around the strike zone yields about a 10 percent increase in correct strike calls relative to the front of the strike zone in all cases. However, this comes at the cost of misclassifying balls, with a similar drop in percentage. These changes lead to a decrease in the percentage of correct calls by a few percent.

To allow for more spatial freedom later on, we will assume that the strike zone can be approximated by a rectangular box. We will take only the rectangular part of the home plate in the y-direction (between y=17/12 feet and y = 17/24 feet) and refer to this as the truncated strike zone, as it removes the triangular part of the plate. This simplification will be of use when we start manipulating the shape of the strike zone to attempt to find an optimal one for different geometries.

Truncated Strike Zone
	LHB				RHB
Year	Strike %	Ball %	Call %	Weight %	Strike %	Ball %	Call %	Weight %
2012	75.2	90.3	85.4	82.7	85.6	88.8	87.7	87.2
2013	77.6	91.3	86.9	84.4	85.9	89.7	88.4	87.8
2014	78.4	91.8	87.4	85.1	85.4	90.5	88.8	87.9
2015	78.9	92	87.9	85.5	84.9	91.2	89.1	88

This version does almost as well as the front strike zone and falls in between the front and rule book ones in terms of best percentages of correct calls. This makes sense because it matches the shape of the rule book zone on the front and sides, but lacks the more complicated geometry that comes from using the entire plate as a boundary. With these results as baselines, we can manipulate these versions of the strike zone to try to get better percentages and, ultimately, try to get a handle on what volume may best approximate the strike zone that is actually called.

Optimal Strike Zone in Two Dimensions

Focusing on 2015, to attempt to find an optimal zone in 3D, we will first establish what creates the best strike zone in two dimensions. We will adjust the sides of the 2D zone as well as its height, relative to the sz_top and sz_bot PITCHf/x variables, and see what configuration produces the highest correct-call percentage for both types of batter. The adjustments done to the four sides are a half-inch at a time to try to maximize this percentage.

MLB Optimal Front Strike Zone (2015)
Stand	Top	Bottom	Left	Right	Strike %	Ball %	Call %	Weight %
LHB	-1.5 in.	+0.5 in.	-10.5 in.	7.5 in.	84.369	93.606	90.681	88.987
RHB	-1.5 in.	+1.5 in.	-9.5 in.	9.5 in.	85.678	93.406	90.861	89.542

The best width for the strike zone to right-handed batters is symmetric, extending in each direction by 9.5 inches from the center line of home plate. The left-handed batters’ strike zone is more skewed and an inch tighter horizontally in comparison. The tops of the optimal strike zones match but the LH zone is about an inch lower. These zones generate slightly less than 91 percent correct calls and around 85 percent correctly called strikes.

Optimal Strike Zone in Three Dimensions

Since the bounds for the strike zone in 3D are determined by home plate, we will leave these as-is for the rule book and lenient algorithms and adjust the height of the strike zone relative to the PITCHf/x-prescribed values. For the simpler version of the truncated zone, we can adjust horizontally and vertically, as in 2D. We are interested in beating the optimal strike zones in 2D mentioned above with one of these representations. Unfortunately, since the distance a pitch is from the strike zone is relative to the zone itself, every time the strike zone is adjusted, all distances need to be recalculated from scratch, leading to a slow process of guess-and-check to find the best values.

MLB Optimal Rule Book Strike Zone (2015)
Stand	Top	Bottom	Strike %	Ball %	Call %	Weight %
LHB	-3 in.	+1 in.	74.768	95.247	88.763	85.008
RHB	-3.5 in.	+1 in.	81.069	95.099	90.479	88.084

For the rule book strike zone to RHB, the call percentage is comparable to that for the 2D zone, differing by a fraction of a percent. The zone for LHB fares worse at about two percent lower in call percentage. This is due the the strike zone to lefties being slightly shifted to the outside part of home plate, so a symmetric zone is at a disadvantage in this regard. For both types of batter, the called-strike percentage is lower, by about 10 percent for lefties and 4 percent for righties. To balance this out, the called-ball percentage is up by approximately 1.5 percent in both cases.

MLB Optimal Lenient Rule Book Strike Zone (2015)
Stand	Top	Bottom	Strike %	Ball %	Call %	Weight %
LHB	-5 in.	+2.5 in.	82.09	93.224	89.699	87.657
RHB	-5.5 in.	+3 in.	86.183	93.511	91.098	89.847

For the lenient strike zone, the call percentage for righties bests that of the front of the plate by roughly a quarter of a percent. Again, the zone for lefties falls short of the 2D version since the skew is not accounted for, but it does outdo the rule book strike zone. Since both rule book-based zones perform comparably to the fully adjustable 2D one, presumably having the liberty to move it freely (while sacrificing the triangular back end of the plate) should allow for higher correct-call percentages, relative to the other options.

MLB Optimal Truncated Strike Zone (2015)
Stand	Top	Bottom	Left	Right	Strike %	Ball %	Call %	Weight %
LHB	-3 in.	+0.5 in.	-11 in.	7.5 in.	85.596	93.152	90.759	89.374
RHB	-3.5 in.	+1.5 in.	-10 in.	9.5 in.	85.244	93.818	90.995	89.531

Using a 3D box, we are able to outperform the optimal 2D cases and rule book zones with only the RHB lenient zone being its better by 0.1 percent. This is likely due to the more curved corners and edges of the lenient zone better matching the data (as can be seen in the 2D heat maps of the next section).

Based on these results, the best version of the strike zone may lie among some combination of the truncated strike zone with a lenient condition on its surface, which rounds off the corners and edges. Also note that as the strike zone is lengthened in the y-direction, the top of the strike zone lowers, as pitches that start high at the front of the plate can fall and still land on the top of the strike zone, while the bottom stays relatively the same at around sz_bot + 1 inch (accounting for the extra 1.5 inches added to the top and bottom in the lenient version). While up to this point, we have relegated the edges of the strike zone to lining up with the axes, we can try to get a better handle on whether a less restrictive condition may fare better by visually examining the strike zone in 2D.

2D Strike Zone Modeling and the LH versus RH Strike Zone Transformation

To get a good idea of the exact form of the strike zone at the front of the plate in two dimensions for 2015, we will start by using a heat map to plot called strikes and balls. Each pitch is represented by a three-inch circle and the number of intersecting called strikes at a given point is divided by the total number of intersecting called pitches to form the called-strike percentage. By doing this, there is no smoothing or interpolation occurring in the heat map that may damp out minor details. The red line indicates the strike zone in two dimensions with heights of 1.5 and 3.5 feet, and the green curve is the 50 percent-probability strike contour. Here, there is no adjustment vertically relative to the top and bottom of the zone for each batter.

With this as a guide, we can then construct a model for the strike zone, both for left- and right-handed batters. We start by defining a region for each where we will assume strikes will be called 100 percent of the time. The shape of this area will be taken to be a quadrilateral (a closed, four-sided object). From there, we will model the drop-off from this area to the region where strikes are called, effectively, zero percent of the time using an exponential function. Via experimentation and comparison with the data, the function exp(-4 x^4) works well to model this decay and the curvature of the interface. The variable x represents the minimum distance, in feet, from the 100 percent-strike region. Next, we need to adjust the 100 percent region so as to fit the model to the data from the heat map. This can be accomplished by matching contours from the heat map with contours from the model.

Coordinates of the Quadrilaterals Representing the 100% Strike Region (2015)
Left-handed Batters				Right-handed Batters
Upper Left	Upper Right	Lower Left	Lower Right	Upper Left	Upper Right	Lower Left	Lower Right
(-0.48,2.8)	(0.22,2.87)	(-0.55,2.04)	(0.24,2.16)	(-0.4,2.83)	(0.35,2.8)	(-0.47,2.11)	(0.39,2.1)

For both the left- and right-handed strike zones, the fit appears to match the contours very well in most regions. This fitting could also be done by minimizing the difference between the model and the data, but requires a more advanced algorithm which is feasible to write, but likely would not produce markedly better results. The blue quadrilateral in each image encloses the 100 percent-probability strike region.

Both seem to match well visually with the data and have the advantage of straight edges for large portions of their contours, making it easier to observe the shape of the strike zone. Noticeably, the strike zone for righties is very balanced, left to right, and fairly level on top and bottom. However, for left-handed hitters, the strike zone is tilted up on the inside half of the plate and down on the outside, leading to more of the called strike area extending outside the rectangular strike zone. As noted earlier, this difference in zones is not a new observation but, using the model, we can try to discern why this is the case.

To this end, we will focus on the 100 percent strike region for each type of hitter and consider how the two relate to each other by plotting the four corners of each. If, for example, the strike zones are related by a translation, to go from the RH zone to the LH one, the four corners would all move in a similar direction and a similar distance.

The corners for righties (green) can be moved up and left on the right side and those on the left moved down and left to reach the associated locations for lefties (red). This kind of transformation is more characteristic of a rotation than a translation. To test this theory of a rotation, we can draw a straight line between each pair of corners and find a perpendicular line that bisects (or divides into two equal pieces) the line between each pair of corners. If all four of these lines intersect, the 100 percent regions are rotations of each other.

It appears that three of the four lines nearly intersect, with the fourth line, coming from the upper-right corner, missing high. Comparing to previous years, a similar result is present for 2014 while 2012 and 2013 have nearly all four lines intersecting (2013 shown below).

We can now apply a rotation to each of the corners for RHB to try to get to the corners for LHB. Each rotation is a slightly different angle of approximately (starting in the upper left and working clockwise) 3.6, 6, 12, and 7.2 degrees. Note that since the angles are not the same, the rotation is not rigid and the quadrilateral will deform slightly in the process. The center of rotation is taken to be (0.06,1.425).

This places three of the four rotated points (blue) nearly on top of their left-handed counterparts. From this, we can conclude that the strike zone for lefties is, while not perfect, close to a rotation of the right-handed zone about a point beneath the bottom of the strike zone.

The process of rotating the zone for RHB to be the one for LHB is shown below as a GIF. The green curve is the 50 percent called-strike contour for right-handed batters from the data in 2015 and the red curve is the same for the left-handed batters. The blue curve is the model, starting with 0 percent rotation as the RHB strike zone model, and is rotated to try to match the LHB strike zone using the above center of rotation and angles. The blue quadrilateral demonstrates the deformation of the 100 percent called-strike region. Even with the upper-right corner not matching perfectly, we still get strong visual evidence that the two zones are related by a rotation.

While we can conclude that the two strike zones are roughly a rotation of each other, the harder question is to ascertain why this is true. It may be as simple as a matter of perspective based on the position of the umpire making the calls, but this would be difficult to tease out of the data, if even possible. One aspect of the results from the model that lends itself to this idea is that the top of each zone is slightly narrower than the bottom. This would make sense from a point of perspective since two lines would look the same length at different distances if the line that was farther away was longer. Therefore, it would seem reasonable that the lower front edge of the strike zone being longer than the top might appear to the umpire as being the same length. Again, this is something that is hard to verify purely based on data and the model.

3D Strike Zone Optimization Algorithm

At a given time in its trajectory, a pitch’s shortest distance to the strike zone will either be to a (1) face, (2) edge, or (3) corner. To figure out this minimum distance, we need to track the location of the pitch and determine which of these three options it is closest to over the entire time interval that the pitch is in flight. We will divide the area in and around the strike zone into three stages and, within each stage, several regions. The first stage is before the pitch reaches the front of the strike zone at 17 inches from the back of home plate. The second stage is made up the region between the plane at the front of home plate and the back of the rectangular region of the plate at 8.5 inches from the tip of the plate. The third stage encompasses the back of the rectangular region to the back of the batter’s box, at -27.5 inches in y.

Within each stage, there will be several regions we need to consider and each region will require a different optimization technique. The first and second stages are divided along the top and bottom of the strike zone and the left and right sides of home plate to form nine regions (which can be seen near the beginning of this article). The third stage is divided along the top and bottom of the zone as well, but also along lines extending back along the diagonal edges of home plate. This, in total, forms 36 regions. Within each region, we can say what part of the strike zone that the pitch will be nearest and find the minimum distance.

As an example, in a region where a point is closest to the curve, say at (a,b,c), we want to minimize the distance d(t) = sqrt( (x(t)-a)^2 + (y(t)-b)^2 + (z(t)-c)^2 ) between times t0 and t1 (which is the time interval the pitch is in this region). This can be done using standard techniques from calculus related to optimization, using a slightly modified function D(t) = 0.5[d(t)]^2. By setting its derivative to zero, we arrive at a cubic equation of the form At^3 + Bt^2 + Ct + E = 0 that can be solved for t. Then finding d(t0), d(t1), and d(t*) for any of the aforementioned solutions t* between t0 and t1, we can take the smallest value to be the minimum distance in that region. This can be done similarly for a face or an edge.

As a 2D example, we can eliminate the x variable (take x = 0) and only consider y and z. Then we would be interested in how close the pitch came to the rectangle representing the strike zone. In the below GIF, a pitch is shown passing through all three labeled stages and the distance from the strike zone at each frame is indicated in the upper-right corner. For this example, the minimum distance is 0.18 feet in Stage 1, which means the pitch comes closest to the front corner of the pictured 2D zone.

To track the stages/regions that a pitch passes through, we need to solve several equations related to (x(t),y(t),z(t)) for where it crosses the lines that separate the stages/regions. Then for each region that the pitch encounters, the minimum distance is found over that region and of those distances, the minimum of those is the overall minimum over the pitch’s entire flight path. Since these calculations are being done precisely and without approximation, it is very fast and as accurate as can be done with the PITCHf/x data.

Discussion

Based on the options presented above, the best representation of the strike zone, for matching umpires’ calls, would be a truncated strike zone that was appropriately shifted horizontally and vertically. Since the lenient zone outperforms this for RHB, which is likely due to the curved corners which will better match the data, a better choice may be a combination of the two. We can also experiment with different limits in y for the strike zone. For example, the strike zone can be stretched in front of the plate or even all the way back to the catcher, but adding this extra freedom means more configurations to consider to find an optimal one.

Even with this increased flexibility in the strike zone model, we are still limiting ourselves to edges aligned with the coordinate axes. It may be that working with a more general framework including, for example, slanted edges and curved surfaces may produce better results. We has seen that the LHB strike zone in 2D appears to be a rotation of the one to RHB, and both appear to be slightly narrower near the top of the zone.

Implementing these features, while not difficult, requires a custom code for each scenario. So we may be able to improve the results by loosening some of the geometric restrictions, but it comes at the cost of forming multiple iterations of the algorithm, which needs to track the region a pitch is in, relative to which part of the strike zone is closest. Also, since we do not have a handle on what the maximum correct-call percentage achievable might be, it would be hard to say if a small percentage improvement from such modifications was meaningful.

For the 2D rotation, while we do not get an exact rotation of all four corners of the quadrilateral, we get a point of rotation for three and the fourth is not far off. This may be due to the transformation of the strike relative to the handedness of the batter being a 3D rather than 2D effect, so we are not capturing it in full. For example, the strike zone may be rotated for LHB, but that rotation may also modify it in the y-direction as well. However, we cannot begin to check for this until we have some idea of a shape of the called strike zone with this level of dimensionality, which is no simple task unto itself. Based on what we have seen, a next step might be to build a model for the strike zone that incorporates the curved edges and corners seen in 2D and can use edges that do not align with the coordinate axes. We would then have to find the optimal choice of values for the model, which is harder since there is an increased number of degrees of freedom in the x-, y-, and z-directions.

R Code

Posted below is the R code for the strike zone algorithms used in this article. The algorithm takes in the nine PITCHf/x parameters, as well as a top and bottom of the strike zone per pitch, and outputs the stage, region (labeled “zone” in the code), and minimum distance. The first file contains all of the optimization algorithms and the second the algorithms for the rule book zone, lenient zone (for the rule book zone, set ball_radius=0.25), the truncated zone (labeled “box”), and front zone (labeled “2D”). There is also a “tunnel” algorithm, which takes the strike zone all the way to the back of the batter’s box that was not considered in the article. The remaining algorithms used in this article, while not posted here, can be posted in the comments upon request.

Strike Zone Optimization Algorithms
Strike Zone Minimum Distance Algorithms

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Scott

9 years ago

Just want to say this was a really interesting and thorough piece. Great work.

Matthew Mata

Reply to Scott

Thanks! It took me about a month over the summer to design and write the algorithm and, the way I was able to put it together, it can be modified to work for any 3D strike zone. Hopefully, I can adjust it to test 3D analogues of the 2D zones mentioned near the end of the article.

francis

Imagine if umps got 16% of fair / foul calls wrong for a minute.

If the tracking software is accurate, it needs to be implemented. If the technology was available in 1860, I’m sure it would have been used.

Reply to francis

The calculations, relative to the strike zone used, done in the article are as accurate as can be performed with the PITCHf/x data. Also, it can be adjusted to use any values for the height of the strike zone on a given pitch just by changing two parameters in the code.

Reply to Matthew Mata

It’s awesome. More articles like these and the league may actually automate balls & strikes one day.

Keep up the good work, you could be part of history !

Peter B

Matthew, I looked at your R code and I think I am missing something… I didn’t see any adjustments for spin or drag in the trajectory calculations – just the final position extrapolated from the initial position/velocity/acceleration in each dimension. Am I misunderstanding your calculations?

Reply to Peter B

The trajectories are completely determined by the 9-parameter PITCHf/x model, so we’re not doing any adjustments in that regard. Using this data as a parameterized curve (x(t),y(t),z(t)) in three dimensions, we find the time at which the pitch is closest to the strike zone (and that produces the associated location in 3-space). Any spin or drag acting on the pitch is already built in to the PITCHf/x parameters. If we wanted to, for example, remove or adjust the spin/drag on a pitch, we could in theory, but here we’re just considering the pitch trajectories as-is.

Also, we’re not using the final position, but rather the position when the pitch is closest to the strike zone to find the distance. Hopefully, that clarifies and if not, let me know and I can try to give you a more detailed explanation.

I see, you’re using what Alan Nathan refers to as the “standard procedure” (http://baseball.physics.illinois.edu/Movement.pdf). Most of the work I’ve seen on pitch trajectories has used his drag-adjusted calculations and I think sub-consciously I just assumed that’s what I’d see in the R code (and by the way, kudos to you for posting it – it’s a habit all researchers should adopt). The differences in practice are probably small – an inch or two here or there.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG