Pitching injuries: A PITCHf/x look

by Kyle Boddy
April 13, 2011

Manny Corpas has suffered multiple elbow injuries. Could PITCHf/x metrics help us determine why? (Icon/SMI)

My apologies for the long delay in articles. I’ve been hard at work on re-creating minorleaguesplits.com over at ML Splits using Jeff Sackmann’s freely-available data files. I have all the MLEs and park factors from Jeff’s data for batters and pitchers, and it’s finally at a stable point where the archive data can be easily accessed. I’ve also modified some scripts with the help of THT’s own resident genius – Brian Cartwright – and have started to spider all the original source data from all the minor leagues (Rookie to Triple=A) in hopes of creating a brand new data warehouse and continually-updating site – just like Minor League Splits used to be, with a few more upgrades!

The other project I alluded to in previous articles is the Open Biomechanics Project. This has been pushed back to mid-May 2011 at the earliest due to my work with minor league data, vacation trips, and running Driveline Baseball in Seattle.

PITCHf/x Variables and Pitcher Injuries: The Link

Our holy grail is to correlate easily-collectible data points with trips to the disabled list. This would help save teams millions of dollars by knowing which pitchers were more likely to break down. Josh Kalk wrote a bit about this using neural net analyses back in February 2009, and Jeff Zimmerman has done some work on this using BMI, which didn’t show a lot of promise (see comments).

Using the Advanced Baseball Injury Database, I separated pitchers into two groups:

1) Pitchers who suffered shoulder or upper arm injuries that kept them out for at least 15 days from 2008-2010
2) Pitchers who suffered elbow or forearm injuries that kept them out for at least 15 days from 2008-2010

I pulled all the days lost on the disabled list by pitchers that fit into these two groups and then calculated the following nine independent variables that went into my regression analysis:

-Body Mass Index (BMI)
-Adjusted vertical release point (Height – Average Z-release point)
-Average fastball velocity
-Variance of z-release point (weighted across all pitches)
-Variance of x-release point (weighted across all pitches – pitchers who clearly stood on different parts of the rubber were eliminated from the analysis)
-% of pitches that were fastballs
-% of pitches that were sliders and cutters (grouped)
-% of pitches that were curves or knuckle curves (grouped)
-% of pitches that were changeups or sinkers (grouped)

Some initial hypotheses that some people have about pitching mechanics and injuries are:

-A more consistent release point decreases the chance of injury
-The higher the release point in relation to the body, the more stress on the glenohumeral (shoulder) joint due to various stabilization issues
-The lower the release point in relation to the body, the more stress on the elbow joint due to pronation/hyperextension theories
-More sliders/cutters increase the chance of elbow injury (supinated release)
-More changeups/sinkers increase the chance of shoulder injury (pronated release)

Results: Shoulder Injuries

I identified 144 pitchers from 2008-2010 that suffered major shoulder injuries that fit the criteria that I set forth above. Using the nine-factor model above, no characteristic was statistically significant at the alpha = 0.05 level. None came close. This was surprising, as I personally thought that a higher release point (in relation to the body) would be weakly and positively correlated with increased risk for shoulder injury. This theory is shared by many educated rehabilitation specialists, and it wasn’t even close to being statistically significant in this model.

No other factors were close to a cut at alpha = 0.05 or even alpha = 0.10, so we cannot reject the null hypothesis based on this model.

Results: Elbow Injuries

I identified 114 pitchers from 2008-2010 that suffered major elbow injuries that fit the criteria that I set forth above. Using the nine-factor model above, BMI and Slider/Cutter % were statistically significant at the alpha = 0.10 level, while vertical release point variance was statistically significant at the alpha = 0.05 level! Using this more specific three-factor model, the r-squared was 0.11 and both Slider/Cutter % and vertical release point variance had p-values < 0.05, while BMI was still just below 0.10. The theories that a more varied vertical release point can lead to more elbow injuries may have some validity to it, as are the theories that increased use of sliders/cutters have the same detrimental effect. Increased BMI was weakly and negatively correlated with elbow injury - meaning if the effect is real, the bigger you are, the less likely you are to suffer an elbow injury.

Conclusion

While I know most people were hoping for a bit more exciting news, the truth is that proper use of regression analyses and statistics very rarely leads to these kinds of discoveries. (The joke in scientific research is that if you’ve made a conclusion, you’ve done something terribly wrong.) We’re simply not going to find a model of basic characteristics (height, weight, PITCHf/x values) that has an r-squared of 0.50 with p-values of each variable < 0.05, meaning that 50% of the variance is explained with a very low likelihood of being subject to chance alone. Generally an r-squared of 0.25 is desired to take action on the model, but there are some refinements to the data that may yet yield better (or at least different) results. There are a number of data-specific issues that can be drilled down (reducing the sample size but perhaps increasing the specificity of the sample - ligament damage specifically rather than just bone chips, for example) or better clarified (days on the DL does not control for injuries suffered at the end of a season where off-days are used to recover from the injury, for example). I plan on doing more research into this field and find it promising that we've discovered even a tenuous link between elbow injuries and some basic (though annoying to compile) variables. It certainly warrants further research, even if no conclusions can be made from the findings. It's possible that neural net analyses will turn up more interesting information on historical data, or that an in-game analysis tool can be built like Josh Kalk alluded to in his previously mentioned article.

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

George Purcell

14 years ago

(I’m Blue from the BMI thread on Fangraphs.)

Nice exploratory work; looks like you’ve found something interesting on elbow injuries and I wouldn’t discount an r-squared of .11 in something as complex as pitching injuries.

Kyle Boddy

George:

Thanks. It was your comments to Jeff’s FG article that made me think about running a larger regression analysis. I think with forward-looking data and as PITCHf/x data becomes more readily available (it’s only reasonable from 2007 and on from what I understand), we will get larger datasets to work with and more exploratory work can be done.

An r-squared of .11 is fairly interesting and encourages me to do further research; I agree. I just wanted to make sure people didn’t misinterpret my “findings” as is often done in this relatively complex field of regression!

Trip Somers

What data set did you use: all pitchers or just injured pitchers?

Only injured pitchers. The independent variable was total days lost.

It seems like the data could benefit from the inclusion of a modest control group. I don’t know how many it would take, but if you threw in 25-50 uninjured pitchers with the dependent variable for days lost set at 0, you’d be using a more complete data set.

Seems like your study was designed to measure how long someone is going to be out once they get hurt rather than to identify which independent variables may contribute to injury.

I thought about that, but what if pitchers in the control group are injured later in their careers? They no longer represent “healthy” pitchers anymore. And if we select for pitchers who were “healthy,” then we have an age bias (assuming we use an IP or service time cutoff).

So generally yes, the study was designed to measure severity of injury rather than propensity of injury, I suppose.

This study ignores injury history either way, except during this 3-year period. It ignores previous injuries, so why not ignore future injuries, too?

It does not ignore previous injuries. It’s the total number of days lost for those pitchers up to that point – I only have corrected PITCHf/x data for that three-year period. Sorry for the confusion.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG