Bayes’ Theorem and prospect valuation

by Victor Wang
August 14, 2008

Bayes’ Theorem, named after Thomas Bayes, is a way to determine posterior probabilities after being given a set of prior and conditional probabilities. It has been used before in past baseball analysis. As shown in the linked article, it is useful to use Bayes’ Theorem to “update” a player’s projection since we have prior information, the player’s preseason projection, and we have the conditional information of the plate appearances that a player has accumulated during the season.

Most projection systems take a weighted average of a player’s past stats adjusted for context and regress to the mean, and then make an age adjustment. When we regress to the mean, we are assuming that the distribution of player talent is normally distributed. However, if talent is not distributed normally, then regressing to the mean is incorrect and we would want to use Bayesian analysis instead. Coming up with the actual talent distribution for major leaguers is tricky but some research has shown that major league talent approximates a normal distribution.

While major league talent may be close to a normal distribution, we know that minor league talent isn’t normally distributed when it comes to future major league production. My research has shown that most top 100 prospects eventually become either supporting players or busts, while fewer become everyday players and even fewer become stars.

We can use this prior information of how top prospects perform with Bayes’ Theorem and minor league statistics to come up with individual player values. One of the complaints I have gotten when using my prospect value rankings is that the system is too macro and it needs to incorporate more individual prospect information. Well, we can do this using prior probabilities (prospect group distributions) and conditional information (a player’s minor league equivalency (MLE)). Here is how we can do this:

{exp:list_maker}Using Mitchel Litchman’s hitting MLEs and Dan Fox’s SFR fielding stats, determine a minor leaguer’s WAR (Wins Above Replacement).
Convert that player’s WAR into an equivalent wOBA (http://www.insidethebook.com/woba.shtml) so we can use binomial probability.
Determine a prospect’s player ranking.
Using the group from that prospect’s ranking, convert each subsection’s (bust, contributor, everyday player, star) WAB into an equivalent wOBA. Prospect value rankings and prospect groups refer to the tables in my Johan Santana article (http://www.hardballtimes.com/main/article/the-bright-side-of-losing-santana/).
Using a binomial probability model, find the chances that a prospect would produce his WAR given that he had a true talent of each subsection. In other words, find the chances a prospect would perform the way he did if his true talent were that of a bust, a contributor, an everyday player, and a star.
Multiply those probabilities by the chance that a prospect from his group actually is a bust, a contributor, an everyday player and a star.
Use those probabilities to find a prospect’s true talent and calculate his surplus value. {/exp:list_maker}
Here is an example of this process using Andy LaRoche and his statistics coming into 2008:

{exp:list_maker}Litchman’s MLEs (using his linear weights) and Fox’s SFR have LaRoche as a 4.2 WAR over 1,056 plate appearances from 2005-2007. Using a .338 wOBA as major league average, Laroche’s equivalent wOBA is .349.
Coming into 2008, Laroche was rated as the No. 31 prospect overall by Baseball America, the No. 14 prospect overall by Kevin Goldstein, and the No. 22 prospect overall by Deric McKamey. Taking an average, we get Laroche as a No. 22 prospect and in the 11-25 prospect ranking group.
Using the 11-25 prospect rankings, I assumed each groups WAB/year was equivalent to WAR/625 PA. I then got these as equivalent wOBAs: .307 for the bust group, .321 for the contributor group, .355 for the everyday player group, and .388 for the star group.
Using Laroche’s wOBA, I found the chances that he would produce a .349 wOBA in 1,056 PA if he had a true talent of a bust, a contributor, an everyday player, and a star. There is a .033 percent chance of him producing a .349 wOBA in 1,056 plate appearances if he had a true talent of a bust, a 0.374 percent chance if he was a contributor, a 2.4 percent chance if he was an everyday player, and a .09 percent chance if he was a star.
Next, we multiply those probabilities by the probability of a prospect from the 11-25 group becoming a player in each respective group. So for the bust category, we take the .033% chance and multiply that by 21.4 percent, for the contributor we multiply .374 percent by 50 percent, and we do the same for the next two groups.
The cool part of using Bayes is that not only can we find the mean of a player’s talent, we can see the outcome of ranges. After the multiplication of the probabilities and adjusting them so they add up to 1, we find that there is a 1.1 percent chance Laroche becomes a bust, 27.4 percent chance of a contributor, 70.4 percent chance of an everyday player, and 1.1 percent chance of being a star. {/exp:list_maker}
So given LaRoche’s minor league track record, he has a high probability of becoming an everyday player but a low chance of being a star. While this type of prospect might not seem too valuable, an everyday player cost controlled for six years is immensely valuable as Laroche’s surplus value using this Bayesian analysis coming into 2008 was $40 million. Making some basic assumptions about what Laroche could be expected to be paid, PECOTA had Laroche worth around $50 million in surplus value coming into 2008. So it’s good to see that the Bayesian system has a similar rating with PECOTA. Despite this, there are still some weaknesses with this model:

{exp:list_maker}Reliability of MLEs: Despite what Bill James may have said, research by Mitchel Litchman has shown that MLEs are not as reliable as major league stats. One way to counter this would be to weight a player’s MLE plate appearances less. For example, maybe you could count Triple-A plate appearances by .8 and Double-A plate appearances by .7. This would give MLEs less weight and more weight to the prior probabilities.
Ranking a prospect: This can be arguably the trickiest part and is the most subjective. However, for top prospects there is generally a decent consensus.
Selling short a player’s star and bust potential: Basically using Bayesian analysis makes an assumption that the better a player performs, the greater the chance he has of becoming a star. This may be true in some cases, but sometimes a 21-year-old outperformed by a 24-year-old may actually have a greater chance of being a star player. If the 24-year-old has superior MLEs, this model would show that he has a higher chance of becoming a star than the 21-year-old. In other words, prospects still have a lot of growth in them. One solution could be to break the prior probabilities down by age group. {/exp:list_maker}
I hope to keep building on this type of model for prospect valuation. Right now I would definitely recommend using the basic prospect valuations based on a prospect’s ranking for prospects below Double-A. I would also love to hear comments on people’s thoughts of this kind of model and suggestions for improvements.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG