The Magic and Mysticism of Baseball’s Projections

by Jack Moore
December 23, 2014

Michael Brantley vastly out-performed his 2014 projections. (via Erik Drost)

PECOTA. ZiPS. Steamer. Marcel. Oliver. Chone (RIP). CAIRO, MORPS, and BEANS. We turn to these projection systems every winter to paint the picture of the season to come. We use them to illuminate next year’s standings and All-Star teams, divine fantasy sleepers and busts, and reveal the hazy reasoning behind the hectic hot stove. The development of multiple freely available projection systems may be the the greatest contribution of sabermetrics to the baseball-watching public.

Building a better projection system is a tried-and-true track to a job in baseball. Nate Silver’s PECOTA spreadsheets wedged open the door to Baseball Prospectus. Sean Smith’s CHONE was ripped from the internet after the 2010 season when he found a job in a front office. Many of the systems listed above are proprietary, and fans, fantasy players, stats websites and even major league teams shell out cash yearly for access. The baseball community values anything it believes can deliver an accurate prediction for the coming season, a commitment it backs up in dollars and jobs.

Projections elevate sabermetricians to sabermagicians. Although much of sabermetric thought draws from the logical tradition, projection demands creation. No matter how much time is spent poring over historical statistics and deducing trends, projecting a hypothetical baseball season requires another significant step. Logic is solely deductive; it lacks the ability to create. Going from principle, theory and conjecture to real win totals and player statistics requires the projectionist to step outside the realm of logic and into the realm of mysticism.

The role of the projectionist bears a stark resemblance to the Mesopotamian baru, a priest-like figure whose rituals were a hugely important part of the decision-making process in their society. The baru practiced what we now call “divination” and would read the movement of drops of oil along a bowl, or the movement of smoke rising from incense, the patterns of hot candle wax dropping from the wick, or even the shapes and colors of the organs of sacrificed animals. The baru‘s ritual was always preceded by a prayer, in which the baru asked a god to reveal his intentions through the chosen medium.

As absurd as it sounds, the baru‘s rituals were treated with great seriousness and rigor. Historian J.J. Finkelstein wrote in a 1963 scholarly article titled “Mesopotamian Historiography,” “Sacrifices of animals were routine in the city temples, often being made on behalf of the king, who, on numerous occasions even offered the sacrificial animal on purpose. After slaughter, autopsies of the entrails were made as a matter of course, and a detailed record made of the findings against any contingency that might arise. Or, a clay model of some of the organs might be made in order to preserve the exact features of the original. The pathologists, so to speak, would consult their records in order to match up the current case with one in the past.”

Subtract the blood and guts and organs and you essentially have Nate Silver’s PECOTA development process. “Eventually, by stealing an hour or two a ta time during slow periods during the workday, and a few more while at home at night,” Silver wrote in The Signal and the Noise, “I developed a database consisting of more than 10,000 player-seasons…as well as an algorithm to compare any one player with another. It used a different method for comparing a set of players–what is technically known as a nearest neighbor analysis. It also considered a wider variety of factors–including things like a player’s height and weight, that are traditionally more in the domain of scouting.”

It may be odd to imagine Silver sifting through the digestive track of an expired sheep to project Dustin Pedroia as a perennial All-Star, or to imagine Dan Szymborski emerging from his computing cave with the ZiPS projections in the form of a clay model of a cow’s brain. But if we step aside from the sensational gore, an analysis of how the Mesopotamians approached this ritual reveals a strikingly similar intellectual tradition.

“The Mesopotamian form of learning known as ‘divination’ was rooted in, and is most characteristic of, the fundamental cognitive mode of the Mesopotamian intellect,” Finkelstein writes. “There probably has never been another civilization so single-mindedly bent on the accumulation of information, and on eschewing any generalization or enunciation of principles.” As Finkelstein explains, the Mesopotamian tradition fixated not on the uniqueness of historical events, as the Western world does, but rather on what he calls their “exemplicative value.” Nothing occurred by chance. Anything that can be observed could be known and understood empirically. For the Mesopotamians, Finkelstein writes, “Ultimate understanding of the universe would, in theory, require nothing but the painstaking accumulation of as much detail as possible about literally everything.” The Mesopotamians would have loved StatCast.

The result of the unceasing Mesopotamian attention to detail was the omen texts. “If a town is set on a hill,” one reads, “it will not be good for the dweller within that town.” Another, “If black ants are seen on the foundations which have been laid, that house will get built; the owner of that house will live to grow old.” And one that may have been warning us about Nate Silver himself, “If a fox runs into the public square, that town will be devastated.”

These omens read like religious proverbs, designed to serve as overarching social responses to certain situations. But this reading is overly simplistic, as Finkelstein explains. “A moment of time was apprehended and defined as the sum total of the occurrences and events known to be in temporal conjunction.” The omen isn’t making a suggestion that building a town on a hill may be dangerous. It is, rather, a historical retelling of how, in every case the writer of the omen has witnessed, the construction of towns on hills has ended poorly for residents. This is a limited view of history, to be sure, but it is history nonetheless.

“The greater the number of events noted for a single moment,” like the moment of a town being built upon a hill, or a fox running into the public square, or a 37-year-old kunckleballer posting a Cy Young season, Finkelstein writes, “the more refined and precise the prediction that could be based on them.” This is the exact philosophy Silver carried into his PECOTA project, and one every other projectionist has been mimicking for the past decade or so.

For us, with this philosophy, there is always the danger of injecting the self into the project, forcing the divination to tell us what we want to see. This was not an issue for the Mesopotamians. According to Finkelstein, “The universe of Mesopotamia was assuredly not a man-centered one; his mode of awareness drew his attention first to the external world, and secondarily, at best, to himself.” It is this hyper-awareness of surroundings and lack of self-awareness that allowed the Mesopotamian to see a clear future in nothing but a drop of oil or a trail of intestines.

Similarly, our own self-awareness allows us to see through the flaws in the divination process. Hyper-awareness of our own selves, of the many differing paths our single lives can take, forces us to run our simulations thousands of times before we are satisfied the effects of “luck” have been wrung out. Self-awareness forces us to see the thin or non-existent lines of causality tying the divination process to the events it is attempting to foresee. Even if it has been right before, we need a compelling causal reason why the divination is able to foretell the future before we can accept it. The Mesopotamian, with such a deep focus on the external world, instead requires a counter-example before disengaging from the belief.

As superior as our way may seem, the self can get in the way, especially when there is a desire to see the projection come to fruition. In the case of black box projections, the only rationale giving weight to the projection is the authority of a single statistician, organization, or “expert.” In others, statistics an individual or organization holds dear or even may be trying to sell may be emphasized while others are held back. Sometimes the explanation is too loaded with jargon for a layman to be reasonably expected to understand.

A Hardball Times Update

by RJ McDaniel

Goodbye for now.

While there should be no doubt that every projectionist has a process, if that process is not publicly available and accessible, we are left with nothing but their word that the projections are honest and not instructed by the personality or preferences of the projectionists. It is the explanation, the reason why the divination works and could be successfully repeated, that separates the successful projectionist from the baru digging for livers and gall bladders. And if we wouldn’t listen to the guts of a sheep we don’t understand, why would we listen to the guts of a system kept secret?

Thus far, there only seems to be one acceptable way for our projectionists to establish an authority for their method: lower the RMSE, or “root mean square error.” In an analysis of numerous projection systems at Baseball Prospectus this December, Rob Arthur found what I consider to be a shocking amount of agreement. Arthur points out that players who had unexpectedly explosive seasons in 2014, like Cleveland’s Michael Brantley or Detroit’s Victor Martinez, were not just missed by one or a few of the projection systems under the microscope. “The astonishing fact,” Arthur writes, “is that not one but all of the systems missed on these players.”

“While the details differ,” Arthur continues, “projections are by and large similar to each other.” If we were like the Mesopotamians, unaware of ourselves with unshakable external focus, perhaps this would be a point in our favor, an indication that we are on the right track, and as we continue to add information to the pile, our predictions will only improve. But we know better. We know there is a limit to what we can understand about the future. We know that it will be unique, and thus we know you can’t predict baseball (you just can’t!).

The value of an individual projection system, then, cannot be simply that it gets it right. Every form of divination is flawed. Any attempt to collect all information will inevitably fail–it will miss something that wasn’t recorded, something that wasn’t considered, perhaps even something that doesn’t exist yet. And no matter how logical the projection system is, this information hole at some point will reveal itself, and the world of projection and world of reality will from then on be fundamentally irreconcilable.

Projectionists, faced with a totally impossible task, should embrace the mystical side of the craft. A projectionist working towards increasing our knowledge of the game should feel less desire to slice decimal points off RMSEs and more desire to find something new within this beautifully massive set of data the projectionists work with to tell us about how the game is played and why teams and players succeed and fail. That Steamer says Steven Souza and Wil Myers will be equals in 2014 is far less interesting than the divination process that leads to this surprising, jarring conclusion, and if it can tell us something about why our assumptions are so far removed from the world Steamer has created for 2015.

What’s more interesting, a projection system that tells gets it slightly less wrong than the rest of the pack, or the one that finds another Jose Bautista or R.A. Dickey? Such a system would have to be based on fundamentally different ideas than those currently seeding the projection processes used today. It would have to be built, essentially, from nothing. And any such successful system would be the result of a number of failures in the process.

It sounds daunting, to be sure. But our current methods do little but add an incremental amount of information to assist with the task each year. The Mesopotamians would be proud. We, on the other hand, should be fully aware of the flaws in the process. Without embracing this mysticism and admitting it is part of how we build and think about our projections, we are nothing but barus digging for insight in the digital equivalent of sheep entrails.

References & Resources

Arthur, Rob. “Moonshot: The Power of Projections,” Baseball Prospectus.
Jaynes, Julian. The Origin of Consciousness in the Breakdown of the Bicameral Mind. Houghton Mifflin Harcourt. Kindle Edition.
Russell, Bertrand. Mysticism and Logic and Other Essays. Kindle Edition.
Finkelstein, J.J. Mesopotamian Historiography. Proceedings of the American Philosophical Society, Vol. 107, No. 6, Cuneiform Studies and the History of Civilization (Dec. 20, 1963), pp. 461-472

Jack Moore's work can be seen at VICE Sports and anywhere else you're willing to pay him to write. Buy his e-book.

40 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Dan Szymborski

10 years ago

To be honest, I find the tone of those whole piece to be quite insulting. It continually implies that there’s an unethical component to projecting. And it also, in a few places, shows a stunning ignorance of what projection systems actually are – that Michael Brantley had a 154 OPS+ in 2014 does not mean that his projection should have been considered the 50th percentile projection before the start of the season. That something happens does not mean that it was the likely event with the knowledge at the time. The idea that projection systems should have projected Brantley to be as likely to beat a 154 OPS+ as fall short is a ridiculous notion, some kind of creepy projection Calvinism.

Chris

10 years ago

Reply to Dan Szymborski

I have to agree. I’ve always thought that projections and scouting ought to (and practically must, in most front offices) go together hand in hand. Projections look at the numbers and tell you what you’re probably getting if everything stays the same. Scouting looks at a player and tells you what he might become. Prudence would dictate a proper consideration of both.

Jeff Long

10 years ago

Reply to Chris

Realistically teams (or particularly ambitious fantasy team owners) should look at projection models and then use scouting reports to identify if a player is likely to beat their 50th percentile projection or not. For example, Brantley’s 2014 season WAS in the results from ZiPS or PECOTA, you just had to move up into much higher percentiles to find it. Now if you have scouting info that says, “hey he changed his swing to be generate more power” or whatever, then you start moving up the scale towards a higher percentile in the projection.

By combining the scouting data with projection models, you can (relatively easily) identify areas where the mean/median of the projection is too kind or too harsh.

owenpoin

10 years ago

Reply to Dan Szymborski

So, this post does have an old-school romance about our inability to predict the game, but I don’t get the torrent of criticism. Jack clearly thinks that ancient mysticism is cool (and so do I!). Jack sees a basic similarity in ancient divination and modern projection systems (sure, same basic mission). He grasps at some recent outliers (Dickey, Brantley, Bautista) to play up the place for magic in our modern, data-laden world. To this I would point him toward the seventh principle of Huna (Hawaiian mysticism): Pono, generally translated as “effectiveness is the measure of truth”. Ancient divination is super fun to think about, but effective? Probably not enough to win your fantasy league. And yes, Jack does play up something of a false equivalency between systems.

TL/DR: You can all chill out.

Matt

10 years ago

This is brutal. I can see The LA Times printing a piece that doesn’t understand how projections work or what purpose they serve, but THT?

Zach

10 years ago

This is a really, really brutal piece for THT to have published. Not really sure how this got past whatever editorial review is done here.

Jack seems to miss the entire point of what projection systems are for, and is instead writing some bizarre piece about divination and what HE WANTS projections to be. Qualitative scouting would seem to be a much better comparison to divination than projection systems.

I don’t WANT my projection system to find the next Jose Bautista or RA Dickey. If it projects seasons like that from guys out of nowhere, that’s a BUG, not a feature, and something that should be FIXED.

Vince

10 years ago

I’m not as surprised as some by this – disappointed, but not surprised – because I play in a fairly high-level sim league primarily populated with guys that understand how this all works, but every year as ZiPS come out we hear stuff like “I don’t care what other 27-year old lefthanded hitting second basemen with a similar profile have done in the past, I want to know what Jason Kipnis is gonna do.” Or “How can my small-sample RP not project well? He had a 1.50 ERA in 24 innings last year! Don’t tell me about his lousy K and BB rates, or his unimpressive minor-league resume – he had a 1.50 ERA! WHAT DO YOU NOT UNDERSTAND?!?!?!?

But yeah – this is THT. How the heck did this get here?

PS to the projection guys – Please keep whittling away at the RMS. We want accuracy, not Gotcha!

Rawson

10 years ago

This explains a lot: “Jack Moore’s work can be seen at VICE Sports and anywhere else you’re willing to pay him to write. Buy his e-book.”

Luis

10 years ago

This article is a complete disaster. Dan has every right to be offended by it.

This piece demonstrates a catastrophic lack of understanding about how projection systems work. The criticisms levied here either confirm what we already know about the deficiencies in the systems, or they just miss the mark entirely.

To cite mysticism on a website that is driven by objective, fact-based driven analysis is inexcusable. Furthermore, the Mesopotamia analogy was incredibly drawn out and flat-out bizarre. It served little purpose other than for Mr. Moore to show off his knowledge on the subject. How did this get past the editors?

Projections are not predictions. They are a measure of true talent level. Of course, players outperform and underperform their skills all the time. However, these systems are the most logical method in evaluating players and teams.

A projection system cannot find a Brantley or V-Mart, nor should it. To do so would be a bug, not a feature. The only method for accomplishing this is good old fashioned scouting. It’s a necessary component for fully evaluating a player.

I love THT, and will continue to return. That being said, Dan and any other projectionists are owed an apology.

Psy Jung

10 years ago

Reply to Luis

Why isn’t he allowed to cite mysticism? Is citing something equivalent to endorsing it, just as some people claim that depicting violence in art is equivalent to condoning it? I think it served a clearly defined purpose: the Mesopotamian methods and knowledge are fundamentally different and mistaken about the nature of causality, but they implicitly recognize the world as causal, and just as their methods were limited by their knowledge so are ours.

You say that projection systems by nature shouldn’t project breakouts like Bautista’s (using him because he’s an example of true true talent change, not just fluctuation) but if you came up with a method that let you predict variations from our regressionary model of projection… isn’t that just a better projection system? The limits of our knowledge aren’t the limits of what projection systems can be.

Also, and I think this is really important, scouting is just a different form of projection system.

Blue

10 years ago

Reply to Luis

The hell they aren’t. A projection IS a prediction and should be judged the way other predictions are: through the use of the adjusted R-squared statistic.

Jack Moore

10 years ago

I honestly have no idea where this idea that I think the projections are “unethical” are coming from. I think the Mesopotamian intellectual tradition is one of the most fascinating I’ve ever read about. They were voracious learners and data collectors, and I’m sure as hell not here to judge a 6,000 year old civilization for cutting open a few sheep. They were operating from a knowledge base of practically nothing and still managed to grow and build what was one of the early world’s largest and most brilliant civilizations. Seeing a comparison to these people as an insult shows, in my mind, a lack of curiosity. By all means, feel free not to consider the similarities. But I think it’s a missed opportunity to learn and grow.

Jack Moore

10 years ago

Reply to Jack Moore

For further reading on this I highly recommend the first essay in this Bertrand Russell collection. It’s free on Kindle http://www.amazon.com/Mysticism-Logic-Essays-Bertrand-Russell-ebook/dp/B004TRPS52/ref=sr_1_4?ie=UTF8&qid=1419363664&sr=8-4&keywords=bertrand+russell

Rick

10 years ago

Good read, don’t understand the criticism of it.

I tend to like the Gladwell style story telling that looks deeper into conventional wisdom

Paul Swydanmember

10 years ago

For my part, I will say this:

It is our goal to create thought-provoking baseball content. As such, sometimes articles will stray from what we consider our collective wisdom. I think that’s a good thing. However, in trying to achieve that goal, not all of those pieces are going to strike a chord with our readers, and we understand that, but we always want to leave the door open for new thoughts and ideas.

I also want to thank those who commented for keeping their comments civil.

Zach

10 years ago

Reply to Paul Swydan

Paul-

I don’t think this article is “straying from conventional wisdom.” I really, really like pieces/research that stray(s) from conventional wisdom. Those pieces are where you learn new and exciting things even if they seem offputting to you at first or even for a long time afterwards – see DIPS.

This piece doesn’t do that. Jack appears to be working from a totally uniformed or ill-conceived notion of what projections are for and what they are working toward. What he’s looking for,

A projectionist working towards increasing our knowledge of the game should feel less desire to slice decimal points off RMSEs and more desire to find something new within this beautifully massive set of data the projectionists work with to tell us about how the game is played and why teams and players succeed and fail.

is, I suppose, quite interesting. But I have no idea why he’s looking for this from PROJECTIONISTS. Those developing projections are working toward one thing. Jack’s asking for something totally different. Why is he asking those people for that thing? There’s no connection at all – he just talks for a while about the Mesopotamian intellectual tradition and then, for some reason, picks, seemingly haphazardly, projectionists as the people he’s tasking to take this on. This seems to be saying that projectionists are those who strive to understand everything that happens in baseball (???????).

Perhaps talking to a projectionist, seeing what they think their role or job or mission is, and grounding this piece in something like that would have helped a little.

Zach

10 years ago

Reply to Zach

Collective, not conventional.

Miguel

10 years ago

Reply to Zach

otm

Fundamental difference between understanding and challenging the status quo and just missing the entire thing.

ThePuck

10 years ago

Reply to Paul Swydan

‘we always want to leave the door open for new thoughts and ideas.’ That is a great goal to have, Mr. Swydan. Unfortunately, this article doesn’t do that. It does the exact opposite. It tries to close the door on new thoughts and ideas by trying to keep people in the past. There aren’t any new ideas or new thoughts in this. This article falls under the ‘contrary’ label. Contrary to the standard level of intelligent thinking this site and sites like this normally have.

J. Cross

10 years ago

The baru stuff makes this projecting business sound so much more exciting than it is. We basically try to estimate true talent and then stick in some aging and park adjustments.

I think the end goal of minimizing RMSE is actually a good way of keeping yourself grounded in your pursuit of new insights in the mess of data. It’s the only way to make sure that you’re not just looking at entrails and adding noise to Marcel.

Psy Jung

10 years ago

Reply to J. Cross

I think the criticisms of this piece are somewhat knee-jerkish, probably because the usual tenor of pieces that probe projection systems is reductive, which in combination with the references to superstitious rituals makes it easy to project onto it a Dan O-Shaughnessy-style interpretation.

BUT
if Mr. Moore will permit me to act as his interpreter, what I got from the piece was a look at the meta-process of projection. Which, as the above comments show, we seem to take for granted and yet which relies on certain assumptions about the physical universe, particularly the foundational idea of causality, or that the physical world contains within itself its own articulation, and that by observing the current state of its constituent elements we can create generalizations of its future states. Frankly, the idea that projection systems should give us this kind of median assumption and not project outliers is a false one because it ignores the implicit universality of this fundamental assumption (i.e. that given the correct data we should be able to project Michael Brantley’s breakout because he is a part of the physical world, not the manifestation of a divine whim) and takes the limits of our knowledge to be the absolute limits of the process of projection. As Szymborski notes above, it’s absolutely true that given our current knowledge it’s absurd to project Michael Brantley for a 154 OPS+. I don’t think the piece is criticizing current projection systems for not doing so. It’s more of a recognition of these same epistemological limits that Szymborski acknowledges, and the fact that beyond them lies a projection system that can account for deviations that our current projection systems fail to pick up on.

The references to Mesopotamian clearly aren’t equivalencies in either process or results – I think Moore makes this clear enough – but an equivalence in terms of the spirit of looking at the physical world as a self-contained, deterministic system. And as we take for granted our current methods that Mesopotamian society could not even conceive of, the future of projections could be such that we’re unable to conceive of them.

Psy Jung

10 years ago

Reply to Psy Jung

Sorry Jared Cross, that wasn’t supposed to be a reply to your comment!

Kenny

10 years ago

Reply to Psy Jung

So maybe somebody should come up with a projection system that isn’t premised on the universe being governed by causal laws? Is that the big idea?

Now I know I’ve been had. This is parody.

Kenny

10 years ago

I was sure this was a parody until I read the comments. Just to point out the obvious: Moore’s comments about the great gulf between logic and mysticism absurdly entail that *all* forms of prediction are tantamount to divination, whether they be about whether the sun will rise, wax will burn, baseball’s will eventually fall to earth, or Cole Hamels will have 3.00 BB/9. The whole argument is an amateurish mangling of simple philosophical ideas. Also, just on an ostensibly non-existent field: http://plato.stanford.edu/entries/logic-inductive/

Phil Stewart

10 years ago

“Logic is solely deductive; it lacks the ability to create. ”
Logic encompasses deduction, induction, and abduction. Charles Peirce (pronounced like “purse”), the American pragmatist philosopher and founder of semiotics (or semiology, depending on what you like to call it — think of the kind of work Umberto Eco does), introduced abduction as “guessing.” I’m not convinced that deductive inference lacks the ability to create, but I’m in the minority on this, and I think your basic point is perfectly reasonable. Abductive inference is actually pretty interesting and has received attention from philosophers and more recently, artificial intelligence researchers (which is only philosophy in an archaic sense, but I’m not into treating these fields as radically distinct.)

There’s more to dig into in your article, and it will take some time to absorb. Predicting individual performance is, however, a way of predicting how a complex system (a human being) will perform in the context of an adversarial game (baseball) that is itself a complex system embedded into other complex systems (economic, environmental, etc.). The “information hole” you speak of seems like it could take a variety of forms.

I think back to games of baseball I’ve played in, and what was truly wonderful about playing in them? Was it the perfection of the field and the regularity of the fences and the standardization of competition, predictability of hops enabling only the best to compete successfully? Or was it the asymmetry and idiosyncrasy of the fields, of the land we played on, the buildings one could hit home runs over, the leaves the line drive bounding over the center fielder’s head could get lost in, the eddies of air currents that drew a ball in a trajectory that enabled one to get to it in time, the presence of rust on the fence dissuading outfielders from crashing into it, the parking lot to left field or to straightaway center that could take a long home run and bounce it onto a gas station beyond? Some of those affected the outcomes of plays, and the outcomes of games. They affected how players tailored their swings sometimes.

We play this game of statistics to try to predict outcomes, but isn’t a lot of the enjoyment in the game just not measurable, not assignable with a single number, and doesn’t it live exactly where that information hole is? Isn’t a lot of the fun of baseball not a matter of who wins and who loses, whether an out was recorded, etc., but right in the information hole about how a human being without time to think about it enters that zone of liminality where thought cannot happen, discovers some way to get to a ball and haul it in, or in an incomprehensibly short instant brings bat to ball perfectly, and like a chess player finding a place in game-space never recorded in “the book” before, somehow makes a play (think about some of those plays Jose Iglesias has made, think of Fred Lynn, think of Jackie Bradley).

Projections — they’re like throws, projectiles coursing through uncertainly composed air, on an uncertain path, and if projected just so, make one hop into an infielder’s glove, and sometimes that throw is perfect. So there’s skill (draw a throw cross-seam, around a runner, almost impossibly?) — what fun would it be if it were really predictable, what its result would be? Yeah, it feels like there has to be something mystical in it, there has to be an information hole, and the composition of forces determining any outcome are outside of anything knowable. Phenomena that feel like noumena.

Isn’t the game of projection a little bit like the game it describes?

Psy Jung

10 years ago

That’s not what I wrote – I meant that because projection systems rely on this causal assumption, given enough data you should be able to project things like breakouts. Those data wouldn’t necessarily be on the field stuff. Theoretically if you had infinite data you’d even be able to predict random fluctuations in performance – what we call randomness is really just causality that is impossible to account for. The article is a sorta clarion call to not settle on the current model of projections and to keep trying to find a way to account for things that our models don’t address, not the hackjob it’s been labelled as. I think it has a really clear view of the philosophical nature of projection, the sort of fundamental stuff that lets you beyond the current limitations of knowledge and imagine something new – y’know, kinda like early sabermetrics?

Psy Jung

10 years ago

Meant as a reply to Kenny.

Phil Stewart

10 years ago

While we are in the realm of meta-theory, here’s a little more.

Psy Jung said, “Theoretically if you had infinite data you’d even be able to predict random fluctuations in performance – what we call randomness is really just causality that is impossible to account for.”

That is, theoretically according to one particular theory. It’s a version of “LaPlace’s demon,” and the trick in it is the recourse to “infinite data.” Arbitrarily fine-grained pictures of a dynamical system still fall prey to “chaos,” kind of a mathematical sales word for nonlinear systems’ departure from predictability — which is exponential. (See Lyapunov Characteristic Exponent, LCE). Wholly causal, deterministic systems fall prey to this.

So, there is a shelf life for predictions.

The theory of perfect prediction from perfect data arises in a physical system, and however conceptual, *as* a physical system, subject to the laws of the system it arose in (the peculiarly evolved human brain, its cultural context). It wasn’t such a bad theory, in the sense that it turned out that it could be tested and found in some instances to be false (so it fulfills Karl Popper’s “falsification criterion” of a scientific theory).

What’s interesting in Mr. Moore’s article is its twin appeal to unpredictability and call for new theories of prediction (or projection, whichever term you prefer). There’s some ontological tension in that! It seems to be saying two kinds of things: Regression-based prediction is only valuable in a general sense, so individual players and situations will still give us big surprises; and, given the similarity among these projection systems’ results for outliers or surprises (like Michael Brantley), there is room for theoretical innovation to take place. Presumably that innovation will come from outside the (linear? nonlinear?) regression framework.

This framework, to the extent that it is fundamentally correlative (regression is mathematically related to correlation) is without commitments to causation. (corrections welcome! 🙂 ) Bringing in personal attributes like height and weight, as you report for Nate Silver, brings in at least proxies for causation, so I’m going to hesitate to commit myself further than this, but you can already see in this the kernel of a move beyond a correlation-based method.

If you’re looking for player attributes that enter into causal models, then you can start to look at physiological measures of reflexes, eyesight, coordination, short-term memory span, etc., and personality indices like the Minnesota Multiphasic Personality Inventory (if you can validate its relevance to acquisition of high-level baseball skills). Validity is not absolute: it’s domain-specific, by definition. So by validity I mean, “does this work as a way to project baseball player performance?”

The big problem for these extensions of projection systems is the fundamentally proprietary (and private) nature of these data. It might be an interesting computer vision problem to see if you can find proxies for reflexes in gameplay video, but is that any better than a scout hearing a pitcher has caught X number of line drives passing through his position?

Whatever the provocative nature of Mr. Moore’s talk about the Mesopotamians and their entrails, there’s a restlessness for new theory in his piece here. He suggests, at least in their failure to anticipate breakout seasons, a hint of similar assumptions (or shared datasets) that present models share — short of a monoculture, but — open to diversification?

For more on the merits of diversification for science, there’s a great essay I can recommend to you. Fans of “neural” networks (who may be using versions of them to build their models) will recognize the computational theory Paul Churchland uses to develop a rich, quantitative metatheory of his own:

“A Deeper Unity: Some Feyerabendian Themes in Neurocomputational Form,” by Paul M. Churchland, anthologized in _On the Contrary: Critical Essays, 1987-1997_ by Paul M. Churchland and Patricia Smith Churchland. MIT Press (1998)

Calvin Liu

10 years ago

What I got from the article was that all of the present projection systems are based on the same fundamental methodology as the baru: the accumulation of previous facts to predict future behavior in similar situations.

Is this an incorrect understanding?

The second point I got from the article is that any system which purports to be able to identify outliers (breakouts, breakdowns) would have to use a fundamentally different underlying basis.

Again, is this incorrect?

Now, Mr. Szymborski complains that the projection systems aren’t intended to predict what happens, but what is “likely” to happen. While I understand that distinction – frankly this is a bit of sophistry. A system which “projects” likely performance making all sorts of assumptions, but still intends to predict – only in a specific limited context.

Not to be nasty – but how is that different than a diviner who after making a prediction based on feng shui or whatever – replies that the actual events that occur later were because of different circumstances? The feng shui diviner whose client ends up getting hit by a jet engine that detached from an overflying jet could legitimately say that such events are not covered by the I Ching, but the reality is still that there was a jet route overhead.

Again, not complaining about the projection products. Simply noting that defensive restrictions to environment in order to protect “accuracy” are pointless and equally not meaningful to most people anyway.

charlie

10 years ago

I like the discussion…
can we predict with any reliable means what will transpire over the course of 162 games?
Whatever methods we choose to Believe In/ Put Faith In really only serves to justify our deep connection to this childs game of ball.
I find it easy to read all the hard work of the folks who like number crunching (big thank you), but know if I want any edge over my league mates I need to look further in predicting the season. There is evidence to be discovered beyond what the numbers tell us. Most, not all, of these predictors are drawn from past performance. Something in the environment that gives a certain result an advantage over the course of the season.
For this ‘environmental’ influence as a predictor I look to a variety of lesser factors that will influence performance. For example; is the player in question being asked to adjust their role? (see, Joe Mauer) or; is a ‘fresh start’ advantagious (see, Edinson Volquez). Certainly there are numerous lessers factors to consider, but as I wrote of early… our biases are based on how we relate to this game. I prefer to look into areas that suggest the mental aspect, and don’t hesitate to base my final decisions on such subjective predictors.

charlie

10 years ago

Reply to charlie

I only feel qualified to submit an entry to this discussion because I selected Brantley in a keeper league draft (#259). I don’t believe I reached to make this selection, so I don’t pretend the year Brantley had proves my methods. Case in point… Carlos Quentin who I drafted #309.

So it goes.

ThePuck

10 years ago

This could very well be the worst article ever linked to the Fangraphs site. I’ve already seen people posting quotes from this rubbish other places and saying Fangraphs endorses this line of thinking, which is sad. This is something Harold Reynolds would write, if it wasn’t for all the big words.

Adrock

10 years ago

Reply to ThePuck

I liked the article, and am in the same boat as those who don’t quite understand the degree of vitriol in some of the comments. I think Calvin Liu’s interpretation is a fair one. The article accepts that the projection systems are good at what they do, but what they do is limited by the common assumptions shared by the projectors.

Dan’s complaint suggests that no projection systems will ever pick up on the big breakouts, and that it’s impossible. I think ZIPS and STEAMER et al are great for what they do, but this article hints at the models’ fundamental limitations.

Most writing about breakouts takes projections into account and builds on them; that seems to be the best that can be done right now. I think Mr. Moore is dreaming bigger dreams on the projectors’ behalf…

ThePuck

10 years ago

Reply to Adrock

The problem is that this is all based on the idea that projections are predictions. Today, Fangraphs did an article about the American league projections and asked fans to see how they felt about them. Not one AL team had a projection of more than 88 wins and the one team that had 88 was Seattle. Does anyone actually believe the people who make the projections are predicting that no team will more than 88 games? When was the last time the AL win leader only had 88 wins? The projectionists understand there’s more to the true outcome of wins and losses than what the projections say. They don’t try to make the projections into any more than what they are. They don’t try and make them out to be predictions.

Jack Moore wants the projections to somehow be predictions and, since they aren’t, he wants to slam them and he also questions the integrity of the people who make them, which is even worse.

Truth is, overall, last years projections were very close to what happened overall. The breakdown can be read here: http://www.hardballtimes.com/evaluating-the-2014-projection-systems/

Mike S.

10 years ago

This is like getting mad at the dice when you roll a 12, when the most likely result would have been 7.

Any projection system that tries to predict low-probability results will get the occasional spectacular success, but still fail most of the time.

charlie

10 years ago

Reply to Mike S.

Now here is most well thoughtout response. And it only took him two sentences.

charlie

10 years ago

Reply to Mike S.

AND

written by a veteran Strato-Matic player I’m guessing.

Marty

10 years ago

“A projection system cannot find a Brantley or V-Mart, nor should it. To do so would be a bug, not a feature. The only method for accomplishing this is good old fashioned scouting. It’s a necessary component for fully evaluating a player.”

I agree. For a good example of this sort of divination-by-scouting, see last winter’s Fangraphs piece on J.D. Martinez.

http://www.fangraphs.com/blogs/rule-5-dark-horse-j-d-martinez/

10 years ago

Reply to Marty

Sweet example. I guess the items that can’t be distilled into projection model inputs should be called “entrails” in light of this article lol.

In all seriousness, if your goal is to guess better than the pack, whether as a GM or a fantasy player, as more information becomes readily available, the greater the marginal value of using these “entrails” as the basis for decision making (see Jeff Long’s comment near the top of the comment section)

Blue

10 years ago

RMSE is NOT the right metric to evaluate a projection–and that is exactly what these systems provide, projections. One MAJOR reason for this is that the R-squared statistic shows the proportion of unexplained variation in the model, which RMSE does not communicate.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG