How Teams Can Get the Most Out of Analytics

Even the most sabermetric-based teams could get even more from the available data. (via Scott Slingsby)

Even the most sabermetric-based teams could get even more from the available data. (via Scott Slingsby)

Statistics and numbers have always permeated baseball. From its earliest days, observers of the sport have attempted to catalog the performance of the game in various ways using numbers. Those efforts have evolved from merely trying to describe or recreate the game numerically to understanding, explaining and predicting performance on the field. We can safely say that the transition to an era where virtually every front office looks to data and advanced analytical methods for some kind of advantage (or, at the very least, to keep up with the competition) is complete. Whether it is a department of one or an army of analysts, every major league team has individuals focused on collecting and statistically analyzing data about player performance.

A decade ago, when only a handful of teams were serious about gaining an advantage through thoughtful analysis data, it was easy to see how the few could identify and exploit market inefficiencies. But in a world where everyone is on the prowl for these inefficiencies—small and large market teams, alike, leveraging largely the same data sources—is there really any advantage left for teams to gain from data? Or has it become mere tables stakes; there may not be marginal gains from the practice, but opting out means putting your team at a structural disadvantage?

I would argue that there are still marginal gains to be made, and there are data that suggest the same. Ben Baumer and Andrew Zimbalist looked at whether the competitive balance across major league baseball had improved since the growth and spread of analytics. Interestingly, they found that whatever tightening of competitiveness had been achieved was likely more a function of changes to revenue sharing. So even if all teams are leveraging data today, they aren’t all getting the same return from it.

Data can be powerful, but it would be a mistake to assume that simply having data will bring about an advantage, or even the same advantage. Data alone can’t improve performance. Moving from data to insight to change is no easy task, and teams have to overcome plenty of barriers along the way.

The Four Foundations of a Data-driven Organization

So what does it take for a team to get the most out of data? In my experience, leveraging data effectively requires organizational alignment and optimization among four key areas: metrics/insights, data/systems, people and culture. This is true for any organization—a major league team, a government agency, a toy manufacturer, you name it. Each area is dependent on the others and each can act as a failure point or barrier. I know that sounds like a bunch of consultant speak, so let’s break it down more practically:

1. Right metrics and insights. Make no mistake, the quality of metrics and insights generated from those metrics matters a great deal. You need metrics that are predictive, not just covariates of the things that matter (in the larger sense that if they change, some other variables will change, like revenue, profit, or run scoring), reliable (meaning, their correlational or predictive power will not fade over the short, medium, and ideally long-term), and easily used by all sorts of actors and decision-makers (leaders in baseball operations, coaches, and even players).

Metrics come in three flavors, each valuable in their own way: descriptive tells you what has happened, diagnostic tells you why something happened and provides hints on how to fix and optimize, and predictive tells you what will happen. These metrics are also generally the most reliable over time, and actionable for decision-makers and front-line employees.

2. Right data and systems. For organizations to fully apply their data-driven insights, they must have the right infrastructure in place. The right metrics are based on “good” data (i.e. high fidelity, low latency, relevant data) and analysis is aided by efficient systems that make it easy to collect, merge, analyze and share the data and resulting insights. Usefulness in the field is based on systems that allow for the efficient organization, processing and timely reporting of relevant metrics. How data and insights are consumed is an area ripe for failure: as in advertising, the best messaging is meaningless if people don’t see it or can’t easily consume it.

3. Right people. Data can’t change anything by itself. Individuals need to do something (ideally, the right thing) with the numbers. Organizations need people throughout the hierarchy who understand the right way to use the data at their disposal, as well as the inherent limitations of data.

This idea applies equally to an organization’s analytics team, decision-makers, and all other employees whose behavior determines how the organization performs. Another fundamental need is employees who can deliver on the actions and behaviors the data and models suggest are optimal. One of the most common failure points I have come across is a lack of the appropriate talent and skill required to execute.

4. Right Culture. Finally, it’s critical how an organization works, how it gets done what it needs to get done. If the organization does not encourage being data-driven, if it incentivizes the wrong behaviors or wrong metrics, it won’t get the most out of its data. This, then, is fundamentally about organizational culture and making sure that culture is aligned with how the organization wants to conduct itself. Often, I have found significant misalignment between what an organization’s analytics program says to focus on and what is actually incentivized (both formally through pay plans as well as informally through expectations and mentorship). The best modeling in the world can often be stifled by an unreceptive culture.

How the Four Foundations (Might) Play Out in Major League Baseball

Okay, so we have a framework to organize our thoughts around what it takes for any organization to get the most out of data and analytics. Let’s see how each (might) play out in a major league team setting. I’ll deal with the first two today, and the last two tomorrow.

Right Metrics and Insights

You can put the kinds of metrics a team–or any organization–can focus on, into three buckets:

  • Descriptive analytics: Data that describe what has happened or is happening — essentially, reporting. In baseball, think “the back of the baseball card.”
  • Diagnostic analytics: Data that show a potential relationship between two or more variables and help explain why something happened.
  • Predictive analytics: Data that are used to predict future events.

These three categories also frame the kind of work an analytics staff can spend time on. Organizations outside of baseball tend to spend a great deal of time and effort creating reports and summaries that do not actually support their decision-making with data-driven insights. They stare at reams of descriptive reports telling them what has happened (e.g. Tommy had the highest sales this quarter, Susie worked the most hours last month, etc.). This isn’t to say that reporting isn’t useful, but there is a danger in relying too much on descriptive data that describe what has happened instead of suggesting what will happen in the future. It’s like trying to drive by primarily looking at your rear-view mirror.

A Hardball Times Update
Goodbye for now.

Metrics and insights are probably the most developed of the four foundations across major league baseball. While there is still some variance in the quality of metrics that teams focus on, most teams have moved beyond “baseball card” numbers and have fixed their gaze on metrics that do a better job of diagnosing and predicting performance at both the individual player and team level.

It has been many years since teams relied on counting statistics, such as home runs and runs batted in, and more simplistic rate statistics, such as batting average. Teams have moved on to at least focusing their attention on a position player’s on-base percentage and slugging rate, which taken together do a much better job of predicting team offensive performance and are  more stable at the individual level, year over year. Similarly, many teams now pay more attention to secondary metrics for pitchers–such as their walk and strikeout rates–and less attention to their win-loss records.

An analytics department in a major league front office is asked to tackle many things. For example:

  • Build and maintain baseball information systems—databases where various types of quantitative and qualitative information live and are made accessible to others
  • Produce reports for front office decision-makers, scouts, managers, coaches and players
  • Create long-term projections, forecasts, and scenarios at the player, team and league levels
  • Conduct transactional analysis for decision makers—determine whether a player is worth trading for given his current and likely future performance
  • Conduct exploratory analysis/new research

There is a lot of potential work hidden within each bullet point, and potentially some overlap. Every team is going to distribute its time among these five activities a little differently. Partly, it depends on the size of the staff, which will determine what types of projects can get done given time limitations. Long-term projections certainly fall into the predictive analytics bucket, but there is arguably the same or greater demand for diagnostic analytics that help explain why players are performing a certain way and how to improve that performance. This represents probably the greatest area of opportunity from a metrics and insight standpoint—helping players and coaches diagnose their performance and prepare for their upcoming opponents.

For example, much work has been done in baseball to determine what metrics provide insight into a player’s future performance, not just what he’s done. In fact, analytically advanced front offices spend a great deal of time devising and running player projections—both for players under their control and the balance of players available in the major and minor league systems. In a recent profile, Keith Woolner, the Indians’ director of analytics, talked about his team’s focus on projecting player, team, and league performance:

We’re very forward-looking,” Woolner said. “‘What is this player going to contribute to the club this year, next year, in three years? How are these prospects going to develop? What is the makeup of the team going to look like and how competitive is that going to be?'”

In the same profile, Sky Andrecheck—a colleague of Woolner’s in Cleveland—said that about 70 percent of their time is focused on developing player projections over the long term, not just the current season.

I can’t say for sure that every other front office has its analytics team focused on projections 70 percent of the time, but I suspect that there is more than just pure projections happening in that 70 percent for Cleveland.

I checked in with analytics personnel at three other teams to get a sense of how they distribute their time among the five activities that I listed, and here are the results:

How Do Front Offices Spend Their Time?

Analytics Department Activities Team A Team B Team C Average
Building/maintaining Systems 30% 20% 30% 27%
Developing/producing Reports 10% 20% 25% 18%
Long-term projections 5% 35% 10% 17%
Transactional analysis 30% 15% 20% 22%
Exploratory analysis/new nesearch 25% 10% 15% 17%
Excluding exploratory analysis/new research 75% 90% 85% 83%
Projections & transactions 35% 50% 30% 38%
Projections/transactions/reports 45% 70% 55% 57%

None of these three other teams said they spent more than 35 percent of their time on long-term projections. However, they all said there is a large degree of overlap. For example, projections and transactional analysis typically go hand in hand–you can’t really conduct a transactional analysis unless you take into account the projected performance of the players involved in any potential trade or signing. What jumps out is that the one activity that has the most time dedicated to it is the building and maintaining of their data systems.

While important, the activity with the lowest average time dedicated to it is exploratory analysis and new research. What do I mean by this? Well, with the other four activities we are typically working with existing or known metrics. Baseball systems are populated with existing data or custom metrics that have already been derived. Reports will generally include data taken from existing databases (e.g., a hitter’s tendencies against certain types of pitches in certain counts), long-term projections and forecasts will typically leverage an established projection model or system (whether public or team-specific), and transactional analysis will rely on existing information and established modes of analysis.

Breaking new ground, however, takes quite a bit of time. Coming up with new ways of projecting player talent, discovering new inefficiencies in the free-agent market, gaining a greater understanding of how new types of data can be integrated into existing projection systems and methods of analysis aren’t always prioritized over the existing needs of the day by an analytics department’s internal customers.

Now, the public analytics community commits a fair amount of its time to exploratory analysis, and in some cases teams will rely on those studies or at least use them as inspiration for their own analysis. The problem is that there is quite a bit the public doesn’t know about the current state of knowledge—since teams keep their own research and findings confidential—and public researchers don’t have access to many data sources. This means that teams can’t simply rely on public researches for exploratory analysis. This is where consultants often come into play.

Diagnostic versus predictive analytics—who benefits the most from each? There is obviously a big emphasis on prediction and projections from a roster construction perspective, but there are ample opportunities to provide insights to managers, coaches and players they can use to improve their own performance. There is naturally some ebb and flow to player performance; the key is understanding when that ebb and flow is due to randomness, what can be done to make improvements when it isn’t, or why a player has elevated his performance. Diagnostic analytics are all about why something happens, so it’s extremely valuable for personnel on the field to understand why they are struggling or why some opposing player is performing so well against them. One major league player told me that can be more valuable than the projections that are often a bigger focus of the front office. More on this later.

So let’s assume a team has worked out all the metrics and insight bugs; how do those metrics and insights go from the databases or reports to actually changing the way the team performs? This is where there is far less research and knowledge. But it’s fair to say that baseball isn’t the only industry where this is the case. I’ve worked with organizations in all sorts of industries where the question of how analytics actually makes its way through the organization is a huge challenge.

The next three foundations might represent what has come to be known as “the softer side of sabermetrics.” Even leading proponents of analytics have come to understand that it takes more than great data and models to change team performance.

Right Data and Systems

The first thing to remember is that the analytical process itself starts with data–historical, current, primary, third-party, etc. Analysis can’t really start unless we have a source of data to analyze. But as obvious as this might seem, it’s not a trivial point. Not all data are created equal, nor are all data or data sets complete or free of bias. Ensuring that a team has the highest quality data, is available with the least amount of latency, is an important task. It relies, in large part, on having the right technical infrastructure in place. This infrastructure needs to be reliable (experience as little downtime as possible), easily accessible by relevant users, and able to handle large quantities of data and perform extensive queries quickly.

As the information revolution has blossomed, so too has the need for teams to upgrade their data systems. The advent of PITCHf/x data exponentially increased the available data that teams have access to, with the volume growing daily. Combine this with data from various hit tracking and defensive tracking systems and you’ve easily tripled what was already a massive increase. Combining numerical data with the large amount of video now at the disposal of players, coaches and front office personnel adds another layer of complexity and strain on these systems.

The importance of infrastructure is one reason you see so many advertisements for interns and front office personnel where programming, developer and database skills are emphasized as much as baseball experience or statistical/research skills. But the basic “pipes” are just one part that needs to be considered. How users interface with the data, what they have access to, how they can digest the information—these key issues can absolutely impact how much positive impact data and analytics have. Here are two examples where this matters.

First, the degree to which there is automation or self-service options for more simple, mundane tasks will determine how much time an analytics team can dedicate to, well, analysis, as opposed to providing users with basic reports. A lot of time can be wasted if your analytic team is manually preparing reports that are essentially just the same report updated with new data. More time can be wasted by constantly responding to ad hoc requests from internal customers who don’t have access to the information themselves. (Of course, those customers also may not have the right skills or knowledge to use the data correctly, which we will get into in the next section.)

Related is the usability of the systems by people outside the analytical department. Putting data in the hands of other users can free an analytics team to do more diagnostic and predictive work, but that only works if users can easily digest the information they get from the system.

Take coaches and players, for example. Most do not have an analytical background, nor do they have time to break down and interpret complex data or visualizations. They need information that helps them answer their questions, presented succinctly in a way they can easily understand. The last thing you want to do to a non-analytical person is flood him with data and make him wade through it himself, looking for the nugget or insight. Why? You increase the chances that the user  becomes frustrated and abandons the data for their gut feelings, or ends up chasing false leads. Our brains are great at finding patterns, whether those patterns are real/meaningful or not.

How information gets communicated—either directly by members of the analytics team, or indirectly through reporting systems—can make a huge difference in whether the metrics and insights are actually leveraged by managers, coaches and players or if they are used in the right way.

Ben Lindbergh wrote a great profile on the Pirates and how they have approached the use of data and analytics. The article focused on how Mike Fitzgerald and Dan Fox have worked to ensure that the insights they generate at the top of the house make their way down to the field. In one example:

Fitzgerald and Fox have discovered that coaches tend to be visual learners. “We both found that more often than not, if we can figure out a way to communicate something visually, we can show it to these guys, and then all of a sudden, the message that we were trying to get out in words in six to seven minutes, they pick up in 20 seconds,” Fitzgerald says. Even video clips can be compressed into an easily absorbed image, he adds. “Instead of saying, ‘We have 35 video clips for you to go through,’ we can say, ‘Here’s a quick heat map of what happened with all these 35.’”

The point here is that you must understand how users best absorb information and then use that to tailor how the information is presented. That’s something to consider when teams develop their systems; they inevitably serve a number of different audiences, each of whom will have their own needs and ways of digesting data and information. The trick is to build systems that are as flexible and responsive to those different needs as possible.

The example from Pittsburgh also illustrates another critical point—the importance of people to any analytics effort. We’ll deal with that and the importance of organization culture in part two tomorrow.

References & Resources

Bill leads Predictive Modeling and Data Science consulting at Gallup. In his free time, he writes for The Hardball Times, speaks about baseball research and analytics, has consulted for a Major League Baseball team, and has appeared on MLB Network's Clubhouse Confidential as well as several MLB-produced documentaries. He is also the creator of the baseballr package for the R programming language. Along with Jeff Zimmerman, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Twitter @BillPetti.
newest oldest most voted

Great article, really enjoy reading them, wherever you end up writing for. One thought that struck me, while reading through a number of the links (thanks, I greatly appreciate all of them, instead of having to search for them or not), is questioning how good a consultant Luhnow is/was, because change management is one of the key mantras of the past 20 years, and yet it took him a number of years to figure out that he needed to figure out a way to implement his changes in the Astros (i.e. change management). And your conclusion rings true with what… Read more »

Jeremy Chiasson
Jeremy Chiasson

I’ll be real with you, I only have a high school diploma, so I’m in a little over my head here, and can’t add much of my own insight. That being said, I still found this very interesting, and thought you communicated it very clearly. Fascinating stuff!