Tuesday, August 22, 2017

Enby Distribution, pt. 4: Revisiting W%

In my first series about runs per game distributions, I wrote about how to use estimates of the probability of scoring k runs (however these probabilities were estimated, Enby distribution or an alternative approach) to estimate a team’s winning percentage. I’m going to circle back to that here, and most of the content is a repeat of the earlier post.

However, I think this is an important enough topic to rehash. In fact, a winning percentage estimator strikes me as the most logical application for a runs per game distribution, albeit one that is not particularly helpful to everyday sabermetric practice. After all, multiple formulas to estimate W% as a function of runs scored and runs allowed have been developed, and most of them work quite well when working with normal major league teams--well enough to make it difficult to imagine that there is any appreciable gain in accuracy to be had. Better yet, these W% estimators are fairly simple--even the most complex versions in common use, Pythagenport/pat, can be quickly tapped out on a calculator in about thirty seconds.

Given that there are powerful, relatively simple W% models already in use, why even bother to examine a model based on the estimated scoring distribution? There are three obvious reasons that come to my mind. The first is that such a model serves as a check on the others. Depending on how much confidence one has in the underlying run distribution model, it is possible that the resulting W% estimator will produce a batter estimate, at least at the extremes. We know of course that some of the easier models don’t hold up well in extreme situations--linear estimators will return negative or greater than one figures at some point, and fixed Pythagorean exponents will fray at some point. While we know that Pythagenpat works at the known point of 1 RPG and appears to work well at other extreme values, it doesn’t hurt to have another way of estimating W% in those extremes to see if Pythagenpat is corroborated, or whether the models disagree. This can also serve as a check on Enby--if the results vary too much from what we expect, it may imply that Enby does not hold up well at extremes itself.

A second reason is that it’s plain fun if you like esoteric sabermetrics (and if you’re reading this blog, it’s a good bet that you do). I’ve never needed an excuse to mess around with alternative methods, particularly when it comes to W% estimators, which along with run estimators are my own personal favorite sabermetric tools.

But the third reason is the one that I want to focus on here, which is that a W% estimator based on an underlying estimate of the run distribution is from one perspective the simplest possible estimator. This may seem to be an absurd statement given all of the steps that are necessary to compute Enby estimates, let alone plugging these into a W% formula. But from a first principles standpoint, the distribution-based W% estimator is the simplest to explain, because it is defined by the laws of the game itself.

If you score no runs, you don’t win. If you score one run, you win if you allow zero runs. If you score two runs, you win if you allow either zero or one run, and on it goes ad infinitum. If at the end of nine innings you have scored and allowed an equal number of runs, you play on until there is an inning in which an unequal, greater than zero number of runs are scored. This fundamental identity is what all of the other W% estimators attempt to approximate, the mechanics which they attempt to sweep under the rug by taking shortcuts to approximate. The distribution-based approach is computationally dense but conceptually easy (and correct). Of course, to bring points one and three together, the definition may be correct, but the resulting estimates are useless if the underlying model (Enby in this case) does not work.

In order to produce our W% estimate, we first need to use Enby to estimate the scoring distribution for the two teams. This is not as simple as using the Enby parameters we have already developed based on the Tango Distribution with c = .767. Tango has found that his method produces more accurate results for two teams when c is set equal to .852 instead.

In the previous post, I walked through the computations for the Enby distribution with any c value, so this is an easy substitution to make. But why is it necessary? I don’t have a truly satisfactory answer to that question--it's trite to just assert that it works better for head-to-head matchups because of the covariance between runs scored and runs allowed, even if that is in fact the right answer.

How will modifying the control value alter the Enby distribution? All of the parameters will be effected, because all depend on the control value in one way or another. First, B and r (the latter as it is initially figured before zero modification):

VAR = RG^2/9 + (2/c - 1)*RG
r = RG^2/(VAR - RG)
B = VAR/RG - 1

When c is larger, the variance of runs scored will be smaller. We can see this by examining the equations for variance with c = .767 and .852:

VAR (.767) = RG^2/9 + 1.608*RG
VAR (.852) = RG^2/9 + 1.347*RG

This results in a larger value for r and a smaller value for B, but these parameters don’t have an intuitive baseball explanation, unlike variance. It’s difficult to explain (for me at least) why variance of a single team’s runs scored should be lower when considering a head-to-head matchup, but that’s the way it works out.

It should be noted that if the sole purpose of this exercise is to estimate W%, we don’t have to care whether the actual probability of each team scoring k runs is correct. All we need to do is have an accurate estimate of how often Team A’s runs scored are greater than Team B’s.

By increasing c, we also reduce the probability of a shutout, as can be seen from the formula for z:

z =(RI/(RI + c*RI^2))^9

Originally, I had intended to display some graphs showing the behavior of the three parameters by RG with each choice of c, but these turned out to be not of any particular interest. I ran similar graphs earlier in the series with parameters based on the earlier variance model, and the shape of the resulting functions are quite similar. The only real visual difference when c varies is what appears to be linear shifts for r and B (the B shift is linear, the r not quite).

What might be more interesting is looking at how c shapes the estimated run distribution for a team with a given RG. I’ll look at three teams--one average (4.5 RG), one extremely low-scoring (2.25 RG), and one extremely high-scoring (9 RG). First, the 4.5 RG team:



As you may recall from earlier, Enby consistently overestimates the frequency with which a normal major league team will score 2-4 runs. Using the .852 c value exacerbates this issue; in fact, the main thing to take away from this set of graphs is that the higher c value clusters more probability around the mean, while the lower c value leaves more probability for the tails.

The 2.25 RG team:



And the 9 RG team:


No comments:

Post a Comment

I reserve the right to reject any comment for any reason.