Wednesday, February 28, 2018

Enby Distribution, pt. 5: W% Estimate

While an earlier post contained the full explanation of the methodology used to estimate W%, it’s an important enough topic to repeat in full here. The methodology is not unique to Enby; it could be implemented with any estimate of the frequency of runs scored per game (and in fact I first implemented it with the Tango Distribution). As I discussed last time, the math may look complicated and require a computer to implement, but the model itself is arguably the simplest conceptually because it is based on the simple logic of how games are decided.

Let p(k) be the probability of scoring k runs in a game and q(m) be the probability of allowing m runs a game. If k is greater than m, then the team will win; if k is less than m, then the team will lose. If k and m are equal, then the game will go to extra innings. In setting it up this way, I am implicitly assuming that p(k) is the probability of scoring k runs in nine innings rather than in a game. This is not a horrible way to go about it since the average major league game has about 27 outs once the influences that cause shorter games (not batting in the ninth, rain) are balanced with the longer games created by extra innings. Still, it should be noted that the count of runs scored from a particular game does not necessarily arise from an equivalent opportunity context (as defined by innings or outs) of another game.

Given this notation, we can express the probability of winning a game in the standard nine innings as:

P(win 9) = p(1)*q(0) + p(2)*[q(0) +q(1)] +p(3)*[q(0) + q(1) + q(2)] + p(4)*[q(0) + q(1) + q(2) + q(3)] + ...

Extra innings will occur whenever k and m are equal:

P(X) = p(0)*q(0) + p(1)*q(1) + p(2)*q(2) + p(3)*q(3) + p(4)*q(4) + ...

When the game goes to extra innings, it becomes an inning by inning contest. Let n(k) be the probability of scoring k runs in an inning and r(m) be the probability of allowing m runs in an inning. If k is greater than m, the team wins; if k is less than m, the team loses; and if k is equal to m, then the process will repeat until a winner is determined.

To find the probability of each of the three possible outcomes of an extra inning, we can follow the same logic as used above for P(win 9). The probability of winning the inning is:

P(win inning) = n(1)*r(0) +n(2)*[r(0) +r(1)] +n(3)*[r(0) + r(1) + r(2)] + n(4)*[r(0) + r(1) + r(2) + r(3)] + ...

The probability of the game continuing (equivalent to tying the inning) is similar to P(extra innings above):

P(tie inning) = n(0)*r(0) + n(1)*r(1) +n(2)*r(2) + n(3)*r(3) + n(4)*r(4) + ...

The probability of winning in extra innings [P(win X)] is:

P(win X) = P(win inning) + P(tie inning)*P(win inning) + P(tie inning)^2*P(win inning) + P(tie inning)^3*P(win inning) + ...

This is a geometric series that simplifies to:

P(win X) = P(win inning)*[P(tie inning) + P(tie inning)^2 + P(tie inning)^3 + ...] = P(win inning)*1/[1 - P(tie inning)] = P(win inning)/[1 - P(tie inning)]

This could also be expressed in a very clever way using the Craps Principle if we had also computed P(lose inning); I did it that way last time, but it doesn’t really cut down on the amount of calculation necessary in this case.

Since I want these last few posts to serve as a comprehensive explanation of how to calculate the Enby run and win estimates, it is necessary to take a moment to review how to use the Tango Distribution to estimate the runs per inning distribution. c of course is the constant, set at .852 when looking with a head-to-head matchup. RI is runs/inning, which I’ve defined as RG/9:

a = c*RI^2
n(0) = RI/(RI + a)
d = 1 - c*f(0)
n(1) = (1 - n(0))*(1 - d)
n(k) = n(k - 1)*d for k >= 2

Once we have these three key probabilities [P(win 9), P(X), and P(win X)], the formula for W% is obvious:

W% = P(win 9) + P(X)*P(win X)

We will use the Enby Distribution to determine p(k) and q(m), and the Tango Distribution to determine n(k) and r(m). In both cases, we’ll use the Tango Distribution constant c = .852 since this works best when looking at a head-to-head matchup, which certainly is the applicable context when discussing W%.

I have put together a spreadsheet that will handle all of the calculations for you. The yellow cells are the ones that you can edit, with the most important being R (cell B1) and RA (cell L1), which naturally are where you enter the average R/G and RA/G for the team whose W% you’d like to estimate. The other yellow cell is for the c value of Tango Distribution. Please note that editing this cell will do nothing to change the Enby Distribution parameters--those are fixed based on using c = .852. Editing c in this cell (B8) will only change the estimates of the per inning scoring probabilities estimated by the Tango Distribution. I don’t advise changing this value, since .852 has been found to work best for head-to-head matchups and leaving it there keeps the Tango Distribution estimates consistent with the Enby Distribution estimates. The sheet also calculates Pythagenpat W% for a given exponent (which you can change in cell B15).

The calculator supports the same range of values as the one for single team run distribution introduced in part 9--RG at intervals of .25 between 0-3 and 7-15 runs, and at intervals of .05 between 3-7 runs. The vlookup function will round down to the next R/G value on the parameter sheet (for example, the two highest values supported are 14.75 and 15.00. You can enter 14.93 if you want, but the Enby calculation will be based on 14.75 (the Pythagenpat calculation will still be based on 14.93). Have some fun playing around with it, and next time we’ll look at how accurate the Enby estimate is compared to other W% models.

Tuesday, February 13, 2018

Doubles or Nothing

In previewing the season to come for any team, it is customary (for good reason) to start by taking a look back at the previous season. Sometimes this is a pleasant or at least unobjectionable experience. On some occasions, though, it forces one to review an absolute disaster of a season, as was turned in by the 2017 Ohio State Buckeyes.

OSU went 22-34, which was the lowest W% by a Buckeye club since 1974. Their 8-16 Big Ten record was the worst since 1987. The seven years in which Beals have been at the helm have produced a .564 W%, which excepting the largely overlapping span of 2008-2014, is the worst since 1986-1992. Beals has taken the program build by Bob Todd, who inherited the late 80s malaise, and driven it right back into mediocrity.

Yet merrily he rolls along, untroubled by the pressures of coaching at a school that fired its all-time winningest basketball coach for having two straight NCAA tournament misses, despite compiling a .500 record in Big Ten play over those two seasons. Beals and his unenlightened brand of baseball may be too small fry to draw the ire of AD Gene Smith, but tell that to the track, gymnastics, and women’s hockey coaches who have been pushed out in recent years. Beals record of doing less with a historically strong program is unmatched at the University.

When one peruses the likely lineup for 2018, it’s hard to think that a turnaround is imminent. Stranger things have happened, of course, but eight years into his tenure in Columbus, enough time to have nearly turned over two whole recruiting classes with no overlap, he is still plugging roster wholes with unproven JUCO transfers, failing to develop the high school recruits he’s brought in. It’s gotten to the point that if a player doesn’t find a role as a freshman, you can basically write him off as a future contributor.

Junior Jacob Barnwell is firmly ensconced at catcher; he was an average hitter last year and appears to have the coach seal of approval as a receiver, so he’s golden for playing time over the next two seasons. True freshman Dillon Dingler may be the heir apparent, with junior Andrew Fishel and redshirt freshman Scottie Seymour providing depth.

Seniors Bo Coolen and Noah McGowan, both JUCO transfers a year ago, will compete for first base; Coolen was bad offensively in 2017 with no power (.074 ISO), McGowan a little better but still below average. Junior Brady Cherry will move from the hot corner to the keystone, a curious move to this observer; Cherry flashed power as a freshman but was middling with the bat last year. That opens up third for sophomore Connor Pohl, who filled in admirably at second last year but does look more like a third baseman; on a rate basis he was the second most productive returning hitter, although it wasn’t a huge sample size (89 PA and it was very BA-heavy with a .325 BA/.225 SEC). JUCO transfer junior Kobie Foppe is penciled in at shortstop. The utility infielders are both sophomores; Noah West played more as a freshman, getting starts at second base (he didn’t hit at .213/.278/.303) and serving as a defensive replacement for Pohl, while Carpenter had 14 hitless (one walk) PAs. True freshman Aaron Hughes rounds out the roster.

Senior Tyler Cowles has the inside track at left field, coming off a first season as a JUCO transfer in which he hit .190/.309/.314 over 129 PA. McGowan could also contend for this spot, with backup outfield redshirt juniors Nate Romans and Ridge Winand also in the mix. JUCO transfer Malik Jones has been anointed as the centerfielder, with true freshman Jake Ruby as an understeady. Right field along with catcher is the only spot on the roster that features an established starter at the same position; sophomore Dominic Canzone is OSU’s best returning hitter, although it was BA heavy (.343 BA/.205 SEC). Some combination of Cowles, McGowan, and Fishel would appear to have the first crack at DH.

OSU’s pitching was an utter disaster last year, partly due to injury and partly because, well, Greg Beals. The only sure bet for the rotation appears to be senior Adam Niemeyer, with junior lefty Connor Curlis and senior Yianni Pavlopoulos (who closed as a sophomore) most likely to join him. Their RAs were 6.23, 5.03, and 7.65 respectively in 2017, although only Curlis had good health. Junior Ryan Feltner pitched poorly last year (7.32 RA over 62 IP despite 8.2 K/9), then went to the Cape Cod league and was named Reliever of the Year. Sophomore Jake Vance had a 6.92 RA over 26 innings, largely thanks to 20 walks, and is the fifth rotation candidate.

The perennial bright spot of the pitching staff is senior righty Seth Kinker, who easily led the team with 13 RAA over 58 innings, even getting 3 starts when everything fell to pieces. He figures to be the go-to reliever, with fifth-year senior righties Kyle Michalik, Austin Woody, and Curtiss Irving in middle relief. You’re not going to believe this, but their RAs ranged between 6.85 and 7.94 over a combined 66 innings. Sophomore Thomas Waning will follow Kinker and Michalik in one of Beals’ good traits, which is an affinity for sidearmers; Waning was effective (11 K, 4 W) in a 12 inning injury-shortened debut season. Junior Dustin Jourdan will be in the mix as well.

Beals also has an affinity for lefty specialists, which he will have to cultivate anew from sophomore Andrew Magno (4 appearances in 2016) and true freshman Luke Duermit, Griffan Smith, and Alex Theis.

The schedule is fairly typical, with the opening weekend (starting Friday) featuring a pair of games with both Canisus and UW-Milwaukee in Florida. The following weekend will see the Bucks in Arizona for the Big Ten/Pac-12 Challenge where they’ll play two each against Utah and Oregon State. Another trip to Florida to play low-level opponents (Nicholls State, Southern Miss, and Eastern Michigan) follows, followed by a trip to the Carolinas that will feature two games each against High Point, Coastal Carolina, and UNC-Wilmington.

Bizarrely, the home schedule opens March 16 with a weekend series against Cal St-Northridge; usually any home dates with non-Northern opponents come later in the calendar. Another non-conference weekend series against Georgetown follows, and then Big Ten play: Nebraska, @ Iowa, @ Penn St, Indiana, Minnesota, Illinois, Purdue, @ Michigan St. Mixed in will be a typically home-heavy mid-week slate (Eastern Michigan, Toledo, Kent St, Ohio University, Miami, Campbell) with road games at Ball St and Cincinnati.

As I wrote the roster outlook (which relied on my own knowledge and guesses but also heavily on the season preview released by the athletic department), two things that I already thought I knew struck me even more plainly.

1) This team does not appear to be very good. One can construct a rosy scenario where the pitching woes of 2017 were due largely to injury, but we’re talking about pitcher injuries. It takes extra tint on those glasses. It has to be better than last year, when nine pitchers started at least three games, but this team was 22-34; “better” isn’t going to cut it.

2) The offense has a couple solid returnees, but in the eighth year of Beals tenure, major positions on the diamond are still being papered over with JUCO transfers. There is no pipeline of young players getting their feet wet in utility roles and transitioning into starting as you would expect in a healthy program. There are no freshman studs to come in and commandeer lineup positions as you would expect in a strong program. It is quite easy to imagine a scenario in which five of the nine lineup spots are held by first or second-year JUCO transfers.

Beals has failed in recruiting, he has failed in player development, and most importantly he has failed to win at the level to which an OSU program should aspire. I’ve devoted many words in previous season previews and recaps (and the hashtag #BealsBall) to his asinine tactics. I won’t rehash that here, but I will end with a quote from the Meet the Team Dinner that program icon Nick Swisher was roped into headlining, which makes one seriously question in what decade Mr. Beals thinks he coaches:

“Our goal in 2018 is to hit a lot of doubles,” said Beals on Saturday night.

Monday, January 08, 2018

Run Distribution and W%, 2017

I always start this post by looking at team records in blowout and non-blowout games. This year, after a Twitter discussion with Tom Tango, I’ve narrowed the definition of blowout to games in which the margin of victory is six runs or more (rather than five, the definition used by Baseball-Reference and that I had independently settled on). In 2017, the percentage of games decided by x runs and by >= x runs are:

If you draw the line at 5 runs, then 21.6% of games are classified as blowouts. At 6 runs, it drops to 15.3%. Tango asked his Twitter audience two different but related poll questions. The first was “what is the minimum margin that qualifies a game as a blowout?”, for which the plurality was six (39%, with 5, 7, and 8 the other options). The second was “what percentage of games do you consider to be blowouts?”, for which the plurality was 8-10% (43%, with the other choices being 4%-7%, 11%-15%, and 17-21%). Using the second criterion, one would have to set the bar at a margin of seven, at least in 2017.

As Tango pointed out, it is of interest that asking a similar question in different ways can produce different results. Of course this is well-known to anyone with a passing interest in public opinion polling. But here I want to focus more on some of the pros and cons of having a fixed standard for a blowout or one that varies depending on the actual empirical results from a given season.

A variable standard would recognize that as the run environment changes, the distribution of victory margin will surely change (independent of any concurrent changes in the distribution of team strength), expanding when runs are easier to come by. Of course, this point also means that what is a blowout in Coors Field may not be a blowout in Petco Park. The real determining factor of whether a game is a blowout is whether the probability that the trailing team can make a comeback (of course, picking one standard and applying to all games ignores the flow of a game; if you want to make a win probability-based definition, go for it).

On the other hand, a fixed standard allows the percentage of blowouts to vary over time, and maybe it should. If the majority of games were 1-0, it would sure feel like there were vary few blowouts even if the probability of the trailing team coming back was very low. Ideally, I would propose a mixed standard, in which the margin necessary for a blowout would not be a fixed % of games but rather somehow tied to the average runs scored/game. However, for the purpose of this post, Tango’s audience answering the simpler question is sufficient for my purposes. I never had any strong rationale for using five, and it does seem like 22% of games as blowouts is excessive.

Given the criterion that a blowout is a game in which the margin of victory was six or more, here are team records in non-blowouts:

Records in blowouts:

The difference between blowout and non-blowout records (B - N), and the percentage of games for each team that fall into those categories:

Keeping in mind that I changed definitions this year (and in so doing increased random variation if for no reason other than the smaller percentage of games in the blowout bucket), it is an oddity to see two of the very best teams in the game (HOU and WAS) with worse records in blowouts. Still, the general pattern is for strong teams to be even better in blowouts, per usual. San Diego stands out as the most extreme team, with an outlier poor record in blowouts offsetting an above-.500 record in non-blowouts, although given that they play in park with the lowest PF, their home/road discrepancy between blowout frequency should theoretically be higher than most teams.

A more interesting way to consider game-level results is to look at how teams perform when scoring or allowing a given number of runs. For the majors as a whole, here are the counts of games in which teams scored X runs:

The “marg” column shows the marginal W% for each additional run scored. In 2017, three was the mode of runs scored, while the second run resulted in the largest marginal increase in W%.

The major league average was 4.65 runs/game; at that level, here is the estimated probability of scoring x runs using the Enby Distribution (stopping at fifteen):

In graph form (again stopping at fifteen):

This is a pretty typical visual for the Enby fit to the major league average. It’s not perfect, but it is a reasonable model.

In previous years I’ve used this observed relationship to calculate metrics of team offense and defense based on the percentage of games in which they scored or allowed x runs. But I’ve always wanted to switch to using theoretical values based on the Enby Distribution, for a number of reasons:

1. The empirical distribution is subject to sample size fluctuations. In 2016, all 58 times that a team scored twelve runs in a game, they won; meanwhile, teams that scored thirteen runs were 46-1. Does that mean that scoring 12 runs is preferable to scoring 13 runs? Of course not--it's a quirk in the data. Additionally, the marginal values don’t necessary make sense even when W% increases from one runs scored level to another.

2. Using the empirical distribution forces one to use integer values for runs scored per game. Obviously the number of runs a team scores in a game is restricted to integer values, but not allowing theoretical fractional runs makes it very difficult to apply any sort of park adjustment to the team frequency of runs scored.

3. Related to #2 (really its root cause, although the park issue is important enough from the standpoint of using the results to evaluate teams that I wanted to single it out), when using the empirical data there is always a tradeoff that must be made between increasing the sample size and losing context. One could use multiple years of data to generate a smoother curve of marginal win probabilities, but in doing so one would lose centering at the season’s actual run scoring rate. On the other hand, one could split the data into AL and NL and more closely match context, but you would lose sample size and introduce more quirks into the data.

So this year I am able to use the Enby Distribution. I have Enby Distribution parameters at each interval of .05 runs/game. Since it takes a fair amount of manual work to calculate the Enby parameters, I have not done so at each .01 runs/game, and for this purpose it shouldn’t create too much distortion (more on this later). The first step is to take the major league average R/G (4.65) and park-adjust it. I could have park-adjusted home and road separately, and in theory you should be as granular as practical, but the data on teams scoring x runs or more is not readily available broken out between home and road. So each team’s standard PF which assumes a 50/50 split of home and road games is used. I then rounded this value to the nearest .05 and calculated the probability of scoring x runs using the Enby Distribution (with c = .852 since this exercise involves interactions between two teams).

For example, there were two teams that had PFs that produced a park-adjusted expected average of 4.90 R/G (ARI and TEX). In other words, an average offense playing half their games in Arizona's environment should have scored 4.90 runs/game; an average defense doing the same should have allowed 4.90 runs/game. The Enby distribution probabilities of scoring x runs for a team averaging 4.90 runs/game are:

For each x, it’s simple to estimate the probability of winning. If this team scores three runs in a particular game, then they will win if they allow 0 (4.3%), 1 (8.6%), or 2 runs (12.1%). As you can see, this construct assumes that their defense is league-average. If they allow three, then the game will go to extra innings, in which case they have a 50% chance of winning (this exercise doesn’t assume anything about inherent team quality), so in another 13.6% of games they win 50%. Thus, if this the Diamondbacks score three runs, they should win 31.8% of those games. If they allow three runs, it’s just the complement; they should win 68.2% of those games.

Using these probabilities and each team’s actual frequency of scoring x runs in 2017, I calculate what I call Game Offensive W% (gOW%) and Game Defensive W% (gDW%). It is analogous to James’ original construct of OW% except looking at the empirical distribution of runs scored rather than the average runs scored per game. (To avoid any confusion, James in 1986 also proposed constructing an OW% in the manner in which I calculate gOW%, which is where I got the idea).

As such, it is natural to compare the game versions of OW% and DW%, which consider a team’s run distribution, to their OW% and DW% figured using Pythagenpat in a traditional manner. Since I’m now using park-adjusted gOW%/gDW%, I have park-adjusted the standard versions as well. As a sample calculation, Detroit averaged 4.54 R/G and had a 1.02 PF, so their adjusted R/G is 4.45 (4.54/1.02). OW% The major league average was 4.65 R/G, and since they are assumed to have average defense we use that as their runs allowed. The Pythagenpat exponent is (4.45 + 4.65)^.29 = 1.90, and so their OW% is 4.45^1.90/(4.45^1.90 + 4.65^1.90) = .479, meaning that if the Tigers had average defense they would be estimated to win 47.9% of their games.

In previous year’s posts, the major league average gOW% and gDW% worked out to .500 by definition. Since this year I’m 1) using a theoretical run distribution from Enby 2) park-adjusting and 3) rounding team’s park-adjusted average runs to the nearest .05, it doesn’t work out perfectly. I did not fudge anything to correct for the league-level variation from .500, and the difference is small, but as a technical note do consider that the league average gOW% is .497 and the gDW% is .503.

For most teams, gOW% and OW% are very similar. Teams whose gOW% is higher than OW% distributed their runs more efficiently (at least to the extent that the methodology captures reality); the reverse is true for teams with gOW% lower than OW%. The teams that had differences of +/- 2 wins between the two metrics were (all of these are the g-type less the regular estimate, with the teams in descending order of absolute value of the difference):

Positive: SD, TOR
Negative: CHN, WAS, HOU, NYA, CLE

The Cubs had a standard OW% of .542, but a gOW% of .516, a difference of 4.3 wins which is the largest such discrepancy for any team offense/defense in the majors this year. I always like to pick out this team and present a graph of their runs scored frequencies to offer a visual explanation of what is going on, which is that they distributed their runs less efficiently for the purpose of winning games than would have been expected. The Cubs average 5.07 R/G, which I’ll round to 5.05 to be able to use an estimated distribution I have readily available from Enby (using the parameters for c = .767 in this case since we are just looking at one team in isolation):

The Cubs were shutout or held to one run in eight more games than one would expect for a team that average 5.05 R/G; of course, these are games that you almost certainly will not win. They scored three, four, or six runs significantly less than would be expected; while three and four are runs levels at which in 2017 you would expect to lose more often then you win, even scoring three runs makes a game winnable (.345 theoretical W% for a team in the Cubs’ run environment). The Cubs had 4.5 fewer games scoring between 9-12 runs than expected, which should be good from an efficiency perspective, since even at eight runs they should have had a .857 W%. But they more than offset that by scoring 13+ runs in a whopping 6.8% of their games, compared to an expectation of 2.0%--7.8 games more than expected where they gratuitously piled on runs. Chicago scored 13+ runs in 11 games, with Houston and Washington next with nine, and it’s no coincidence that they were also very inefficient offensively.

The preceding paragraph is an attempt to explain what happened; despite the choice of non-neutral wording, I’m not passing judgment. The question of whether run distribution is strongly predictive compared to average runs has not been studied in sufficient detail (by me at least), but I tend to think that the average is a reasonable indicator of quality going forward. Even if I’m wrong, it’s not “gratuitous” to continue to score runs after an arbitrary threshold with a higher probability of winning has been cleared. In some cases it may even be necessary, as the Cubs did have three games in which they allowed 13+ runs, although they weren’t the same games. As we saw earlier, major league teams were 111-0 when scoring 13+ runs, and 294-17 when scoring 10-12.

Teams with differences of +/- 2 wins between gDW% and standard DW%:

Positive: SD, CIN, NYN, MIN, TOR, DET
Negative: CLE, NYA

San Diego and Toronto had positive differences on both sides of the ball; the Yankees and Cleveland had negative difference for both. Thus it is no surprise that those teams show up on the list comparing gEW% to EW% (standard Pythagenpat). gEW% combines gOW% and gDW% indirectly by converting both to equivalent runs/game using Pythagenpat (see this post for the methodology):

Positive: SD, TOR, NYN, CIN, OAK
Negative: WAS, CHN, CLE, NYA, ARI, HOU

The Padres EW% was .362, but based on the manner in which they actually distributed their runs and runs allowed per game, one would have expected a .405 W%, a difference of 6.9 wins which is an enormous difference for these two approaches. In reality, they had a .438 W%, so Pythagenpat’s error was 12.3 wins which is enormous in its own right.

gEW% is usually (but not always!) a more accurate predictor of actual W% than EW%, which it should be since it has the benefit of additional information. However, gEW% assumes that runs scored and allowed are independent of each other on the game-level. Even if that were true theoretically (and given the existence of park factors alone it isn’t), gEW% would still be incapable of fully explaining discrepancies between actual records and Pythagenpat.

The various measures discussed are provided below for each team.

Finally, here are the Crude Team Ratings based on gEW% since I hadn’t yet calculated gEW% when that post was published:

Monday, December 18, 2017

Crude Team Ratings, 2017

For the last several years I have published a set of team ratings that I call "Crude Team Ratings". The name was chosen to reflect the nature of the ratings--they have a number of limitations, of which I documented several when I introduced the methodology.

I explain how CTR is figured in the linked post, but in short:

1) Start with a win ratio figure for each team. It could be actual win ratio, or an estimated win ratio.

2) Figure the average win ratio of the team’s opponents.

3) Adjust for strength of schedule, resulting in a new set of ratings.

4) Begin the process again. Repeat until the ratings stabilize.

The resulting rating, CTR, is an adjusted win/loss ratio rescaled so that the majors’ arithmetic average is 100. The ratings can be used to directly estimate W% against a given opponent (without home field advantage for either side); a team with a CTR of 120 should win 60% of games against a team with a CTR of 80 (120/(120 + 80)).

First, CTR based on actual wins and losses. In the table, “aW%” is the winning percentage equivalent implied by the CTR and “SOS” is the measure of strength of schedule--the average CTR of a team’s opponents. The rank columns provide each team’s rank in CTR and SOS:

The top ten teams were the playoff participants, with the two pennant winners coming from the group of three teams that formed a clear first-tier. The #9 and #10 teams lost the wildcard games. Were it not for the identity of the one of those three that did not win the pennant, it would have been about as close to perfect a playoff outcome as I could hope for. What stood out the most among the playoff teams to me is that Arizona ranked slightly ahead of Washington. As we’ll see in a moment, the NL East was bad, and as the best team in the worst division, the Nationals had the lowest SOS in the majors, with their average opponent roughly equivalent to the A’s, while the Diamondbacks’ average opponent was roughly equivalent to the Royals.

Next are the division averages. Originally I gave the arithmetic average CTR for each divison, but that’s mathematically wrong--you can’t average ratios like that. Then I switched to geometric averages, but really what I should have done all along is just give the arithemetic average aW% for each division/league. aW% converts CTR back to an “equivalent” W-L record, such that the average across the major leagues will be .50000. I do this by taking CTR/(100 + CTR) for each team, then applying a small fudge factor to force the average to .500. In order to maintain some basis for comparison to prior years, I’ve provided the geometric average CTR alongside the arithmetric average aW%, and the equivalent CTR by solving for CTR in the equation:

aW% = CTR/(100 + CTR)*F, where F is the fudge factor (it was 1.0005 for 2017 lest you be concerned there is a massive behind-the-scenes adjustment taking place).

The league gap closed after expanding in 2016, but the AL maintained superiority, with only the NL West having a higher CTR than any AL division. It was a good bounceback for the NL West after being the worst division in 2016, especially when you consider that the team that had been second-best for several years wound up as the second-worst team in the majors. The NL East was bad, but not as bad as it was just two years ago.

I also figure CTRs based on various alternate W% estimates. The first is based Expected W%, (Pythagenpat based on actual runs scored and allowed):

The second is CTR based on Predicted W% (Pythagenpat based on runs created and allowed, actually Base Runs):

Usually I include a version based on Game Expected Winning %, but this year I’m finally switching to using the Enby distribution so it’s going to take a little bit more work, and I’d like to get one of these two posts up before the end of the year. So I will include the CTRs based on gEW% in the Run Distribution post.

A few seasons ago I started including a CTR version based on actual wins and losses, but including the postseason. I am not crazy about this set of ratings, the reasoning behind which I tried very poorly to explain last year. A shorter attempt follows: Baseball playoff series have different lengths depending on how the series go. This has a tendency to exaggerate the differences between the teams exhibited by the series, and thus have undue influence on the ratings. When the Dodgers sweep the Diamondbacks in the NLDS, this is certainly additional evidence that we did not previously have which suggests that the Dodgers are a stronger team than the Diamondbacks. But counting this as 3 wins to 0 losses exaggerates the evidence. I don’t mean this in the (equally true) sense that W% over a small sample size will tend to be more extreme than a W% estimate based on components (R/RA, RC/RCA, etc.) This we could easily account for by using EW% or PW%. What I’m getting at is that the number of games added to the sample is dependent on the outcomes of the games that are played. If series were played through in a non-farcical manner (i.e. ARI/LA goes five games regardless of the outcomes), than this would be a moot point.

I doubt that argument swayed even one person, so the ratings including playoff performance are:

With the Dodgers holding a 161 to 156 lead over the Astros before the playoffs, romping through the NL playoffs at 7-1 while the Astros went 7-4 in the AL playoffs, and taking the World Series to seven games, they actually managed to increase their position as the #1 ranked team. I’m not sure I’ve seen that before--certainly it is common for the World Series winner to not be ranked #1, but usually they get closer to it than further away.

And the differences between ratings include playoffs (pCTR) and regular season only (rCTR):

Monday, December 11, 2017

Hitting by Position, 2017

Of all the annual repeat posts I write, this is the one which most interests me--I have always been fascinated by patterns of offensive production by fielding position, particularly trends over baseball history and cases in which teams have unusual distributions of offense by position. I also contend that offensive positional adjustments, when carefully crafted and appropriately applied, remain a viable and somewhat more objective competitor to the defensive positional adjustments often in use, although this post does not really address those broad philosophical questions.

The first obvious thing to look at is the positional totals for 2016, with the data coming from "MLB” is the overall total for MLB, which is not the same as the sum of all the positions here, as pinch-hitters and runners are not included in those. “POS” is the MLB totals minus the pitcher totals, yielding the composite performance by non-pitchers. “PADJ” is the position adjustment, which is the position RG divided by the overall major league average (this is a departure from past posts; I’ll discuss this a little at the end). “LPADJ” is the long-term positional adjustment that I use, based on 2002-2011 data. The rows “79” and “3D” are the combined corner outfield and 1B/DH totals, respectively:

After their record-smashing performance in 2016, second basemen regressed to the mean, although they still outproduced the league average. The mid-defensive spectrum positions, third base and centerfield, were both similarly about 3% above their historical norms, but the real story of 2017 positional offense was DH. DHs were essentially as productive as shortstops. Looking at the two positions’ respective slash lines, DH had the better secondary average, SS the better batting average for the same run output. While DH has been down in recent years, they were at a much more respectable 109 last year. One year of this data tends to yield more blips than trends, although after a league average performance in 2016 left fielders only improved slightly to 102.

Moving on to looking at more granular levels of performance, I always start by looking at the NL pitching staffs and their RAA. I need to stress that the runs created method I’m using here does not take into account sacrifices, which usually is not a big deal but can be significant for pitchers. Note that all team figures from this point forward in the post are park-adjusted. The RAA figures for each position are baselined against the overall major league average RG for the position, except for left field and right field which are pooled.

While positions relative to the league bounce around each year, it seems that the most predictable thing about this post is that the difference between the best and worst NL pitching staffs will be about twenty runs at the plate. As a whole, pitchers were at 0.00 runs created/game, which is the first time I’ve estimated them at 0, although they dipped into the negative in 2014 then crept back into positive territory for two years.

I don’t run a full chart of the leading positions since you will very easily be able to go down the list and identify the individual primarily responsible for the team’s performance and you won’t be shocked by any of them, but the teams with the highest RAA at each spot were:


More interesting are the worst performing positions; the player listed is the one who started the most games at that position for the team:

Usually this list is more funny than sad, but almost every player that led one of these teams in starts was at one time considered (by some, maybe, in the case of Alcides Escobar) to be an outstanding player. Mercifully Mark Trumbo led Oriole DHs to a worse composite performance than the Angels or it would have been a veritable tragedy. Although the depressing nature of this list is offset significantly by the presence of the Kansas City shortstops and their Esky Magic, it is also not fair to Eduardo Nunez, who hit fine as a SF 3B (764 OPS in 199 PA). The real culprits for the Giants were, well, everyone else who played third base, with a max 622 OPS out of Christian Arroyo, Pablo Sandoval, Kelby Tomlinson, Jae-gyun Hwan, Connor Gillaspie, Ryder Jones, Aaron Hill, and Orlando Calixte. Giant third basemen other than Nunez hit a combined un-park adjusted 174/220/246. Props to Austin Slater who had a single in his only PA as Giant third basemen, joining Nunez as the only non-horrible performer of the bunch.

I like to attempt to measure each team’s offensive profile by position relative to a typical profile. I’ve found it frustrating as a fan when my team’s offensive production has come disproportionately from “defensive” positions rather than offensive positions (“Why can’t we just find a corner outfielder who can hit?”) The best way I’ve yet been able to come up with to measure this is to look at the correlation between RG at each position and the long-term positional adjustment. A positive correlation indicates a “traditional” distribution of offense by position--more production from the positions on the right side of the defensive spectrum. (To calculate this, I use the long-term positional adjustments that pool 1B/DH as well as LF/RF, and because of the DH I split it out by league.) There is no value judgment here--runs are runs whether they are created by first basemen or shortstops:

The two teams with the most extreme correlations did so because of excellence (which we’ll see further evidence of in the next set of charts) from either a position group that is expected to provide offense (Miami’s outfielders) or from one that is not (Houston’s middle infielders). The signing of Edwin Encarnacion helped the Indians record a high correlation, as the rest of the positions didn’t strongly match expectations and the middle infielders hit very well.

The following tables, broken out by division, display RAA for each position, with teams sorted by the sum of positional RAA. Positions with negative RAA are in red, and positions that are +/-20 RAA are bolded:

In 2016, the Yankees were last in the division in RAA; this year they were the only above-average offense, led by the AL’s most productive outfield. The Red Sox nearly did the opposite, going from the best offense in the AL to a lot of red, highlighted by the AL’s worst corner infield production. They were the only AL team to have just one above average positon. To what extent the Blue Jays hit, it was from the right side of the defensive spectrum; their catchers and middle infielders were the worst in MLB.

The Indians had very balanced offensive contributions relative to position, with the +36 for DH inflated by the fact that here DH are compared to the (historically-low) 2017 positional average rather than a longer-term benchmark. Seeing the Detroit first basemen at -20 is sad. Kansas City had the worst outfield in the AL, as it seems it takes more than Esky Magic and “timely hitting” and “putting the ball in play” (yes, I realize their frequency of doing the latter has tailed off) to score runs.

Houston led all of MLB in infield and middle infield RAA, and they were the only AL team to have just one below average position. Los Angeles had the worst infield in MLB, and shortstop was the only position that chipped in to help Mike Trout.

Miami led MLB in outfield RAA; of course Giancarlo Stanton was the driving force but all three spots were outstanding. Washington had the NL’s top infield, Philadelphia the worst. But what jumped out at me in the NL East was how good Atlanta’s catchers were. Only the Cubs had a higher RAA. Atlanta’s unlikely duo was Tyler Flowers (282/382/447 in 368 PA overall) and Kurt Suzuki (284/355/539 in 306 PA). I have to admit I watch a lot of Braves games this year, so I am floored to see that Suzuki pulled a .255 ISO out of a hat; non-park adjusted, it was his career high by 94 points, and the .160 came a full decade ago.

The Cubs had two positions that led the majors in RAA, a good showing from first base--and otherwise a lot of average and below average. Cincinnati led the majors in RAA from corner infielders; Joey Votto is obvious, but Eugenio Suarez led the third basemen to a fine showing as well. Pittsburgh was the only NL team to have just one position show up in black font, but there’s a reason I’m not constructing that to say anything about “below average”...

The Dodgers joined the Nationals as the only NL teams to have just one below-average position and led the NL in middle infield RAA. Arizona and San Diego tied for the worst middle infield RAA in the NL, while the Giants had the worst corner infielders and outfielders in the majors. The remarkably bad third basemen, the single worst position in the majors, were discussed in greater detail above. But the Padres might have the most dubious distinction on this list; they had not a single position that was above average. It doesn’t stand out here because I zero is displayed in black font rather than red, and to be fair they had two positions at zero, as well as single positions at -1, -2, and -4; it’s not as if every position was making outs with no redeeming value. And their pitchers were +9, so they can hang their hat on that.

The full spreadsheet with data is available here.

Tuesday, November 28, 2017

Leadoff Hitters, 2017

I will try to make this as clear as possible: the statistics are based on the players that hit in the #1 slot in the batting order, whether they were actually leading off an inning or not. It includes the performance of all players who batted in that spot, including substitutes like pinch-hitters.

Listed in parentheses after a team are all players that started in twenty or more games in the leadoff slot--while you may see a listing like "COL (Blackmon)" this does not mean that the statistic is only based solely on Blackmon's performance; it is the total of all Colorado batters in the #1 spot, of which Blackmon was the only one to start in that spot in twenty or more games. I will list the top and bottom three teams in each category (plus the top/bottom team from each league if they don't make the ML top/bottom three); complete data is available in a spreadsheet linked at the end of the article. There are also no park factors applied anywhere in this article.

That's as clear as I can make it, and I hope it will suffice. I always feel obligated to point out that as a sabermetrician, I think that the importance of the batting order is often overstated, and that the best leadoff hitters would generally be the best cleanup hitters, the best #9 hitters, etc. However, since the leadoff spot gets a lot of attention, and teams pay particular attention to the spot, it is instructive to look at how each team fared there.

The conventional wisdom is that the primary job of the leadoff hitter is to get on base, and most simply, score runs. It should go without saying on this blog that runs scored are heavily dependent on the performance of one’s teammates, but when writing on the internet it’s usually best to assume nothing. So let's start by looking at runs scored per 25.5 outs (AB - H + CS):

1. COL (Blackmon), 7.9
2. HOU (Springer), 7.0
3. STL (Carpenter/Fowler), 6.7
Leadoff average, 5.5
ML average, 4.6
28. CHA (Garcia/Anderson/Sanchez), 4.4
29. KC (Merrifield/Escobar), 4.3
30. SD (Margot/Pirela), 3.9

That’s Leury Garcia for the White Sox, in case you were wondering. One of my favorite little tidbits from the 2017 season was their all-Garcia outfield: Willy, Leury, and Avasail. Sadly, one of my favorite tidbits from the last few season was no more as Kansas City finally decided that Esky Magic had run its course. Alcides Escobar only got 25 starts leading off, with Whit Merrifield leading the way with 115. The Royals still made plenty of appearances in the trailer portions of these lists.

The most basic team independent category that we could look at is OBA (figured as (H + W + HB)/(AB + W + HB)):

1. COL (Blackmon), .399
2. STL (Carpenter/Fowler), .374
3. HOU (Springer), .374
Leadoff average, .333
ML average, .327
28. CIN (Hamilton), .295
29. TOR (Pillar/Bautista), .287
30. KC (Merrifield/Escobar), .282

Even if we were to park-adjust Colorado’s .399, they’d be at .370, so it was a fine performance by Blackmon and company (mostly Blackmon, with 156 starts), but not the best in the league. There’s no good reason I don’t park-adjust, although my excuse is that park adjustments don’t apply (or at least can’t be based off the runs park factor) for some of the metrics presented here. Of the categories mentioned in this post, R/G, OBA, 2OPS, RG, and LE could be if one was so inclined.

The next statistic is what I call Runners On Base Average. The genesis for ROBA is the A factor of Base Runs. It measures the number of times a batter reaches base per PA--excluding homers, since a batter that hits a home run never actually runs the bases. It also subtracts caught stealing here because the BsR version I often use does as well, but BsR versions based on initial baserunners rather than final baserunners do not. Here ROBA = (H + W + HB - HR - CS)/(AB + W + HB).

This metric has caused some confusion, so I’ll expound. ROBA, like several other methods that follow, is not really a quality metric, it is a descriptive metric. A high ROBA is a good thing, but it's not necessarily better than a slightly lower ROBA plus a higher home run rate (which would produce a higher OBA and more runs). Listing ROBA is not in any way, shape or form a statement that hitting home runs is bad for a leadoff hitter. It is simply a recognition of the fact that a batter that hits a home run is not a baserunner. Base Runs is an excellent model of offense and ROBA is one of its components, and thus it holds some interest in describing how a team scored its runs. As such it is more a measure of shape than of quality:

1. STL (Carpenter/Fowler), .341
2. COL (Blackmon), .336
3. PHI (Hernandez), .327
5. SEA (Segura/Gamel), .319
Leadoff average, .295
ML average, .288
28. CIN (Hamilton), .265
29. KC (Merrifield/Escobar), .248
30. TOR (Pillar/Bautista), .241

With the exception of Houston, the top and bottom three are the same as the OBA list, just in different order (HOU was eighth at .313, their 38 homers out of the leadoff spot tied with Colorado; Minnesota, the Mets, Cleveland, and Tampa Bay also got 30 homers out of the top spot).

I also include what I've called Literal OBA--this is just ROBA with HR subtracted from the denominator so that a homer does not lower LOBA, it simply has no effect. It “literally” (not really, thanks to errors, out stretching, caught stealing after subsequent plate appearances, etc.) is the proportion of plate appearances in which the batter becomes a baserunner able to be advanced by his teammates. You don't really need ROBA and LOBA (or either, for that matter), but this might save some poor message board out there twenty posts, by not implying that I think home runs are bad. LOBA = (H + W + HB - HR - CS)/(AB + W + HB - HR):

1. COL (Blackmon), .354
2. STL (Carpenter/Fowler), .352
3. PHI (Hernandez), .332
4. HOU (Springer), .330
Leadoff average, .303
ML average, .298
28. CIN (Hamilton), .268
29. KC (Merrifield/Escobar), .253
30. TOR (Pillar/Bautista), .251

The next two categories are most definitely categories of shape, not value. The first is the ratio of runs scored to RBI. Leadoff hitters as a group score many more runs than they drive in, partly due to their skills and partly due to lineup dynamics. Those with low ratios don’t fit the traditional leadoff profile as closely as those with high ratios (at least in the way their seasons played out, and of course using R and RBI incorporates the quality and style of the hitters in the adjacent lineup spots rather then attributes of the leadoff hitters’ performance in isolation):

1. MIA (Gordon), 2.5
2. TEX (DeShields/Choo/Gomez), 2.2
3. CIN (Hamilton), 2.2
Leadoff average, 1.5
ML average, 1.0
27. SD (Margot/Pirela), 1.3
28. KC (Merrifield/Escobar), 1.2
29. MIN (Dozier), 1.1
30. CLE (Kipnis/Lindor/Santana), 1.1

Cleveland only settled on a permanent leadoff fixture (Lindor) late in the season, but all three of their 20+ game leadoff men were of the same general type. They didn’t really have a player who saw regular time who fit anything like the leadoff profile. It worked OK; their .328 OBA was lower than the leadoff average, but as we’ll see later their 5.2 RG was above average.

A similar gauge, but one that doesn't rely on the teammate-dependent R and RBI totals, is Bill James' Run Element Ratio. RER was described by James as the ratio between those things that were especially helpful at the beginning of an inning (walks and stolen bases) to those that were especially helpful at the end of an inning (extra bases). It is a ratio of "setup" events to "cleanup" events. Singles aren't included because they often function in both roles.

Of course, there are RBI walks and doubles are a great way to start an inning, but RER classifies events based on when they have the highest relative value, at least from a simple analysis:

1. CIN (Hamilton), 1.7
2. MIA (Gordon), 1.6
3. TEX (DeShields/Choo/Gomez), 1.3
Leadoff average, .8
ML average, .6
28. TB (Dickerson/Kiermaier/Smith/Souza), .5
29. BAL (Smith/Beckham/Jones/Rickard), .5
30. COL (Blackmon), .5

Both Tampa Bay and Baltimore followed the Cleveland pattern of using multiple leadoff hitters, although one of the four for each (Mallex Smith and Joey Rickard) fit more of a traditional profile. The Rays got 5.1 RG out of this hodgepodge, which is above-average; the Orioles’ 4.3 was not. For the record, I’m basing my assessment of Joey Rickard’s traditional leadoff style bona fides on his career minor league line (.280/.388/.392), and not his major league line (.255/.298/.361) for which the only stylistic interpretation is “bad”.

Since stealing bases is part of the traditional skill set for a leadoff hitter, I've included the ranking for what some analysts call net steals, SB - 2*CS. I'm not going to worry about the precise breakeven rate, which is probably closer to 75% than 67%, but is also variable based on situation. The ML and leadoff averages in this case are per team lineup slot:

1. WAS (Turner/Goodwin), 37
2. CIN (Hamilton), 36
3. MIA (Gordon), 25
4. NYA (Gardner), 21
Leadoff average, 8
ML average, 2
28. CHN (Jay/Zobrist/Schwarber), -2
29. COL (Blackmon), -5
30. HOU (Springer), -7

A lot of the leaders and trailers are flipped on this list from the overall quality measures.

Shifting back to said quality measures, first up is one that David Smyth proposed when I first wrote this annual leadoff review. Since the optimal weight for OBA in a x*OBA + SLG metric is generally something like 1.7, David suggested figuring 2*OBA + SLG for leadoff hitters, as a way to give a little extra boost to OBA while not distorting things too much, or even suffering an accuracy decline from standard OPS. Since this is a unitless measure anyway, I multiply it by .7 to approximate the standard OPS scale and call it 2OPS:

1. COL (Blackmon), 979
2. HOU (Springer), 887
3. MIN (Dozier), 865
Leadoff average, 763
ML average, 755
28. TOR (Pillar/Bautista), 678
29. KC (Merrifield/Escobar), 658
30. CIN (Hamilton), 651

Along the same lines, one can also evaluate leadoff hitters in the same way I'd go about evaluating any hitter, and just use Runs Created per Game with standard weights (this will include SB and CS, which are ignored by 2OPS):

1. COL (Blackmon), 7.8
2. HOU (Springer), 6.4
3. MIN (Dozier), 6.1
Leadoff average, 4.8
ML average, 4.6
28. TOR (Pillar/Bautista), 3.7
29. CIN (Hamilton), 3.6
30. KC (Merrifield/Escobar), 3.5

Allow me to close with a crude theoretical measure of linear weights supposing that the player always led off an inning (that is, batted in the bases empty, no outs state). There are weights out there (see The Book) for the leadoff slot in its average situation, but this variation is much easier to calculate (although also based on a silly and impossible premise).

The weights I used were based on the 2010 run expectancy table from Baseball Prospectus. Ideally I would have used multiple seasons but this is a seat-of-the-pants metric. The 2010 post goes into the detail of how this measure is figured; this year, I’ll just tell you that the out coefficient was -.230, the CS coefficient was -.597, and for other details refer you to that post. I then restate it per the number of PA for an average leadoff spot (750 in 2017):

1. COL (Blackmon), 45
2. HOU (Springer), 26
3. MIN (Dozier), 25
Leadoff average, 2
ML average, 0
28. CIN (Hamilton), -17
29. TOR (Pillar/Bautista), -18
30. KC (Merrifield/Escobar), -24

Esky Magic has residual effects, apparently. I don’t recall seeing the same teams in the leaders and trailers list for 2OPS, RG, and LE before, but they are all very similar in terms of their construction, with 2OPS arbitrarily but logically tilted towards OBA and LE attempting to isolate run value that would be contributed if all plate appearances came in a leadoff situation. RG represents the approximate run value of a player’s performance in an “average” situation on an average team.

The spreadsheet with full data is available here.

Wednesday, November 15, 2017

Hypothetical Ballot: MVP

My heart’s just not in writing this, since it’s the first time in the brilliant career of Mike Trout that he will not top my AL MVP ballot. This is made a little better by noting that, prorated to 150 games, he contributed 90 RAR to my choice’s 76, but you add no value to your team when you’re sitting at home. I should note that there have been many less deserving MVP winners than Mike Trout would be in 2017.

Jose Altuve leads Aaron Judge by four RAR, not considering baserunning or fielding. Judge’s fielding metrics are better than one might expect from a man of his size--5 FRAA, 6 UZR, 9 DRS--but Altuve ranks as average, and per Fangraphs was worth another run on the basepaths, so I don’t think it’s enough to bump him. It’s well within any margin of error and Judge would certainly be a fine choice as MVP.

The other candidate for the top spot is Corey Kluber--with 81 RAR, he’d would my choice by default. But while Kluber’s RAR using his peripherals (78) and DIPS (71) are good, they are basically a match for Altuve’s 77 (after baserunning). Our analytical approach for evaluating hitters is much more like using pitcher’s peripherals than their actual runs allowed, and there should be some consideration that some of the value attributed to a pitcher is actually due to his fielders (if you’re not making an explicit adjustment for that). For me, if a pitcher doesn’t clearly rank ahead of a hitter, he doesn’t get the benefit of the doubt on a MVP ballot.

The rest of the ballot is pretty straightforward by RAR, mixing the top pitchers in, as the top-performing hitters were all solid in the field and didn’t change places:

1. 2B Jose Altuve, HOU
2. RF Aaron Judge, NYA
3. SP Corey Kluber, CLE
4. SP Chris Sale, BOS
5. CF Mike Trout, LAA
6. SP Carlos Carrasco, CLE
7. 3B Jose Ramirez, CLE
8. SP Luis Severino, NYA
9. SP Justin Verlander, DET/HOU
10. SS Carlos Correa, HOU

In the NL, Joey Votto has a two-run RAR lead over Giancarlo Stanton, but Fangraphs has him at a whopping -10 baserunning runs to Stanton’s -2. BP has the same margin, but -8 to 0. Their fielding numbers (FRAA, UZR, DRS) are almost identical--(10, 11, 7) for Votto and (9, 10, 7) for Stanton. I’m not sure I’ve ever determined the top spot on my ballot on the basis of baserunning value before, but even if you were to be extremely conservative and regress it by 50%, it makes the difference. Paul Goldschmidt will get a lot of consideration, but even as a good baserunner and fielder I don’t think his 52 RAR offensively gets him in the picture for the top of the ballot.

Max Scherzer’s case is similar to Kluber’s, except with an even more pronounced gap between his actual runs allowed-based RAR (77), his peripherals (71), and DIPS (61).

The rest of my ballot follows RAR, as there were no players who made a huge difference in the field. I might be more inclined to accept an argument that Buster Posey was more valuable than the statistics suggest in a season in which San Francisco didn’t have the second-worst record in the league. But one player is missing from my ballot who will be high on many (although not in the top three of the BBWAA vote) is Nolan Arenado, and I feel that deserves a little explanation.

Rather than comparing Arenado to every player on my ballot, let’s look at my last choice, Marcell Ozuna. Ozuna starts with a ten run lead in RAR (56 to 46). Arenado is widely regarded as an excellent fielder, but the metrics aren’t in agreement--1 FRAA, 7 UZR, 20 DRS. Ozuna’s figures are (5, 11, 3). If you believe that Arenado is +20 fielder, than he would rank about dead even with Kris Bryant at 66 RAR (bumping Bryant from 61 on the strength of his own (especially) baserunning and fielding). It’s certainly not out of the realm of possibility. But if you only give Arenado credit for 10 fielding runs, that only pulls him even with Ozuna, before giving Ozuna any credit for his fielding.

I was going to write a bit more about how it might be easy for writers to consider Coors Field but understate how good of a hitter’s park it is (116 PF). If you used 110 instead, then Arenado starts at 51. But given that he didn’t finish in the top three, I don’t think there’s any evidence of not taking park into account. As discussed, there are perfectly reasonable views on Arenado’s fielding value that justify fourth-place. I’m not sure Arenado would be 11th or 12th or 13th if I went further, as Tommy Pham, Corey Seager, Clayton Kershaw, Gio Gonzalez, and Zack Greinke are all worthy of consideration for the bottom of the ballot themselves:

1. RF Giancarlo Stanton, MIA
2. 1B Joey Votto, CIN
3. SP Max Scherzer, WAS
4. 3B Kris Bryant, CHN
5. CF Charlie Blackmon, COL
6. 3B Anthony Rendon, WAS
7. 1B Paul Goldschmidt, ARI
8. 3B Justin Turner, LA
9. SP Stephen Strasburg, WAS
10. LF Marcell Ozuna, MIA

Tuesday, November 14, 2017

Hypothetical Ballot: Cy Young

The starting pitchers (and they’re the only ones that can possibly accrue enough value to be serious Cy Young candidates if you subscribe to the school of thought that all innings are created equal and the only leveraging effect appropriate to credit to relief aces is that they are used in close games) did a very nice job of separating themselves by RAR into groups of five or six, with a five run gap to the next pitcher. This makes a very convenient cut point to define the ballot candidates:

AL: Kluber 81, Sale 71, Carrasco 62, Verlander 61, Severino 59, Santana 59, Stroman 52

NL: Scherzer 77, Gonzalez 66, Kershaw 64, Strasburg 62, Greinke 57, Ray 51

Corey Kuber has a clear edge over Chris Sale in RAR, but it’s closer (78 to 73) in RAR based on eRA, and in RAR based on DIPS theory (assuming an average rate of hits on balls in play), Sale flips the standard list almost perfectly (80 to 71). My philosophy has always been that the actual runs allowed takes precedence, and while DIPS can serve to narrow the difference, Kluber is still outstanding when viewed in that light (e.g. this is not a Joe Mays or Esteban Loaiza situation). I don’t think it comes close to making up the difference. FWIW, Baseball Prospects WARP, which attempts to account for all matter of situational effects not captured in the conventional statistical record, sees Kluber’s performance as slightly more valuable (8.0 to 7.6).

The rest of my AL ballot goes in order except to flip Severino and Verlander. Severino had significantly better marks in both eRA (3.15 to 3.73) and dRA (3.40 to 4.12). Santana had an even more marked disparity between his actual runs allowed and the component measures (3.82 eRA, 4.75 dRA) which also triggers confirmation bias as he and Jason Vargas’ first-half performances were quite vexing to this Cleveland fan.

1. Corey Kluber, CLE
2. Chris Sale, BOS
3. Carlos Carrasco, CLE
4. Luis Severino, NYA
5. Justin Verlander, DET/HOU

In the NL, I was a little surprised to see that in some circles, Clayton Kershaw is the choice for the award and may well win it. Tom Tango pointed out that Kershaw’s edges over Scherzer in both W-L (18-4 to 16-6) and ERA (2.31 to 2.51) give him a clear edge in the normal thought process of voters. I have been more detached than normal this season from the award debates as you might hear on MLB Tonight, and so seeing a 13 run gap in RAR I didn’t even consider that there might be a groundswell of support for Kershaw. With respect to ERA, Scherzer has a lower RRA (based on runs allowed, adjusting for park, and crudely accounting for bullpen support) and Kershaw’s raw .20 ERA lead drops to just .08 (2.41 to 2.49) when park-adjusting.

What’s more is that Scherzer has a larger edge over Kershaw in eRA (2.71 to 3.26) and dRA (3.20 to 3.73) than he does in RRA (2.56 to 2.72)--leading in all three with a 25 inning advantage. Scherzer led the NL in RRA and eRA, was a narrow second to his teammate Strasburg (3.07 to 3.20) in dRA, and was just seven innings off the league lead (albeit in seventh place). For Cy Young races in non-historic pitcher seasons, I don’t think it gets much more clear than this.

As a final note on Kershaw v. Scherzer, perhaps some of the pro-Kershaw sentiment goes beyond W-L and ERA and into the notion that Kershaw is the best pitcher in baseball. I don’t think this is relevant to a single season award, and I think it would have a much more obvious application to the AL MVP race, where not only is Mike Trout the best player in baseball, but the best by a tremendous margin, and was easily the most valuable player on a rate basis in the league (NOTE: I am not advocating that Trout should be the MVP, only that he has a better case using this argument than Kershaw). But it may be time to re-evaluate Kershaw as the best pitcher in baseball as a fait accompli. Over the last three seasons, Scherzer has pitched 658 innings with a 2.84 RRA and 211 RAR; Kershaw has pitched 557 innings with a 2.37 RRA and 206 RAR. At some point, the fact that Scherzer has consistently been more durable than Kershaw should factor into the discussion of “best”.

Strasburg placed second to Scherzer in eRA, and as discussed bettered him in dRA, recording one more out than Kershaw did. That’s enough for me to move him into second over teammate Gonzalez as well, who had an even larger peripheral gap than Kershaw (basing RAR on eRA, Strasburg beats Gonzalez 62 to 56; on dRA, 56 to 37), so I see it as:

1. Max Scherzer, WAS
2. Stephen Strasburg, WAS
3. Clayton Kershaw, LA
4. Gio Gonzalez, WAS
5. Zack Greinke, ARI