Why all your career WAR leader boards are wrong.

09.02.2022 22:34

Athletics Nation

To be a baseball fan is to be obsessed with numbers. 3000, 60, 500, 300, 3.00, .300, we all know what these numbers represent. (And what is it with the number three that makes it come up so much?) But now we have a whole new set of numbers less well defined by arbitrary benchmarks. WAR -€” both the f and b versions -€” xwOBA, FIP, xFIP, xERA, WPA. The column names of baseball leader boards look like the product of spoonfuls of alphabet soup drawn at random. But, as fans of the sport, we've grown custom to these new names and their meanings.

One such statistic we've now embraced is WAR -€” Wins Above Replacement. It is the one reductive number that captures everything: fielding, hitting, pitching, replacement level, positional value, park factors. Everything from trade to MVP discussions to Hall of Fame ballots now revolve around WAR. Unless the WAR numbers are close, often not much is even made of the context behind these numbers -€” even across generations. And that's wrong.

Major League Baseball is now about 150 years old. The game has been through so many changes and has so much history that Ken Burns can make a 10-part documentary about it and still hardly scratch the surface. Gloves have changed from things resembling biker gloves and oven mitts to thin leather wrappings for each finger to what we see today. And despite these evolutions, it is common practice to rank players separated by two-lifetimes by a single number -€” career WAR.

We know that hits, home runs, strike outs, or pitcher wins were all subject to contexts of the era in which they came. Strike outs and homers are easier to come by today, while hits and pitcher wins are harder to find now compared to years past. But what about WAR? The various adjustments and normalizations done across the league should make it less prone to changes over time, but how true is that?

I decided to go dig into the historical data for fWAR (fangraphs WAR) for hitters and tackle this question by inspecting the per-season variance in fWAR among qualified players. This represents 15,718 qualified player seasons across 151 seasons. The variance will tell us just how much spread there was in per-season WAR. The higher that number, the wider the distribution, meaning more players at the lower or higher end of the WAR spectrum. I've plotted this variance in the graph below.

This plot clearly shows that fWAR variance by year is not consistent across time -€” it also fails formal tests of equal variance, such as the Levene Test. The lack of equal variance makes our method of adding up season-by-season WAR totals to create an all-time ranking invalid.

Let's use a thought example for why both the mean -€” which is pegged by replacement value and is equal across eras — but also variance matters in judging players across eras with unequal variance. Imagine you have two lakes stocked with fish and you're going to having a fishing contest judged by the total poundage of fish someone catches over a day. Both lakes have been stocked with fish that have the same average weight, let's say four pounds. But, for whatever reason, the variance in fish weight is higher in second of the two lakes. Let's say one standard deviation (the square root of the variance) in lake A is just 0.5 pounds, while lake B is 1.0 pounds. Below is a randomly permutated distribution of the fish weight in the two lakes.

Not thinking about this, we randomly assign fishers to each lake and let them start the competition. Now, imagine you have two highly skilled fisherman, but in different lakes. Each fisherman easily catches a lot of fish and consistently catches fish above the lake's average weight. Because the fisherman in lake B has a higher weight distribution to draw from he has an advantage in a competition where the goal is to accumulate the highest total fish weight. To illustrate this, I ran a pseudo-competition 100 times. In each competition, both of these two great fishermen only catch fish above the average weight and they always catch 20 fish. Over this 100-pseudo-competition sample, the fisherman in lake A averages 88.4 pounds of fish, while the fisherman in lake B averages 95.7 pounds of fish. In fact, over these 100 competitions the fisher in lake A never wins.

So, what does this mean for baseball? It means that if you're just adding up season-by-season WAR, and season-by-season WAR has an unequal variance across time, then above average players from higher variance eras will seem better than their low-variance-era peers.

These types of situations aren't new to statisticians. Statisticians have developed models to equalize variance by essentially shrinking the variance into a standard range. This would cause the very high -€” or low -€” values in high variance eras to get ‘shrunk' towards the mean.

I'm not going to go through a full variance adjustment in that fashion here, but one quick and dirty way to normalize for unequal variance is to divide single season WAR for each player by the standard deviation of the WAR distribution for that year, which is often called a Z-score. What this gives you is a measure of how many standard deviations away from the mean a player's performance was. This generally works better than forcing distributions into a particular range or going by a strict percentile system (ie, top player is always capped at 10 WAR or we just add up percentile ranks). This is because occasionally a truly amazing player, like Babe Ruth, goes so far off the scale in a high variance era, that we don't want to punish him by limiting him to the same value as the best player in any other seasons.

Below is a table of the single best seasons by this new Z-score of fWAR -€” which I will of course call zfWAR!

Name	Season	fWAR	zfWAR
Babe Ruth	1923	15	5.9
Babe Ruth	1926	12	5.8
Barry Bonds	2004	11.9	5.6
Babe Ruth	1921	13.9	5.6
Barry Bonds	2002	12.7	5.6
Honus Wagner	1908	11.8	5.5
Willie Mays	1962	10.5	5.5
Barry Bonds	2001	12.5	5.4
Ty Cobb	1911	11	5.3
Ty Cobb	1915	9.8	5.3
Cal Ripken	1991	10.6	5.2
Babe Ruth	1920	13.3	5.2
Mickey Mantle	1956	11.5	5.2
Ty Cobb	1917	11.5	5.2
Rogers Hornsby	1925	10.8	5.2
Ted Williams	1946	11.8	5.1
Honus Wagner	1907	9.2	5.1
Ted Williams	1947	10.5	5.1
Willie Mays	1965	10.7	5.1

This is still very much a list of the well-known best seasons ever, which serves as a nice sanity check. We aren't seeing random seasons with relatively low fWAR totals get divided by small variance to yield a big zfWAR number. However, this list is a bit less dominated by Ruth and Bonds, and we see things like Mays show up twice in the top 20, when his best fWAR season was only ranked 30th. We also get Cal Ripkin in the list from his 1991 season, which I think is great given the absolute dearth of players between Mays/Yaz and Bonds, chronologically, in the top WAR lists.

Finally, I've added all these zfWAR totals up to give our career zfWAR rankings after 1900. Below are the top 50.

Rank	Name	WAR	zWAR
1	Barry Bonds	150.8	70.4
2	Ty Cobb	138	65.6
3	Willie Mays	143.7	65.5
4	Babe Ruth	150.6	62.3
5	Tris Speaker	130.6	61.0
6	Hank Aaron	129	59.2
7	Honus Wagner	121.1	58.1
8	Rogers Hornsby	123.3	53.7
9	Eddie Collins	112.7	53.3
10	Mike Schmidt	102.9	52.1
11	Stan Musial	116.8	51.7
12	Alex Rodriguez	110.8	51.0
13	Ted Williams	113.1	49.0
14	Lou Gehrig	116.2	48.5
15	Rickey Henderson	95.5	47.5
16	Mel Ott	108.1	46.3
17	Frank Robinson	100.4	46.0
18	Joe Morgan	94	44.8
19	Carl Yastrzemski	93.7	44.5
20	Cal Ripken	88.9	43.9
21	Eddie Mathews	95.9	43.1
22	Mickey Mantle	95.5	42.1
23	Albert Pujols	87	41.4
24	Wade Boggs	81.4	40.9
25	Jimmie Foxx	95	40.6
26	George Brett	81.3	40.6
27	Mike Trout	74.9	38.3
28	Pete Rose	79.5	38.1
29	Nap Lajoie	78.5	37.3
30	Jeff Bagwell	80	37.3
31	Brooks Robinson	78.6	36.9
32	Adrian Beltre	77.7	36.9
33	Eddie Murray	73	36.6
34	Reggie Jackson	75.3	36.4
35	Joe DiMaggio	78.3	34.5
36	Derek Jeter	73.9	34.3
37	Ken Griffey Jr.	72.3	33.8
38	Miguel Cabrera	68.7	33.5
39	Sam Crawford	68.2	33.3
40	Ron Santo	71.6	33.1
41	Robin Yount	65.8	33.1
42	Rafael Palmeiro	69.8	32.7
43	Gary Carter	64.6	32.5
44	Charlie Gehringer	76.8	32.3
45	Frank Thomas	67.2	32.1
46	Roberto Clemente	68.6	32.0
47	Chipper Jones	70.6	31.9
48	Rod Carew	65.7	31.8
49	Ozzie Smith	62.7	31.6

There you have it, your variance-normalized career WAR leaderboard.

Moscow.media

Частные объявления сегодня

Rss.plus

Все новости за 24 часа

Другие проекты от SMI24.net

Музыкальные новости

Агрегатор новостей 24СМИ

Новости спорта