jump to navigation

California 2, France 0. Maybe. March 17, 2009

Posted by Lee in Statistics, linkedin.
trackback

This past weekend, I pay-per-viewed a well-made little movie: Bottle Shock. It’s a dramatization of the well-known “Judgment of Paris,” a 1976 wine tasting, organized by British ex-pat Steven Spurrier. For the first time in history, California wines bested their French counterparts. In France. With French judges. Both reds and whites.

Alan Rickman puckishly plays Spurrier, and at the moment of revelation, I wondered: How did he score the ratings from the testers? In the film, it looks like he did a tiny bit of arithmetic on the back of one of the scoring cards. Turns out he just averaged the score of each judge, a statistically meaningless thing to do. (He later acknowledged this publicly.)

So my question is: If a more correct scoring method were used, would California still come out on top? I’ll use JMP to analyze this tasting survey to see which wines really rate. I’ll even examine the tasters themselves.

First, a quick example of why averaging is not allowed.

Judge 1 Judge 2 Judge 3 Average
14 2 12 9
13 8 6 9
11 6 2 9

Apparently, the three wines are tied for popularity. Look closer: Judge 1 prefers wine 1. Judge 3 really prefers wine 1. Judge 2 prefers wine 2. With two first-place wins, we should crown wine 1 the winner.

Averaging doesn’t take into account the fact that each judge operates using his or her own internal scale. We don’t know if Judge 2’s 6-rating is the same as Judge 3’s. Further, we don’t know what a 1-point difference means for each judge. And on and on. This is why most events that are scored by multiple judges (Figure Skating, for example, or ratings from consumer magazines) use ranks (1st, 2nd, 3rd) rather than raw scores.

So I managed to find the scores for the two events[reds][whites]. Click to see larger views.

Text-Scores-Reds

Text-Scores-White

They’re comparing California Cabernet Sauvignon against French Bordeaux, and California Chardonnay against French Burgandy. Not exactly the same, but close enough: two wines that are predominantly one (matching) grape.

Here you can see a graph of the scores from the two flights : red, then white.Click each to view a larger graph.

Raw-Reds

Raw-Whites

A couple of things to note: Stag’s Leap and Chateau Montelena, represented by red circles on each of these graphs, bounce around clearly on top. Both of these are California wines (the Chardonnay being the subject of the aforementioned movie). Also note the dismal rating of David Bruce (Regular) Chardonnay, which is consistently rated poorly and in fact given a zero by Pierre Brejoux.

Now, to use JMP for a legitimate scoring. The Distribution platform has a command to Save > Ranks averaged, which converts each score to a rank from lowest to highest. Since there are ten wines, these will be numbers from 1 to 10. Note that this is backward for our scale : higher numbers are better. So, I simply subtract each number from 11 to convert it to a rating on a scale we can use. Again, click the image to see a larger version.

Text-Ratings-Red

Text-Ratings-White

Now we’ve got something really useful: ranked scores that account for ties. The total of these ranked scores can now be interpreted. If a wine were so swimmingly, sportingly good and may I please have some more, if all the tasters rated it as perfect, the total ranking for eleven judges is 11 : eleven #1 ratings. If, on the other hand, the wine is pure plonk[*], vin aigre[**], it would be tenth on everyone’s list and thus its total ranking would be 10 × 11 =110. The inverted nature of this scale, where higher numbers are worse grades, is the source of the traditional title Points Against. This is a more appropriate measure for the rankings, and far superior to Spurrier’s original.

So how do the wines do on the new scale? First, the scores for the Reds:

1. Stag’s Leap Wine Cellars 1973 USA 41
2. Château Montrose 1970 France 41.5
3. Château Mouton-Rothschild 1970 France 43
4. Château Haut-Brion 1970 France 49
5. Ridge Vineyards Monte Bello 1971 USA 55
6. Heitz Wine Cellars Martha’s Vineyard 1970 USA 70
7. Château Leoville-Las-Cases 1971 France 72.5
8. Freemark Abbey Winery 1969 USA 76
9. Mayacamas Vineyards 1971 USA 77.5
10. Clos du Val Winery 1972 USA 77.5

And the whites:

1. Chateau Montelena 1973 USA 32.5
2. Mersault Charmes 1973 France 33.5
3. Chalone Vineyards 1974 USA 42
4. Spring Mountain 1973 USA 57
5. Freemark Abbey 1972 USA 60.5
6. Bâtard-Montrachet 1972 France 64
7. Puligny-Montrachet 1972 France 67.5
8. Beaune, Clos des Mouches 1973 France 68
9. Veedercrest 1972 USA 73.5
10. David Bruce (Regular) 1973 USA 106.5

JMP 8 provides a neat-o visualization of these scores, a Cell Plot. Here they are for the reds and the whites.

Ranking-Reds

Ranking-Whites

In the rightmost column, you see Points Against ranked in ascending order (so the best wine is on top). This graph lets you see how each taster rated each wine, and how that agrees with the Points Against score.

Finally, JMP lets me calculate a reliability measurement. Essentially, this is a measure of internal consistency that asks if these expert wine raters seem to be in agreement. The scale is from zero to one, with one being perfect. I’d say anything around 0.7–0.8 is a reasonable score to say the raters are consistent. The sample size is a little small, but still, these numbers are encouraging.

Cronbachs-Reds

Cronbachs-White

Summation? The USA still has the highest-ranked wines. Repeated tastings over the past 30 years have confirmed these results.

If you’re interested in this subject, maybe wanting even more statistical analyses, have a look at the wikipedia page, and any of the mountain of reports at liquidasset.com. Chateau Montelena has some press clippings from the time, and Amazon has the Paperback edition of George Taber’s book.


  • Red Wines from Wikipedia
  • White Wines from Table 2 here
  • Why do we call bad wine plonk? During World War I, much of France was still speaking in local dialects, despite the fact that they learned standard French in school. Regional pronunciations being what they were , vin blanc got reduced to blanc which came to describe the soldier’s ration. The poor pronunciation migrated to England as plonk.
  • Vin aigre literally means spoiled wine. Again, a migration north turned this into Vin-egar or vinegar.
  • Comments»

    No comments yet — be the first.