California 2, France 0. Maybe. March 17, 2009
Posted by Lee in Statistics, linkedin.trackback
This past weekend, I pay-per-viewed a well-made little movie: Bottle Shock. It’s a dramatization of the well-known “Judgment of Paris,” a 1976 wine tasting, organized by British ex-pat Steven Spurrier. For the first time in history, California wines bested their French counterparts. In France. With French judges. Both reds and whites.
Alan Rickman puckishly plays Spurrier, and at the moment of revelation, I wondered: How did he score the ratings from the testers? In the film, it looks like he did a tiny bit of arithmetic on the back of one of the scoring cards. Turns out he just averaged the score of each judge, a statistically meaningless thing to do. (He later acknowledged this publicly.)
So my question is: If a more correct scoring method were used, would California still come out on top? I’ll use JMP to analyze this tasting survey to see which wines really rate. I’ll even examine the tasters themselves.
First, a quick example of why averaging is not allowed.
| Judge 1 | Judge 2 | Judge 3 | Average |
| 14 | 2 | 12 | 9 |
| 13 | 8 | 6 | 9 |
| 11 | 6 | 2 | 9 |
Apparently, the three wines are tied for popularity. Look closer: Judge 1 prefers wine 1. Judge 3 really prefers wine 1. Judge 2 prefers wine 2. With two first-place wins, we should crown wine 1 the winner.
Averaging doesn’t take into account the fact that each judge operates using his or her own internal scale. We don’t know if Judge 2’s 6-rating is the same as Judge 3’s. Further, we don’t know what a 1-point difference means for each judge. And on and on. This is why most events that are scored by multiple judges (Figure Skating, for example, or ratings from consumer magazines) use ranks (1st, 2nd, 3rd) rather than raw scores.
So I managed to find the scores for the two events[reds][whites]. Click to see larger views.
They’re comparing California Cabernet Sauvignon against French Bordeaux, and California Chardonnay against French Burgandy. Not exactly the same, but close enough: two wines that are predominantly one (matching) grape.
Here you can see a graph of the scores from the two flights : red, then white.Click each to view a larger graph.
A couple of things to note: Stag’s Leap and Chateau Montelena, represented by red circles on each of these graphs, bounce around clearly on top. Both of these are California wines (the Chardonnay being the subject of the aforementioned movie). Also note the dismal rating of David Bruce (Regular) Chardonnay, which is consistently rated poorly and in fact given a zero by Pierre Brejoux.
Now, to use JMP for a legitimate scoring. The Distribution platform has a command to Save > Ranks averaged, which converts each score to a rank from lowest to highest. Since there are ten wines, these will be numbers from 1 to 10. Note that this is backward for our scale : higher numbers are better. So, I simply subtract each number from 11 to convert it to a rating on a scale we can use. Again, click the image to see a larger version.
Now we’ve got something really useful: ranked scores that account for ties. The total of these ranked scores can now be interpreted. If a wine were so swimmingly, sportingly good and may I please have some more, if all the tasters rated it as perfect, the total ranking for eleven judges is 11 : eleven #1 ratings. If, on the other hand, the wine is pure plonk[*], vin aigre[**], it would be tenth on everyone’s list and thus its total ranking would be 10 × 11 =110. The inverted nature of this scale, where higher numbers are worse grades, is the source of the traditional title Points Against. This is a more appropriate measure for the rankings, and far superior to Spurrier’s original.
So how do the wines do on the new scale? First, the scores for the Reds:
| 1. Stag’s Leap Wine Cellars | 1973 | USA | 41 |
| 2. Château Montrose | 1970 | France | 41.5 |
| 3. Château Mouton-Rothschild | 1970 | France | 43 |
| 4. Château Haut-Brion | 1970 | France | 49 |
| 5. Ridge Vineyards Monte Bello | 1971 | USA | 55 |
| 6. Heitz Wine Cellars Martha’s Vineyard | 1970 | USA | 70 |
| 7. Château Leoville-Las-Cases | 1971 | France | 72.5 |
| 8. Freemark Abbey Winery | 1969 | USA | 76 |
| 9. Mayacamas Vineyards | 1971 | USA | 77.5 |
| 10. Clos du Val Winery | 1972 | USA | 77.5 |
And the whites:
| 1. Chateau Montelena | 1973 | USA | 32.5 |
| 2. Mersault Charmes | 1973 | France | 33.5 |
| 3. Chalone Vineyards | 1974 | USA | 42 |
| 4. Spring Mountain | 1973 | USA | 57 |
| 5. Freemark Abbey | 1972 | USA | 60.5 |
| 6. Bâtard-Montrachet | 1972 | France | 64 |
| 7. Puligny-Montrachet | 1972 | France | 67.5 |
| 8. Beaune, Clos des Mouches | 1973 | France | 68 |
| 9. Veedercrest | 1972 | USA | 73.5 |
| 10. David Bruce (Regular) | 1973 | USA | 106.5 |
JMP 8 provides a neat-o visualization of these scores, a Cell Plot. Here they are for the reds and the whites.
In the rightmost column, you see Points Against ranked in ascending order (so the best wine is on top). This graph lets you see how each taster rated each wine, and how that agrees with the Points Against score.
Finally, JMP lets me calculate a reliability measurement. Essentially, this is a measure of internal consistency that asks if these expert wine raters seem to be in agreement. The scale is from zero to one, with one being perfect. I’d say anything around 0.7–0.8 is a reasonable score to say the raters are consistent. The sample size is a little small, but still, these numbers are encouraging.
Summation? The USA still has the highest-ranked wines. Repeated tastings over the past 30 years have confirmed these results.
If you’re interested in this subject, maybe wanting even more statistical analyses, have a look at the wikipedia page, and any of the mountain of reports at liquidasset.com. Chateau Montelena has some press clippings from the time, and Amazon has the Paperback edition of George Taber’s book.










Flickr/leecreighton
Facebook/Your Name
Twitter/leecreighton
Wikipedia/Lcreight
GMail/Lee Creigton
Blog/Sciolism rocks
Comments»
No comments yet — be the first.