jump to navigation

Diagnose my Regression (SAS edition) March 20, 2009

Posted by Lee in linkedin, Mathematics.
add a comment

Frequently, statisticians have to act like doctors. We see statistical reports that try to describe something : how fast rumors spread based on how large a company is, or the relationship between nitrogen content and crop yield. Speed and gas usage. Almost anything you can think of.

So today, put on your diagnostician’s cap and look at the four relationships I show you here. To keep you from guessing, I’ve hidden the labels for the two variables, so you’ll be looking at Y1 and X1, Y2 and X2, Y3 and X3, and so on. Here’s the DATA step and PROC REG code to generate the output.

	DATA Anscombe;                                                                                                                          
	INPUT X1 Y1 X2 Y2 X3 Y3 X4 Y4;                                                                                                         
	10  8.04  10  9.14  10  7.46  8  6.58                                                                                                   
	8  6.95  8  8.14  8  6.77  8  5.76                                                                                                      
	13  7.58  13  8.74  13  12.74  8  7.71                                                                                                  
	9  8.81  9  8.77  9  7.11  8  8.84                                                                                                      
	11  8.33  11  9.26  11  7.81  8  8.47                                                                                                   
	14  9.96  14  8.1  14  8.84  8  7.04                                                                                                    
	6  7.24  6  6.13  6  6.08  8  5.25                                                                                                      
	4  4.26  4  3.1  4  5.39  19  12.5                                                                                                      
	12  10.84  12  9.13  12  8.15  8  5.56                                                                                                  
	7  4.82  7  7.26  7  6.42  8  7.91                                                                                                      
	5  5.68  5  4.74  5  5.73  8  6.89                                                                                                      
	proc reg;                                                                                                                               
	model Y1=X1;                                                                                                                            
	model Y2=X2;                                                                                                                            
	model Y3=X3;                                                                                                                            
	model Y4=X4;                                                                                                                            
	run; quit;

The PROC REG command fits the least-squares line to each set, giving me the equation of fit and all the statistics you could want. Click on any report or picture to see it in larger size.

Here’s Y1 vs X1:


I highlighted some typical statistics that statisticians might use in discussing how well this line fits. Circles in the picture show the equation of the line (essentially y=3 + ½x), the R2(≅ 0.666), and the F-statistic (≅ 0.022). If you don’t know what these statistics are, bear with me. You’ll still get the joke.

Here’s Y2 by X2. Check the labels if you don’t believe me. :


Here’s Y3 vs X3.


And Y4 by X4.


You should have noticed that all the statistics are identical. The line of best fit is pretty much y = 3 + ½x. And getting all those statistics to be the same, well, that’s something, right ?

Here’s the playing-doctor part. Consider the fact that you’ve got four patients (graphs) exhibiting identical symptoms. What can you tell me about the underlying causes?


Rock Band (Classic Edition) March 20, 2009

Posted by Lee in linkedin, Uncategorized.
add a comment

Never count your lives. With my move to Publications, I’ve recently taken stock of the jobs I’ve had in the past. Not just at SAS, mind you, nor those throwaway jobs where you last only a couple of weeks. Real jobs, with real pay checks; jobs that make demarcations in your life. Here at SAS, I’ve mostly been a writer. Before that, I was a teacher, at the high school and college levels. I’ve also been paid as a movie projectionist (which I doubt is a real job anymore), a cooking instructor, a stand-up comedian, a stand-up comedian who got no laughs for several gigs in a row (unintentional), a close-up magician, and perhaps some others that I’m not willing to admit to just yet.

BandI spent some time this weekend with an old friend from one of those jobs playing Rock Band. For the uninitiated, the game involves staring at a TV screen with a plastic guitar “controller” around your neck, trying to push colored buttons in sync with the little colored chiclets that scroll down the screen, all 3-D-like, in tune with a rock-and-roll song that blasts in the background. After a brief eight to ten hours of this, pizza and adult beverages get involved as you contentedly congratulate yourself on your musical ability.

But I’ve been playing Rock Band guitar for many years, predating this game by more than a decade.


Why you should care about this graph March 20, 2009

Posted by Lee in linkedin, Statistics.
1 comment so far

It’s boring. It’s black and white. It’s important.


Explaining Least Squares March 20, 2009

Posted by Lee in linkedin, Statistics, Technology.
add a comment

If you head over to the JMP blog, you can see the following video that I made to explain the evasive concept of least-squares. I try to explain what is meany by a line of best fit, and how the term “least squares” gives us one.

For those that are interested, I made this entry using a JSL script (also available at the JMP blog), using my Macintosh, JMP 8, and the Mac-only software Screenflow.

California 2, France 0. Maybe. March 17, 2009

Posted by Lee in linkedin, Statistics.
add a comment

This past weekend, I pay-per-viewed a well-made little movie: Bottle Shock. It’s a dramatization of the well-known “Judgment of Paris,” a 1976 wine tasting, organized by British ex-pat Steven Spurrier. For the first time in history, California wines bested their French counterparts. In France. With French judges. Both reds and whites.

Alan Rickman puckishly plays Spurrier, and at the moment of revelation, I wondered: How did he score the ratings from the testers? In the film, it looks like he did a tiny bit of arithmetic on the back of one of the scoring cards. Turns out he just averaged the score of each judge, a statistically meaningless thing to do. (He later acknowledged this publicly.)

So my question is: If a more correct scoring method were used, would California still come out on top? I’ll use JMP to analyze this tasting survey to see which wines really rate. I’ll even examine the tasters themselves. (more…)

Obama’s Stimulus Package March 5, 2009

Posted by Lee in linkedin, Statistics.
add a comment

Using this data, I constructed a graph that shows the amount of money as allocated by Obama’s Stimulus Package.

(Click to see a larger graph)


One thing to notice is the scale : 0 is the darkest blue, up to bright red being the top ($150 billion). The Middle-Class Tax Credit is so large that little comes close to it. The Medicaid to States and State Fiscal Relief sections are in the middle, with blue as almost everything else.

I sort of overloaded this graph, with size and color showing the same thing ($US).

As a less-exciting-yet-more-readable graph, I present what is known as a Multi-Var graph or Variability Chart. It shows the details of the spending much better.


One Hit Wonder January 13, 2009

Posted by Lee in Trivia.
add a comment

Here, a graphical representation of a One Hit Wonder. Witness the raw power of Commander Cody and the Lost Planet Airmen.


Far ? January 12, 2009

Posted by Lee in Uncategorized.
1 comment so far

I was recently reminded that “far away” is contextual, and has changed somewhat over time. Two stories, one in last week’s News and Observer, the other written in 1829 [FR (HTML) | EN(PDF)].


My manager made me eat one of these December 16, 2008

Posted by Lee in Uncategorized.
add a comment

Here’s the wrapper:


Note that it’s not just a mystery-flavored DumDum, but that it is Artificial MYSTERY FLAVOR™.

I tasted it. I didn’t know “Blue Suede” was a flavor.

Obama and Daylight Savings Time November 23, 2008

Posted by Lee in Uncategorized.

It makes little sense to re-blog something from Boing-Boing, but I’m thrilled to see the following post from them.

Turns out, according to two academics on the NYT Op-Ed page, there is little scientific proof that this reduces energy consumption. It also turns out that this practice could be wasteful, a bit annoying, and a lot of people want to get rid of it.
A study in Indiana, a state that recently started DST, showed an overall increase of 1 percent in residential electricity use with occasional increases of 2 to 4 percent in late spring and early fall. So much for conserving energy.

Get rid of Daylight Savings Time !


Get every new post delivered to your Inbox.