960.390 - Introduction to Computers for Statistics

960.390-01, Fall 1999, M 7,8 (6:10-9:00pm)

Meeting dates: 10/25, 11/1, 11/8, 11/15


| Syllabus | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Home | Email  

 

T-TESTS:

We can use SAS procedures for hypothesis testing. PROC MEANS is used to do a one-sample t-test, and PROC TTEST is used to test whether two population means are equal or not.

 

One-Sample t-tests (PROC MEANS):

By using options, PROC MEANS will compute the t-statistic and p-value associated with H0: m0 is equal to 0 vs. H1: m0 is not equal 0. The option T requests t-statistic and the option PRT requests the p-value for the two-sided test.

 

Example:

 
OPTIONS LINESIZE=65 NODATE;
FILENAME TEST 'tmp.dat';
DATA ONE;
  INFILE TEST;
  INPUT CORN RAIN FIRSTOBS = 2;
  RAIN1 = RAIN - 31;
  CORN1 = CORN - 1900;
RUN;
PROC MEANS N MEAN STD T PRT;
  VAR CORN1 RAIN1;
RUN;
 
                         The SAS System                         1
 
Variable   N          Mean       Std Dev             T  Prob>|T|
----------------------------------------------------------------
CORN1     38     8.5000000    11.1130554     4.7149517    0.0001
RAIN1     38     0.9157895     4.3637033     1.2936960    0.2038
----------------------------------------------------------------

Two-Sample t-tests (PROC TTEST):

PROC TTEST tests whether two population means are the same or not (two sided test). It will also produce an F test to test whether the whether the variances from the two groups are the same or not.

The general form of PROC TTEST is,

 
PROC TTEST data = data set;
CLASS variables; /* it identifies the variable(s) that divide the data set into 
               two groups. The variable(s) must have only two values 
               (numeric or character) */ 
VAR variables; 
RUN;

 

Example: Check out the following section of code:

 

 
options linesize=78 nodate formdlim="-";
data mydata;
   do i = 1 to 10000;
   if (i <= 5000) then sample = "first";
     else sample = "second";
   if (i <= 5000) then value = (rannor(3024) * 10) + 50;
     else value = (rannor(1777) * 10) + 51;
   output;
   end;
run;
proc ttest data=mydata;
   class sample;
   var value;
run;

 

Note that the class variable is "sample" and it has two possible values, "first" and "second". The first sample is the collection of observations where "sample" equals "first", the second sample is the collection of observations where "sample" equals "second". The class variables values can be strings or numbers. It doesn't matter, as long as, again, there are only two values used.

 

which produces the tiny bit of output:

 

 
------------------------------------------------------------------------------
 
                                The SAS System                               1
 
                               TTEST PROCEDURE
 
Variable: VALUE                                                
 
SAMPLE       N                 Mean              Std Dev            Std Error
-----------------------------------------------------------------------------
first     5000          49.98513268           9.95626583           0.14080286
secon     5000          51.19024365          10.02770417           0.14181315
 
Variances        T       DF    Prob>|T|
---------------------------------------
Unequal    -6.0303   9997.5      0.0001
Equal      -6.0303   9998.0      0.0000
 
For H0: Variances are equal, F' = 1.01    DF = (4999,4999)    Prob>F' = 0.6132

 

Discussion: As you can see from the program, the DATA step generate a data set with 10000 rows and two columns. The first column variable is "sample" and the second one is "value". The second column contains total 1000 random observations from two normal distributions: The first 5000 from a normal distribution with mean 50 and variance 100; the second 5000 from another normal distribution with mean 51 and variance 100. The output indicates that the mean from the first sample was actually 49.985, and the second was 51.190. The standard deviations (square root of the variance) are both close to 10, as desired. We can see that the "Prob>F'" value is 0.6132, which is above 0.05, so we conclude that the variances are NOT different - that is, the variances are "Equal". We look at the row in the output that describes the "Equal" assumption, and note that the "Prob>|T|" value is .0000, which is very small indeed. Much less than 0.05, so we can conclude that the means are DIFFERENT.

 

Correlations and Plot between two random variables:

Correlation Coefficient (PROC CORR)

We can use PROC CORR procedures to compute correlation coefficients of two random variables.

The general form of a PROC CORR procedure is

 
PROC CORR data = dataset;
BY variables;
VAR variables;
WITH variables; /* it can reduce the number of correlation coefficients calculated in the 
               VAR statement */
RUN; 

 

Example:

 
OPTIONS LINESIZE=65 NODATE;
FILENAME TEST 'tmp.dat';
DATA ONE;
  INFILE TEST;
  INPUT CORN RAIN FIRSTOBS = 2;
  RAIN1 = RAIN - 31;
  CORN1 = CORN - 1900;
RUN;
PROC CORR;
  VAR CORN CORN1 RAIN;
RUN;
 
 
AS System                         1
 
                      Correlation Analysis
 
           3 'VAR' Variables:  CORN     CORN1    RAIN
 
 
                       Simple Statistics
 
Variable             N          Mean       Std Dev           Sum
 
CORN                38   1908.500000     11.113055         72523
CORN1               38      8.500000     11.113055    323.000000
RAIN                38     31.915789      4.363703   1212.800000
 
           Simple Statistics
 
Variable       Minimum          Maximum
 
CORN       1890.000000      1927.000000
CORN1       -10.000000        27.000000
RAIN         19.400000        38.300000
 
 
  Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0
  / N = 38
 
                      CORN             CORN1              RAIN
 
   CORN            1.00000           1.00000           0.37971
                    0.0               0.0001            0.0187
 
   CORN1           1.00000           1.00000           0.37971
                    0.0001            0.0               0.0187
 
   RAIN            0.37971           0.37971           1.00000
                    0.0187            0.0187            0.0     

 

X-Y Plot (PROC PLOT):

When data is collected in pairs, it is often good idea to see whether a relationship is exist by plotting them on a graph. PROC PLOT is used to generate such graphs.

The general form of a PROC PLOT procedure is:

PROC PLOT data = dataset;
BY variables;
PLOT yvar * xvar;
RUN;

There are some variations on the PLOT statement:

PLOT yvar*xvar = '+'; /* observations are plotted using `+' character. You can use other 
               characters as well */
 
PLOT yvar*(xvar1 xvar2); /* two plots yvar*xvar1 and yvar*xvar2 appear on separate 
               pages */ 
 
PLOT yvar*xvar1='+' yvar*xvar2='-' /OVERLAY; /* The two plots are overlayed in the 
               same graph */

Example:

 
option pagesize = 50 linesize = 64;
data mydata;
do x1 = 1 to 100;
      x2 = (1.1 ** x1);
      y = 50 * x1;
      y2 = (x1 ** 2);
      if (x1 < 50) then type = "old";
      else type = "new";
      output;
end;
run;
proc plot hpercent = 50 vpercent = 33;
  plot y*x1 y2*x1;
  plot y*x1 = '+';
  plot y*x1 = type;
  plot y*x1='l' y2*x1='s' / overlay;
  plot (y y2)*x1;
run;
 
                         The SAS System                        1
                                   13:28 Monday, October 5, 1998
 
A=1, B=2, etc.  Plot of Y*X1.     A=1, B=2, etc.  Plot of Y2*X1.
 
4800 +                   CEC      10000 +                    AC
     |                 EEB              |                   BD
   Y |              BEE              Y2 |                  EC
     |            DEC                   |                CE
2400 +         AEEA                5000 +              AEB
     |       CED                        |            BED
     |     EEB                          |          DEC
     |  BEE                             |      CEEEA
   0 + BC                             0 + BEEEEB
     --+---------+---------+--          --+---------+---------+-
       0        50        100             0        50        100
                 X1                                X1
 
 
      Plot of Y*X1='+'.                Plot of Y*X1=TYPE.
      (NOTE: 73 obs hidden.)            (NOTE: 73 obs hidden.)
4800 +                   +++      4800 +                   nnn
     |                 +++             |                 nnn
   Y |              +++              Y |              nnn
     |            +++                  |            nnn
2400 +         ++++               2400 +         ooon
     |       +++                       |       ooo
     |     +++                         |     ooo
     |  +++                            |  ooo
   0 + ++                            0 + oo
     --+---------+---------+--         --+---------+---------+--
       0        50        100            0        50        100
                 X1                                X1
 
 
      Plot of Y*X1='l'.           A=1, B=2, etc.  Plot of Y*X1.
      Plot of Y2*X1='s'.
       (NOTE: 149 obs hidden.)    4800 +                   CEC
    Y |                                |                 EEB
                           ss        Y |              BEE
                         sss           |            DEC
                       sss        2400 +         AEEA
                     sss llll          |       CED
                  ssssllll             |     EEB
              sssss                    |  BEE
        sssssss                      0 + BC
      --+---------+---------+-         --+---------+---------+--
        0        50        100           0        50        100
                 X1                                X1
 
^L                         The SAS System                        2
                                   13:28 Monday, October 5, 1998
 
A=1, B=2, etc.  Plot of Y2*X1.
 
10000 +                    AC
      |                   BD
   Y2 |                  EC
      |                CE
 5000 +              AEB
      |            BED
      |          DEC
      |      CEEEA
    0 + BEEEEB
      --+---------+---------+-
        0        50        100
                 X1