960.390 - Introduction to Computers for Statistics

960.390-01, Fall 1999, M 7,8 (6:10-9:00pm)

Meeting dates: 10/25, 11/1, 11/8, 11/15


| Syllabus | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Home | Email

 

SAS FUNCTIONS:

SAS have many build-in functions for working with data values. They include numeric functions, character functions, probability and distribution functions, among others.

SAS Numeric Functions:

A numeric function provides a numerical computation on one or more numerical variables. See page 35 of the textbook for a list and description of some of the numeric functions

Example:

 
OPTIONS LINESIZE=65 NODATE; 
DATA ONE; 
  INPUT X @@; 
  Y = ABS(X); 
  Z = LOG(Y); 
CARDS; 
1.4 -1.5 0.35 0.66 -5 
; 
RUN; 
PROC PRINT; 
RUN; 
 
                         The SAS System                         1
 
                OBS      X        Y         Z
 
                 1      1.40    1.40     0.33647
                 2     -1.50    1.50     0.40547
                 3      0.35    0.35    -1.04982
                 4      0.66    0.66    -0.41552
                 5     -5.00    5.00     1.60944

SAS Character Functions:

A character function operates on one or a few character strings. Page 36 of the textbook lists some of the character functions.

Example:

 
OPTIONS LINESIZE=65 NODATE; 
DATA ONE; 
   INPUT @1 NAMES $CHAR15.; 
   R_ALIGN = RIGHT(NAMES); 
   L_ALIGN = LEFT(NAMES); 
CARDS; 
 ABLERT 
   ALEX 
NACY 
      EMMANUAL 
; 
RUN; 
PROC PRINT; 
RUN; 
 
                         The SAS System                         1
 
          OBS        NAMES         R_ALIGN     L_ALIGN
 
           1      ABLERT             ABLERT    ABLERT  
           2        ALEX               ALEX    ALEX    
           3     NACY                  NACY    NACY    
           4           EMMANUAL    EMMANUAL    EMMANUAL

SAS Probability and Distributional Functions:

·  For a given distribution and a specified value in the sample space, a probability function returns the corresponding cumulative distribution function (probability) value.

--- See page 36 of the textbook for a list and description of the probability functions.

·  For a given distribution (with specified parameter(s)) and a seed, a SAS distributional function generates a random observation from such a distribution.

--- See page 61 for more a list of some distributional functions and a description of the random observation that is generated.  

Example:

 
OPTIONS LINESIZE=65 NODATE; 
DATA ONE; 
   INPUT X @@; 
   N = 6; 
   P = 0.3; 
   CDF = PROBBNML(P,N,X); 
CARDS; 
0 1 2 3 4 5 6 
; 
RUN; 
PROC PRINT; 
RUN; 
DATA TWO; 
   Y = RANNOR(3333); 
   OUTPUT; 
RUN; 
PROC PRINT; 
   VAR Y; 
RUN; 
 
                         The SAS System                         1
 
                 OBS    X    N     P       CDF
 
                  1     0    6    0.3    0.11765
                  2     1    6    0.3    0.42018
                  3     2    6    0.3    0.74431
                  4     3    6    0.3    0.92953
                  5     4    6    0.3    0.98907
                  6     5    6    0.3    0.99927
                  7     6    6    0.3    1.00000
^L                         The SAS System                         2
 
                         OBS        Y
 
                          1     -0.41520

 

Do Loops and Generate Multiple Random Observations in SAS:

DO Loops:

Do loops are used when a statement or set of statements needs to be repeated many times.

General form of a do loop is

 
       DO index_variable = beginning number TO ending number;
          Statements;
       OUTPUT; /* You need this line when a DO loop is used in a DATA step, 
               otherwise these elements don't get added to the data set */ 
       END;
 

Example:

 
OPTIONS LINESIZE=65 NODATE;
DATA MYDATA;
  DO BASE = 0 TO 12;
    DEG = BASE * 30;
    RAD = (DEG * 3.14159) / 180;
    COSX = COS(RAD);
    SINX = SIN(RAD);
    TANX = TAN(RAD);
    OUTPUT;
  END;
RUN;
PROC PRINT;
RUN;

 

 
                         The SAS System                         1
 
  OBS   BASE   DEG     RAD       COSX       SINX           TANX
 
    1     0      0   0.00000    1.00000    0.00000         0.00
    2     1     30   0.52360    0.86603    0.50000         0.58
    3     2     60   1.04720    0.50000    0.86602         1.73
    4     3     90   1.57080    0.00000    1.00000    753696.00
    5     4    120   2.09439   -0.50000    0.86603        -1.73
    6     5    150   2.61799   -0.86602    0.50000        -0.58
    7     6    180   3.14159   -1.00000    0.00000        -0.00
    8     7    210   3.66519   -0.86603   -0.50000         0.58
    9     8    240   4.18879   -0.50000   -0.86602         1.73
   10     9    270   4.71239   -0.00000   -1.00000    251232.00
   11    10    300   5.23598    0.50000   -0.86603        -1.73
   12    11    330   5.75958    0.86602   -0.50000        -0.58
   13    12    360   6.28318    1.00000   -0.00001        -0.00

 

The DO construction can also be added without an index, to extend the functionality of IF THEN ELSE constructions. Like this:

 
DATA MYDATA;
DO AGES = 10 TO 18;
   IF (AGES < 13) THEN DO
      TYPE = "PRE-TEEN";
      SCHOOL = "GRADE";
   END;
   ELSE DO
      TYPE = "TEEN";
      SCHOOL = "MID/HIGH";
   END;
   OUTPUT;
END;
RUN;
PROC PRINT;
RUN;
 
 
                                       OBS    AGES    TYPE        SCHOOL
 
                                        1      10     PRE-TEEN    GRADE
                                        2      11     PRE-TEEN    GRADE
                                        3      12     PRE-TEEN    GRADE
                                        4      13     TEEN        MID/H
                                        5      14     TEEN        MID/H
                                        6      15     TEEN        MID/H
                                        7      16     TEEN        MID/H
                                        8      17     TEEN        MID/H
                                        9      18     TEEN        MID/H

Use DO Loops and Distributional Functions to Generate Multiple Random Observations:

The following is an example to generate 10 random normal distribution (mean = 3, sd = 1.5), 10 exponential random variables (mean = 7), and 10 random numbers from a uniform distribution on (1,5).

 
OPTIONS LINESIZE=65 NODATE;
DATA ONE;
   DO INDX = 1 TO 10;
      XNORMAL = (RANNOR(3333) * 1.5) + 3;
      YEXP = RANEXP(44444)*7;
      ZUNI = RANUNI(1234567) * (5 - 1) + 1;
     OUTPUT;
   END;
RUN;
PROC PRINT;
RUN;

The SAS System 16

 
                        The SAS System                         1
 
          OBS    INDX    XNORMAL      YEXP       ZUNI
 
            1      1     2.37720    3.40962    3.30274
            2      2     1.98452    2.01766    1.51405
            3      3     3.27308    1.66074    1.24179
            4      4     4.90929    1.91629    3.18993
            5      5     1.55773    4.09273    4.79824
            6      6     2.96389    2.89952    2.29184
            7      7     2.46344    0.98380    3.74177
            8      8     2.00966    3.24449    3.03825
            9      9     3.05269    7.59491    3.57499
           10     10     1.28174    1.06131    1.95390

DISCRIPTIVE STATISTICS II

PROC FREQ Procedure:

PROC FREQ is used to generate a table or tables of frequency counts from data that are in categories. To generate a "one-way" table, simply use one variable. A one-way table summarizes all the values of the variable, how many variables have each value, and the percent and cumulative percent for each value. To generate a "two-way" table, use two variables. A two-way table contains cell frequencies, cell percent of total, cell percent of row total, and cell percent of column total.

The usage of PROC FREQ is:

 
  PROC FREQ DATA=DATASET;
    BY VARIABLE(S);
    TABLES V1 / options;
    TABLES V1*V2 / options;
  RUN;

There are substantial numbers of options, which can be found on Page 54:

 

Example:

 
OPTIONS LINESIZE=78 NODATE;
DATA MYDATA;
  DO I = 1 TO 500;
    VALP = RANPOI(3024,500);
    VALN = RANNOR(3024);
    VALU = (RANUNI(3024) * 100);
    IF (VALU > 80) THEN TYP = "GREAT";
    ELSE IF (VALU > 60) THEN TYP = "OK";
    ELSE TYP = "BAD";
    IF (VALN > 0) THEN SIGN = "POS";
    ELSE SIGN = "NEG"; 
    IF (VALP < 480) THEN PVAL = 480;
    ELSE IF (VALP < 520) THEN PVAL = 520;
    ELSE PVAL = 560; 
    OUTPUT;
  END;
RUN;
PROC FREQ DATA=MYDATA;
    TABLES PVAL;
    TABLES TYP*SIGN/CHISQ;
RUN;
 
                                The SAS System                               1
 
                                          Cumulative  Cumulative
              PVAL   Frequency   Percent   Frequency    Percent
              --------------------------------------------------
               480         96      19.2          96       19.2
               520        317      63.4         413       82.6
               560         87      17.4         500      100.0
 
 
 
 
                             TABLE OF TYP BY SIGN
 
                     TYP       SIGN
 
                     Frequency|
                     Percent  |
                     Row Pct  |
                     Col Pct  |NEG     |POS     |  Total
                     ---------+--------+--------+
                     BAD      |    141 |    164 |    305
                              |  28.20 |  32.80 |  61.00
                              |  46.23 |  53.77 |
                              |  59.49 |  62.36 |
                     ---------+--------+--------+
                     GREAT    |     43 |     49 |     92
                              |   8.60 |   9.80 |  18.40
                              |  46.74 |  53.26 |
                              |  18.14 |  18.63 |
                     ---------+--------+--------+
                     OK       |     53 |     50 |    103
                              |  10.60 |  10.00 |  20.60
                              |  51.46 |  48.54 |
                              |  22.36 |  19.01 |
                     ---------+--------+--------+
                     Total         237      263      500
                                 47.40    52.60   100.00
 
^L                                The SAS System                               2
 
                     STATISTICS FOR TABLE OF TYP BY SIGN
 
            Statistic                     DF     Value        Prob
            ------------------------------------------------------
            Chi-Square                     2     0.863       0.649
            Likelihood Ratio Chi-Square    2     0.862       0.650
            Mantel-Haenszel Chi-Square     1     0.736       0.391
            Phi Coefficient                      0.042
            Contingency Coefficient              0.042
            Cramer's V                           0.042
 
            Sample Size = 500
 

Also Study Example 7.1 on page 54 and Example 19.1 on page 141.