R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> pdf("g08.pdf");options(width=64)
> #install.packages("lme4")
> library(lme4)# For lme
> # Consider the balanced case, and convert to a data set with a constant
> # explanatory variable on each cluster.  Note that the least squares fit
> # of averages equals the ML fit.  It isn't the same if the explanatory
> # variable in the reduced case needs to be averaged over cluster.
> #**************************************************************/
> #* IQ and Brain size, from Tramo MJ et al, Neurology 1998;50: */
> #* 1246-1252. Description at statlib says that it is MSWord   */
> #* format, but fortunately that isn't true.  The data file has*/
> #* text descriptions in it, and tabs between columns.  Data   */
> #* lines (and only data lines) start with numeral.  expand re-*/
> #* moves tabs.                                                */
> #* wget http://lib.stat.cmu.edu/datasets/IQ_Brain_Size        */
> #* expand IQ_Brain_Size | grep '^[0-9]' > twinbrain.dat       */
> #* Variables are                                              */
> #* CCMIDSA: Corpus Collasum Surface Area (cm2)                */
> #* FIQ: Full-Scale IQ                                         */
> #* HC: Head Circumference (cm)                                */
> #* ORDER: Birth Order                                         */
> #* PAIR: Pair ID (Genotype)                                   */
> #* SEX: Sex (1=Male 2=Female)                                 */
> #* TOTSA: Total Surface Area (cm2)                            */
> #* TOTVOL: Total Brain Volume (cm3)                           */
> #* WEIGHT: Body Weight (kg)`                                  */
> #**************************************************************/
> twinbrain<-as.data.frame(scan("twinbrain.dat",
+    what=list(CCMIDSA=0,FIQ=0,HC=0,ORDER=0,PAIR=0,SEX=0,TOTSA=0,
+    TOTVOL=0,WEIGHT=0)))
> fir<-twinbrain[twinbrain$ORDER==1,]
> fir$v1<-fir$TOTVOL 
> sec<-twinbrain[twinbrain$ORDER==2,]
> sec$v2<-sec$TOTVOL 
> both<-merge(fir,sec,by="PAIR")[,c("v1","v2")]
> both$diff<-both$v2-both$v1
> # Create reduced data set by averaging over cluster.
> reduced<-NULL
> for(i in unique(twinbrain$PAIR)){
+    averaged<-apply(twinbrain[twinbrain$PAIR==i,],2,mean)
+    reduced<-rbind(reduced,averaged)
+ }
> reducedbrainfit<-lm(FIQ~TOTVOL,data=data.frame(reduced))
> summary(reducedbrainfit)

Call:
lm(formula = FIQ ~ TOTVOL, data = data.frame(reduced))

Residuals:
    Min      1Q  Median      3Q     Max 
-14.708  -9.546  -2.227   3.562  24.949 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept) 109.391654  41.154400   2.658   0.0289 *
TOTVOL       -0.007453   0.036349  -0.205   0.8427  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.64 on 8 degrees of freedom
Multiple R-squared:  0.005228,	Adjusted R-squared:  -0.1191 
F-statistic: 0.04204 on 1 and 8 DF,  p-value: 0.8427

> fake<-merge(twinbrain[,c("PAIR","FIQ")],
+    reduced[,c("PAIR","TOTVOL")],by="PAIR")
> lmer(FIQ~TOTVOL+(1|PAIR),data=fake,REML=FALSE)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: FIQ ~ TOTVOL + (1 | PAIR)
   Data: fake
     AIC      BIC   logLik deviance df.resid 
156.4426 160.4255 -74.2213 148.4426       16 
Random effects:
 Groups   Name        Std.Dev.
 PAIR     (Intercept) 11.525  
 Residual              5.675  
Number of obs: 20, groups:  PAIR, 10
Fixed Effects:
(Intercept)       TOTVOL  
 109.391654    -0.007453  
> # REML estimates.
> summary(lmer(FIQ~TOTVOL+(1|PAIR),data=twinbrain))
Linear mixed model fit by REML ['lmerMod']
Formula: FIQ ~ TOTVOL + (1 | PAIR)
   Data: twinbrain

REML criterion at convergence: 149.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4114 -0.6044  0.0212  0.4753  1.5530 

Random effects:
 Groups   Name        Variance Std.Dev.
 PAIR     (Intercept) 161.90   12.724  
 Residual              33.46    5.784  
Number of obs: 20, groups:  PAIR, 10

Fixed effects:
              Estimate Std. Error t value
(Intercept) 103.313280  32.360968   3.193
TOTVOL       -0.002055   0.028495  -0.072

Correlation of Fixed Effects:
       (Intr)
TOTVOL -0.991
> # ML estimates.
> summary(lmer(FIQ~TOTVOL+(1|PAIR),data=twinbrain,REML=FALSE))
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: FIQ ~ TOTVOL + (1 | PAIR)
   Data: twinbrain

     AIC      BIC   logLik deviance df.resid 
   156.5    160.5    -74.2    148.5       16 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.4296 -0.6254  0.0263  0.4831  1.5949 

Random effects:
 Groups   Name        Variance Std.Dev.
 PAIR     (Intercept) 133.12   11.54   
 Residual              32.27    5.68   
Number of obs: 20, groups:  PAIR, 10

Fixed effects:
             Estimate Std. Error t value
(Intercept) 103.85975   30.30606   3.427
TOTVOL       -0.00254    0.02670  -0.095

Correlation of Fixed Effects:
       (Intr)
TOTVOL -0.992
> #Examine the parent-child height data.  Data are from
> #https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/T0HSJ1
> #Documentation claims that these are formatted for Stata; fortunately R reads 
> #them just fine.  Tell R that the first row contains column names.
> galton<-read.table("galton-stata11.tab",header=TRUE)
> #First examine daughters.
> daughters<-galton[galton$female==1,]
> #Families have multiple children.  Most of what we will do this semester will
> #require observations to be independent, and so keep only the first daughter.
> first<-c(TRUE,daughters$family[-1]!=daughters$family[-length(daughters$family)])
> fd<-daughters[first,c("mother","father","height")]
> summary(lmer(height~mother+father+(1|family),data=daughters))
Linear mixed model fit by REML ['lmerMod']
Formula: height ~ mother + father + (1 | family)
   Data: daughters

REML criterion at convergence: 1813.5

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.64907 -0.61107  0.01745  0.57211  2.48559 

Random effects:
 Groups   Name        Variance Std.Dev.
 family   (Intercept) 0.8421   0.9176  
 Residual             3.1248   1.7677  
Number of obs: 433, groups:  family, 168

Fixed effects:
            Estimate Std. Error t value
(Intercept) 17.22826    4.37573   3.937
mother       0.30452    0.05113   5.956
father       0.39451    0.04352   9.066

Correlation of Fixed Effects:
       (Intr) mother
mother -0.724       
father -0.662 -0.037
> library(carData)# For data set States
> summary(States[,c("dollars","pay")])
    dollars           pay       
 Min.   :2.993   Min.   :22.00  
 1st Qu.:4.354   1st Qu.:27.50  
 Median :5.045   Median :30.00  
 Mean   :5.175   Mean   :30.94  
 3rd Qu.:5.689   3rd Qu.:33.50  
 Max.   :9.159   Max.   :43.00  
> twodfit<-lm(percent~dollars+pay,data=States,x=TRUE)
> print(twodfit)

Call:
lm(formula = percent ~ dollars + pay, data = States, x = TRUE)

Coefficients:
(Intercept)      dollars          pay  
   -44.2372       9.2693       0.9699  

> twodfitplot<-function(fit){
+    X<-fit$x
+    plot(X[,2],X[,3])
+    xbar<-apply(X,2,mean)[-1]
+    M<-chol(solve(t(X)%*%X)[2:3,2:3])
+    theta<-(0:32)/5
+    out<-sqrt(0.05)*solve(M)%*%rbind(cos(theta),sin(theta))
+    lines(out[1,]+xbar[1],out[2,]+xbar[2],type="l")
+    points(xbar[1],xbar[2],pch=2)
+    hatdiag<-diag(X%*%solve(t(X)%*%X)%*%t(X))
+    xxx<-X[hatdiag==max(hatdiag),2:3]
+    text(xxx[1],xxx[2],round(max(hatdiag),2))
+ }
> markleverageplot<-function(fit){
+    X<-fit$x
+    hat<-diag(X%*%solve(t(X)%*%X)%*%t(X))
+ # Plot vs. leave one out residuals.
+    plot(predict(fit),rstandard(fit,type="predictive"),col=1+(rank(-hat)<4))
+ }
> twodfitplot(twodfit)
> markleverageplot(twodfit)
> twodfitplot(logfit<-lm(log(Volume)~log(Girth)+log(Height),data=trees,x=TRUE))
> markleverageplot(logfit)
> twodfitplot(dfit<-lm(height~mother+father,data=fd,x=TRUE))
> markleverageplot(dfit)
> #The r function influence.measures calculates a 
> #variety of influence measures for a fit.  These
> #measures are dfbetas for model terms incl. the
> #intercept, dffits, cov.r, Cook's distance, and h.
> #Observations are marked as influential if any of these conditions happens:
> #a dfbetas has absolute value greater than 1, 
> #dffits has absolute value greater than 3 sqrt(k/(n-k)), 
> #cov.r has absolute value greater than 1+3 k/(n-k), 
> #Cook's distance is greater than the F median for k, n-k df,
> #or the hat matrix diagonal exceeds 3 k/n.
> influence.measures(twodfit)
Influence measures of
	 lm(formula = percent ~ dollars + pay, data = States, x = TRUE) :

      dfb.1_ dfb.dllr  dfb.pay   dffit cov.r   cook.d    hat
AL -0.034941  0.06072 -0.02507 -0.1028 1.103 3.58e-03 0.0470
AK  0.529156 -0.02329 -0.31392 -0.6865 0.987 1.50e-01 0.1230
AZ -0.000292 -0.00798  0.00602  0.0113 1.110 4.31e-05 0.0407
AR -0.038881  0.00344  0.01780 -0.0470 1.137 7.50e-04 0.0647
CA -0.162788 -0.20842  0.23447  0.2457 1.374 2.05e-02 0.2344
CO  0.000771  0.01053 -0.00904 -0.0226 1.091 1.75e-04 0.0250
CN -0.056452  0.00360  0.03279  0.0738 1.212 1.85e-03 0.1231
DE -0.061424 -0.00753  0.04970  0.1338 1.062 6.02e-03 0.0314
DC  0.010088 -0.02744  0.01049 -0.0397 1.217 5.37e-04 0.1256
FL  0.039113  0.02432 -0.02963  0.0994 1.058 3.33e-03 0.0215
GA  0.132643  0.03696 -0.07844  0.2590 0.912 2.15e-02 0.0227
HI -0.035603 -0.08795  0.09153  0.1844 1.011 1.13e-02 0.0264
ID  0.055550 -0.05893  0.01181  0.1126 1.121 4.30e-03 0.0615
IL  0.130180  0.18543 -0.20986 -0.2708 1.025 2.42e-02 0.0494
IN -0.035066 -0.08528  0.09074  0.1956 0.996 1.26e-02 0.0252
IA -0.160245 -0.08336  0.12839 -0.2379 0.976 1.85e-02 0.0294
KS -0.107363 -0.06188  0.08616 -0.1935 0.996 1.24e-02 0.0248
KY -0.039621  0.06096 -0.02761 -0.1444 1.044 6.99e-03 0.0271
LA -0.076581  0.00833  0.03166 -0.1069 1.086 3.86e-03 0.0372
ME  0.275503  0.38876 -0.39067  0.4548 1.042 6.76e-02 0.0956
MD -0.106541 -0.05939  0.10670  0.1452 1.119 7.13e-03 0.0660
MA -0.135408  0.01705  0.08427  0.2689 0.987 2.37e-02 0.0379
MI  0.515305  0.58253 -0.69794 -0.7547 0.987 1.80e-01 0.1360
MN  0.073691  0.09769 -0.12269 -0.2281 0.977 1.70e-02 0.0277
MS -0.061239  0.02286  0.01592 -0.0865 1.122 2.54e-03 0.0578
MO -0.061360  0.01569  0.01620 -0.1149 1.060 4.45e-03 0.0262
MT -0.124919 -0.12143  0.14294 -0.1639 1.138 9.09e-03 0.0819
NE -0.115011 -0.03976  0.08289 -0.1411 1.077 6.71e-03 0.0401
NV  0.019193  0.05089 -0.04780 -0.0676 1.111 1.55e-03 0.0471
NH  0.077967  0.11340 -0.09458  0.2834 0.887 2.55e-02 0.0233
NJ -0.009840 -0.32324  0.20555 -0.3611 1.442 4.41e-02 0.2769
NM -0.103941 -0.04278  0.07954 -0.1265 1.086 5.41e-03 0.0417
NY  0.067206 -0.06162 -0.00337 -0.1314 1.225 5.86e-03 0.1364
NC  0.118876  0.01669 -0.05865  0.2427 0.929 1.90e-02 0.0224
ND -0.090984 -0.01914  0.05976 -0.1003 1.131 3.41e-03 0.0668
OH -0.011950 -0.04580  0.02350 -0.1533 1.021 7.83e-03 0.0224
OK -0.057715 -0.00517  0.03286 -0.0675 1.120 1.55e-03 0.0541
OR -0.009480 -0.01782  0.02685  0.1135 1.047 4.33e-03 0.0209
PA -0.066415  0.03710  0.02405  0.1553 1.070 8.11e-03 0.0401
RI -0.032209  0.03195  0.00222  0.0798 1.119 2.16e-03 0.0544
SC  0.154265 -0.07507 -0.01607  0.3167 0.879 3.17e-02 0.0273
SD -0.117762 -0.04275  0.08993 -0.1249 1.154 5.30e-03 0.0865
TN -0.012342  0.05150 -0.03021 -0.0731 1.115 1.82e-03 0.0511
TX  0.089334 -0.06772  0.00748  0.2042 1.004 1.38e-02 0.0289
UT -0.018663  0.02930 -0.01035 -0.0468 1.148 7.44e-04 0.0735
VT  0.081577  0.14994 -0.12593  0.2472 0.976 2.00e-02 0.0310
VA -0.009559 -0.01211  0.02968  0.1857 0.981 1.13e-02 0.0205
WA -0.039116 -0.06455  0.07135  0.1075 1.082 3.91e-03 0.0354
WV -0.167360 -0.14859  0.18197 -0.2141 1.104 1.54e-02 0.0713
WI  0.013770 -0.12176  0.04570 -0.3212 0.866 3.25e-02 0.0264
WY -0.124221 -0.11688  0.13179 -0.2129 1.009 1.50e-02 0.0319
   inf
AL    
AK    
AZ    
AR    
CA   *
CO    
CN   *
DE    
DC   *
FL    
GA    
HI    
ID    
IL    
IN    
IA    
KS    
KY    
LA    
ME    
MD    
MA    
MI   *
MN    
MS    
MO    
MT    
NE    
NV    
NH    
NJ   *
NM    
NY   *
NC    
ND    
OH    
OK    
OR    
PA    
RI    
SC    
SD    
TN    
TX    
UT    
VT    
VA    
WA    
WV    
WI    
WY    
> 
<\/pre>