R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> pdf("g11.pdf");options(width=64)
> #setwd("C:\\Users\\kolassa\\Class542")
> setwd("~/Taught1/960-542/Data")
> #**************************************************************/
> # Beetle data, from SAS documentation for complementary log   */
> # log link, after reformatting to prepare data for treatment  */
> # by cloglog analysis.  Beetles were exposed to insecticides  */
> # and observed for 13 days.  The day on which the beetle died */
> # is recorded, with beetles lasting all 13 days being recorded*/ 
> # as 14.  Sex of the beetle (m=1, f=2), and concentrations of*./
> # insecticide, were also recorded.                            */
> #**************************************************************/
> beetledays<-read.table("beetledays.dat",header=TRUE)
> beetledays$sex<-c("M","F")[beetledays$sex]
> # Starting model with -1 forces there to be no intercept, and 
> # hence effects for each day are estimable.  Effects are
> # discrete hazards, which equal conditional probability of
> # failure at that time given survival up until tht time.
> # Baseline concentration is .20, measired in mg/cm^2.
> summary(glm(y~-1+factor(day)+sex+factor(conc), data=beetledays, 
+    weights=freq, family=binomial(link="cloglog")))

Call:
glm(formula = y ~ -1 + factor(day) + sex + factor(conc), family = binomial(link = "cloglog"), 
    data = beetledays, weights = freq)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-4.4261  -0.6501   0.0000   0.0000   8.4576  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
factor(day)1      -4.9094     0.2565 -19.137  < 2e-16 ***
factor(day)2      -3.8503     0.1929 -19.964  < 2e-16 ***
factor(day)3      -3.3698     0.1779 -18.939  < 2e-16 ***
factor(day)4      -2.9518     0.1690 -17.464  < 2e-16 ***
factor(day)5      -3.4331     0.1994 -17.213  < 2e-16 ***
factor(day)6      -4.0470     0.2609 -15.512  < 2e-16 ***
factor(day)7      -4.9112     0.3928 -12.503  < 2e-16 ***
factor(day)8      -4.7281     0.3696 -12.792  < 2e-16 ***
factor(day)9      -6.0845     0.7154  -8.505  < 2e-16 ***
factor(day)10     -6.0651     0.7125  -8.512  < 2e-16 ***
factor(day)11     -6.0427     0.7141  -8.462  < 2e-16 ***
factor(day)12     -6.0298     0.7155  -8.427  < 2e-16 ***
factor(day)13     -6.0214     0.7155  -8.415  < 2e-16 ***
sexM               0.6185     0.1144   5.405 6.49e-08 ***
factor(conc)0.32   1.2599     0.1624   7.759 8.58e-15 ***
factor(conc)0.5    1.6341     0.1677   9.742  < 2e-16 ***
factor(conc)0.8    2.1101     0.1640  12.866  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 10509.4  on 462  degrees of freedom
Residual deviance:  1909.9  on 445  degrees of freedom
AIC: 1943.9

Number of Fisher Scoring iterations: 7

> #******************************/
> # Interval Censoring Directly */
> #******************************/
> #***************************************************************
> #* Aids data from Lindsey and Ryan (1998) Statistics in Med-   *
> #* icine pp 219ff.  Studies drug resistance to zidovudine, a   *
> #* drug for treating AIDS, in clinical trial subjects.  Drug   *
> #* resistance might be assessed by observing the effects that  *
> #* the drug is intended to treat occuring but a more efficient *
> #* study uses assays, performed on samples collected at peri-  *
> #* odic visits.  Visits were infrequent because of expense.    *
> #* Variables include L, the time of the visit preceeding the   *
> #* collection of the positive sample, and R the right limit,   *
> #* disease stage and dose, and  indicates for moderate and high*
> #* CD4 counts.                                                 *
> #***************************************************************
> drugresistance<-read.table("lindseyryan.dat",header=TRUE)
> # Paper says those with upper bound 26 are right censored.
> # Indicate right censored to ic_sp by Inf.
> drugresistance$R[drugresistance$R==26]<-Inf
> library(icenReg)#For ic_sp
> # Standard errors are calculated via bootstrapping.
> # ic_sp is interval cenoring- semiparametric.
> o1<-ic_sp(Surv(L,R,type="interval2")~D+CD41+CD42,data=drugresistance,
+    bs_samples=10)
> print(o1)

Model:  Cox PH
Dependency structure assumed: Independence
Baseline:  semi-parametric 
Call: ic_sp(formula = Surv(L, R, type = "interval2") ~ D + CD41 + CD42, 
    data = drugresistance, bs_samples = 10)

     Estimate Exp(Est) Std.Error  z-value      p
D       1.302   3.6770     9.747  0.13360 0.8937
CD41   -2.220   0.1086    30.610 -0.07253 0.9422
CD42   -1.490   0.2254    24.280 -0.06136 0.9511

final llk =  -12.47292 
Iterations =  25 
Bootstrap Samples =  10 
WARNING: only  10  bootstrap samples used for standard errors. 
Suggest using more bootstrap samples for inference
> #******************************************************/
> # Threshold models                                    */
> #******************************************************/
> #*************************************************************/
> # Data from http://lib.stat.cmu.edu/datasets/Arsenic         */
> # reformatted into ASCII on the course home page.  Data re-  */
> # flect arsenic levels in toenail clippings; covariates in-  */
> # clude age, sex (1=M), categorical measures of quantities   */
> # used for drinking and cooking, arsenic in the water, and   */
> # arsenic in the nails.  To make arsenic.dat from Arsenic, do*/
> #antiword Arsenic|awk '((NR>39)&&(NR<61)){print}'>arsenic.dat*/
> #*************************************************************/
> arsenic<-read.table("arsenic.dat")
> names(arsenic)<-c("age","sex","drink","cook","arswater",
+    "arsnails")
> # I don't believe that the zeros are really zero.  Data make
> # it look like the detection limit is approx 0.0001.
> arsenic$status<-(arsenic$arswater>=.0001)+0
> arsenic$arswater[arsenic$arswater<.0001]<-.0001
> arsenic$lnail<-log(arsenic$arsnails)
> arsenic$lwater<-log(arsenic$arswater)
> cat('Model for arsenic in nail that ignores censoring\n')
Model for arsenic in nail that ignores censoring
> lm(lnail~lwater,data=arsenic)

Call:
lm(formula = lnail ~ lwater, data = arsenic)

Coefficients:
(Intercept)       lwater  
     0.2938       0.2644  

> cat('\n Tobit model for arsenic in nail.\n')

 Tobit model for arsenic in nail.
> cat('\n Incorporates censoring, but without exponentiating.\n')

 Incorporates censoring, but without exponentiating.
> library(survival)#For survreg
> summary(survreg(Surv(lnail,status,type="left")~lwater,
+   dist="gaussian",data=arsenic))

Call:
survreg(formula = Surv(lnail, status, type = "left") ~ lwater, 
    data = arsenic, dist = "gaussian")
              Value Std. Error     z       p
(Intercept)  0.6965     0.3129  2.23   0.026
lwater       0.3624     0.0485  7.47 8.1e-14
Log(scale)  -0.7255     0.1802 -4.03 5.7e-05

Scale= 0.484 

Gaussian distribution
Loglik(model)= -12.1   Loglik(intercept only)= -27
	Chisq= 29.75 on 1 degrees of freedom, p= 4.9e-08 
Number of Newton-Raphson Iterations: 6 
n= 21 

> # An alternative way to specify the same model.  Express
> # model on the original, rather than on log, scale.  This
> # is the scale we usually use for times.  The log normal 
> # distribution, along with the Weibull, contains infromation
> # directing R to do calculations on the log scale.  The
> # Gaussian distribution lacks this direction, and so the
> # linear, rather than accelerated failure, model is applied.
> # Hence in all cases, nail arsenic is linear in water arsenic
> # on a scale that logs both, and so nail arsenic is a multiple
> # of a power of water arsenic.
> cat("Note the same fit on the log scale\n")
Note the same fit on the log scale
> summary(survreg(Surv(arsnails,status,type="left")~lwater,
+   dist="lognormal",data=arsenic))

Call:
survreg(formula = Surv(arsnails, status, type = "left") ~ lwater, 
    data = arsenic, dist = "lognormal")
              Value Std. Error     z       p
(Intercept)  0.6965     0.3129  2.23   0.026
lwater       0.3624     0.0485  7.47 8.1e-14
Log(scale)  -0.7255     0.1802 -4.03 5.7e-05

Scale= 0.484 

Log Normal distribution
Loglik(model)= 7.2   Loglik(intercept only)= -7.7
	Chisq= 29.75 on 1 degrees of freedom, p= 4.9e-08 
Number of Newton-Raphson Iterations: 6 
n= 21 

> #************************************************************#
> # Danger of injuries from lifting, from Peter Westfall (Texas#
> # Tech.)  Data are at                                        #
> # courses.ttu.edu/isqs5349-westfall/images/5349/alr_data.htm #
> # You'll have to convert to unix format if necessary, and re-#
> # move HTML markup tags.  I did                              #
> # dos2unix alr_data.htm; html2text alr_data.html>alr_data.txt#
> # You might have to play with this a bit, since different    #
> # versions of dos2unix alternatively convert in place or     #
> # write to standard output.                                  #
> # Model the number of days lost as a latent normal variable, #
> # rounded to zero if negative.  Important variables are alr  #
> # (an index of lifting) and work days lost.                  #
> #************************************************************#
> lifting<-read.table("alr_data.dat")
> names(lifting) <- c("subject","hours","status","alr","sex", 
+    "age","weight","armst","backst","dynamend","abdep",
+    "shoulht","knuckht","dayslost")
> cat('\n Tobit model for Injuries from lifting  \n')

 Tobit model for Injuries from lifting  
> lifting$status<-(lifting$dayslost>0)+0
> summary(survreg(Surv(dayslost,status,type="left")~alr,
+    data=lifting, dist="gaussian"))

Call:
survreg(formula = Surv(dayslost, status, type = "left") ~ alr, 
    data = lifting, dist = "gaussian")
               Value Std. Error     z       p
(Intercept) -112.254     30.655 -3.66 0.00025
alr           14.707      8.759  1.68 0.09313
Log(scale)     4.078      0.207 19.68 < 2e-16

Scale= 59 

Gaussian distribution
Loglik(model)= -121.1   Loglik(intercept only)= -122.8
	Chisq= 3.47 on 1 degrees of freedom, p= 0.063 
Number of Newton-Raphson Iterations: 5 
n= 206 

> #******************************************************/
> # Frailty models                                      */
> #******************************************************/
> #************************************************************/
> # Skin graft data from Klein and Moeschberger Exercise 13.1 */
> # from the KMsurv package data directory.  The process of   */
> # installing the packages places this data into your compu- */
> # ter.  R tells you the location with                       */
> # .libPaths()                                               */
> # Data files, compressed via .gz, are in the data directory.*/
> # The first line of the file is variable names.             */
> # Variables are patient, time, status, and quality of match.*/
> #************************************************************/
> # allograft<-read.table("allograft.txt",header=TRUE)
> library(KMsurv) ; data(allograft)
> #First fit the Weibull model without frailty for 
> #comparison:
> summary(survreg(Surv(time,rejection)~match,data=allograft))

Call:
survreg(formula = Surv(time, rejection) ~ match, data = allograft)
             Value Std. Error     z       p
(Intercept)  3.214      0.118 27.30 < 2e-16
match        0.794      0.190  4.18 2.9e-05
Log(scale)  -0.701      0.149 -4.70 2.6e-06

Scale= 0.496 

Weibull distribution
Loglik(model)= -122.2   Loglik(intercept only)= -128.8
	Chisq= 13.27 on 1 degrees of freedom, p= 0.00027 
Number of Newton-Raphson Iterations: 5 
n= 34 

> cat('\n Gamma frailty model applied to skin graft data  \n')

 Gamma frailty model applied to skin graft data  
> # The following used to run, but apparently gave unreliable
> # results
> #survreg(Surv(time,rejection)~match+frailty(patient),
> #   data=allograft)
> # Replaced by:
> library(parfm)#For parfm
> # Default frailty is gamma.  Parameters are theta for frailty
> # scale parameter, with shape determined to give constant
> # variance, lambda is rate for exponential and Weibull, and
> # rho is Weibull shape.
> # The Weibull distribution apparently gives a shape parameter 
> # estimate extreme enough to make the fitting algorithm fail.
> # parfm(Surv(time,rejection)~match,cluster="patient",
> #   dist="weibull", frailty="gamma",data=allograft)
> parfm(Surv(time,rejection)~match,cluster="patient",
+    dist="exponential", frailty="gamma",data=allograft)

Frailty distribution: gamma 
Baseline hazard distribution: Exponential 
Loglikelihood: -130.194 

       ESTIMATE SE    p-val  
theta   0.000   0.362        
lambda  0.041   0.012        
match  -0.797   0.394 0.043 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0 
> # Compare with Cox model with frailty
> summary(coxph(Surv(time,rejection)~match+cluster(patient),
+    data=allograft))
Call:
coxph(formula = Surv(time, rejection) ~ match, data = allograft, 
    cluster = patient)

  n= 34, number of events= 29 

         coef exp(coef) se(coef) robust se     z Pr(>|z|)   
match -1.0966    0.3340   0.4397    0.3931 -2.79  0.00527 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

      exp(coef) exp(-coef) lower .95 upper .95
match     0.334      2.994    0.1546    0.7217

Concordance= 0.629  (se = 0.046 )
Likelihood ratio test= 6.84  on 1 df,   p=0.009
Wald test            = 7.78  on 1 df,   p=0.005
Score (logrank) test = 6.76  on 1 df,   p=0.009,   Robust = 5.36  p=0.02

  (Note: the likelihood ratio and score tests assume independence of
     observations within a cluster, the Wald and robust score tests do not).
> #Penalized frailty models will be investigated in R using 
> #frailtypack.  If you need to, install using
> #install.packages("frailtypack")
> library(frailtypack)#For frailtyPenal giving frailty for PH;
> #Frailty model using spline for hazard function.
> frailtyPenal(Surv(time,rejection)~match+cluster(patient),
+    data=allograft,n.knots=5,kappa=0.1)

Be patient. The program is computing ... 
The program took 0 seconds 
Call:
frailtyPenal(formula = Surv(time, rejection) ~ match + cluster(patient), 
    data = allograft, n.knots = 5, kappa = 0.1)


  Shared Gamma Frailty model parameter estimates  
  using a Penalized Likelihood on the hazard function 

          coef exp(coef) SE coef (H) SE coef (HIH)        z
match -1.76536  0.171126    0.494893      0.494893 -3.56715
               p
match 0.00036089

    Frailty parameter, Theta: 1.65287 (SE (H): 0.666744 ) p = 0.0065874 
 
      penalized marginal log-likelihood = -113.85
      Convergence criteria: 
      parameters = 1.77e-05 likelihood = 1.19e-06 gradient = 3.09e-13 

      LCV = the approximate likelihood cross-validation criterion
            in the semi parametrical case     = 3.61273 

      n= 34
      n events= 29  n groups= 16
      number of iterations:  19 

      Exact number of knots used:  5 
      Value of the smoothing parameter:  0.1, DoF:  7.00
> #******************************************************/
> # Multiple observations of the same kind.             */
> #******************************************************/
> #********************************************************/
> # VA bladder tumor data from                            */
> # http://lib.stat.cmu.edu/datasets/Andrews/T45.1, after */
> # some reformatting.  Variables are treatment group,    */
> # followuptime, initial number of tumors, initial size  */
> # of tumors, and up to four recurrence times.           */
> #********************************************************/
> tumor<-read.table("tumor.dat",na.strings=".",
+    col.names=c("group","futime","number","size","rt1","rt2",
+    "rt3" ,"rt4" ))
> tumor$nt<-apply(!is.na(tumor[,5:8]),1,sum)
> longtumor<-data.frame(array(NA,c(sum(tumor$nt+1),7)))
> names(longtumor)<-c("group","Number","Size","recur","subject",
+   "Time","Status")
> count<-0
> for(j in seq(dim(tumor)[1])){
+   cv<-unlist(tumor[j,c(1,3,4)])
+   done<-FALSE
+   last<-0
+   for(k in 1:4){
+      if(!done){
+         done<-is.na(tumor[j,4+k])
+         count<-count+1
+         longtumor[count,]<-c(cv,k,j,tumor[j,4+k]-last,1)
+         if(!done) last<-tumor[j,4+k]
+      }
+   }
+   if(!done){
+      count=count+1
+      longtumor[count,]<-c(cv,k,j,NA,1)
+   }
+   if(is.na(longtumor[count,6])){
+      longtumor[count,6]<-tumor[j,2]-last
+      longtumor[count,7]<-0}
+ }
> #survreg handles missing values, but frailtyPenal does not.
> clean<-longtumor[(longtumor$Time>0)&(!is.na(longtumor$Time)),]
> cat('\n Gamma frailty model applied to tumor Recurrence  \n')

 Gamma frailty model applied to tumor Recurrence  
> # Again, this used to work:
> #survreg(Surv(Time,Status)~factor(group)+factor(recur)+
> # frailty(subject),data=clean)
> parfm(Surv(Time,Status)~factor(group)+factor(recur),
+   cluster="subject",frailty="gamma",data=clean)

Frailty distribution: gamma 
Baseline hazard distribution: Weibull 
Loglikelihood: -453.692 

               ESTIMATE SE    p-val  
theta           1.041   0.557        
rho             1.027   0.096        
lambda          0.052   0.017        
factor(group)2 -0.301   0.312 0.334  
factor(recur)2 -0.102   0.342 0.766  
factor(recur)3  0.384   0.389 0.324  
factor(recur)4 -0.970   0.475 0.041 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Kendall's Tau: 0.342 
> #Cox regression frailty model.
> frailtyPenal(Surv(Time,Status)~factor(group)+factor(recur)+
+  cluster(subject),data=clean,kappa=.1,n.knots=5)

Be patient. The program is computing ... 
The program took 0 seconds 
Call:
frailtyPenal(formula = Surv(Time, Status) ~ factor(group) + factor(recur) + 
    cluster(subject), data = clean, n.knots = 5, kappa = 0.1)


  Shared Gamma Frailty model parameter estimates  
  using a Penalized Likelihood on the hazard function 

              coef exp(coef) SE coef (H) SE coef (HIH)
group2 -0.17426779  0.840072    0.199884      0.199884
recur2  0.29613417  1.344651    0.238287      0.238287
recur3  0.97873812  2.661096    0.267199      0.267199
recur4 -0.00141507  0.998586    0.308091      0.308091
                 z          p
group2 -0.87184346 0.38329000
recur2  1.24276395 0.21395000
recur3  3.66296166 0.00024932
recur4 -0.00459304 0.99634000

        chisq df global p
recur 14.7299  3  0.00206

    Frailty parameter, Theta: 5.66983e-10 (SE (H): 1.19313e-05 ) p = 0.49998 
 
      penalized marginal log-likelihood = -443.48
      Convergence criteria: 
      parameters = 0.000108 likelihood = 0.000318 gradient = 1.93e-08 

      LCV = the approximate likelihood cross-validation criterion
            in the semi parametrical case     = 2.39728 

      n= 190
      n events= 112  n groups= 85
      number of iterations:  8 

      Exact number of knots used:  5 
      Value of the smoothing parameter:  0.1, DoF:  6.00
>