Zhiqiang Tan's Research
Return to the home page.
I'm looking for a post-doc working on causal inference and statistical learning. See here [new link as of Sept 2024] for more information and submission of application. Please feel free to contact me if you are interested or have any question.
Software
R package RCALsa: Regularized Calibrated Estimation for Sensitivity Analysis.
        Description: Model-assisted sensitivity analysis for population means and ATEs using regularized calibrated estimation under marginal sensitivity models.
        References: Tan (2022).
        Download: binary package (zip file). (Use "Install packages from local files")
        Manual: here.
R package HAMS: Hamiltonian Assisted Metropolis Sampling and Other MCMC Algorithms.
        Description: This package implements Hamiltonian assisted
Metropolis sampling as well as various existing MCMC algorithms studied in Song & Tan (2022).
        References: Song & Tan (2022a), Song & Tan (2022b).
        Download: binary package (zip file). (Use "Install packages from local files")
        Manual: here.
        Vignette: here.
R package mCAL: Regularized Calibrated Estimation with Multi-valued Treatments.
        Description: Regularized calibrated estimation for causal inference and missing-data problems with multi-valued treatments and high-dimensional data.
        References: Xu & Tan (2022).
        Download: binary package (zip file). (Use "Install packages from local files")
        Manual: here.
        Vignette: here.
R package dSurvival: Discrete-time Survival Analysis.
        Description: Breslow-Peto and weighted Mantel-Haenszel estimation in hazard probability and odds models with discrete-time survival data.
        References: Tan (2019), Tan (2020).
        Download: binary package (zip file). (Use "Install packages from local files")
        Manual: here.
R package RCAL: Regularized Calibrated Estimation.
        Description: Regularized calibrated estimation for causal inference and missing-data problems with high-dimensional data.
        References: Tan (2020), Tan (2020), Sun & Tan (2022).
        Vignette: here.
R package iWeigReg: Improved methods for causal inference and missing data problems.
        Description: Improved methods based on inverse probability weighting and outcome regression for causal inference and missing data problems.
        References: Tan (2006), Tan (2010), Tan (2013).
        Vignette: here.
R package UWHAM: Unbinned weighted histogram analysis method.
        Description: A method for estimating log-normalizing constants (or free energies) and expectations from multiple distributions (such as multiple generalized ensembles).
        References: Tan et al. (2012).
Working or archived papers
Liang, L. and Tan, Z. (2024) On ridge estimation in high-dimensional rotationally sparse linear regression, arXiv:2405.00974.
Fu, P. and Tan, Z. (2023) Understanding accelerated gradient methods: Lyapunov analyses and Hamiltonian assisted interpretations, arXiv:2304.10063.
Song, Z. and Tan, Z. (2022) Imputation maximization stochastic approximation with application to generalized linear mixed models, arXiv:2201.10096.
Shu, H. and Tan, Z. (2018) Improved estimation of average treatment effects on the treated: Local efficiency, double robustness, and beyond, arXiv:1808.01408. (based on Heng Shu's 2015 thesis)
Journal articles (including refereed conference papers)
Probability & Statistics (see below for Interdisciplinary fields)
Fu, P. and Tan, Z. (2024) Block-wise primal-dual algorithms for large-scale doubly penalized ANOVA modeling, Computational Statistics and Data Analysis, 194, 107932. (Supplement) (arXiv preprint:2210.10991)
Tan, Z. (2024) Sensitivity models and bounds under sequential unmeasured confounding in longitudinal studies, Biometrika, to appear. (Supplement) (arXiv preprint:2308.15725)
Tan, Z. (2024) Model-assisted sensitivity analysis for treatment effects under unmeasured confounding via regularized calibrated estimation, Journal of the Royal Statistical Society, Ser. B, to appear. (Supplement) (arXiv preprint:2209.11383)
Wu, P., Tan, Z., Hu, W., and Zhou, X.-H. (2024) Model-assisted inference for covariate-specific treatment effects with high-dimensional data, Statistica Sinica, 34, 459-479.
Song, Z. and Tan, Z. (2023) Hamiltonian assisted Metropolis sampling, Journal of the American Statistical Association, 118, 1176-1194. (Supplement) (arXiv preprint:2005.08159) (A companion paper Song & Tan 2022, SIAM Journal on Scientific Computing)
Tan, Z. (2023) Consistent and robust inference in hazard probability and odds models with discrete-time survival data, Lifetime Data Analysis, 29, 555-584. (Supplement) (arXiv preprint:2012.03451)
Zhang, X., Tan, Z., and Ou, Z. (2023) Persistently trained, diffusion-assisted energy-based models, Stat, 12, e625. (Supplement) (arXiv preprint:2304.10707)
Ghosh, S. and Tan, Z. (2022) Doubly robust semiparametric inference using regularized calibrated estimation with high-dimensional data, Bernoulli, 28, 1675-1703. (Supplement) (arXiv preprint:2009.12033)
Sun, B and Tan, Z. (2022) High-dimensional model-assisted inference for local average treatment effects with instrumental variables, Journal of Business and Economic Statistics, 40, 1732-1744. (Supplement) (arXiv preprint:2009.09286)
Tan, Z. (2022) Analysis of odds, probability, and hazard ratios: From 2 by 2 tables to two-sample survival data, Journal of Statistical Planning and Inference, 221, 248-265. (Supplement) (arXiv preprint:1911.10682)
Yang, T. and Tan, Z. (2021) Hierarchical total variations and doubly penalized ANOVA modeling for multivariate nonparametric regression, Journal of Computational and Graphical Statistics, 30, 848-862. (Supplement) (arXiv preprint:1906.06729)
Shu, H. and Tan, Z. (2020) Improved methods for moment restriction models with data combination and an application to two-sample instrumental variable estimation, Canadian Journal of Statistics, 48, 259-284. (Supplement) (arXiv preprint:1808.03786) (based on Heng Shu's 2015 thesis)
Tan, Z. (2020) Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data, Annals of Statistics, 48, 811-837. (Supplement) (arXiv preprint:1801.09817)
Tan, Z. (2020) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137-158. (Supplement) (arXiv preprint:1710.08074)
Zhang, X. and Tan, Z. (2020) Semi-supervised logistic learning based on exponential tilt mixture models, Stat, 9, e312. (Supplement) (arXiv preprint:1906.07882)
Tan, Z. (2019) On doubly robust estimation for logistic partially linear models, Statistics and Probability Letters, 155, 108577. (arXiv preprint:1901.09138)
Tan, Z., Song, Y, and Ou, Z. (2019) Calibrated adversarial algorithms for generative modeling, Stat, 8, e224. (Supplement)
Tan, Z. and Zhang, C.-H. (2019) Doubly penalized estimation in additive regression with high-dimensional data, Annals of Statistics, 47, 2567-2600. (Supplement) (arXiv preprint:1704.07229)
Yang, T. and Tan, Z. (2018) Backfitting algorithms for total-variation and empirical-norm penalized additive modeling with high-dimensional data, Stat, 7, e198. (Supplement)
Small, D., Tan, Z., Ramsahai, R.R., Lorch, S.A., and Brookhart, A. (2017) Instrumental variable estimation with a stochastic monotonicity assumption, Statistical Science, 32, 561-579.
Tan, Z. (2017) Optimally adjusted mixture sampling and locally weighted histogram analysis, Journal of Computational and Graphical Statistics, 26, 54-65. (Supplement) (A companion paper Tan et al. 2016, JCP)
Li, W., Chen, R., and Tan, Z. (2016) Efficient sequential Monte Carlo with multiple proposals and control variates, Journal of the American Statistical Association, 111, 298-313.
Tan, Z. (2016) Steinized empirical Bayes estimation for heteroscedastic data, Statistica Sinica, 26, 1219-1248. (Supplement)
Tan, Z. (2015) Resampling Markov chain Monte Carlo algorithms: Basic analysis and empirical comparisons, Journal of Computational and Graphical Statistics, 24, 328-356. (Supplement)
Tan, Z. (2015) Improved minimax estimation of a multivariate normal mean under heteroscedasticity, Bernoulli, 21, 574-603. (Supplement)
Tan, Z. and Wu, C. (2015) Generalized pseudo empirical likelihood inferences for complex surveys, Canadian Journal of Statistics, 43, 1-17.
Tan, Z. (2014) Second-order asymptotic theory for calibration estimators in sampling and missing-data problems, Journal of Multivariate Analysis, 131, 240-253. (Supplement)
Wang, C., Tan, Z., and Louis, T.A. (2014) An exponential tilt mixture model for time-to-event data to evaluate treatment effect heterogeneity in randomized clinical trials, Biometrics & Biostatistics International Journal, 1, 00006.
Wang, C., Tan, Z., and Louis, T.A. (2014) An exponential tilt model for quantitative trait loci mapping with time-to-event data, Journal of Bioinformatics Research Studies, 1, 2.
Li, W., Tan, Z., and Chen, R. (2013) Two-stage importance sampling with mixture proposals, Journal of the American Statistical Association, 108, 1350-1365.
Tan, Z. (2013) Simple design-efficient calibration estimators for rejective and high-entropy sampling, Biometrika, 100, 399-415. (Supplement) (Correction)
Tan, Z. (2013) Calibrated path sampling and stepwise bridge sampling, Journal of Statistical Planning and Inference, 143, 675-690.
Tan, Z. (2013) A cluster-sample approach for Monte Carlo integration using multiple samplers, Canadian Journal of Statistics, 41, 151-173. (Supplement)
Okui, R., Small, D., Tan, Z., and Robins, J.M. (2012) Doubly robust instrumental variables regression, Statistica Sinica, 22, 173-205.
VanderWeele, T.J. and Tan, Z. (2012) Directed acyclic graphs with edge-specific bounds, Biometrika, 99, 115-126.
Tan, Z. (2011) Efficient restricted estimators for conditional mean models with missing data, Biometrika, 98, 663-684.
Wang, C., Tan, Z., and Louis, T.A. (2011) Exponential tilt models for two-group comparison with censored data, Journal of Statistical Planning and Inference, 141, 1102-1117.
Tan, Z. (2010) On estimation of conditional density models with two-phase sampling, Journal of Statistical Planning and Inference, 140, 1986-2002.
Tan, Z. (2010) Marginal and nested structural models using instrumental variables, Journal of the American Statistical Association, 105, 157-169.
Tan, Z. (2010) Bounded, efficient, and doubly robust estimation with inverse weighting, Biometrika, 97, 661-682.
Tan, Z. (2010) Nonparametric likelihood and doubly robust estimating equations for marginal and nested structural models, Canadian Journal of Statistics, 38, 609-632. (Supplement)
Cheng, J., Small, D., Tan, Z., and TenHave, T.R. (2009) Efficient nonparametric estimation of causal effects in randomized trials with noncompliance, Biometrika, 96, 19-36.
Tan, Z. (2009) On profile likelihood for exponential tilt mixture models, Biometrika, 96, 229-236.
Wang, W., Scharfstein, D.O., Tan, Z., MacKenzie, E.J. (2009) Causal inference in outcome-dependent two-phase sampling designs, Journal of of the Royal Statistical Society, Ser. B, 71, 947-969.
Chi, Z. and Tan, Z. (2008) Positive false discovery proportions: Intrinsic bounds and adaptive control, Statistica Sinica, 18, 837-860. (Supplement)
Tan, Z. (2008) Monte Carlo integration with Markov chain,
Journal of Statistical Planning and Inference, 138, 1967-1980.
Tan, Z. (2006) Regression and weighting methods for causal inference using instrumental variables, Journal of the American Statistical Association, 101, 1607-1618. (Correction)
Tan, Z. (2006) A distributional approach for causal inference using propensity scores,
Journal of the American Statistical Association, 101, 1619-1637.
Tan, Z. (2006) Monte Carlo integration with acceptance-rejection,
Journal of Computational and Graphical Statistics, 15, 735-752.
Tan, Z. (2004) On a likelihood approach for Monte Carlo integration, Journal of the American Statistical Association, 99, 1027-1036.
Kong, A., McCullagh, P., Meng, X.-L., Nicolae, D., and Tan, Z. (2003) A theory of statistical models for Monte Carlo integration (with discussion), Journal of the Royal Statistical Society, Ser. B, 65, 585-618.
Interdisciplinary fields (see above for Probability & Statistics)
Xu, W. and Tan, Z. (2024) High-dimensional model-assisted inference for treatment effects with multi-valued treatments, Journal of Econometrics, to appear. (Supplement) (arXiv preprint:2201.09192)
Wang, Z. and Tan, Z. (2023) Tractable and near-optimal adversarial algorithms for robust estimation in contaminated Gaussian models, Journal of Machine Learning Research, 24, 1-112. (arXiv preprint:2112.12919)
Song, Z. and Tan, Z. (2022) On irreversible Metropolis sampling related to Langevin dynamics, SIAM Journal on Scientific Computing, 44, A2089-A2120. (Supplement) (arXiv preprint:2106.03012) (A companion paper Song & Tan 2022, JASA)
Tan, Z. and Zhang, X. (2022) On loss functions and regret bounds for multi-category classification, IEEE Transactions on Information Theory, 68, 5295-5313. (Supplement) (arXiv preprint:2005.08155)
Cui, D., Zhang, B.W., Tan, Z., and Levy, R.M. (2020) Ligand binding thermodynamic cycles: Hysteresis, the locally weighted histogram analysis method, and the overlapping states matrix, Journal of Chemical Theory and Computation, 16, 67-79. (Supplement)
Gerhard, T., Stroup, T.S., Correll, C.U., Setoguchi, S., Strom, B.L., Huang, C., Tan, Z., Crystal, S., and Olfson, M. (2020) Mortality risk of antipsychotic augmentation for adult depression, PLoS One, 15, e0239206.
Stroup, T.S., Gerhard, T., Crystal, S., Huang, C., Tan, Z., Wall, M.M., Mathai, C., Olfson, M. (2019) Comparative effectiveness of adjunctive psychotropic medications in patients with schizophrenia, JAMA Psychiatry, 76, 508-515. (Supplement)
Gerhard, T., Stroup, T.S., Correll, C.U., Huang, C., Tan, Z., Crystal, S., and Olfson, M. (2018) Antipsychotic medication treatment patterns in adult depression, Journal of Clinical Psychiatry, 79, 16m10971.
Stroup, T.S., Gerhard, T., Crystal, S., Huang, C., Tan, Z., Wall, M.M., Mathai, C., and Olfson, M. (2018) Psychotropic medication use in adults with schizophrenia and schizoaffective disorder in the United States, Psychiatric Services, 69, 605-608.
Wang, B., Ou, Z., and Tan, Z. (2018) Learning trans-dimensional random fields with applications to language modeling, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 876-890. (Supplement)
Zhang, B.W., Deng, N., Tan, Z., and Levy, R.M. (2017) Stratified UWHAM and its stochastic approximation for multicanonical simulations which are far from equilibrium, Journal of Chemical Theory and Computation, 13, 4660-4674.
Tan, Z., Xia, J., Zhang, B.W., and Levy, R.M. (2016) Locally weighted histogram analysis and stochastic solution for large-scale multistate free energy estimation, Journal of Chemical Physics, 144, 034107. (A companion paper Tan 2017, JCGS)
Zhang, B.W., Gallicchio, E., Dai, W., He, P., Tan, Z., and Levy, R.M. (2016) Simulating replica exchange: Markov state models, proposal schemes, and the infinite swapping limit, Journal of Physical Chemistry B, 120, 8289-8301.
Flynn, W.F., Chang, M.W., Tan, Z., Oliveira, G., Yuan, J., Okulicz, J.F., Torbett, B.E., and Levy, R.M. (2015) Deep sequencing of protease inhibitor resistant HIV patient isolates reveals patterns of correlated mutations in Gag and Protease, PLoS Computational Biology, 11, e1004249.
Wang, B., Ou, Z., and Tan, Z. (2015) Trans-dimensional random fields for language modeling, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL), 785-794. (See Zhijian Ou's webpage for codes.)
Xia, J., Flynn, W.F., Gallicchio, E., Zhang, B.W., He, P., Tan, Z., and Levy, R.M. (2015) Large-scale asynchronous and distributed multidimensional replica exchange molecular simulations and efficiency analysis, Journal of Computational Chemistry, 36, 1772-1785.
Zhang, B.W., Xia, J., Tan, Z., and Levy, R.M. (2015) A stochastic solution to the unbinned WHAM equations, Journal of Physical Chemistry Letters, 6, 3834-3840.
Tan, Z., Gallicchio, E., Lapelosa, M., and Levy, R.M. (2012) Theory of binless multi-state free energy estimation with applications to protein-ligand binding, Journal of Chemical Physics, 136, 144102.
Pluzhnikov, A., Nolan, D.K., Tan, Z., McPeek, M.S., and Ober, C. (2007) Correlation of intergenerational family sizes suggests a genetic component to reproductive fitness, American Journal of Human Genetics, 81, 165-169.
Discussions
Tan, Z. (2023) On DID, IV, and combination, Discussion on "Instrumented difference-in-differences" by Ye, Ertefaie, Flory, Hennessy, and Small, Biometrics, 79, 587-591.
Tan, Z. (2008) Improved local efficiency and double robustness, Comment on "Empirical efficiency maximization: Improved locally efficient covariate adjustment in randomized experiments and survival analysis" by Rubin and van der Laan, International Journal of Biostatistics, 4, Article 10.
Tan, Z. (2007) Understanding OR, PS, and DR, Comment on "Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data" by Kang and Schafer, Statistical Science, 22, 560-568.
Unpublished manuscripts
Tan, Z. (2013) Variance estimation under misspecied models. Used for variance estimation in the R package iWeigReg.
Tan, Z., Gerhard, T., and Crystal, S. (2011) Exploring new statistical methods for causal inference in longitudinal studies, DEcIDE report, the AHRQ Effective Health Care Program.
Small, D. and Tan, Z. (2007) A stochastic monotonicity assumption for the instrumental variable method. Superseded by Small, Tan, Ramsahai, Lorch, and Brookhart (2017), Statistical Science.