My research is supported by the Natural Sciences and Engineering Council of Canada through the NSERC Discovery Grants program and the NSERC Discovery Launch Supplements program.

Publications and Preprints

In the following, * represents alphabetical author ordering.

  1. Anna Neufeld, Lucy L. Gao, Joshua Popp, Alexis Battle, and Daniela Witten (2022+) Inference after latent variable estimation for single-cell RNA-sequencing data. [pdf] [package website] [package on github] [code]
  2. Anna Neufeld, Lucy L. Gao, and Daniela Witten (2022+) Tree-values: selective inference for regression trees. [pdf] [package website] [package on github] [code]
  3. Lucy L. Gao, Jacob Bien, and Daniela Witten (2022+) Selective inference for hierarchical clustering. To appear in Journal of the American Statistical Association. [pdf] [package website] [package on github] [code]
  4. Lucy L. Gao, Daniela Witten and Jacob Bien (2022+) Testing for association in multi-view network data. To appear in Biometrics. [pdf] [package on cran] [code]
    [Received a 2020 ASA Statistical Learning and Data Science Section Student Paper Award.]
  5. Lucy L. Gao*, Jane J. Ye*, Haian Yin*, Shangzhi Zeng*, Jin Zhang* (2022). Value function based difference-of-convex algorithm for bilevel hyperparameter selection problems. International Conference on Machine Learning (ICML) 2022. [pdf]
  6. Pengqi Liu, Lucy L. Gao and Julie Zhou (2022). R-optimal designs for multi-response regression models with multi-factors. Communications in Statistics - Theory and Methods, 51(2), 340-355. [pdf]
  7. Lucy L. Gao, Jacob Bien and Daniela Witten (2020) Are clusterings of multiple data views independent? Biostatistics, 21(4), 692-708. [pdf] [package on cran] [code]
    [Received a 2019 ASA Biometrics Section Student Travel Award.]
  8. Lucy L. Gao* and Julie Zhou* (2020). Minimax D-optimal designs for multivariate regression models with multi-factors. Journal of Statistical Planning and Inference . [pdf]
  9. Evelyn Hsu, Michele Shaffer, Lucy L. Gao, Christopher Sonnenday, Michael Volk, John Bucuvalas and Jennifer Lai (2017) Analysis of liver offers to pediatric candidates on the transplant wait list. Gastroenterology, 153(4): 988-995.
  10. Lucy L. Gao and Julie Zhou (2017) D-optimal designs based on the second-order least squares estimator. Statistical Papers, 58(2): 77-94.
  11. Lucy L. Gao and Julie Zhou (2014) New optimal design criteria for regression models with asymmetric errors. Journal of Statistical Planning and Inference, 149: 140-151.


  1. clusterpval is an R package that computes valid p-values for a difference in means between estimated clusters in a data set. [website] [paper] [github]
  2. multiviewtest is an R package for learning whether and how clusters defined with respect to different data views are associated. [paper1] [paper2] [cran]