About
I’m an Assistant Professor in the Department of Statistics at the University of Manitoba.
I earned my Ph.D. at the University of Alberta (Canada), co-advised by Prof. Linglong Kong and Prof. Bei Jiang. I received my M.S. from the University of Minnesota (USA), advised by Prof. Yang Li and Prof. Haiyang Wang, and my B.S. from Beijing University of Posts and Telecommunications (China).
Research
My research sits at the intersection of modern machine learning, natural language processing, and statistics, with an emphasis on trustworthy, practical, and deployable AI.
Current themes
- LLMs & NLP: post-training (NeurIPS 2025 Workshop), agentic systems (AAMAS 2026 Oral), retrieval-augmented generation (RAG; AAAI 2026 Workshop, arXiv), reasoning, evaluation (e.g., LLM-as-judge), and domain applications (NeurIPS 2024).
- Algorithmic fairness & computational social science: measuring and mitigating social bias in language and AI systems (PNAS Nexus, NAACL 2024, NeurIPS 2022, AAAI 2022, Frontiers in Big Data, Cities).
- Statistics + AI for interdisciplinary problems: privacy (ICML 2023, NeurIPS 2023), health (BMC Medical Research Methodology, Substance Use & Misuse), medicine (Leukemia Research), anthropology (Archaeological and Anthropological Sciences), environmental science (Ecotoxicology and Environmental Safety), and related applied domains.
I’m always open to collaborations with curious and motivated students and researchers. If you’d like to work together, please reach out by email.
Publications
I don’t maintain a complete publication list here—please see Google Scholar.
Selected work
- Probing Social Bias in Labor Market Text Generation by ChatGPT: A Masked Language Model Approach (NeurIPS 2024)
- Experimental design for measuring bias in ChatGPT-generated job applications conditioned on real job ads.
- Masked-language-model-based bias evaluation using validated social cue inventories.
- Language in Job Advertisements and the Reproduction of Labor Force Gender and Racial Segregation (PNAS Nexus 2024)
- Gender/EDI language inventory for job ads; analysis of 28.6M UK job ads (2018–2023) linked to labor force statistics.
- Evidence on how job-ad language can perpetuate or mitigate labor market segregation.
- From Physical Space to Cyberspace: Recessive Gender Biases in Social Media Mirror the Real World (Cities)
- Framework to quantify subtle (“recessive”) gender bias in social media with temporal and spatial analysis (Hong Kong case study).
- Debiasing with Sufficient Projection: A General Theoretical Framework for Vector Representations (NAACL 2024)
- General framework for debiasing vector representations via projection onto an unbiased subspace.
- Gaussian Differential Privacy on Riemannian Manifolds (NeurIPS 2023)
- Extends Gaussian Differential Privacy (GDP) to general Riemannian manifolds.
- Local Differential Privacy for Population Quantile Estimation (ICML 2023)
- Methods for population quantile estimation under local differential privacy.
- Quantile Fairness Regression with Conformalized Prediction Intervals (NeurIPS 2022)
- Fairness-aware quantile regression with conformal prediction intervals for uncertainty quantification.
- Reducing Gender Bias in GloVe Word Embeddings Using Causal Inference (AAAI 2022)
- Causal approach to reducing gender bias while preserving semantic utility.
Projects & Industry Collaboration
- AI-driven Combat Against Bias in Job Recruitment (Canada–UK AI Initiative)
- International collaboration on measuring and mitigating gender/ethnic bias in recruitment and job-market text.
- Synthetic Health Data Generation with Deep Learning (Replica Analytics)
- Industry internship on longitudinal synthetic structured health data generation; paper in BMC Medical Research Methodology (2023).
- Predicting Carbon Percentage in Alberta Soil (ABMI)
- Interdisciplinary modeling for soil carbon prediction; feature screening with bootstrap group lasso.
- Reducing Selection Bias in Counterfactual Reasoning (NeurIPS 2019 Workshop)
- Synthetic data mechanism for studying selection bias in counterfactual settings.
- Cloud User Interest Analysis in P2P Systems (IEEE INFOCOM 2020)
- Measurement and modeling of peer interests; large-scale scraping and clustering-based analysis.
Invited Talks
- Ding, Lei (Author & Presenter). Health Data with Artificial Intelligence. Manitoba CLL Research Meeting, Winnipeg, MB, Canada. Oct 2025
- Ding, Lei (Author & Presenter). Probing Social Bias in Labor Market Text Generation by ChatGPT: A Masked Language Model Approach. DSA Seminar, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China. Jul 2025
- Ding, Lei (Author & Presenter). Social Bias Evaluation in Text and Large Language Models. Seminar, Concordia University, Canada. Feb 2025
- Ding, Lei (Author & Presenter). Social Bias Evaluation in Text and Large Language Models. Seminar, University of Manitoba, Canada. Dec 2024
- Ding, Lei (Author & Presenter). Word Embeddings via Causal Inference: Reducing Gender Bias While Preserving Semantic Information. SIAM Spring 2023, University of Texas at Arlington, Arlington, TX, USA. May 2023
Media
- University of Alberta Folio
- University of Alberta Gateway
