About Me

I’m an Assistant Professor in the Department of Statistics at the University of Manitoba.

I earned my Ph.D. at the University of Alberta (Canada), co-advised by Prof. Linglong Kong and Prof. Bei Jiang. I obtained my M.S. degree at the University of Minnesota (USA) with Prof. Yang Li and Prof. Haiyang Wang and B.S. degree at Beijing University of Posts and Telecommunications(China).

My research is driven by a passion for natural language processing, machine learning, and deep learning, particularly their applications and implications for society. I work at the intersection of statistical machine learning and NLP, aiming to develop methods that not only advance the field but also address socially impactful challenges.

I am always open to collaborations with curious and motivated individuals from all backgrounds. If you’re interested in working together or learning more about my research, feel free to reach out via email—I’d love to connect.

Academic Research and Conference Papers:

We propose a novel experimental design to examine social biases within ChatGPT-generated job applications in response to real job advertisements.
By simulating the process of job application creation, we examine the language patterns and biases that emerge when the model is prompted with diverse job postings.
We also present a novel bias evaluation framework based on Masked Language Models to quantitatively assess social bias based on validated inventories of social cues/words

Language in job advertisements and the reproduction of labor force gender and racial segregation (PNAS Nexus 2024)

We develop a gender and EDI language inventory for job advertisements to examine the reciprocal influence between the language used in job ads and the gender/racial composition of the labor force. Utilizing 28.6 million job ads from the United Kingdom (2018-2023) and labor force statistics, it identifies mechanisms through which this interaction either perpetuates or mitigates gender/racial segregation in the labor force. The findings emphasize the potential impacts and challenges of modifying job ad language to foster labor market equity.

We propose a new Framework to analyze and quantify recessive gender biases in social media, using Hong Kong’s Twitter data as a case study.
Our framework demonstrates the temporal trends and spatial distribution of these biases, revealing that the gender biases in Hong Kong’s virtual spaces closely mirror those in the physical world, highlighting the persistent influence of gender stereotypes across both realms.

Debiasing with Sufficient Projection: A General Theoretical Framework for Vector Representations (NAACL 2024)

We propose a novel framework to reduce bias by transforming vector representations to an unbiased subspace using sufficient projection.

Gaussian differential privacy on Riemannian manifolds (NeurIPS 2023)

We develop an advanced approach for extending Gaussian Differential Privacy (GDP) to general Riemannian manifolds.

Local Differential Privacy for Population Quantile Estimation (ICML 2023)

Developed a novel approach for estimating population quantiles with Local Differential Privacy.

Quantile Fairness Regression with Conformalized Prediction Intervals (NeurIPS 2022)

Presented a pioneering study on quantile fairness regression at NeurIPS 2022.
Proposed a novel conformalized prediction interval to assess fairness algorithm uncertainty and provide fair prediction intervals.

Reducing Gender Bias in GloVe Word Embeddings Using Causal Inference (AAAI 2022)

We propose a method for gender bias in GloVe word embeddings while retaining semantics information.

AI-driven Combat Against Bias in Job Recruitment (Canada-UK AI Initiative)

Led an international project to identify and mitigate gender and ethnic bias in the job market using AI.
Managed a diverse team of postdocs and Ph.D. students from Canada and the UK.
Collaboration with sociology teams for interdisciplinary work.
Published outcomes in reputable journals and conferences.

Synthetic Health Data Generation with Deep Learning (Replica Analytics)

Internship at Replica Analytics, working on synthesizing structured health data.
Developed state-of-the-art deep learning models to generate longitudinal synthetic data.
Contributed to critical modeling discussions and utility evaluation methods.
Research paper accepted at BMC Medical Research Methodology 2023.

Predicting Carbon Percentage in Alberta Soil (Alberta Biodiversity Monitoring Institute)

Collaborated on an interdisciplinary project for predicting carbon soil percentage.
Developed a novel screening variable method with bootstraps group lasso for analysis.
Submitted research paper to Science of the Total Environment in 2021.

Reducing Selection Bias in Counterfactual Reasoning (NeurIPS 2019 Workshop)

Contributed to a research project addressing selection bias in counterfactual reasoning.
Proposed a novel synthetic data-generating mechanism.
Paper presented at NeurIPS 2019 Workshop on “Do the right thing.”

Cloud User Interest Analysis in P2P Systems (IEEE INFOCOM 2020)

Conducted measurements and analysis of cloud peers’ interest in P2P systems.
Built web scrapers and performed data analysis and visualization.
Utilized machine learning (clustering algorithm) to gain insights.

Invited Talk:

Ding, L, (Author & Presenter), Society for Industrial and Applied Mathematics (SIAM) Spring 2023, “Word embeddings via causal inference: Gender bias reducing and semantic information preserving” the University of Texas at Arlington, Arlington, TX. (May 2nd, 2023)

Interview:

University of Alberta Folio

University of Alberta Gateway

Internships and Personal Projects:

1. Augmented Decision-making for Innovation Management (Human Resources Company)

Led a project that utilized various language modeling methods to cluster ideas and comments efficiently.
Implemented NLP models including TF-IDF, averaging word vectors, doc2vec, and RNN with BERT embeddings.
Successfully improved innovation management and resource allocation.

1. Medical Insurance Cost Prediction Using Recurrent Neural Networks

Internship at Guangzhou Huazi Software Technology Co in 2018.
Built RNN models to predict medical insurance costs for government authorities.
Worked on NLP word correction for OCR with Kneser-Ney smoothing and noisy channel model.

2. Named Entity Recognition with RNN and Tensorflow (2018)

Developed a bi-directional LSTM model with character embeddings for Named Entity Recognition.
Utilized Glove word embeddings and conditional random field for accurate predictions.

4. Neural-based Dependency Parser Implementation (2017)

Implemented a neural-based dependency parser using fully connected dense layers.
Incorporated word embeddings and POS tag features for improved accuracy.

5. Cooperation of Clustering and Differential Evolution Algorithm (2015)

Explored the combination of different clustering algorithms with Differential Evolution.
Improved searching ability and convergence speed on challenging optimization problems.

6. 3D Human Faces Modeling at SAMSUNG Advanced Institute of Technology (2015)

Internship focused on 3D human faces modeling.
Developed Matlab and C++ programs for face point cloud data processing and ECG signals analysis.

Lei Ding (丁雷)