Enmao Diao

Entrepreneur and Researcher
Ambitious, Creative, Curious, Honest, Passionate

Research interests
Distributed Machine Learning, Efficient Machine Learning,
Signal Processing, Artificial Intelligence

To create AI like never before

I was born in Chengdu, Sichuan, China in 1994. I received my B.S. with the highest honor in Electrical Engineering and Computer Science from Georgia Institute of Technology in 2016. I received my M.S. degree in Electrical Engineering from Harvard University in 2018. I received my Ph.D. degree in Electrical Engineering from Duke University in 2023.

news

Jan 23, 2025	MAP: Multi-Human-Value Alignment Palette is accepted in ICLR 2025 (Oral). Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing is accepted in ICLR 2025.
Jan 23, 2025	AID: Adaptive Integration of Detectors for Safe AI with Language Models is accepted in NAACL 2025.
Dec 15, 2024	DynamicFL: Federated Learning with Dynamic Communication Resource Allocation is accepted in IEEE BigData 2024 (Best Student Paper).
Apr 30, 2024	ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers is accepted in EMNLP 2024.

selected publications

2025

ICLR

MAP: Multi-Human-Value Alignment Palette

Xinran Wang, Qi Le, Ammar Ahmed, and 5 more authors

In International Conference on Learning Representations, 2025

Oral Abs PDF

Oral

Ensuring that generative AI systems align with human values is essential but challenging, especially when considering multiple human values and their potential trade-offs. Since human values can be personalized and dynamically change over time, the desirable levels of value alignment vary across different ethnic groups, industry sectors, and user cohorts. Within existing frameworks, it is hard to define human values and align AI systems accordingly across different directions simultaneously, such as harmlessness, helpfulness, and positiveness. To address this, we develop a novel, first-principle approach called Multi-Human-Value Alignment Palette (MAP), which navigates the alignment across multiple human values in a structured and reliable way. MAP formulates the alignment problem as an optimization task with user-defined constraints, which define human value targets. It can be efficiently solved via a primal-dual approach, which determines whether a user-defined alignment target is achievable and how to achieve it. We conduct a detailed theoretical analysis of MAP by quantifying the trade-offs between values, the sensitivity to constraints, the fundamental connection between multi-value alignment and sequential alignment, and proving that linear weighted rewards are sufficient for multi-value alignment. Extensive experiments demonstrate MAP’s ability to align multiple values in a principled manner while delivering strong empirical performance across various tasks.
ICLR

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing

Qi Le, Enmao Diao, Ziyan Wang, and 4 more authors

In International Conference on Learning Representations, 2025

Abs PDF

We introduce Probe Pruning (PP), a novel framework for online, dynamic, structured pruning of Large Language Models (LLMs) applied in a batch-wise manner. PP leverages the insight that not all samples and tokens contribute equally to the model’s output, and probing a small portion of each batch effectively identifies crucial weights, enabling tailored dynamic pruning for different batches. It comprises three main stages: probing, history-informed pruning, and full inference. In the probing stage, PP selects a small yet crucial set of hidden states, based on residual importance, to run a few model layers ahead. During the history-informed pruning stage, PP strategically integrates the probing states with historical states. Subsequently, it structurally prunes weights based on the integrated states and the PP importance score, a metric developed specifically to assess the importance of each weight channel in maintaining performance. In the final stage, full inference is conducted on the remaining weights. A major advantage of PP is its compatibility with existing models, as it operates without requiring additional neural network modules or fine-tuning. Comprehensive evaluations of PP on LLaMA-2/3 and OPT models reveal that even minimal probing-using just 1.5% of FLOPs-can substantially enhance the efficiency of structured pruning of LLMs. For instance, when evaluated on LLaMA-2-7B with WikiText2, PP achieves a 2.56 times lower ratio of performance degradation per unit of runtime reduction compared to the state-of-the-art method at a 40% pruning ratio.
NAACL

AID: Adaptive Integration of Detectors for Safe AI with Language Models

Xinran Wang, Enmao Diao, Qi Le, and 2 more authors

In Nations of the Americas Chapter of the ACL, 2025

Abs PDF

As Large Language Models (LLMs) increasingly influence content generation across diverse platforms, there is a heightened urgency to regulate their outputs to ensure safe usage. However, defining “safety” is complex, given that entities across domains may interpret it through varied lenses and develop safety detectors—models trained to identify specific unsafe content based on predefined criteria. To address this complexity, we introduce the approach of Adaptive Integration of Detectors (AID) to orchestrate the strengths of multiple pretrained detectors to ensure comprehensive effectiveness in diverse scenarios. AID employs a Mixture-of-Experts (MoE) framework, wherein it dynamically assigns and learns data-adaptive weights for each detector using domain-specific annotated data and LLM-extracted features. We provide theoretical insights into why MoE can be effective by showing its optimality in a Neyman-Pearson setting. Our experimental studies using various detection tasks curated from benchmark datasets demonstrate AID’s ability to synergistically combine the unique capabilities of individual detectors. For example, it is observed that AID can improve the area under the curve (AUC) by an absolute value of 0.07 to 0.21, with a median of 0.12, compared to the best individual detectors developed for specific safety aspects. The improvement is particularly significant for complex detection tasks that mix different unsafe data sources.

2024

IEEE BigData

DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

Qi Le, Enmao Diao, Xinran Wang, and 4 more authors

In 2024 IEEE International Conference on Big Data (BigData), 2024

Best Student Paper Abs PDF Code

Best Student Paper

Federated Learning (FL) is a collaborative machine learning framework that allows multiple users to train models utilizing their local data in a distributed manner. However, considerable statistical heterogeneity in local data across devices often leads to suboptimal model performance compared with independently and identically distributed (IID) data scenarios. In this paper, we introduce DynamicFL, a new FL framework that investigates the trade-offs between global model performance and communication costs for two widely adopted FL methods: Federated Stochastic Gradient Descent (FedSGD) and Federated Averaging (FedAvg). Our approach allocates diverse communication resources to clients based on their data statistical heterogeneity, considering communication resource constraints, and attains substantial performance enhancements compared to uniform communication resource allocation. Notably, our method bridges the gap between FedSGD and FedAvg, providing a flexible framework leveraging communication heterogeneity to address statistical heterogeneity in FL. Through extensive experiments, we demonstrate that DynamicFL surpasses current state-of-the-art methods with up to a 10% increase in model accuracy, demonstrating its adaptability and effectiveness in tackling data statistical heterogeneity challenges.
EMNLP

ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Yuzhe Gu, and Enmao Diao

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abs PDF Code

Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing neural codecs often trade model complexity for reconstruction performance. These codecs primarily use convolutional blocks for feature transformation, which are not inherently suited for capturing the local redundancies in speech signals. To compensate, they require either adversarial discriminators or a large number of model parameters to enhance audio quality. In response to these challenges, we introduce the Efficient Speech Codec (ESC), a lightweight, parameter-efficient speech codec based on a cross-scale residual vector quantization scheme and transformers. Our model employs mirrored hierarchical window transformer blocks and performs step-wise decoding from coarse-to-fine feature representations. To enhance bitrate efficiency, we propose a novel combination of vector quantization techniques along with a pre-training paradigm. Extensive experiments demonstrate that ESC can achieve high-fidelity speech reconstruction with significantly lower model complexity, making it a promising alternative to existing convolutional audio codecs
IEEE Access

Large Deviation Analysis of Score-based Hypothesis Testing

Enmao Diao, Taposh Banerjee, and Vahid Tarokh

IEEE Access, 2024

Abs PDF Code

Score-based statistical models play an important role in modern machine learning, statistics, and signal processing. For hypothesis testing, a score-based hypothesis test is proposed in \citewu2022score. We analyze the performance of this score-based hypothesis testing procedure and derive upper bounds on the probabilities of its Type I and II errors. We prove that the exponents of our error bounds are asymptotically (in the number of samples) tight for the case of simple null and alternative hypotheses. We calculate these error exponents explicitly in specific cases and provide numerical studies for various other scenarios of interest.
RadarConf

A PixelCNN Based Method for Rough Surface Clutter Reduction in GPR B-scan Images

Yan Zhang, Enmao Diao, Dryver Huston, and 1 more author

In 2024 IEEE Radar Conference (RadarConf24), 2024

Abs PDF

Reducing the rough surface clutter in a Ground Penetrating Radar (GPR) B-scan image is essential for detecting shallow buried targets and for improving the performance of subsequent image processing algorithms. Challenges involved in this problem lie in the difficulties with modeling the variations of the rough surface profile. In this paper, we propose a novel PixelCNN based method for reducing rough surface clutter in GPR B-scan images. In the proposed method, the rough surface region in a B-scan image is split into small patches which are used to train a PixelCNN model. Given an input patch, the trained PixelCNN model can output the probability that the input patch comes from the rough surface region. For reconstructing a clutter reduced B-scan, the entire B-scan image is split into patches and each patch is input into the trained PixelCNN model to get its corresponding probability. Negative log-probabilities are then utilized as scores to suppress the rough surface region and to enhance the target profile in the B-scan image. To demonstrate the effectiveness of the proposed method, we test it on four simulation B-scan images and two real-world B-scan images. The proposed method is compared with traditional subspace projection methods. The results indicate that the proposed method outperforms traditional subspace projection methods.
TGRS

A Data Efficient Deep Learning Method for Rough Surface Clutter Reduction in GPR Images

Yan Zhang, Enmao Diao, Dryver Huston, and 1 more author

IEEE Transactions on Geoscience and Remote Sensing, 2024

Abs PDF

In ground penetrating radar (GPR) B-scan images, ground surface clutter is the main source of interference, often obscuring or distorting subsurface target signals. We propose a deep autoencoder-based method to mitigate rough surface clutter in GPR images by treating it as an anomaly detection problem. First, the rough surface region in a B-scan image is partitioned into small patches, which act as the training dataset for the deep autoencoder. Through the training process, the autoencoder learns and captures the patterns associated with the rough surface patches. Following training, the entire B-scan image is divided into small patches of the same size as the training patches, and each of them is fed into the autoencoder to compute an anomaly score. To reconstruct a clutter-reduced B-scan image, we employ a weighted sum approach to aggregate all patches based on their anomaly scores. We evaluate our method against conventional subspace projection techniques using simulated and field-collected B-scans. The results clearly indicate that our approach surpasses these subspace methods. Furthermore, we employ t-distributed stochastic neighbor embedding (t-SNE) analysis to gain deeper insights into our method’s effectiveness in reducing rough surface clutter. The outcomes of this analysis reinforce the practical viability of our approach for GPR image processing.

2023

TIT

Quickest Change Detection for Unnormalized Statistical Models

Suya Wu, Enmao Diao, Taposh Banerjee, and 2 more authors

IEEE Transactions on Information Theory, 2023

Abs PDF Code

Classical quickest change detection algorithms require modeling pre-change and post-change distributions. Such an approach may not be feasible for various machine learning models because of the complexity of computing the explicit distributions. Additionally, these methods may suffer from a lack of robustness to model mismatch and noise. This paper develops a new variant of the classical Cumulative Sum (CUSUM) algorithm for the quickest change detection. This variant is based on Fisher divergence and the Hyvärinen score and is called the Hyvärinen score-based CUSUM (SCUSUM) algorithm. The SCUSUM algorithm allows the applications of change detection for unnormalized statistical models, i.e., models for which the probability density function contains an unknown normalization constant. The asymptotic optimality of the proposed algorithm is investigated by deriving expressions for average detection delay and the mean running time to a false alarm. Numerical results are provided to demonstrate the performance of the proposed algorithm.
Thesis

Efficient and Collaborative Methods for Distributed Machine Learning

Enmao Diao

Duke University, 2023

Abs PDF

In recent years, there has been a significant expansion in the scale and complexity of neural networks. This has resulted in significant demand for data, computation, and energy resources. In this light, it is crucial to enhance and optimize the efficiency of these ML models and algorithms. Additionally, the rise in computational capabilities of modern devices has prompted a shift towards distributed systems that enable localized data storage and model training. While this evolution promises substantial potential, it introduces a series of challenges. Such challenges encompass addressing the heterogeneity across systems, data, models, and supervision, balancing the trade-off among communication, computation, and performance, as well as building a community of shared interest to encourage collaboration in the emerging era of Artificial General Intelligence (AGI).In this dissertation, we contribute to the establishment of a theoretically justified, methodologically comprehensive, and universally applicable Efficient and Collaborative Distributed Machine Learning framework. Specifically, in Part I, we contribute to methodologies for Efficient Machine Learning including for both learning and inference. In this direction, we propose a parameter-efficient model, namely Restricted Recurrent Neural Networks (RRNN), that leverage the recurrent structures of RNNs using weight sharing in order to improve learning efficiency. We also introduce an optimal measure of vector sparsity named the PQ Index (PQI), and postulate a hypothesis connecting this sparsity measure and compressibility of neural networks. Based on this, we propose a Sparsity-informed Adaptive Pruning (SAP) algorithm. This algorithm adaptively determines the pruning ratio to enhance inference efficiency. In Part II, we address both efficiency and collaboration in Distributed Machine Learning. We introduce Distributed Recurrent Autoencoders for Scalable Image Compression (DRASIC), a data-driven Distributed Source Coding framework that can compress heterogeneous data in a scalable and distributed manner. We then propose Heterogeneous Federated Learning (HeteroFL), demonstrating the feasibility of training localized heterogeneous models to create a global inference model. Subsequently, we propose a new Federated Learning (FL) framework, namely SemiFL, to tackle Semi-Supervised Federated Learning (SSFL) for clients with unlabeled data. This method performs comparably with state-of-the-art centralized Semi-Supervised Learning (SSL), and fully supervised FL techniques. Finally, we propose Gradient Assisted Learning (GAL) in order to enable collaborations among multiple organizations without sharing data, models, and objective functions. This method significantly out-performs local learning baselines and achieves near-oracle performance. In Part III, we develop collaborative applications for building a community of shared interest. We apply SemiFL to Keyword Spotting (KWS), a technique widely used in virtual assistants. Numerical experiments demonstrate that one can train models from the scratch, or transfer from pre-trained models in order to leverage heterogeneous unlabeled on-device data, using only a small amount of labeled data from the server. Finally, we propose a Decentralized Multi-Target Cross-Domain Recommendation (DMTCDR) which enhances the recommendation performance of decentralized organizations without compromising data privacy or model confidentiality.
UAI

Robust Quickest Change Detection for Unnormalized Models

Suya Wu, Enmao Diao, Jie Ding, and 2 more authors

In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, 2023

Abs PDF

Detecting an abrupt and persistent change in the underlying distribution of online data streams is an important problem in many applications. This paper proposes a new robust score-based algorithm called RSCUSUM, which can be applied to unnormalized models and addresses the issue of unknown post-change distributions. RSCUSUM replaces the Kullback-Leibler divergence with the Fisher divergence between pre-and post-change distributions for computational efficiency in unnormalized statistical models and introduces a notion of the “least favorable” distribution for robust change detection. The algorithm and its theoretical analysis are demonstrated through simulation studies.
ICLR

Pruning Deep Neural Networks from a Sparsity Perspective

Enmao Diao, Ganghua Wang, Jiawei Zhang, and 3 more authors

In International Conference on Learning Representations, 2023

Abs PDF Code

In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. Many deep pruning algorithms have been proposed with impressive empirical success. However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may under-prune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. Our extensive experiments corroborate the hypothesis that for a generic pruning procedure, PQI decreases first when a large model is being effectively regularized and then increases when its compressibility reaches a limit that appears to correspond to the beginning of underfitting. Subsequently, PQI decreases again when the model collapse and significant deterioration in the performance of the model start to occur. Additionally, our experiments demonstrate that the proposed adaptive pruning algorithm with proper choice of hyper-parameters is superior to the iterative pruning algorithms such as the lottery ticket-based pruning methods, in terms of both compression efficiency and robustness.
AISTATS

Score-based Quickest Change Detection for Unnormalized Models

Suya Wu, Enmao Diao, Taposh Banerjee, and 2 more authors

In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, 2023

Abs PDF Code

Classical change detection algorithms typically require modeling pre-change and post-change distributions. The calculations may not be feasible for various machine learning models because of the complexity of computing the partition functions and normalized distributions. Additionally, these methods may suffer from a lack of robustness to model mismatch and noise. In this paper, we develop a new variant of the classical Cumulative Sum (CUSUM) change detection, namely Score-based CUSUM (SCUSUM), based on Fisher divergence and the Hyvärinen score. Our method allows the applications of the quickest change detection for unnormalized distributions. We provide a theoretical analysis of the detection delay given the constraints on false alarms. We prove the asymptotic optimality of the proposed method in some particular cases. We also provide numerical experiments to demonstrate our method’s computation, performance, and robustness advantages.

2022

NeurIPS

GAL: Gradient Assisted Learning for Decentralized Multi-organization Collaborations

Enmao Diao, Jie Ding, and Vahid Tarokh

In Advances in Neural Information Processing Systems, 2022

Abs PDF Code

Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new method for multiple organizations to assist each other in supervised learning tasks without sharing local data, models, and objective functions. In this framework, all participants collaboratively optimize the aggregate of local loss functions, and each participant autonomously builds its own model by iteratively fitting the gradients of the overarching objective function. We also provide asymptotic convergence analysis and practical case studies of GAL. Experimental studies demonstrate that GAL can achieve performance close to centralized learning when all data, models, and objective functions are fully disclosed.
NeurIPS

SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training

Enmao Diao, Jie Ding, and Vahid Tarokh

In Advances in Neural Information Processing Systems, 2022

Abs PDF Code

Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication-efficient FL with SSL. To address this issue, we propose alternate training to ’fine-tune global model with labeled data’ and ’generate pseudo-labels with the global model.’ We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results.

2021

ICLR

HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients

Enmao Diao, Jie Ding, and Vahid Tarokh

In International Conference on Learning Representations, 2021

Abs PDF Code

Federated Learning (FL) is a method of training machine learning models on private data distributed over a large number of possibly heterogeneous clients such as mobile phones and IoT devices. In this work, we propose a new federated learning framework named HeteroFL to address heterogeneous clients equipped with very different computation and communication capabilities. Our solution can enable the training of heterogeneous local models with varying computation complexities and still produce a single global inference model. For the first time, our method challenges the underlying assumption of existing work that local models have to share the same architecture as the global model. We demonstrate several strategies to enhance FL training and conduct extensive empirical evaluations, including five computation complexity levels of three model architecture on three datasets. We show that adaptively distributing subnetworks according to clients’ capabilities is both computation and communication efficient.

2020

ICASSP

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Jianyou Wang, Michael Xue, Ryan Culhane, and 3 more authors

In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

Abs PDF Code

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation of human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3% - a 6% improvement over current state-of-the-art unimodal models - and is comparable with multimodal models that leverage textual information as well as audio signals.
DCC

DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

Enmao Diao, Jie Ding, and Vahid Tarokh

In 2020 Data Compression Conference (DCC), 2020

Abs PDF Code

We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning.

2019

IEEE Big Data

Restricted Recurrent Neural Networks

Enmao Diao, Jie Ding, and Vahid Tarokh

In 2019 IEEE International Conference on Big Data (Big Data), 2019

Student Travel Award Abs PDF Code

Student Travel Award

Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs. The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as a compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most existing compression techniques. Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50% compression rate. In particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.