publications | Enmao Diao

2025

ICLR

MAP: Multi-Human-Value Alignment Palette

Xinran Wang, Qi Le, Ammar Ahmed, and 5 more authors

In International Conference on Learning Representations, 2025

Oral Abs PDF

Oral

Ensuring that generative AI systems align with human values is essential but challenging, especially when considering multiple human values and their potential trade-offs. Since human values can be personalized and dynamically change over time, the desirable levels of value alignment vary across different ethnic groups, industry sectors, and user cohorts. Within existing frameworks, it is hard to define human values and align AI systems accordingly across different directions simultaneously, such as harmlessness, helpfulness, and positiveness. To address this, we develop a novel, first-principle approach called Multi-Human-Value Alignment Palette (MAP), which navigates the alignment across multiple human values in a structured and reliable way. MAP formulates the alignment problem as an optimization task with user-defined constraints, which define human value targets. It can be efficiently solved via a primal-dual approach, which determines whether a user-defined alignment target is achievable and how to achieve it. We conduct a detailed theoretical analysis of MAP by quantifying the trade-offs between values, the sensitivity to constraints, the fundamental connection between multi-value alignment and sequential alignment, and proving that linear weighted rewards are sufficient for multi-value alignment. Extensive experiments demonstrate MAP’s ability to align multiple values in a principled manner while delivering strong empirical performance across various tasks.
ICLR

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing

Qi Le, Enmao Diao, Ziyan Wang, and 4 more authors

In International Conference on Learning Representations, 2025

Abs PDF

We introduce Probe Pruning (PP), a novel framework for online, dynamic, structured pruning of Large Language Models (LLMs) applied in a batch-wise manner. PP leverages the insight that not all samples and tokens contribute equally to the model’s output, and probing a small portion of each batch effectively identifies crucial weights, enabling tailored dynamic pruning for different batches. It comprises three main stages: probing, history-informed pruning, and full inference. In the probing stage, PP selects a small yet crucial set of hidden states, based on residual importance, to run a few model layers ahead. During the history-informed pruning stage, PP strategically integrates the probing states with historical states. Subsequently, it structurally prunes weights based on the integrated states and the PP importance score, a metric developed specifically to assess the importance of each weight channel in maintaining performance. In the final stage, full inference is conducted on the remaining weights. A major advantage of PP is its compatibility with existing models, as it operates without requiring additional neural network modules or fine-tuning. Comprehensive evaluations of PP on LLaMA-2/3 and OPT models reveal that even minimal probing-using just 1.5% of FLOPs-can substantially enhance the efficiency of structured pruning of LLMs. For instance, when evaluated on LLaMA-2-7B with WikiText2, PP achieves a 2.56 times lower ratio of performance degradation per unit of runtime reduction compared to the state-of-the-art method at a 40% pruning ratio.
NAACL

AID: Adaptive Integration of Detectors for Safe AI with Language Models

Xinran Wang, Enmao Diao, Qi Le, and 2 more authors

In Nations of the Americas Chapter of the ACL, 2025

Abs PDF

As Large Language Models (LLMs) increasingly influence content generation across diverse platforms, there is a heightened urgency to regulate their outputs to ensure safe usage. However, defining “safety” is complex, given that entities across domains may interpret it through varied lenses and develop safety detectors—models trained to identify specific unsafe content based on predefined criteria. To address this complexity, we introduce the approach of Adaptive Integration of Detectors (AID) to orchestrate the strengths of multiple pretrained detectors to ensure comprehensive effectiveness in diverse scenarios. AID employs a Mixture-of-Experts (MoE) framework, wherein it dynamically assigns and learns data-adaptive weights for each detector using domain-specific annotated data and LLM-extracted features. We provide theoretical insights into why MoE can be effective by showing its optimality in a Neyman-Pearson setting. Our experimental studies using various detection tasks curated from benchmark datasets demonstrate AID’s ability to synergistically combine the unique capabilities of individual detectors. For example, it is observed that AID can improve the area under the curve (AUC) by an absolute value of 0.07 to 0.21, with a median of 0.12, compared to the best individual detectors developed for specific safety aspects. The improvement is particularly significant for complex detection tasks that mix different unsafe data sources.

2024

IEEE BigData

DynamicFL: Federated Learning with Dynamic Communication Resource Allocation

Qi Le, Enmao Diao, Xinran Wang, and 4 more authors

In 2024 IEEE International Conference on Big Data (BigData), 2024

Best Student Paper Abs PDF Code

Best Student Paper

Federated Learning (FL) is a collaborative machine learning framework that allows multiple users to train models utilizing their local data in a distributed manner. However, considerable statistical heterogeneity in local data across devices often leads to suboptimal model performance compared with independently and identically distributed (IID) data scenarios. In this paper, we introduce DynamicFL, a new FL framework that investigates the trade-offs between global model performance and communication costs for two widely adopted FL methods: Federated Stochastic Gradient Descent (FedSGD) and Federated Averaging (FedAvg). Our approach allocates diverse communication resources to clients based on their data statistical heterogeneity, considering communication resource constraints, and attains substantial performance enhancements compared to uniform communication resource allocation. Notably, our method bridges the gap between FedSGD and FedAvg, providing a flexible framework leveraging communication heterogeneity to address statistical heterogeneity in FL. Through extensive experiments, we demonstrate that DynamicFL surpasses current state-of-the-art methods with up to a 10% increase in model accuracy, demonstrating its adaptability and effectiveness in tackling data statistical heterogeneity challenges.
EMNLP

ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers

Yuzhe Gu, and Enmao Diao

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Abs PDF Code

Neural speech codecs aim to compress input signals into minimal bits while maintaining content quality in a low-latency manner. However, existing neural codecs often trade model complexity for reconstruction performance. These codecs primarily use convolutional blocks for feature transformation, which are not inherently suited for capturing the local redundancies in speech signals. To compensate, they require either adversarial discriminators or a large number of model parameters to enhance audio quality. In response to these challenges, we introduce the Efficient Speech Codec (ESC), a lightweight, parameter-efficient speech codec based on a cross-scale residual vector quantization scheme and transformers. Our model employs mirrored hierarchical window transformer blocks and performs step-wise decoding from coarse-to-fine feature representations. To enhance bitrate efficiency, we propose a novel combination of vector quantization techniques along with a pre-training paradigm. Extensive experiments demonstrate that ESC can achieve high-fidelity speech reconstruction with significantly lower model complexity, making it a promising alternative to existing convolutional audio codecs
IEEE Access

Large Deviation Analysis of Score-based Hypothesis Testing

Enmao Diao, Taposh Banerjee, and Vahid Tarokh

IEEE Access, 2024

Abs PDF Code

Score-based statistical models play an important role in modern machine learning, statistics, and signal processing. For hypothesis testing, a score-based hypothesis test is proposed in \citewu2022score. We analyze the performance of this score-based hypothesis testing procedure and derive upper bounds on the probabilities of its Type I and II errors. We prove that the exponents of our error bounds are asymptotically (in the number of samples) tight for the case of simple null and alternative hypotheses. We calculate these error exponents explicitly in specific cases and provide numerical studies for various other scenarios of interest.
arXiv

Robust Score-Based Quickest Change Detection

Sean Moushegian, Suya Wu, Enmao Diao, and 3 more authors

arXiv preprint arXiv:2407.11094, 2024

Abs PDF

Methods in the field of quickest change detection rapidly detect in real-time a change in the data-generating distribution of an online data stream. Existing methods have been able to detect this change point when the densities of the pre- and post-change distributions are known. Recent work has extended these results to the case where the pre- and post-change distributions are known only by their score functions. This work considers the case where the pre- and post-change score functions are known only to correspond to distributions in two disjoint sets. This work employs a pair of "least-favorable" distributions to robustify the existing score-based quickest change detection algorithm, the properties of which are studied. This paper calculates the least-favorable distributions for specific model classes and provides methods of estimating the least-favorable distributions for common constructions. Simulation results are provided demonstrating the performance of our robust change detection algorithm.
RadarConf

A PixelCNN Based Method for Rough Surface Clutter Reduction in GPR B-scan Images

Yan Zhang, Enmao Diao, Dryver Huston, and 1 more author

In 2024 IEEE Radar Conference (RadarConf24), 2024

Abs PDF

Reducing the rough surface clutter in a Ground Penetrating Radar (GPR) B-scan image is essential for detecting shallow buried targets and for improving the performance of subsequent image processing algorithms. Challenges involved in this problem lie in the difficulties with modeling the variations of the rough surface profile. In this paper, we propose a novel PixelCNN based method for reducing rough surface clutter in GPR B-scan images. In the proposed method, the rough surface region in a B-scan image is split into small patches which are used to train a PixelCNN model. Given an input patch, the trained PixelCNN model can output the probability that the input patch comes from the rough surface region. For reconstructing a clutter reduced B-scan, the entire B-scan image is split into patches and each patch is input into the trained PixelCNN model to get its corresponding probability. Negative log-probabilities are then utilized as scores to suppress the rough surface region and to enhance the target profile in the B-scan image. To demonstrate the effectiveness of the proposed method, we test it on four simulation B-scan images and two real-world B-scan images. The proposed method is compared with traditional subspace projection methods. The results indicate that the proposed method outperforms traditional subspace projection methods.
arXiv

ColA: Collaborative Adaptation with Gradient Learning

Enmao Diao, Qi Le, Suya Wu, and 4 more authors

2024

Abs PDF Code

A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
TGRS

A Data Efficient Deep Learning Method for Rough Surface Clutter Reduction in GPR Images

Yan Zhang, Enmao Diao, Dryver Huston, and 1 more author

IEEE Transactions on Geoscience and Remote Sensing, 2024

Abs PDF

In ground penetrating radar (GPR) B-scan images, ground surface clutter is the main source of interference, often obscuring or distorting subsurface target signals. We propose a deep autoencoder-based method to mitigate rough surface clutter in GPR images by treating it as an anomaly detection problem. First, the rough surface region in a B-scan image is partitioned into small patches, which act as the training dataset for the deep autoencoder. Through the training process, the autoencoder learns and captures the patterns associated with the rough surface patches. Following training, the entire B-scan image is divided into small patches of the same size as the training patches, and each of them is fed into the autoencoder to compute an anomaly score. To reconstruct a clutter-reduced B-scan image, we employ a weighted sum approach to aggregate all patches based on their anomaly scores. We evaluate our method against conventional subspace projection techniques using simulated and field-collected B-scans. The results clearly indicate that our approach surpasses these subspace methods. Furthermore, we employ t-distributed stochastic neighbor embedding (t-SNE) analysis to gain deeper insights into our method’s effectiveness in reducing rough surface clutter. The outcomes of this analysis reinforce the practical viability of our approach for GPR image processing.

2023

TIT

Quickest Change Detection for Unnormalized Statistical Models

Suya Wu, Enmao Diao, Taposh Banerjee, and 2 more authors

IEEE Transactions on Information Theory, 2023

Abs PDF Code

Classical quickest change detection algorithms require modeling pre-change and post-change distributions. Such an approach may not be feasible for various machine learning models because of the complexity of computing the explicit distributions. Additionally, these methods may suffer from a lack of robustness to model mismatch and noise. This paper develops a new variant of the classical Cumulative Sum (CUSUM) algorithm for the quickest change detection. This variant is based on Fisher divergence and the Hyvärinen score and is called the Hyvärinen score-based CUSUM (SCUSUM) algorithm. The SCUSUM algorithm allows the applications of change detection for unnormalized statistical models, i.e., models for which the probability density function contains an unknown normalization constant. The asymptotic optimality of the proposed algorithm is investigated by deriving expressions for average detection delay and the mean running time to a false alarm. Numerical results are provided to demonstrate the performance of the proposed algorithm.
Thesis

Efficient and Collaborative Methods for Distributed Machine Learning

Enmao Diao

Duke University, 2023

Abs PDF

In recent years, there has been a significant expansion in the scale and complexity of neural networks. This has resulted in significant demand for data, computation, and energy resources. In this light, it is crucial to enhance and optimize the efficiency of these ML models and algorithms. Additionally, the rise in computational capabilities of modern devices has prompted a shift towards distributed systems that enable localized data storage and model training. While this evolution promises substantial potential, it introduces a series of challenges. Such challenges encompass addressing the heterogeneity across systems, data, models, and supervision, balancing the trade-off among communication, computation, and performance, as well as building a community of shared interest to encourage collaboration in the emerging era of Artificial General Intelligence (AGI).In this dissertation, we contribute to the establishment of a theoretically justified, methodologically comprehensive, and universally applicable Efficient and Collaborative Distributed Machine Learning framework. Specifically, in Part I, we contribute to methodologies for Efficient Machine Learning including for both learning and inference. In this direction, we propose a parameter-efficient model, namely Restricted Recurrent Neural Networks (RRNN), that leverage the recurrent structures of RNNs using weight sharing in order to improve learning efficiency. We also introduce an optimal measure of vector sparsity named the PQ Index (PQI), and postulate a hypothesis connecting this sparsity measure and compressibility of neural networks. Based on this, we propose a Sparsity-informed Adaptive Pruning (SAP) algorithm. This algorithm adaptively determines the pruning ratio to enhance inference efficiency. In Part II, we address both efficiency and collaboration in Distributed Machine Learning. We introduce Distributed Recurrent Autoencoders for Scalable Image Compression (DRASIC), a data-driven Distributed Source Coding framework that can compress heterogeneous data in a scalable and distributed manner. We then propose Heterogeneous Federated Learning (HeteroFL), demonstrating the feasibility of training localized heterogeneous models to create a global inference model. Subsequently, we propose a new Federated Learning (FL) framework, namely SemiFL, to tackle Semi-Supervised Federated Learning (SSFL) for clients with unlabeled data. This method performs comparably with state-of-the-art centralized Semi-Supervised Learning (SSL), and fully supervised FL techniques. Finally, we propose Gradient Assisted Learning (GAL) in order to enable collaborations among multiple organizations without sharing data, models, and objective functions. This method significantly out-performs local learning baselines and achieves near-oracle performance. In Part III, we develop collaborative applications for building a community of shared interest. We apply SemiFL to Keyword Spotting (KWS), a technique widely used in virtual assistants. Numerical experiments demonstrate that one can train models from the scratch, or transfer from pre-trained models in order to leverage heterogeneous unlabeled on-device data, using only a small amount of labeled data from the server. Finally, we propose a Decentralized Multi-Target Cross-Domain Recommendation (DMTCDR) which enhances the recommendation performance of decentralized organizations without compromising data privacy or model confidentiality.
KDD

Once-for-All Federated Learning: Learning From and Deploying to Heterogeneous Clients

Kamala Varma, Enmao Diao, Tanya Roosta, and 2 more authors

2023

Abs PDF

Federated learning (FL) enables multiple client devices to train a single machine learning model collaboratively. As FL often involves various smart devices, it is important to adapt the FL pipeline to accommodate device resource constraints. This work addresses the problem of training and storing memory-intensive deep neural network architectures on resource-constrained devices. Existing solutions often involve computationally expensive methods. We propose Once-for-All Federated Learning (OFA-FL) to overcome this limitation by learning a model that concurrently optimizes sub-networks of various sizes. Clients can therefore receive the sub-network best suited for their device resources without extra computation. Our experiments show that each component of OFAFL contributes to well-performing FL-produced sub-networks while maintaining a global network design that supports the efficient deployment of device resource-specific sub-networks.
UAI

Robust Quickest Change Detection for Unnormalized Models

Suya Wu, Enmao Diao, Jie Ding, and 2 more authors

In Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence, 2023

Abs PDF

Detecting an abrupt and persistent change in the underlying distribution of online data streams is an important problem in many applications. This paper proposes a new robust score-based algorithm called RSCUSUM, which can be applied to unnormalized models and addresses the issue of unknown post-change distributions. RSCUSUM replaces the Kullback-Leibler divergence with the Fisher divergence between pre-and post-change distributions for computational efficiency in unnormalized statistical models and introduces a notion of the “least favorable” distribution for robust change detection. The algorithm and its theoretical analysis are demonstrated through simulation studies.
ICME

Semi-Supervised Federated Learning for Keyword Spotting

Enmao Diao, Eric W Tramel, Jie Ding, and 1 more author

In 2023 IEEE International Conference on Multimedia and Expo Workshops (ICMEW), 2023

Abs PDF Code

Keyword Spotting (KWS) is a critical aspect of audio-based applications on mobile devices and virtual assistants. Recent developments in Federated Learning (FL) have significantly expanded the ability to train machine learning models by utilizing the computational and private data resources of numerous distributed devices. However, existing FL methods typically require that devices possess accurate ground-truth labels, which can be both expensive and impractical when dealing with local audio data. In this study, we first demonstrate the effectiveness of Semi-Supervised Federated Learning (SSL) and FL for KWS. We then extend our investigation to Semi-Supervised Federated Learning (SSFL) for KWS, where devices possess completely unlabeled data, while the server has access to a small amount of labeled data. We perform numerical analyses using state-of-the-art SSL, FL, and SSFL techniques to demonstrate that the performance of KWS models can be significantly improved by leveraging the abundant unlabeled heterogeneous data available on devices.
CVMI

Multimodal Controller for Generative Models

Enmao Diao, Jie Ding, and Vahid Tarokh

In Computer Vision and Machine Intelligence, 2023

Abs PDF Code

Class-conditional generative models are crucial tools for data generation from user-specified class labels. Existing approaches for class-conditional generative models require nontrivial modifications of backbone generative architectures to model conditional information fed into the model. This paper introduces a plug-and-play module, named ‘multimodal controller’ to generate multimodal data without introducing additional learning parameters. In the absence of the controllers, our model reduces to non-conditional generative models. We test the efficacy of multimodal controllers on CIFAR10, COIL100, and Omniglot benchmark datasets. We demonstrate that multimodal controlled generative models (including VAE, PixelCNN, Glow, and GAN) can generate class-conditional images of significantly better quality when compared with conditional generative models. Moreover, we show that multimodal controlled models can also create novel modalities of images.
ICLR

Pruning Deep Neural Networks from a Sparsity Perspective

Enmao Diao, Ganghua Wang, Jiawei Zhang, and 3 more authors

In International Conference on Learning Representations, 2023

Abs PDF Code

In recent years, deep network pruning has attracted significant attention in order to enable the rapid deployment of AI into small devices with computation and memory constraints. Pruning is often achieved by dropping redundant weights, neurons, or layers of a deep network while attempting to retain a comparable test performance. Many deep pruning algorithms have been proposed with impressive empirical success. However, existing approaches lack a quantifiable measure to estimate the compressibility of a sub-network during each pruning iteration and thus may under-prune or over-prune the model. In this work, we propose PQ Index (PQI) to measure the potential compressibility of deep neural networks and use this to develop a Sparsity-informed Adaptive Pruning (SAP) algorithm. Our extensive experiments corroborate the hypothesis that for a generic pruning procedure, PQI decreases first when a large model is being effectively regularized and then increases when its compressibility reaches a limit that appears to correspond to the beginning of underfitting. Subsequently, PQI decreases again when the model collapse and significant deterioration in the performance of the model start to occur. Additionally, our experiments demonstrate that the proposed adaptive pruning algorithm with proper choice of hyper-parameters is superior to the iterative pruning algorithms such as the lottery ticket-based pruning methods, in terms of both compression efficiency and robustness.
AISTATS

Score-based Quickest Change Detection for Unnormalized Models

Suya Wu, Enmao Diao, Taposh Banerjee, and 2 more authors

In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, 2023

Abs PDF Code

Classical change detection algorithms typically require modeling pre-change and post-change distributions. The calculations may not be feasible for various machine learning models because of the complexity of computing the partition functions and normalized distributions. Additionally, these methods may suffer from a lack of robustness to model mismatch and noise. In this paper, we develop a new variant of the classical Cumulative Sum (CUSUM) change detection, namely Score-based CUSUM (SCUSUM), based on Fisher divergence and the Hyvärinen score. Our method allows the applications of the quickest change detection for unnormalized distributions. We provide a theoretical analysis of the detection delay given the constraints on false alarms. We prove the asymptotic optimality of the proposed method in some particular cases. We also provide numerical experiments to demonstrate our method’s computation, performance, and robustness advantages.

2022

NeurIPS

GAL: Gradient Assisted Learning for Decentralized Multi-organization Collaborations

Enmao Diao, Jie Ding, and Vahid Tarokh

In Advances in Neural Information Processing Systems, 2022

Abs PDF Code

Collaborations among multiple organizations, such as financial institutions, medical centers, and retail markets in decentralized settings are crucial to providing improved service and performance. However, the underlying organizations may have little interest in sharing their local data, models, and objective functions. These requirements have created new challenges for multi-organization collaboration. In this work, we propose Gradient Assisted Learning (GAL), a new method for multiple organizations to assist each other in supervised learning tasks without sharing local data, models, and objective functions. In this framework, all participants collaboratively optimize the aggregate of local loss functions, and each participant autonomously builds its own model by iteratively fitting the gradients of the overarching objective function. We also provide asymptotic convergence analysis and practical case studies of GAL. Experimental studies demonstrate that GAL can achieve performance close to centralized learning when all data, models, and objective functions are fully disclosed.
NeurIPS

SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training

Enmao Diao, Jie Ding, and Vahid Tarokh

In Advances in Neural Information Processing Systems, 2022

Abs PDF Code

Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication-efficient FL with SSL. To address this issue, we propose alternate training to ’fine-tune global model with labeled data’ and ’generate pseudo-labels with the global model.’ We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results.
NeurIPS

PerFedSI: A Framework for Personalized Federated Learning with Side Information

Liam Collins, Enmao Diao, Tanya Roosta, and 2 more authors

2022

Abs PDF

With an ever-increasing number of smart edge devices with computation and communication constraints, Federated Learning (FL) is a promising paradigm for learning from distributed devices and their data. Typical approaches to FL aim to learn a single model that simultaneously performs well for all clients. But such an approach may be ineffective when the clients’ data distributions are heterogeneous. In these cases, we aim to learn personalized models for each client’s data yet still leverage shared information across clients. A critical avenue that may allow for such personalization is the presence of client-specific side information available to each client, such as client embeddings obtained from domain-specific knowledge, pre-trained models, or simply one-hot encodings. In this work, we propose a new FL framework for utilizing a general form of client-specific side information for personalized federated learning. We prove that incorporating side information can improve model performance for simplified multi-task linear regression and matrix completion problems. Further, we validate these results with image classification experiments on Omniglot, CIFAR-10, and CIFAR-100, revealing that proper use of side information can be beneficial for personalization.
Asilomar

Personalized Federated Recommender Systems with Private and Partially Federated AutoEncoders

Qi Le, Enmao Diao, Xinran Wang, and 3 more authors

In 2022 56th Asilomar Conference on Signals, Systems, and Computers, 2022

Abs PDF Code

Recommender Systems (RSs) have become increasingly important in many application domains, such as digital marketing. Conventional RSs often need to collect users’ data, centralize them on the server-side, and form a global model to generate reliable recommendations. However, they suffer from two critical limitations: the personalization problem that the RSs trained traditionally may not be customized for individual users, and the privacy problem that directly sharing user data is not encouraged. We propose Personalized Federated Recommender Systems (PersonalFR), which introduces a personalized autoencoder-based recommendation model with Federated Learning (FL) to address these challenges. PersonalFR guarantees that each user can learn a personal model from the local dataset and other participating users’ data without sharing local data, data embeddings, or models. PersonalFR consists of three main components, including AutoEncoder-based RSs (ARSs) that learn the user-item interactions, Partially Federated Learning (PFL) that updates the encoder locally and aggregates the decoder on the server-side, and Partial Compression (PC) that only computes and transmits active model parameters. Extensive experiments on two real-world datasets demonstrate that PersonalFR can achieve private and personalized performance comparable to that trained by centralizing all users’ data. Moreover, PersonalFR requires significantly less computation and communication overhead than standard FL baselines.
IEEE Access

Score-based Hypothesis Testing for Unnormalized Models

Suya Wu, Enmao Diao, Khalil Elkhalil, and 2 more authors

IEEE Access, 2022

Abs PDF Code

Unnormalized statistical models play an important role in machine learning, statistics, and signal processing. In this paper, we derive a new hypothesis testing procedure for unnormalized models. Our approach is motivated by the success of score matching techniques that avoid the intensive computational costs of normalization constants in many high-dimensional settings. Our proposed test statistic is the difference between Hyvärinen scores corresponding to the null and alternative hypotheses. Under some reasonable conditions, we prove that the asymptotic distribution of this statistic is Chi-squared. We outline a bootstrap approach to learn the test critical values, particularly when the distribution under the null hypothesis cannot be expressed in a closed form, and provide consistency guarantees. Finally, we conduct extensive numerical experiments and demonstrate that our proposed approach outperforms goodness-of-fit benchmarks in various settings.
JoT

Dimension Reduced Turbulent Flow Data from Deep Vector Quantizers

Mohammadreza Momenifar, Enmao Diao, Vahid Tarokh, and 1 more author

Journal of Turbulence, 2022

Abs PDF Code

Analysing large-scale data from simulations of turbulent flows is memory intensive, requiring significant resources. This major challenge highlights the need for data compression techniques. In this study, we apply a physics-informed Deep Learning technique based on vector quantisation to generate a discrete, low-dimensional representation of data from simulations of three-dimensional turbulent flows. The deep learning framework is composed of convolutional layers and incorporates physical constraints on the flow, such as preserving incompressibility and global statistical characteristics of the velocity gradients. The accuracy of the model is assessed using statistical, comparison-based similarity and physics-based metrics. The training data set is produced from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow. The performance of this lossy data compression scheme is evaluated not only with unseen data from the stationary, isotropic turbulent flow, but also with data from decaying isotropic turbulence, a Taylor–Green vortex flow, and a turbulent channel flow. Defining the compression ratio (CR) as the ratio of original data size to the compressed one, the results show that our model based on vector quantisation can offer CR = 85 with a mean square error (MSE) of (10−3), and predictions that faithfully reproduce the statistics of the flow, except at the very smallest scales where there is some loss. Compared to the recent study of Glaws. et al. [Deep learning for in situ data compression of large turbulent flow simulations. Phys Rev Fluids. 2020;5(11):114602], which was based on a conventional autoencoder (where compression is performed in a continuous space), our model improves the CR by more than 30%, and reduces the MSE by an order of magnitude. Our compression model is an attractive solution for situations where fast, high quality and low-overhead encoding and decoding of large data are required.
DCC

A Physics-Informed Vector Quantized Autoencoder for Data Compression of Turbulent Flow

Mohammadreza Momenifar, Enmao Diao, Vahid Tarokh, and 1 more author

In 2022 Data Compression Conference (DCC), 2022

Abs PDF Code

Analyzing large-scale data from simulations of turbulent flows is memory intensive, requiring significant resources. This major challenge highlights the need for data compression techniques. In this study, we apply a physics-informed Deep Learning technique based on vector quantization to generate a discrete, low-dimensional representation of data from simulations of three-dimensional turbulent flows. The deep learning framework is composed of convolutional layers and incorporates physical constraints on the flow, such as preserving incompressibility and global statistical characteristics of the velocity gradients. The accuracy of the model is assessed using statistical, comparison-based similarity and physics-based metrics. The training data set is produced from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow. The performance of this lossy data compression scheme is evaluated not only with unseen data from the stationary, isotropic turbulent flow, but also with data from decaying isotropic turbulence, and a Taylor-Green vortex flow. Defining the compression ratio (CR) as the ratio of original data size to the compressed one, the results show that our model based on vector quantization can offer CR = 85 with a mean square error (MSE) of O(10−3), and predictions that faithfully reproduce the statistics of the flow, except at the very smallest scales where there is some loss. Compared to the recent study based on a conventional autoencoder where compression is performed in a continuous space, our model improves the CR by more than 30 percent, and reduces the MSE by an order of magnitude. Our compression model is an attractive solution for situations where fast, high quality and low-overhead encoding and decoding of large data are required.
AAAI

Emulating Spatio-Temporal Realizations of Three-Dimensional Isotropic Turbulence via Deep Sequence Learning Models

Mohammadreza Momenifar, Enmao Diao, Vahid Tarokh, and 1 more author

2022 Workshop on AI to Accelerate Science and Engineering (AAAI), 2022

Abs PDF Code

We use a data-driven approach to model a three-dimensional turbulent flow using cutting-edge Deep Learning techniques. The deep learning framework incorporates physical constraints on the flow, such as preserving incompressibility and global statistical invariants of velocity gradient tensor. The accuracy of the model is assessed using statistical and physics-based metrics. The data set comes from Direct Numerical Simulation of an incompressible, statistically stationary, isotropic turbulent flow in a cubic box. Since the size of the dataset is memory intensive, we first generate a low-dimensional representation of the velocity data, and then pass it to a sequence prediction network that learns the spatial and temporal correlations of the underlying data. The dimensionality reduction is performed via extraction using Vector-Quantized Autoencoder (VQ-AE), which learns the discrete latent variables. For the sequence forecasting, the idea of Transformer architecture from natural language processing is used, and its performance compared against more standard Recurrent Networks (such as Convolutional LSTM). These architectures are designed and trained to perform a sequence to sequence multi-class classification task in which they take an input sequence with a fixed length (k) and predict a sequence with a fixed length (p), representing the future time instants of the flow. Our results for the short-term predictions show that the accuracy of results for both models deteriorates across predicted snapshots due to autoregressive nature of the predictions. Based on our diagnostics tests, the trained Conv-Transformer model outperforms the Conv-LSTM one and can accurately, both quantitatively and qualitatively, retain the large scales and capture well the inertial scales of flow but fails at recovering the small and intermittent fluid motions.

2021

arXiv

Decentralized Multi-Target Cross-Domain Recommendation for Multi-organization Collaborations

Enmao Diao, Vahid Tarokh, and Jie Ding

arXiv preprint arXiv:2110.13340, 2021

Abs PDF Code

Recommender Systems (RSs) are operated locally by different organizations in many realistic scenarios. If various organizations can fully share their data and perform computation in a centralized manner, they may significantly improve the accuracy of recommendations. However, collaborations among multiple organizations in enhancing the performance of recommendations are primarily limited due to the difficulty of sharing data and models. To address this challenge, we propose Decentralized Multi-Target Cross-Domain Recommendation (DMTCDR) with Multi-Target Assisted Learning (MTAL) and Assisted AutoEncoder (AAE). Our method can help multiple organizations collaboratively improve their recommendation performance in a decentralized manner without sharing sensitive assets. Consequently, it allows decentralized organizations to collaborate and form a community of shared interest. We conduct extensive experiments to demonstrate that the new method can significantly outperform locally trained RSs and mitigate the cold start problem.
ICLR

HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients

Enmao Diao, Jie Ding, and Vahid Tarokh

In International Conference on Learning Representations, 2021

Abs PDF Code

Federated Learning (FL) is a method of training machine learning models on private data distributed over a large number of possibly heterogeneous clients such as mobile phones and IoT devices. In this work, we propose a new federated learning framework named HeteroFL to address heterogeneous clients equipped with very different computation and communication capabilities. Our solution can enable the training of heterogeneous local models with varying computation complexities and still produce a single global inference model. For the first time, our method challenges the underlying assumption of existing work that local models have to share the same architecture as the global model. We demonstrate several strategies to enhance FL training and conduct extensive empirical evaluations, including five computation complexity levels of three model architecture on three datasets. We show that adaptively distributing subnetworks according to clients’ capabilities is both computation and communication efficient.

2020

TIT

On Statistical Efficiency in Learning

Jie Ding, Enmao Diao, Jiawei Zhou, and 1 more author

IEEE Transactions on Information Theory, 2020

Abs PDF Code

A central issue of many statistical learning problems is to select an appropriate model from a set of candidate models. Large models tend to inflate the variance (or overfitting), while small models tend to cause biases (or underfitting) for a given fixed dataset. In this work, we address the critical challenge of model selection to strike a balance between model fitting and model complexity, thus gaining reliable predictive power. We consider the task of approaching the theoretical limit of statistical learning, meaning that the selected model has the predictive performance that is as good as the best possible model given a class of potentially misspecified candidate models. We propose a generalized notion of Takeuchi’s information criterion and prove that the proposed method can asymptotically achieve the optimal out-sample prediction loss under reasonable assumptions. It is the first proof of the asymptotic property of Takeuchi’s information criterion to our best knowledge. Our proof applies to a wide variety of nonlinear models, loss functions, and high dimensionality (in the sense that the models’ complexity can grow with sample size). The proposed method can be used as a computationally efficient surrogate for leave-one-out cross-validation. Moreover, for modeling streaming data, we propose an online algorithm that sequentially expands the model complexity to enhance selection stability and reduce computation cost. Experimental studies show that the proposed method has desirable predictive power and significantly less computational cost than some popular methods.
ICASSP

Speech Emotion Recognition with Dual-Sequence LSTM Architecture

Jianyou Wang, Michael Xue, Ryan Culhane, and 3 more authors

In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020

Abs PDF Code

Speech Emotion Recognition (SER) has emerged as a critical component of the next generation of human-machine interfacing technologies. In this work, we propose a new dual-level model that predicts emotions based on both MFCC features and mel-spectrograms produced from raw audio signals. Each utterance is preprocessed into MFCC features and two mel-spectrograms at different time-frequency resolutions. A standard LSTM processes the MFCC features, while a novel LSTM architecture, denoted as Dual-Sequence LSTM (DS-LSTM), processes the two mel-spectrograms simultaneously. The outputs are later averaged to produce a final classification of the utterance. Our proposed model achieves, on average, a weighted accuracy of 72.7% and an unweighted accuracy of 73.3% - a 6% improvement over current state-of-the-art unimodal models - and is comparable with multimodal models that leverage textual information as well as audio signals.
DCC

DRASIC: Distributed Recurrent Autoencoder for Scalable Image Compression

Enmao Diao, Jie Ding, and Vahid Tarokh

In 2020 Data Compression Conference (DCC), 2020

Abs PDF Code

We propose a new architecture for distributed image compression from a group of distributed data sources. The work is motivated by practical needs of data-driven codec design, low power consumption, robustness, and data privacy. The proposed architecture, which we refer to as Distributed Recurrent Autoencoder for Scalable Image Compression (DRASIC), is able to train distributed encoders and one joint decoder on correlated data sources. Its compression capability is much better than the method of training codecs separately. Meanwhile, the performance of our distributed system with 10 distributed sources is only within 2 dB peak signal-to-noise ratio (PSNR) of the performance of a single codec trained with all data sources. We experiment distributed sources with different correlations and show how our data-driven methodology well matches the Slepian-Wolf Theorem in Distributed Source Coding (DSC). To the best of our knowledge, this is the first data-driven DSC framework for general distributed code design with deep learning.
DCC

Deep Clustering of Compressed Variational Embeddings

Suya Wu, Enmao Diao, Jie Ding, and 1 more author

In 2020 Data Compression Conference (DCC), 2020

Abs PDF Code

Motivated by the ever-increasing demands for limited communication bandwidth and low-power consumption, we propose a new methodology, named joint Variational Autoencoders with Bernoulli mixture models (VAB), for performing clustering in the compressed data domain. The idea is to reduce the data dimension by Variational Autoencoders (VAEs) and group data representations by Bernoulli mixture models (BMMs). Once jointly trained for compression and clustering, the model can be decomposed into two parts: a data vendor that encodes the raw data into compressed data, and a data consumer that classifies the received (compressed) data. In this way, the data vendor benefits from data security and communication bandwidth, while the data consumer benefits from low computational complexity. To enable training using the gradient descent algorithm, we propose to use the Gumbel-Softmax distribution to resolve the infeasibility of the back-propagation algorithm when assessing categorical samples.

2019

IEEE Big Data

Restricted Recurrent Neural Networks

Enmao Diao, Jie Ding, and Vahid Tarokh

In 2019 IEEE International Conference on Big Data (Big Data), 2019

Student Travel Award Abs PDF Code

Student Travel Award

Recurrent Neural Network (RNN) and its variations such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have become standard building blocks for learning online data of sequential nature in many research areas, including natural language processing and speech data analysis. In this paper, we present a new methodology to significantly reduce the number of parameters in RNNs while maintaining performance that is comparable or even better than classical RNNs. The new proposal, referred to as Restricted Recurrent Neural Network (RRNN), restricts the weight matrices corresponding to the input data and hidden states at each time step to share a large proportion of parameters. The new architecture can be regarded as a compression of its classical counterpart, but it does not require pre-training or sophisticated parameter fine-tuning, both of which are major issues in most existing compression techniques. Experiments on natural language modeling show that compared with its classical counterpart, the restricted recurrent architecture generally produces comparable results at about 50% compression rate. In particular, the Restricted LSTM can outperform classical RNN with even less number of parameters.

2018

ICASSP

A Penalized Method for the Predictive Limit of Learning

Jie Ding, Enmao Diao, Jiawei Zhou, and 1 more author

In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018

Abs PDF Code

Machine learning systems learn from and make predictions by building models from observed data. Because large models tend to overfit while small models tend to underfit for a given fixed dataset, a critical challenge is to select an appropriate model (e.g. set of variables/features). Model selection aims to strike a balance between the goodness of fit and model complexity, and thus to gain reliable predictive power. In this paper, we study a penalized model selection technique that asymptotically achieves the optimal expected prediction loss (referred to as the limit of learning) offered by a set of candidate models. We prove that the proposed procedure is both statistically efficient in the sense that it asymptotically approaches the limit of learning, and computationally efficient in the sense that it can be much faster than cross validation methods. Our theory applies for a wide variety of model classes, loss functions, and high dimensions (in the sense that the models’ complexity can grow with data size). We released a python package with our proposed method for general usage like logistic regression and neural networks.