Dong Dai 代栋
Associate Professor
Department of Computer & Information Sciences
University of Delaware
I am an Associate Professor at the University of Delaware in the Department of Computer and Information Sciences, where I lead the Data Intelligence Research Lab (DIRLab). My research focuses on data-intensive and high-performance systems, spanning parallel file systems, metadata management, graph storage, resource scheduling, and machine learning for systems. Previously, I was an Assistant Professor at UNC Charlotte and held postdoctoral positions at Texas Tech University and Argonne National Lab. I received my Ph.D. in Computer Science from the University of Science and Technology of China (USTC).
Publications
* Ph.D. student mentored † Master/undergraduate student mentored
- IPDPS'26 QoSFlow: Ensuring QoS of Distributed Workflows Using Interpretable Sensitivity Models
- IPDPS'26 CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
- npj Comp. Mat. AI-assisted Rapid Crystal Structure Generation towards a Target Local Environment
- SC'25 STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for HPC Parallel File Systems
- SC'25 Improving SpGEMM Performance Through Matrix-Reordering and Cluster-wise Computation
- HPDC'25 TSUE: A Two-Stage Data Update Method for an Erasure Coded Cluster File System
- CUG'25 Towards Empirical Roofline Modeling of Distributed Data Services: Mapping the Boundaries of RPC Throughput
- CCGrid'25 DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System
- IPDPS'25 IOAgent: Democratizing Trustworthy HPC I/O Performance Diagnosis Capability via LLMs
- IPDPS'25 Be Aware of Metadata Corruption in Parallel File Systems: It Can Be Silent and Catastrophic
- IPDPS'25 AdapTBF: Decentralized Bandwidth Control via Adaptive Token Borrowing for HPC Storage
- PDSW'25 LLMTailor: A Layer-wise Tailoring Tool for Efficient Checkpointing of Large Language Models
- PDSW'25 RL4Sys: A Lightweight System-driven RL Framework for Drop-in Integration in System Optimization
- IEEE TC'24 Hardware Accelerated Vision Transformer via Heterogeneous Architecture Design and Adaptive Dataflow Mapping
- BigData'24 QualityNet: Error-bounded Lossy Compression Quality Prediction via Deep Surrogate
- PDSW'24 Understanding and Predicting Cross-Application I/O Interference in HPC Storage Systems
- HotStorage'24 ION: Navigating HPC I/O Optimization Journey using Large Language Models
- JSSPP'24 An Empirical Study of Machine Learning-based Synthetic Job Trace Generation Methods
- IPDPS'24 Cross-System Analysis of Job Characterization and Scheduling in Large-Scale Computing Clusters
Teaching
University of Delaware
- F 2025 CISC 361 Operating System
- S 2025 CISC 360 Computer Architecture
- F 2024 CISC 361 Operating System
UNC Charlotte
- S 2023 ITCS-6050/8050 ML for Efficient Computing Systems
- S 2023 Undergraduate Research Initiative
- F 2023 Undergraduate Research Initiative
- 19–22 ITCS-5145 Parallel Computing
- 20–22 ITSC-3181 Intro to Computer Architecture
- 18–19 ITCS-6144/8144 Operating Systems Design
Doctoral Students
Current
- Md Hasanur Rashid — 2024–present, passed proposal defense
- Chris Egersdoerfer — 2024–present, passed preliminary exam
- Jiaxin Dong — 2024–present
- Minqiu Sun — 2024–present
- Yuan Liang — 2025–present
Graduated
- Abdullah Al Raqibul Islam — Ph.D. 2019–2025 → Research Associate @ OSU
- Di Zhang — Ph.D. 2019–2024 → Research Scientist @ Meta
Research Projects
- Active Moving Machine Learning into the Next-Generation Cloud Flexibly, Agilely and Efficiently
- Active Hybrid NVM based Computing Architecture for Machine Learning Applications
- Active Parallel Graph-Based Paradigm for HPC Parallel File System Checkers
- Active Empowering Data-driven Discovery with Provenance Infrastructure
- Past Partitioning Large Graphs in Deep Storage Architecture
- Past Tuning Extreme-scale Storage Stack through Deep Reinforcement Learning
- Past Uncovering Vulnerabilities in Parallel File Systems