Seonjin Na
Senior High Performance AI Engineer at NVIDIA.
I’m a Senior High Performance AI Engineer at NVIDIA, working on GPU architecture and full-stack software optimization for AI workloads. I focus on accelerating distributed AI training and inference for LLMs and multimodal models through GPU-centric runtimes and system-level solutions.
Prior to joining NVIDIA, I was a Postdoctoral Fellow in the HPArch Group at Georgia Institute of Technology, supervised by Prof. Hyesoon Kim. I received my Ph.D. from the School of Computing at KAIST in 2023, advised by Prof. Jaehyuk Huh.
My research interests include GPU architecture, trusted computing, heterogeneous systems, distributed computing, and systems for machine learning. During my Ph.D., I focused on building secure architectures that provide trusted execution environments (TEEs) for accelerators such as GPUs and NPUs with minimal performance overhead. Currently, I am expanding my research toward challenges in multi-GPU architecture, hardware security, and large language model (LLM) acceleration.
Research Interests (Keywords): GPU/NPU Architecture, Systems for Machine Learning, Secure Architecture for GPUs/NPUs.
Work Experiences
NVIDIA
11/2025 - Present
Senior High Performance AI Engineer
HW-SW Co-Design for Efficient and Scalable AI Training/Inference
Georgia Institute of Technology (Georgia Tech)
06/2023 - 10/2025
Postdoctoral Fellow
Worked with Hyesoon Kim
Microsoft Research
03/2019 - 06/2019
Research Intern
Mentors: Lintao Zhang and Yunxin Liu
KAIST
03/2018 - 02/2023
Graduate Research Assistant
Advisor: Jaehyuk Huh
News
| Jun 04, 2026 | Our technical report Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning is now available. |
|---|---|
| May 02, 2026 | I will serve on the Program Committee for IISWC 2026. |
| Apr 29, 2026 | Our paper Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding is now available on arXiv. |
| Mar 11, 2026 | Our technical report Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning is now available. |
| Feb 11, 2026 | I will serve on the Program Committee for MICRO 2026. |
| Jan 03, 2026 | Our paper CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing is now available on arXiv. |
| Nov 18, 2025 | I will serve on the External Program Committee for MLSys 2026. |
| Nov 01, 2025 | I will serve on the External Program Committee for ISCA 2026. |
| Jul 25, 2025 | I will be joining NVIDIA as a Senior High Performance AI Engineer. |
| Jul 14, 2025 | Our paper Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering has been accepted to MICRO 2025. |
| Jul 07, 2025 | Our paper Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD has been accepted for publication in IEEE Computer Architecture Letters (CAL). |
| Apr 08, 2025 | I received the Outstanding Postdoctoral Research Award from the College of Computing at Georgia Tech. |
| Mar 22, 2025 | Our paper Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors has been accepted to ISCA 2025. |
| Feb 11, 2025 | Our paper FlexInfer: Flexible LLM Inference with CPU Computations has been accepted to MLSys 2025. |
| Feb 04, 2025 | I will serve on the Program Committee for ASPLOS 2026. |
| Jan 14, 2025 | I will serve as the Workshops/Tutorials Chair for IISWC 2025. |
| Dec 03, 2024 | I will serve on the Program Committee for GPGPU 2025. |
| Nov 02, 2024 | Our paper Let-Me-In: (Still) Employing In-pointer Bounds metadata for Fine-grained GPU Memory Safety has been accepted to HPCA 2025. |
| Oct 01, 2024 | I have been selected as a presenter for the MICRO 2024 PhD Forum. |
| Sep 06, 2024 | I will serve on the Program Committee for IPDPS 2025. |
| Aug 16, 2024 | I will serve on the Artifact Evaluation Committee for MICRO 2024. |
| Jul 24, 2024 | I will serve on the Artifact Evaluation Committee for EuroSys 2025. |
| Jul 17, 2024 | Our paper Understanding Performance Implications of LLM Inference on CPUs has been accepted to IISWC 2024. |
| Jul 13, 2024 | I will serve as a Travel Grants Co-Chair for ASPLOS 2025. |
| Jul 02, 2024 | I will serve on the Artifact Evaluation Committee for ASPLOS 2025. |
| May 30, 2024 | Our paper Allegro: GPU Simulation Acceleration for Machine Learning Workloads has been accepted to MLArchsys. |
| Apr 30, 2024 | I will serve on the Artifact Evaluation Committee for OSDI 2024 / ATC 2024. |
| Apr 01, 2024 | I will serve on the Program Committee for SC 2024. |
| Mar 20, 2024 | Our paper Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs has been accepted to ISCA 2024. |
| Feb 23, 2024 | I will serve on the Artifact Evaluation Committee for ISCA 2024. |
| Feb 22, 2024 | I will attend the GPGPU 2024 Workshop as a moderator. |
| Oct 24, 2023 | Our paper Supporting Secure Multi-GPU Computing with Dynamic and Batched Metadata Management has been accepted to HPCA 2024. |
| Jul 24, 2023 | Our paper Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training has been accepted to MICRO 2023. |
| Dec 13, 2022 | I will be joining the HPArch group as a postdoctoral researcher. |
| Dec 09, 2022 | I successfully defended my Ph.D. thesis 🎓. |
| Aug 23, 2022 | Our paper Tunable Memory Protection for Secure NPUs has been accepted to ICCD 2022. |
| Oct 28, 2021 | Our paper TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit has been accepted to HPCA 2022. |
| Oct 28, 2020 | Our paper Common Counters: Compressed Encryption Counters for Secure GPU Memory has been accepted to HPCA 2021. |
Publications
- TechReportNemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning2026
- arXiv
- CALContention-Aware GPU Thread Block Scheduler for Efficient GPU-SSDIn IEEE Computer Architecture Letters (CAL) , 2025
Academic Services
Technical Program Committee
- 2026
- IEEE/ACM International Symposium on Microarchitecture (MICRO)
- IEEE International Symposium on Workload Characterization (IISWC)
- International Symposium on Computer Architecture (ISCA)
- Conference on Machine Learning and Systems (MLSys)
- International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
- 2025
- General Purpose Processing on Graphics Processing Units (GPGPU)
- IEEE International Parallel & Distributed Processing Symposium (IPDPS)
- 2024
- International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)
Journal Reviewer
- ACM Transactions on Computer Systems (TOCS) 2024, 2025
- ACM Transactions on Architecture and Code Optimization (TACO) 2024, 2025
- IEEE Micro 2025, 2026
- IEEE Transactions on Dependable and Secure Computing (TDSC) 2023
- IEEE Computer Architecture Letters (CAL) 2023, 2025, 2026
Organizing Committee
- Workshop/Tutorial Chair: IEEE International Symposium on Workload Characterization (IISWC) 2025
- Travel Grant Chair: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2025
- Web Chair: IEEE Computer Society TCuARCH