Seonjin Na

I’m a Senior High Performance AI Engineer at NVIDIA, accelerating AI workloads on NVIDIA GPUs through GPU architecture and full-stack software optimization. I focus on accelerating distributed AI training/inference for LLMs and multi-modal models through GPU-centric runtimes and system-level solutions.

Prior to joining NVIDIA, I was a Postdoctoral Fellow in the HPArch Group at Georgia Institute of Technology, supervised by Prof. Hyesoon Kim. I received my Ph.D. from the School of Computing at KAIST in 2023, advised by Prof. Jaehyuk Huh.

My research interests lie in GPU architecture, trusted computing, heterogeneous systems, distributed computing, and systems for machine learning. During my Ph.D., I focused on building secure architectures to provide trusted execution environments (TEE) on accelerators such as GPUs and NPUs with minimal performance overhead. Currently, I am actively engaged in expanding my research to address various challenges in multi-GPU architecture, hardware security, and accelerating large language models (LLMs).

Research Interest (Keyword): GPU/NPU Architecture, Systems for Machine Learning, Secure Architecture for GPU/NPU.

Work Experiences

NVIDIA
11/2025 - Present
Senior High Performance AI Engineer
HW-SW Co-Design for Efficient and Scalable AI Training/Inference
Georgia Institute of Technology (Georgia Tech)
06/2023 - 10/2025
Postdoctoral Fellow
Worked with Hyesoon Kim
Microsoft Research
03/2019 - 06/2019
Research Intern
Mentors: Lintao Zhang and Yunxin Liu
KAIST
03/2018 - 02/2023
Graduate Research Assistant
Advisor: Jaehyuk Huh

News

Mar 11, 2026	Our technical report Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning is now available.
Feb 11, 2026	I will serve as Program Committee for MICRO 2026.
Jan 03, 2026	Our paper CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing is now available on arXiv.
Nov 18, 2025	I will serve as External Program Committee for MLSys 2026.
Nov 01, 2025	I will serve as External Program Committee for ISCA 2026.
Jul 25, 2025	I will be joining NVIDIA as a Senior High Performance AI Engineer.
Jul 14, 2025	Our paper, Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering, has been accepted to MICRO 2025.
Jul 07, 2025	Our paper, Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD, has been accepted for publication in IEEE Computer Architecture Letters (CAL).
Apr 08, 2025	I have been awarded the Outstanding Post-doctoral Research Award by the College of Computing at Georgia Tech.
Mar 22, 2025	Our paper, Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors, has been accepted to ISCA 2025.
Feb 11, 2025	Our paper, FlexInfer: Flexible LLM Inference with CPU Computations, has been accepted to MLSys 2025.
Feb 04, 2025	I will serve as Program Committee for ASPLOS 2026.
Jan 14, 2025	I will serve as Workshops/Tutorials Chair for IISWC 2025.
Dec 03, 2024	I will serve as Program Committee for GPGPU 2025.
Nov 02, 2024	Our paper, Let-Me-In: (Still) Employing In-pointer Bounds metadata for Fine-grained GPU Memory Safety, has been accepeted to HPCA 2025.
Oct 01, 2024	I have been selected as one of the presenters for the MICRO 2024 PhD Forum.
Sep 06, 2024	I will serve as Program Committee for IPDPS 2025.
Aug 16, 2024	I will serve as Artifact Evaluation Committee for MICRO 2024.
Jul 24, 2024	I will serve as Artifact Evalution Committee for EuroSys 2025.
Jul 17, 2024	Our paper, Understanding Performance Implications of LLM Inference on CPUs, has been accepted to IISWC 2024.
Jul 13, 2024	I will serve as Travel Grants Co-Chairs for ASPLOS 2025.
Jul 02, 2024	I will serve as Artifact Evaluation Committee for ASPLOS 2025.
May 30, 2024	Our paper, Allegro: GPU Simulation Acceleration for Machine Learning Workloads, has been accepted to MLArchsys.
Apr 30, 2024	I will serve as Artifact Evaluation Committee for OSDI 2024 / ATC 2024.
Apr 01, 2024	I will serve as Program Committee for SC 2024.
Mar 20, 2024	Our paper, Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs, has been accepted to ISCA 2024.
Feb 23, 2024	I will serve as Artifact Evaluation Committee for ISCA 2024.
Feb 22, 2024	I will attend GPGPU 2024 workshop as a moderator.
Oct 24, 2023	Our paper, Supporting Secure Multi-GPU Computing with Dynamic and Batched Metadata Management, has been accepted to HPCA 2024.
Jul 24, 2023	Our paper, Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training , has been accepted to MICRO 2023.
Dec 13, 2022	I will be joining the HPArch group as a postdoctoral researcher.
Dec 09, 2022	I successfully defended my Ph.D. Thesis 🎓.
Aug 23, 2022	Our paper, Tunable Memory Protection for Secure NPUs , has been accepted to ICCD 2022.
Oct 28, 2021	Our paper, TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit, has been accepted to HPCA 2022.
Oct 28, 2020	Our paper, Common Counters: Compressed Encryption Counters for Secure GPU Memory, has been accepted to HPCA 2021.

Publications

TechReport

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

2026

PDF
arXiv

CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing

Saurabh Singh, Ruobing Han, Jaewon Lee, Seonjin Na, Yonghae Kim, Taesoo Kim, and Hyesoon Kim

2026

arXiv
MICRO

Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering

Euijun Chung, Seonjin Na, Sung Ha Kang, and Hyesoon Kim

In IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2025

PDF Slides
CAL

Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD

Xueyang Liu, Seonjin Na, Euijun Chung, Jiashen Cao, Jing Yang, and Hyesoon Kim

In IEEE Computer Architecture Letters (CAL) , 2025

PDF
ISCA

Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors

Sunho Lee, Seonjin Na, Jeongwon Choi, Jinwon Pyo, and Jaehyuk Huh

In IEEE International Symposium on Computer Architecture (ISCA) , 2025

PDF Slides
MLSys

FlexInfer: Flexible LLM Inference with CPU Computations

Seonjin Na, Geonhwa Jeong, Byunghoon Ahn, Aaron Jezghani, Jeffrey Young, Christopher J. Hughes, Tushar Krishna, and Hyesoon Kim

In Conference on Machine Learning and Systems (MLSys) , 2025

PDF Slides
HPCA

Let-Me-In: (Still) Employing In-pointer Bounds metadata for Fine-grained GPU Memory Safety

Jaewon Lee, Euijun Chung, Saurabh Singh, Seonjin Na, Yonghae Kim, Jaekyu Lee, and Hyesoon Kim

In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2025

PDF Slides
IISWC

Understanding Performance Implications of LLM Inference on CPUs

Seonjin Na, Geonhwa Jeong, Byunghoon Ahn, Jeffrey Young, Tushar Krishna, and Hyesoon Kim

In IEEE International Symposium on Workload Characterization (IISWC) , 2024

PDF Slides
MLArchSys

Allegro: GPU Simulation Acceleration for Machine Learning Workloads

Euijun Chung, Seonjin Na, and Hyesoon Kim

In MLArchSys in ISCA , 2024

PDF Slides
ISCA

Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs

Yuan Feng, Seonjin Na, Hyesoon Kim, and Hyeran Jeon

In IEEE International Symposium on Computer Architecture (ISCA) , 2024

PDF Slides
HPCA

Supporting Secure Multi-GPU Computing with Dynamic and Batched Metadata Management

Seonjin Na, Jungwoo Kim, Sunho Lee, and Jaehyuk Huh

In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2024

PDF Slides
MICRO

Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training

Jungwoo Kim, Seonjin Na, Sanghyeon Lee, Sunho Lee, and Jaehyuk Huh

In IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2023

PDF Slides
ICCD

Tunable Memory Protection for Secure Neural Processing Units

Sunho Lee, Seonjin Na, Jungwoo Kim, Jongse Park, and Jaehyuk Huh

In IEEE International Conference on Computer Design (ICCD) , 2022

PDF Slides
HPCA

TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit

Sunho Lee, Jungwoo Kim, Seonjin Na, Jongse Park, and Jaehyuk Huh

In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2022

PDF Slides
HPCA

Common Counters: Compressed Encryption Counters for Secure GPU Memory

Seonjin Na, Sunho Lee, Yeonjae Kim, Jongse Park, and Jaehyuk Huh

In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2021

PDF Slides

Academic Services

Technical Program Committee

2026
- IEEE/ACM International Symposium on Microarchitecture (MICRO)
- IEEE International Symposium on Workload Characterization (IISWC)
- International Symposium on Computer Architecture (ISCA)
- Conference on Machine Learning and Systems (MLSys)
- International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
2025
- General Purpose Processing on Graphics Processing Units (GPGPU)
- IEEE International Parallel & Distributed Processing Symposium (IPDPS)
2024
- International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)

Journal Reviewer

ACM Transactions on Computer Systems (TOCS) 2024, 2025
ACM Transactions on Architecture and Code Optimization (TACO) 2024, 2025
IEEE Micro 2025, 2026
IEEE Transactions on Dependable and Secure Computing (TDSC) 2023
IEEE Computer Architecture Letters (CAL) 2023, 2025, 2026

Organizing Committee

Workshop/Tutorial Chair: IEEE International Symposium on Workload Characterization (IISWC) 2025
Travel Grant Chair: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2025
Web Chair: IEEE Computer Society TCuARCH