Seonjin Na

Senior High Performance AI Engineer at NVIDIA.

sjna_paris.jpg

I’m a Senior High Performance AI Engineer at NVIDIA, working on GPU architecture and full-stack software optimization for AI workloads. I focus on accelerating distributed AI training and inference for LLMs and multimodal models through GPU-centric runtimes and system-level solutions.

Prior to joining NVIDIA, I was a Postdoctoral Fellow in the HPArch Group at Georgia Institute of Technology, supervised by Prof. Hyesoon Kim. I received my Ph.D. from the School of Computing at KAIST in 2023, advised by Prof. Jaehyuk Huh.

My research interests include GPU architecture, trusted computing, heterogeneous systems, distributed computing, and systems for machine learning. During my Ph.D., I focused on building secure architectures that provide trusted execution environments (TEEs) for accelerators such as GPUs and NPUs with minimal performance overhead. Currently, I am expanding my research toward challenges in multi-GPU architecture, hardware security, and large language model (LLM) acceleration.

Research Interests (Keywords): GPU/NPU Architecture, Systems for Machine Learning, Secure Architecture for GPUs/NPUs.

Work Experiences

  1. NVIDIA Logo

    NVIDIA
    11/2025 - Present
    Senior High Performance AI Engineer
    HW-SW Co-Design for Efficient and Scalable AI Training/Inference

  2. Georgia Tech logo

    Georgia Institute of Technology (Georgia Tech)
    06/2023 - 10/2025
    Postdoctoral Fellow
    Worked with Hyesoon Kim

  3. Microsoft logo

    Microsoft Research
    03/2019 - 06/2019
    Research Intern
    Mentors: Lintao Zhang and Yunxin Liu

  4. KAIST logo

    KAIST
    03/2018 - 02/2023
    Graduate Research Assistant
    Advisor: Jaehyuk Huh

News

Jun 04, 2026 Our technical report Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning is now available.
May 02, 2026 I will serve on the Program Committee for IISWC 2026.
Apr 29, 2026 Our paper Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding is now available on arXiv.
Mar 11, 2026 Our technical report Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning is now available.
Feb 11, 2026 I will serve on the Program Committee for MICRO 2026.
Jan 03, 2026 Our paper CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing is now available on arXiv.
Nov 18, 2025 I will serve on the External Program Committee for MLSys 2026.
Nov 01, 2025 I will serve on the External Program Committee for ISCA 2026.
Jul 25, 2025 I will be joining NVIDIA as a Senior High Performance AI Engineer.
Jul 14, 2025 Our paper Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering has been accepted to MICRO 2025.
Jul 07, 2025 Our paper Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD has been accepted for publication in IEEE Computer Architecture Letters (CAL).
Apr 08, 2025 I received the Outstanding Postdoctoral Research Award from the College of Computing at Georgia Tech.
Mar 22, 2025 Our paper Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors has been accepted to ISCA 2025.
Feb 11, 2025 Our paper FlexInfer: Flexible LLM Inference with CPU Computations has been accepted to MLSys 2025.
Feb 04, 2025 I will serve on the Program Committee for ASPLOS 2026.
Jan 14, 2025 I will serve as the Workshops/Tutorials Chair for IISWC 2025.
Dec 03, 2024 I will serve on the Program Committee for GPGPU 2025.
Nov 02, 2024 Our paper Let-Me-In: (Still) Employing In-pointer Bounds metadata for Fine-grained GPU Memory Safety has been accepted to HPCA 2025.
Oct 01, 2024 I have been selected as a presenter for the MICRO 2024 PhD Forum.
Sep 06, 2024 I will serve on the Program Committee for IPDPS 2025.
Aug 16, 2024 I will serve on the Artifact Evaluation Committee for MICRO 2024.
Jul 24, 2024 I will serve on the Artifact Evaluation Committee for EuroSys 2025.
Jul 17, 2024 Our paper Understanding Performance Implications of LLM Inference on CPUs has been accepted to IISWC 2024.
Jul 13, 2024 I will serve as a Travel Grants Co-Chair for ASPLOS 2025.
Jul 02, 2024 I will serve on the Artifact Evaluation Committee for ASPLOS 2025.
May 30, 2024 Our paper Allegro: GPU Simulation Acceleration for Machine Learning Workloads has been accepted to MLArchsys.
Apr 30, 2024 I will serve on the Artifact Evaluation Committee for OSDI 2024 / ATC 2024.
Apr 01, 2024 I will serve on the Program Committee for SC 2024.
Mar 20, 2024 Our paper Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs has been accepted to ISCA 2024.
Feb 23, 2024 I will serve on the Artifact Evaluation Committee for ISCA 2024.
Feb 22, 2024 I will attend the GPGPU 2024 Workshop as a moderator.
Oct 24, 2023 Our paper Supporting Secure Multi-GPU Computing with Dynamic and Batched Metadata Management has been accepted to HPCA 2024.
Jul 24, 2023 Our paper Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training has been accepted to MICRO 2023.
Dec 13, 2022 I will be joining the HPArch group as a postdoctoral researcher.
Dec 09, 2022 I successfully defended my Ph.D. thesis 🎓.
Aug 23, 2022 Our paper Tunable Memory Protection for Secure NPUs has been accepted to ICCD 2022.
Oct 28, 2021 Our paper TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit has been accepted to HPCA 2022.
Oct 28, 2020 Our paper Common Counters: Compressed Encryption Counters for Secure GPU Memory has been accepted to HPCA 2021.

Publications

  1. TechReport
    Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
    2026
  2. arXiv
    Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding
    Hayate Iso, Tiyasa Mitra, Sudipta Mondal, Rasoul Shafipour, Venmugil Elango, Terry Kong, Yuki Huang, Seonjin Na, Izzy Putterman, Benjamin Chislett, and 8 more authors
    2026
  3. TechReport
    Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
    2026
  4. arXiv
    CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing
    Saurabh Singh, Ruobing Han, Jaewon Lee, Seonjin Na, Yonghae Kim, Taesoo Kim, and Hyesoon Kim
    2026
  5. MICRO
    Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering
    Euijun Chung, Seonjin Na, Sung Ha Kang, and Hyesoon Kim
    In IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2025
  6. CAL
    Contention-Aware GPU Thread Block Scheduler for Efficient GPU-SSD
    Xueyang Liu, Seonjin Na, Euijun Chung, Jiashen Cao, Jing Yang, and Hyesoon Kim
    In IEEE Computer Architecture Letters (CAL) , 2025
  7. ISCA
    Unified Memory Protection with Multi-granular MAC and Integrity Tree for Heterogeneous Processors
    Sunho Lee, Seonjin Na, Jeongwon Choi, Jinwon Pyo, and Jaehyuk Huh
    In IEEE International Symposium on Computer Architecture (ISCA) , 2025
  8. MLSys
    FlexInfer: Flexible LLM Inference with CPU Computations
    Seonjin Na, Geonhwa Jeong,  Byunghoon Ahn, Aaron Jezghani, Jeffrey Young, Christopher J. Hughes, Tushar Krishna, and Hyesoon Kim
    In Conference on Machine Learning and Systems (MLSys) , 2025
  9. HPCA
    Let-Me-In: (Still) Employing In-pointer Bounds metadata for Fine-grained GPU Memory Safety
    Jaewon Lee, Euijun Chung,  Saurabh Singh, Seonjin Na, Yonghae Kim, Jaekyu Lee, and Hyesoon Kim
    In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2025
  10. IISWC
    Understanding Performance Implications of LLM Inference on CPUs
    Seonjin Na, Geonhwa Jeong,  Byunghoon Ahn, Jeffrey Young, Tushar Krishna, and Hyesoon Kim
    In IEEE International Symposium on Workload Characterization (IISWC) , 2024
  11. MLArchSys
    Allegro: GPU Simulation Acceleration for Machine Learning Workloads
    Euijun Chung, Seonjin Na, and Hyesoon Kim
    In MLArchSys in ISCA , 2024
  12. ISCA
    Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs
    Yuan Feng, Seonjin Na, Hyesoon Kim, and Hyeran Jeon
    In IEEE International Symposium on Computer Architecture (ISCA) , 2024
  13. HPCA
    Supporting Secure Multi-GPU Computing with Dynamic and Batched Metadata Management
    Seonjin Na, Jungwoo Kim, Sunho Lee, and Jaehyuk Huh
    In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2024
  14. MICRO
    Improving Data Reuse in NPU On-chip Memory with Interleaved Gradient Order for DNN Training
    Jungwoo Kim, Seonjin Na, Sanghyeon Lee, Sunho Lee, and Jaehyuk Huh
    In IEEE/ACM International Symposium on Microarchitecture (MICRO) , 2023
  15. ICCD
    Tunable Memory Protection for Secure Neural Processing Units
    Sunho Lee, Seonjin Na, Jungwoo Kim, Jongse Park, and Jaehyuk Huh
    In IEEE International Conference on Computer Design (ICCD) , 2022
  16. HPCA
    TNPU: Supporting Trusted Execution with Tree-less Integrity Protection for Neural Processing Unit
    Sunho Lee, Jungwoo Kim, Seonjin Na, Jongse Park, and Jaehyuk Huh
    In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2022
  17. HPCA
    Common Counters: Compressed Encryption Counters for Secure GPU Memory
    Seonjin Na, Sunho Lee, Yeonjae Kim, Jongse Park, and Jaehyuk Huh
    In IEEE International Symposium on High-Performance Computer Architecture (HPCA) , 2021

Academic Services

Technical Program Committee

  • 2026
    • IEEE/ACM International Symposium on Microarchitecture (MICRO)
    • IEEE International Symposium on Workload Characterization (IISWC)
    • International Symposium on Computer Architecture (ISCA)
    • Conference on Machine Learning and Systems (MLSys)
    • International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)
  • 2025
    • General Purpose Processing on Graphics Processing Units (GPGPU)
    • IEEE International Parallel & Distributed Processing Symposium (IPDPS)
  • 2024
    • International Conference for High Performance Computing, Networking, Storage, and Analysis (SC)

Journal Reviewer

  • ACM Transactions on Computer Systems (TOCS) 2024, 2025
  • ACM Transactions on Architecture and Code Optimization (TACO) 2024, 2025
  • IEEE Micro 2025, 2026
  • IEEE Transactions on Dependable and Secure Computing (TDSC) 2023
  • IEEE Computer Architecture Letters (CAL) 2023, 2025, 2026

Organizing Committee

  • Workshop/Tutorial Chair: IEEE International Symposium on Workload Characterization (IISWC) 2025
  • Travel Grant Chair: International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2025
  • Web Chair: IEEE Computer Society TCuARCH