Skip to content

Senior Researcher – Hardware Efficient AI Foundation Model Training

    • Markham, Ontario
  • 5a54h

Job description

Huawei Canada has an immediate permanent opening for a Principal Architect.

About the team: 

The Computing Data Application Acceleration Lab aims to create a leading global data analytics platform organized into three specialized teams using innovative programming technologies. This team focuses on full-stack innovations, including software-hardware co-design and optimizing data efficiency at both the storage and runtime layers. This team also develops next-generation GPU architecture for gaming, cloud rendering, VR/AR, and Metaverse applications.

One of the goals of this lab are to enhance algorithm performance and training efficiency across industries, fostering long-term competitiveness.

About the job:

  • Collaborate with internal and external organizations to lead the design of foundational model architecture for LLM/Code/Multimodal subfields by breakthroughs in post-training and continual training. Develop a foundational model with state-of-the-art performance and hardware efficiency, and establish industry impact.

  • Propose the technical requirements for large-scale distributed training and inference infrastructures such as parallelization and operator fusion, analyze the computational characteristics of typical architectures, and ensure the accuracy and advancement of AI hardware & infrastructure evolution.

Job requirements

About the ideal candidate:

  • Experience in training and optimizing cutting-edge AI models/applications, especially in training and deploying AI models at a scale of 10B+ parameters.

  • Proficiency in the latest AI architecture (such as long-sequence, reinforcement learning, multimodal, and agents). Deep understanding of AI algorithm mechanisms.

  • Solid command of the underlying implementation of AI frameworks (such as PyTorch, vLLM, and SGLang), and mainstream distributed training and inference techniques.

  • Familiarity with AI chip architecture (such as GPU, NPU, and TPU). Understanding of memory hierarchy and interconnect technologies is an asset.

  • PhD preferred in AI architecture, computer architecture, or related fields.

  • Solid publication records in the field of AI systems or chip design are an asset.

or