Skip to content

Senior Engineer - Foundation Models

    • Kingston, Ontario
  • 54y5w

Job description

Huawei Canada has an immediate permanent opening for a Senior Engineer.


About the team:

The Centre for Software Excellence Lab conducts pioneering research in software engineering, focusing on next-generation technologies. This team integrates industry best practices with cutting-edge academic research to address lifecycle software engineering challenges, including foundation model applications, software performance engineering, hyper-cluster programming, next-gen mobile OS, and cloud-native computing. This lab uniquely allows researchers to apply innovations directly to products affecting billions of customers while promoting open-source contributions, publications, conference participation, and collaborations to create a broader impact.


About the job:

  • Research, prototype and build core infrastructure, tooling, and platforms to improve the productivity, quality, and efficiency of engineering and serving foundation model applications.
  • Design, implement and assess application programming APIs, frameworks and runtime systems software for heterogeneous architectures (e.g., GPU, NPU), familiar with machine learning systems or AI infrastructures.
  • Support the integration process of novel software frameworks on in-house hardware platforms (e.g. performance modeling, analysis of future computing architectures, resource allocation and management, scheduling, fault tolerance and resiliency, communication and shared memory).
  • Meet top industry and academic leaders and experts around the world, collaborate with top researchers and students, consult with Engineering teams across diverse domains, publish research papers in far-reaching and impactful areas, and submit patent applications for novel inventions.

Job requirements

About the ideal candidate:

  • Master or PhD Degree in Computer Science, Electrical & Computer Engineering, Machine Learning, or relevant domains.
  • Solid experience with one or more of the following programming languages: Python/C/C++/Go; Familiarity with software development practices (version management, build management, CI/CD, debugging and profiling).
  • Solid understanding in any of these areas: Machine Learning and/or Deep Learning, Large Models Training and Finetuning (e.g., NLP/CV)
  • Experience with mainstream model training and inference frameworks and tools (e.g., PyTorch, Tensorflow, PaddlePaddle, Oneflow, MindSpore, HuggingFace Transformer&Accelerate, DeepSpeed, Megatron, FasterTransformer, Triton Inference).
  • Solid understanding in Computer Architecture, Distributed Computing, Parallel Computing, Cloud Native, Operating Systems, Networks; experience in using frameworks and tools of any of the aforementioned areas (e.g., Spark, Flink, Ray for Distributed Computing, Docker, K8S for Cloud-Native app/framework development).
  • Ability to evaluate, apply, and mature published research to real-world problems on prototype systems.
  • Have an inquisitive mindset, proven research and communication skills, can conduct investigations and experiments independently, and can interpret experiment data and present results clearly and concisely. 
  • Publications in related top-tier venues (e.g., ICSE, FSE, TSE, ICLR, ICML, NeurIPS, OSDI, SOSP) is an asset.

or