Skip to content

Senior Engineer - Ray

    • Kingston, Ontario
  • 48fg7

Job description

Huawei Canada has an immediate permanent opening for a Senior Engineer.

About the team:

The Centre for Software Excellence Lab conducts pioneering research in software engineering, focusing on next-generation technologies. This team integrates industry best practices with cutting-edge academic research to address lifecycle software engineering challenges, including foundation model applications, software performance engineering, hyper-cluster programming, next-gen mobile OS, and cloud-native computing. This lab uniquely allows researchers to apply innovations directly to products affecting billions of customers while promoting open-source contributions, publications, conference participation, and collaborations to create a broader impact.

About the job:

  • Design and implement scalable infrastructure for various AI/LLM workloads, including but not limited to model pre-training, post training, reinforcement learning, multi modal data processing, model serving, etc.

  • Contribute to open-source projects and stay updated with the latest developments in AI infrastructure (e.g., Ray, vLLM, veRL)

  • Develop and maintain data pipelines using tools like Ray Data to handle large-scale datasets efficiently.

  • Optimize system performance and resource utilization across heterogeneous computing environments.

  • Collaborate with cross-functional teams to integrate infrastructure solutions into existing ML pipelines.

  • Meet top industry and academic leaders and experts around the world, collaborate with top researchers and students, consult with Engineering teams across diverse domains, publish research papers in far-reaching and impactful areas, and submit patent applications for novel inventions.

Job requirements

About the ideal candidate:

  • Bachelors/Master/Ph.D Degree in Computer Science, Electrical & Computer Engineering, Machine Learning, or relevant domains.

  • Experience with large language models (LLMs) and related infrastructure.

  • Solid experience with one or more of the following programming languages: Python/C++; Familiarity with software development practices (version management, build management, CI/CD, debugging and profiling).

  • Solid understanding in any of these areas: Machine Learning and/or Deep Learning, Large Models Training and Finetuning (e.g., NLP/CV).

  • Experience with mainstream model training and inference frameworks and tools (e.g., PyTorch, HuggingFace Transformer&Accelerate, DeepSpeed, Megatron, veRL).

  • Experience in using frameworks and tools of any of the aforementioned areas (e.g., Spark, Flink, Ray for Distributed Computing, Docker, K8S for Cloud-Native app/framework development).

  • Ability to evaluate, apply, and mature published research to real-world problems on prototype systems and have an inquisitive mindset, proven research and communication.

or