Aller au contenu

Senior Engineer-Cloud AI Infrastructure

    • Markham, Ontario
  • u9pvg

Job description

Huawei Canada has an immediate permanent opening for a Senior Engineer.

About the team:

Established in 2014, the Distributed Scheduling and Data Engine Lab is Huawei Cloud's technical innovation center in Canada. The lab focuses on researching and developing advanced cloud technologies, supporting the productization and iterative optimization of its technical achievements. Current research areas include cloud native databases, intelligent SQL engine, AI/Agent infrastructure and LLM/Agent Evaluation Technology. The lab fosters a robust technical environment, allowing collaboration with industry experts to create a highly competitive cloud platform. 

About the job:

  • Join a cutting-edge team building next-generation infrastructure for AI and agentic workloads, sitting at the intersection of research, systems engineering, and product innovation

  • Track and analyze the latest trends in LLMs, agentic AI, and multi-step agent workflows to inform infrastructure direction

  • Investigate and address infrastructure bottlenecks across GPU/NPU utilization, data movement, memory hierarchy, and distributed execution

  • Design system-level architectures for agent execution frameworks, multi-model orchestration, and large-scale inference systems

  • Evaluate AI/agent workload requirements on cloud and hybrid infrastructure, balancing trade-offs across cost, performance, and scalability

  • Deep dive into the full infrastructure stack — from distributed schedulers and inference pipelines to caching and data access patterns

  • Collaborate closely with engineering and product teams to prototype and deliver production-ready solutions grounded in research

  • Translate emerging AI trends and workload patterns into scalable, impactful infrastructure designs

The targeted annual total compensation for this position ranges from $127,000 to $225,000 depending on education, experience, and demonstrated expertise.

Job requirements

About the Ideal Candidate:

  • Solid foundation in distributed systems, cloud infrastructure, or systems engineering, with hands-on experience building or operating large-scale systems

  • Proficiency with Kubernetes, cluster scheduling, or equivalent orchestration platforms in production environments

  • Practical experience with AI/ML systems — whether in training pipelines, inference infrastructure, or both

  • Strong programming skills in low-level or systems-oriented languages such as Go or C++

  • Familiarity with LLM serving frameworks (e.g., vLLM, SGLang, Triton, Ray) and an understanding of how they interact with underlying hardware

  • Experience with GPU or accelerator optimization, and a grasp of how hardware constraints shape system design decisions

  • A research-oriented mindset — comfortable reading papers, running experiments, and prototyping ideas to validate architectural decisions

  • Sharp analytical and problem-solving skills, with the ability to model workloads, identify performance bottlenecks, and propose principled solutions

Additional Information:

Huawei Canada is committed to a fair, inclusive, and accessible recruitment process. If you require accommodation during any stage of the hiring process, please let us know and we will work with you to meet your needs.

All applications for this position are reviewed directly by our hiring team, we do not use artificial intelligence tools to screen or select candidates.

or