
Job description
Huawei Canada has an immediate permanent opening for a Senior Engineer.
About the team:
Established in 2014, the Distributed Scheduling and Data Engine Lab is Huawei Cloud's technical innovation center in Canada. The lab focuses on researching and developing advanced cloud technologies, supporting the productization and iterative optimization of its technical achievements. Current research areas include cloud native databases, intelligent SQL engine, AI/Agent infrastructure and LLM/Agent Evaluation Technology. The lab fosters a robust technical environment, allowing collaboration with industry experts to create a highly competitive cloud platform.
About the job:
Join a cutting-edge team building next-generation infrastructure for AI and agentic workloads, sitting at the intersection of research, systems engineering, and product innovation
Track and analyze the latest trends in LLMs, agentic AI, and multi-step agent workflows to inform infrastructure direction
Investigate and address infrastructure bottlenecks across GPU/NPU utilization, data movement, memory hierarchy, and distributed execution
Design system-level architectures for agent execution frameworks, multi-model orchestration, and large-scale inference systems
Evaluate AI/agent workload requirements on cloud and hybrid infrastructure, balancing trade-offs across cost, performance, and scalability
Deep dive into the full infrastructure stack — from distributed schedulers and inference pipelines to caching and data access patterns
Collaborate closely with engineering and product teams to prototype and deliver production-ready solutions grounded in research
Translate emerging AI trends and workload patterns into scalable, impactful infrastructure designs
The targeted annual total compensation for this position ranges from $127,000 to $225,000 depending on education, experience, and demonstrated expertise.
Job requirements
About the Ideal Candidate:
Solid foundation in distributed systems, cloud infrastructure, or systems engineering, with hands-on experience building or operating large-scale systems
Proficiency with Kubernetes, cluster scheduling, or equivalent orchestration platforms in production environments
Practical experience with AI/ML systems — whether in training pipelines, inference infrastructure, or both
Strong programming skills in low-level or systems-oriented languages such as Go or C++
Familiarity with LLM serving frameworks (e.g., vLLM, SGLang, Triton, Ray) and an understanding of how they interact with underlying hardware
Experience with GPU or accelerator optimization, and a grasp of how hardware constraints shape system design decisions
A research-oriented mindset — comfortable reading papers, running experiments, and prototyping ideas to validate architectural decisions
Sharp analytical and problem-solving skills, with the ability to model workloads, identify performance bottlenecks, and propose principled solutions
Additional Information:
Huawei Canada is committed to a fair, inclusive, and accessible recruitment process. If you require accommodation during any stage of the hiring process, please let us know and we will work with you to meet your needs.
All applications for this position are reviewed directly by our hiring team, we do not use artificial intelligence tools to screen or select candidates.
or
All done!
Your application has been successfully submitted!
You've already applied for this job
We appreciate your interest in this position. Unfortunately, you have already applied for this job.
