
Job description
Huawei Canada has an immediate permanent opening for a Senior Researcher.
About the team:
The Emerging Storage Lab is a research group based at Huawei Canada's Toronto and Vancouver Research Centre that is focused on next-generation storage technologies and innovations. Our team comprises graduate computer engineers and computer scientists with diverse industry experience, ranging from 0 to over 20 years. This lab investigates various storage-related topics, including data management, data catalog, data fabric, file systems, storage networks, and AI storage, aiming to advance the field and drive data storage technological progress in the new AI era.
About the job:
Conduct applied research to design and prototype novel AI data engine and storage architectures optimized for the acceleration of large-scale AI training and inference workloads.
Architect and implement high-performance software components—from data, metadata to AI pipeline layers—that provide data lineage, versioning, and automated lifecycle management for AI assets.
Develop and optimize the data plane for AI infrastructure, focusing on critical challenges such as distributed dataset shuffling, fast checkpointing, model loading and offloading, KV Cache loading and offloading, and low-latency metadata serving for billion-scale entities.
Develop high-performance system software to deeply integrate AI storage, AI data, and AI orchestration frameworks, eliminating bottlenecks across the ML pipeline.
Optimize the end-to-end data path—from persistent storage through to GPU memory—leveraging knowledge of hardware, networking, and AI workload semantics.
Advance the field by publishing research at top-tier conferences.
Collaborate with global teams of AI researchers and engineers to productionize innovations and define the strategic roadmap for AI infrastructure.
Job requirements
About the ideal candidate:
Ph.D. or Master in Computer Science, Electrical Engineering, or a related field.
Experience with AI KV Cache, RAG, agentic memory, storage systems, filesystems, caching, data management, scalable data platforms, metadata catalogs, ML pipelines, workflow orchestration, low-level kernel optimization, or GPU memory management.
Publications in relevant top-tier conferences.
Programming proficiency in C/C++, or Python, with experience in relevant environments (CUDA, PyTorch/TensorFlow, or distributed storage APIs).
Knowledge of modern AI/ML development lifecycles and their data challenges (e.g., LLM training, distributed data loading) is an asset.
Able to conduct collaborative research and can communicate complex ideas across hardware, software, and AI disciplines.
or
All done!
Your application has been successfully submitted!
