Skip to content

Associate Engineer - Large Model Application Platforms

  • Kingston, Ontario

Job description

Our team has an immediate 12-month contract opening for an Associate Engineer.


  • Research, prototype and build core infrastructure, tooling, and platforms to improve the productivity, quality, and efficiency of engineering and serving foundation model applications.
  • Design, implement and assess application programming APIs, frameworks and runtime systems software for heterogeneous architectures (e.g., GPU, NPU).
  • Support the integration process of novel software frameworks on in-house hardware platforms (e.g. performance modeling, analysis of future computing architectures, resource allocation and management, scheduling, fault tolerance and resiliency, communication and shared memory).
  • Meet top industry and academic leaders and experts around the world, collaborate with top researchers and students, consult with Engineering teams across diverse domains, publish research papers in far-reaching and impactful areas, and submit patent applications for novel inventions.

Job requirements

What you’ll bring to the team:

  • Bachelors, Master or PhD Degree in Computer Science, Electrical & Computer Engineering, Machine Learning, or relevant domains.
  • Solid experience with one or more of the following programming languages: Python/C/C++/Go; Familiarity with software development practices (version management, build management, CI/CD, debugging and profiling).
  • Solid understanding in any of these areas: Machine Learning and/or Deep Learning, Large Models Training and Finetuning (e.g., NLP/CV).
  • Experience with mainstream model training and inference frameworks and tools (e.g., PyTorch, Tensorflow, PaddlePaddle, Oneflow, MindSpore, HuggingFace Transformer&Accelerate, DeepSpeed, Megatron, FasterTransformer, Triton Inference).
  • Solid understanding in Computer Architecture, Distributed Computing, Parallel Computing, Cloud Native, Operating Systems, Networks.
  • Experience in using frameworks and tools of any of the aforementioned areas (e.g., Spark, Flink, Ray for Distributed Computing, Docker, K8S for Cloud-Native app/framework development).
  • Ability to evaluate, apply, and mature published research to real-world problems on prototype systems and have an inquisitive mindset, proven research and communication skills, can conduct investigations and experiments independently, and can interpret experiment data and present results clearly and concisely. 
  • Publications in related top-tier venues (e.g., ICSE, FSE, TSE, ICLR, ICML, NeurIPS, OSDI, SOSP) is an asset.