Skip to content

Research Engineer - Heterogenous Computing Systems

    • Waterloo, Ontario
  • em26a

Job description

Our team has an immediate permanent opening for a Research Engineer.

Responsibilities:

  • Define and develop programming models that both hide and leverage heterogeneous system components, such as accelerators (GPUs, NPU, FPGAs, modern accelerators), memories (HBM, FAM, etc.), and interconnects (CXL, Slingshot, Infiniband, RDMA, etc.)

  • Design, adapt, and implement software frameworks and runtime systems software to support applications at the confluence of HPC, AI, and data analytics

  • Combine advanced software engineering skills with a drive to explore novel approaches to solve important problems in heterogeneous computing at the large scales

  • Software and hardware co-design to optimize AI/ML workload performance using heterogeneous AI accelerators and to increase accelerator device utilizations, including profiling, analyzing, tuning, and optimization on GPU

  • Design, implement and assess application programming APIs and runtime systems software for future architectures to support the creation of new products, open source software and intellectual property, including performance evaluation, prediction, modeling, and simulation of future computing architectures. And resource allocation and management, scheduling, fault resilience, coordination and other system services in heterogeneous computing environment

  • Write and submit patentable inventions

Job requirements

What you’ll bring to the team:

  • MS or PhD Degree in Computer Science/Electrical Engineering, related Machine Learning field, or equivalent relevant experience

  • Solid understanding on Computer System and Architecture, Operating System, Parallel Computing, Distributed Computing

  • Experience with C/C++ and multi-thread programming, and be familiar with at least one parallel programing language, such as CUDA, OpenCL, Vulkan

  • Familiar with workload characterization and heterogeneous computing performance profiling tools and analyzing methods, including AI model, framework, CUDA library, GPU kernel and hardware

  • Experience with scheduling techniques for heterogeneous systems with different general-purpose processors and accelerators, e.g., kernel offloading, memory scheduling, etc.

  • Familiar with AI/ML algorithm and at least one of the major frameworks (Tensorflow, Pytorch, PaddlePaddle, Caffe, MXNet, TVM, etc.)

  • Hands-on experience in DL workload programming & optimization on various hardware accelerators, such as GPU, TPU, ARM-GPU, FPGA, ASIC

  • A highly self-motivated learner and team player with excellent communication and interpersonal skills

or