Job description
Our team has an immediate permanent opening for a Research Engineer.
Responsibilities:
Define and develop programming models that both hide and leverage heterogeneous system components, such as accelerators (GPUs, NPU, FPGAs, modern accelerators), memories (HBM, FAM, etc.), and interconnects (CXL, Slingshot, Infiniband, RDMA, etc.)
Design, adapt, and implement software frameworks and runtime systems software to support applications at the confluence of HPC, AI, and data analytics
Combine advanced software engineering skills with a drive to explore novel approaches to solve important problems in heterogeneous computing at the large scales
Software and hardware co-design to optimize AI/ML workload performance using heterogeneous AI accelerators and to increase accelerator device utilizations, including profiling, analyzing, tuning, and optimization on GPU
Design, implement and assess application programming APIs and runtime systems software for future architectures to support the creation of new products, open source software and intellectual property, including performance evaluation, prediction, modeling, and simulation of future computing architectures. And resource allocation and management, scheduling, fault resilience, coordination and other system services in heterogeneous computing environment
Write and submit patentable inventions
Job requirements
What you’ll bring to the team:
MS or PhD Degree in Computer Science/Electrical Engineering, related Machine Learning field, or equivalent relevant experience
Solid understanding on Computer System and Architecture, Operating System, Parallel Computing, Distributed Computing
Experience with C/C++ and multi-thread programming, and be familiar with at least one parallel programing language, such as CUDA, OpenCL, Vulkan
Familiar with workload characterization and heterogeneous computing performance profiling tools and analyzing methods, including AI model, framework, CUDA library, GPU kernel and hardware
Experience with scheduling techniques for heterogeneous systems with different general-purpose processors and accelerators, e.g., kernel offloading, memory scheduling, etc.
Familiar with AI/ML algorithm and at least one of the major frameworks (Tensorflow, Pytorch, PaddlePaddle, Caffe, MXNet, TVM, etc.)
Hands-on experience in DL workload programming & optimization on various hardware accelerators, such as GPU, TPU, ARM-GPU, FPGA, ASIC
A highly self-motivated learner and team player with excellent communication and interpersonal skills
or
All done!
Your application has been successfully submitted!