Job description
About the team:
The Centre for Software Excellence Lab conducts pioneering research in software engineering, focusing on next-generation technologies. This team integrates industry best practices with cutting-edge academic research to address lifecycle software engineering challenges, including foundation model applications, software performance engineering, hyper-cluster programming, next-gen mobile OS, and cloud-native computing. This lab uniquely allows researchers to apply innovations directly to products affecting billions of customers while promoting open-source contributions, publications, conference participation, and collaborations to create a broader impact.
About the job:
- Research, prototype and build core infrastructure, tooling, and platforms to improve the productivity, quality, and efficiency of engineering and serving foundation model applications.
- Design, implement and assess application programming APIs, frameworks and runtime systems software for heterogeneous architectures (e.g., GPU, NPU).
- Support the integration process of novel software frameworks on in-house hardware platforms (e.g. performance modeling, analysis of future computing architectures, resource allocation and management, scheduling, fault tolerance and resiliency, communication and shared memory).
- Meet top industry and academic leaders and experts around the world, collaborate with top researchers and students, consult with Engineering teams across diverse domains, publish research papers in far-reaching and impactful areas, and submit patent applications for novel inventions.
Job requirements
About the ideal candidate:
- Bachelors, Master or PhD Degree in Computer Science, Electrical & Computer Engineering, Machine Learning, or relevant domains.
- Solid experience with one or more of the following programming languages: Python/C/C++/Go; Familiarity with software development practices (version management, build management, CI/CD, debugging and profiling).
- Solid understanding in any of these areas: Machine Learning and/or Deep Learning, Large Models Training and Finetuning (e.g., NLP/CV).
- Experience with mainstream model training and inference frameworks and tools (e.g., PyTorch, Tensorflow, PaddlePaddle, Oneflow, MindSpore, HuggingFace Transformer&Accelerate, DeepSpeed, Megatron, FasterTransformer, Triton Inference).
- Solid understanding in Computer Architecture, Distributed Computing, Parallel Computing, Cloud Native, Operating Systems, Networks.
- Experience in using frameworks and tools of any of the aforementioned areas (e.g., Spark, Flink, Ray for Distributed Computing, Docker, K8S for Cloud-Native app/framework development).
- Ability to evaluate, apply, and mature published research to real-world problems on prototype systems and have an inquisitive mindset, proven research and communication skills, can conduct investigations and experiments independently, and can interpret experiment data and present results clearly and concisely.
- Publications in related top-tier venues (e.g., ICSE, FSE, TSE, ICLR, ICML, NeurIPS, OSDI, SOSP) is an asset.
or
All done!
Your application has been successfully submitted!