Software Engineer, Model Inference
Location
San Francisco, CA
Level
Senior
Department
Engineering
Type
Full - Time
Salary
Job Description
Posted on:
January 11, 2023
We are looking for an engineer who wants to take the world's largest and most capable AI models and optimize them for use in a high-volume, low-latency, and high-availability production environment.
Responsibilities
- Work alongside machine learning researchers, engineers, and product managers to bring our latest technologies into production.
- Introduce new techniques, tools, and architecture that improve the performance, latency, throughput, and efficiency of our deployed models.
- Build tools to give us visibility into our bottlenecks and sources of instability and then design and implement solutions to address the highest priority issues.
- Optimize our code and fleet of Azure VMs to utilize every FLOP and every GB of GPU RAM of our hardware.
Job Requirements
- Have an understanding of modern ML architectures and an intuition for how to optimize their performance, particularly for inference.
- Own problems end-to-end, and are willing to pick up whatever knowledge you're missing to get the job done.
- Have at least 3 years of professional software engineering experience.
- Have or can quickly gain familiarity with PyTorch, NVidia GPUs and the software stacks that optimize them (e.g. NCCL, CUDA), as well as HPC technologies such as InfiniBand, MPI, etc.
- Have experience architecting, observing, and debugging production distributed systems.
- Have a humble attitude, an eagerness to help your colleagues, and a desire to do whatever it takes to make the team succeed.
- Have needed to rebuild or substantially refactor production systems several times over due to rapidly increasing scale.
- Are self-directed and enjoy figuring out the most important problem to work on.
- Have a good intuition for when off-the-shelf solutions will work, and build tools to accelerate your own workflow quickly if they won’t.