NVIDIA

Software Engineer Intern, BCP Distributed Systems - Summer 2023

Job Description

Posted on: 
January 18, 2023

NVIDIA is looking for outstanding interns to work on breakthrough technologies that will enable next-generation AI, Rendering and Simulation systems. Our team works on building a scalable, reliable and high performance platform to integrated NVIDIA software and hardware components to support accelerated workloads. You will have the opportunity to work on a hardware accelerated compute platform in areas of GPU systems, distributed systems, high performance network and memory, concurrency, caching, scalable runtime systems, container runtimes and fault-tolerant systems.

We welcome out-of-the-box problem solvers who can provide new insights, challenge the status quo, and are willing to open up the boundaries. You will work with others in this team to help advance NVIDIA's state-of-art technology to deliver ground breaking systems and solutions for modern AI applications. NVIDIA has pioneered accelerated computing to tackle challenges that otherwise can’t be solved. NVIDIA is a world leader in AI and our work is redefining industries valued at more than $100 trillion, from gaming to healthcare to transportation, and are profoundly impacting society. We have some of the most forward-thinking and hardworking people on the planet working for us. If you're creative and autonomous, we want to hear from you!

Responsibilities

  • Join a core group of engineers with high critical-thinking abilities passionate about tackling some of the most sophisticated and hard problems in distributed systems and GPU and network accelerated high performance computing in real-world production infrastructure.
  • With your technical foundation in distributed computing, networking and storage, you will be working on building new initiatives, and think out of the box to design and implement new solutions.
  • Design and implement distributed hardware accelerated caching solutions.
  • Implement integrations into new DL frameworks to support them on the platform.
  • Improve distributed AI training orchestration, observability and performance.
  • Extend and improve cluster management and job scheduling systems like K8s and Slurm.
  • You will work with engineering teams across all of NVIDIA to ensure your software integrates seamlessly up and down the stack. You will work with multi-functional teams, principals, and architects and coordinate across team boundaries and geographies.

Job Requirements

  • Pursuing Ph.D/MS/BS in Computer Science/ Engineering/ Physics/ Mathematics or other comparable degree.
  • Understanding of data structures, concurrency, fault-tolerance, scalable runtime systems, operating systems and distributed systems design.
  • Strong programming skills and knowledge of a systems programming language (C/C++/Go/Rust).
  • Motivated to work across teams across the stack, organizations and geographies.
  • Some experience in engineering or research labs on large-scale systems.
  • Reasonable understanding of performance, security, and reliability in complex distributed systems. Familiarity with system-level architecture, such as interconnects, memory hierarchy, interrupts, and memory-mapped IO.

Apply now

More job openings