Open AI

Software Engineer, Platform Reliability

Job Description

Posted on: 
February 22, 2023

At OpenAI, we are pushing the boundaries of the capabilities of artificial intelligence.  Our success depends on the ability to quickly iterate on products while also ensuring that they are performant and reliable. We need problem-solving engineers with deep technical knowledge in software development processes, reliability, and performance.

Responsibilities

  • Design and build the development and production platforms that power all of OpenAI's products
  • Support your fellow engineers with best practices on deployment, distributed systems, security, and infrastructure
  • Identify potential problems in large complex distributed systems before they occur in production
  • Pick the solution that best fits a variety of technical and business constraints. That may be writing a new system from scratch, modifying an existing system, or deploying something off the shelf.
  • Like all other teams, we are responsible for the reliability of the systems we build. This includes an on-call rotation to respond to critical incidents as needed.

Job Requirements

  • Likely have 3+ years of professional experience in software engineering or systems reliability
  • Have helped a team mature to more efficient and reliable systems and fewer incidents
  • Have rich experience in cloud networking and security
  • Enjoy deep problem solving that spans many different technologies and systems
  • Have great communication skills and empathy for other engineers
  • Always check the file descriptor limit

Apply now

More job openings