OpenAI

About OpenAI

Empowering humanity through safe AI innovation

🏢 Tech👥 1001+ employees📅 Founded 2015📍 Mission District, San Francisco, CA💰 $68.9b4.2
B2CB2BArtificial IntelligenceEnterpriseSaaSAPIDevOps

Key Highlights

  • Headquartered in San Francisco, CA with 1,001+ employees
  • $68.9 billion raised in funding from top investors
  • Launched ChatGPT, gaining 1 million users in 5 days
  • 20-week paid parental leave and unlimited PTO policy

OpenAI is a leading AI research and development platform headquartered in the Mission District of San Francisco, CA. With over 1,001 employees, OpenAI has raised $68.9 billion in funding and is known for its groundbreaking products like ChatGPT, which gained over 1 million users within just five day...

🎁 Benefits

OpenAI offers flexible work hours and encourages unlimited paid time off, promoting at least 4 weeks of vacation per year. Employees enjoy comprehensi...

🌟 Culture

OpenAI's culture is centered around its mission to ensure that AGI benefits all of humanity. The company values transparency and ethical consideration...

Skills & Technologies

Overview

OpenAI is hiring a Senior Software Engineer for the Frontier Systems team to build critical infrastructure for supercomputers. You'll work with Python, SQL, and automation tools to ensure reliable model training. This position requires 7+ years of experience.

Job Description

Who you are

You have 7+ years of industry experience in software engineering, demonstrating a strong ability to own your work end-to-end and make a real impact on system reliability. Your proficiency with Python and shell scripting allows you to build automation that monitors and fixes issues across thousands of machines, ensuring that researchers can continue their work without interruption.

You thrive on root-causing system-level issues and have a high degree of comfort digging into noisy data with SQL, PromQL, and Pandas. Your analytical mindset helps you understand how things break at scale, and you are passionate about building critical infrastructure that supports cutting-edge AI research.

What you'll do

As a Senior Software Engineer on the Frontier Systems team, you will own and improve the system health checks that keep our hyperscale supercomputers stable during model training. You will lead deep dives into hardware failures and system-level bugs, working collaboratively with your team to understand and resolve issues effectively. Your role will involve building automation that monitors and fixes problems across thousands of machines, ensuring minimal disruptions during large-scale training runs.

You will be trusted to make significant contributions to the stability and efficiency of our supercomputers, which are essential for training frontier models. Your work will directly impact the success of AI research at OpenAI, and you will have the opportunity to collaborate with talented engineers who share your commitment to excellence.

What we offer

At OpenAI, we believe in the potential of artificial intelligence to solve immense global challenges. Join us in shaping the future of technology while working in a supportive and innovative environment. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.

You will be part of a team that is dedicated to building and maintaining the largest supercomputers in the world, contributing to groundbreaking advancements in AI. We offer a competitive salary and benefits package, along with opportunities for professional growth and development.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at OpenAI.

Similar Jobs You Might Like

Based on your interests and this role

OpenAI

Software Engineering

OpenAI📍 San Francisco - On-Site

OpenAI is hiring a Software Engineer for the Frontier Systems team to optimize power management in large-scale supercomputers. You'll work with Python and automation tools to ensure efficient operations. This position requires a strong focus on system-level investigations and experience in power management.

🏛️ On-SiteMid-Level
1 year ago
OpenAI

Software Engineering

OpenAI📍 San Francisco - On-Site

OpenAI is hiring a Software Engineer for the Frontier Clusters Infrastructure team to operate next-generation compute clusters. You'll work with Kubernetes and automation to support large-scale model training. This position requires experience in distributed systems and infrastructure.

🏛️ On-SiteMid-Level
1 year ago
OpenAI

Site Reliability Engineer

OpenAI📍 San Francisco - On-Site

OpenAI is hiring a Site Reliability Engineer to operate and scale the next generation of compute clusters for frontier research. You'll work with Kubernetes and automation to ensure the reliability of large-scale supercomputers. This role requires experience in distributed systems and infrastructure management.

🏛️ On-Site
3 months ago
OpenAI

Full Stack Engineer

OpenAI📍 San Francisco - Hybrid

OpenAI is hiring a Full-Stack Software Engineer to create innovative ways for people to interact with AI. You'll work with technologies like JavaScript, Python, and React in San Francisco. This position requires a creative mindset and experience in software development.

🏢 HybridMid-Level
4 months ago
Amazon

Software Development Engineer

Amazon📍 San Francisco - On-Site

Amazon is hiring a Software Development Engineer for their Frontier AI & Robotics team to optimize large-scale transformer architectures for robotics applications. You'll work with CUDA and TensorRT to enhance inference efficiency. This position requires expertise in machine learning and robotics.

🏛️ On-SiteMid-Level
11 months ago