
About OpenAI
Empowering humanity through safe AI innovation
Key Highlights
- Headquartered in San Francisco, CA with 1,001+ employees
- $68.9 billion raised in funding from top investors
- Launched ChatGPT, gaining 1 million users in 5 days
- 20-week paid parental leave and unlimited PTO policy
OpenAI is a leading AI research and development platform headquartered in the Mission District of San Francisco, CA. With over 1,001 employees, OpenAI has raised $68.9 billion in funding and is known for its groundbreaking products like ChatGPT, which gained over 1 million users within just five day...
🎁 Benefits
OpenAI offers flexible work hours and encourages unlimited paid time off, promoting at least 4 weeks of vacation per year. Employees enjoy comprehensi...
🌟 Culture
OpenAI's culture is centered around its mission to ensure that AGI benefits all of humanity. The company values transparency and ethical consideration...
Skills & Technologies
Overview
OpenAI is hiring a Software Engineer for their GPU Infrastructure team to ensure the reliability and uptime of their compute fleet. You'll work with cutting-edge technologies in a high-performance computing environment. This position requires experience in system-level investigations and automation.
Job Description
Who you are
You have a strong background in software engineering with a focus on high-performance computing systems — your experience includes ensuring the reliability and uptime of large-scale compute environments. You are adept at troubleshooting complex systems and have a keen understanding of hardware and software interactions.
Your expertise in Python and Linux allows you to develop automated solutions for system monitoring and maintenance — you thrive in environments where you can apply your technical skills to solve real-world problems. You are familiar with containerization technologies like Docker and orchestration tools such as Kubernetes, which you use to manage and deploy applications efficiently.
You possess excellent problem-solving skills and a detail-oriented mindset — you enjoy diving deep into system-level investigations to identify and resolve issues proactively. Your ability to work autonomously and take ownership of projects makes you a valuable team member.
Desirable
Experience with GPU infrastructure and high-performance computing is a plus — you understand the unique challenges associated with managing supercomputers and large-scale systems. Familiarity with cloud services and networking concepts will further enhance your contributions to the team.
What you'll do
As a Software Engineer on the Fleet HPC team, you will be responsible for maintaining the health and efficiency of OpenAI's compute fleet — your role will involve developing automated solutions for monitoring system performance and minimizing hardware failures. You will collaborate with cross-functional teams to ensure that the infrastructure supports both internal research and external products effectively.
You will engage in system-level investigations to troubleshoot and resolve issues that may arise in the compute environment — your work will directly impact the reliability of OpenAI's models and services. You will also have the opportunity to innovate and implement new technologies that enhance the performance and efficiency of the infrastructure.
Your contributions will help shape the future of AI deployment, ensuring that OpenAI's systems operate seamlessly at scale — you will be at the forefront of technology, working with cutting-edge tools and methodologies. You will also mentor junior engineers, sharing your knowledge and expertise to foster a collaborative team environment.
What we offer
OpenAI provides a dynamic work environment where you can grow your skills and make a significant impact on the future of technology — we prioritize safety, reliability, and responsible AI deployment in all our projects. You will have the opportunity to work with a talented team of engineers and researchers who are passionate about pushing the boundaries of AI.
We encourage you to apply even if your experience doesn't match every requirement — we value diverse perspectives and are committed to creating an inclusive workplace. Join us in shaping the future of technology and making a difference in the world.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at OpenAI.
Similar Jobs You Might Like
Based on your interests and this role

Technical Lead
OpenAI is hiring a Technical Lead for the Sora team to optimize model serving efficiency and enhance inference performance. You'll work closely with research and product teams, leveraging your expertise in GPU and kernel-level systems.

Staff Engineer
Cohere is hiring a Staff Software Engineer for their GPU Infrastructure team to build and operate superclusters for AI model training. You'll work with technologies like Python, Kubernetes, and AWS. This role requires expertise in high-performance computing (HPC) and cloud infrastructure.

Software Engineering
OpenAI is hiring a Software Engineer for their Fleet Infrastructure team to design and operate systems for model deployment and training on a large GPU fleet. You'll work with technologies like Kubernetes and Docker, and this position requires experience in infrastructure systems.

Software Engineering
OpenAI is hiring a Software Engineer to help build and optimize the low-level stack for AI-native silicon. You'll work with Python and C++ in a hybrid role based in San Francisco.

Systems Engineer
OpenAI is seeking an HPC Systems Engineer to design and operate large-scale HPC environments for consumer product development. You'll work with tools like Abaqus and Ansys to optimize simulation workloads. This role requires experience in managing high-performance computing systems.