
About OpenAI
Empowering humanity through safe AI innovation
Key Highlights
- Headquartered in San Francisco, CA with 1,001+ employees
- $68.9 billion raised in funding from top investors
- Launched ChatGPT, gaining 1 million users in 5 days
- 20-week paid parental leave and unlimited PTO policy
OpenAI is a leading AI research and development platform headquartered in the Mission District of San Francisco, CA. With over 1,001 employees, OpenAI has raised $68.9 billion in funding and is known for its groundbreaking products like ChatGPT, which gained over 1 million users within just five day...
🎁 Benefits
OpenAI offers flexible work hours and encourages unlimited paid time off, promoting at least 4 weeks of vacation per year. Employees enjoy comprehensi...
🌟 Culture
OpenAI's culture is centered around its mission to ensure that AGI benefits all of humanity. The company values transparency and ethical consideration...
Skills & Technologies
Overview
OpenAI is hiring a Software Engineer for their Fleet Infrastructure team to design and operate systems for model deployment and training on a large GPU fleet. You'll work with technologies like Kubernetes and Docker, and this position requires experience in infrastructure systems.
Job Description
Who you are
You have a strong background in software engineering, particularly in designing and operating infrastructure systems. Your experience includes working with Kubernetes and Docker, and you understand the complexities of managing large-scale GPU fleets. You are comfortable collaborating with researchers and product teams to gather workload requirements and translate them into effective solutions.
You possess excellent problem-solving skills and can think critically about system performance and reliability. Your ability to communicate effectively with cross-functional teams ensures that you can advocate for the needs of both technical and non-technical stakeholders. You thrive in fast-paced environments and are eager to contribute to OpenAI's mission of advancing AI capabilities responsibly.
Desirable
Experience with CI/CD systems is a plus, as is familiarity with job scheduling and cluster management. If you have worked in a research or AI-focused environment, that would be beneficial. You are adaptable and open to learning new technologies as needed to support the team's goals.
What you'll do
In this role, you will design, implement, and operate components of OpenAI's compute fleet, focusing on job scheduling, cluster management, and snapshot delivery. You will collaborate closely with hardware, infrastructure, and business teams to ensure high utilization and performance of the GPU fleet. Your work will involve building user-friendly scheduling and quota systems, as well as automating Kubernetes cluster provisioning and upgrades.
You will interface with researchers to understand their workload requirements and support their research workflows with effective service frameworks and deployment systems. Your contributions will help ensure fast model startup times through high-performance snapshot delivery and hardware caching. You will also be responsible for maintaining a reliable and low-maintenance platform that meets the demands of OpenAI's ambitious projects.
What we offer
OpenAI offers a hybrid work model, allowing you to work three days in the office per week while providing relocation assistance for new employees. You will be part of a dynamic team that is shaping the future of AI technology, with opportunities for professional growth and development. The work you do will have a significant impact on advancing AI capabilities responsibly, and you will be supported by a collaborative and innovative culture.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at OpenAI.
Similar Jobs You Might Like
Based on your interests and this role

Software Engineering
OpenAI is hiring a Software Engineer for their Data Infrastructure team to design and implement dataset infrastructure for next-generation training stacks. You'll work with technologies like Python, Docker, and AWS in San Francisco.

Software Engineering
Baseten is hiring a Software Engineer - Infrastructure to build and maintain components of their ML inference platform. You'll work with Python, Go, and Kubernetes to enable developers to deploy and monitor ML models. This position requires experience in infrastructure development.

Software Engineering
Baseten is hiring a Senior Software Engineer - Infrastructure to architect and lead the development of their ML inference platform. You'll work with technologies like Kubernetes and AWS to optimize model serving. This position requires significant experience in infrastructure and machine learning.

Backend Engineer
Doppel is hiring a Backend Engineer to build the infrastructure for their AI-native social engineering defense platform. You'll work with technologies like Elasticsearch and Kubernetes to design scalable systems. This position requires experience in backend engineering and infrastructure management.

Infrastructure Engineer
Middesk is hiring an Infrastructure Engineer to join their DevSecOps team. You'll build tooling and platform capabilities to enhance software delivery and developer experience. This position requires experience with infrastructure-as-code tools and high availability systems.