OpenAI

About OpenAI

Empowering humanity through safe AI innovation

🏢 Tech👥 1001+ employees📅 Founded 2015📍 Mission District, San Francisco, CA💰 $68.9b4.2
B2CB2BArtificial IntelligenceEnterpriseSaaSAPIDevOps

Key Highlights

  • Headquartered in San Francisco, CA with 1,001+ employees
  • $68.9 billion raised in funding from top investors
  • Launched ChatGPT, gaining 1 million users in 5 days
  • 20-week paid parental leave and unlimited PTO policy

OpenAI is a leading AI research and development platform headquartered in the Mission District of San Francisco, CA. With over 1,001 employees, OpenAI has raised $68.9 billion in funding and is known for its groundbreaking products like ChatGPT, which gained over 1 million users within just five day...

🎁 Benefits

OpenAI offers flexible work hours and encourages unlimited paid time off, promoting at least 4 weeks of vacation per year. Employees enjoy comprehensi...

🌟 Culture

OpenAI's culture is centered around its mission to ensure that AGI benefits all of humanity. The company values transparency and ethical consideration...

Skills & Technologies

Overview

OpenAI is hiring a Software Engineer for their Fleet Infrastructure team to design and operate systems for model deployment and training on a large GPU fleet. You'll work with technologies like Kubernetes and Docker, and this position requires experience in infrastructure systems.

Job Description

Who you are

You have a strong background in software engineering, particularly in designing and operating infrastructure systems. Your experience includes working with Kubernetes and Docker, and you understand the complexities of managing large-scale GPU fleets. You are comfortable collaborating with researchers and product teams to gather workload requirements and translate them into effective solutions.

You possess excellent problem-solving skills and can think critically about system performance and reliability. Your ability to communicate effectively with cross-functional teams ensures that you can advocate for the needs of both technical and non-technical stakeholders. You thrive in fast-paced environments and are eager to contribute to OpenAI's mission of advancing AI capabilities responsibly.

Desirable

Experience with CI/CD systems is a plus, as is familiarity with job scheduling and cluster management. If you have worked in a research or AI-focused environment, that would be beneficial. You are adaptable and open to learning new technologies as needed to support the team's goals.

What you'll do

In this role, you will design, implement, and operate components of OpenAI's compute fleet, focusing on job scheduling, cluster management, and snapshot delivery. You will collaborate closely with hardware, infrastructure, and business teams to ensure high utilization and performance of the GPU fleet. Your work will involve building user-friendly scheduling and quota systems, as well as automating Kubernetes cluster provisioning and upgrades.

You will interface with researchers to understand their workload requirements and support their research workflows with effective service frameworks and deployment systems. Your contributions will help ensure fast model startup times through high-performance snapshot delivery and hardware caching. You will also be responsible for maintaining a reliable and low-maintenance platform that meets the demands of OpenAI's ambitious projects.

What we offer

OpenAI offers a hybrid work model, allowing you to work three days in the office per week while providing relocation assistance for new employees. You will be part of a dynamic team that is shaping the future of AI technology, with opportunities for professional growth and development. The work you do will have a significant impact on advancing AI capabilities responsibly, and you will be supported by a collaborative and innovative culture.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at OpenAI.

Similar Jobs You Might Like

Based on your interests and this role

OpenAI

Software Engineering

OpenAI📍 San Francisco - On-Site

OpenAI is hiring a Software Engineer for their Data Infrastructure team to design and implement dataset infrastructure for next-generation training stacks. You'll work with technologies like Python, Docker, and AWS in San Francisco.

🏛️ On-SiteMid-Level
5 months ago
Baseten

Software Engineering

Baseten📍 San Francisco - On-Site

Baseten is hiring a Software Engineer - Infrastructure to build and maintain components of their ML inference platform. You'll work with Python, Go, and Kubernetes to enable developers to deploy and monitor ML models. This position requires experience in infrastructure development.

🏛️ On-SiteMid-Level
11 months ago
Baseten

Software Engineering

Baseten📍 San Francisco

Baseten is hiring a Senior Software Engineer - Infrastructure to architect and lead the development of their ML inference platform. You'll work with technologies like Kubernetes and AWS to optimize model serving. This position requires significant experience in infrastructure and machine learning.

Senior
11 months ago
Doppel

Backend Engineer

Doppel📍 San Francisco - On-Site

Doppel is hiring a Backend Engineer to build the infrastructure for their AI-native social engineering defense platform. You'll work with technologies like Elasticsearch and Kubernetes to design scalable systems. This position requires experience in backend engineering and infrastructure management.

🏛️ On-SiteMid-Level
5 months ago
Middesk

Infrastructure Engineer

Middesk📍 San Francisco - On-Site

Middesk is hiring an Infrastructure Engineer to join their DevSecOps team. You'll build tooling and platform capabilities to enhance software delivery and developer experience. This position requires experience with infrastructure-as-code tools and high availability systems.

🏛️ On-SiteMid-Level
8 months ago