
About OpenAI
Empowering humanity through safe AI innovation
Key Highlights
- Headquartered in San Francisco, CA with 1,001+ employees
- $68.9 billion raised in funding from top investors
- Launched ChatGPT, gaining 1 million users in 5 days
- 20-week paid parental leave and unlimited PTO policy
OpenAI is a leading AI research and development platform headquartered in the Mission District of San Francisco, CA. With over 1,001 employees, OpenAI has raised $68.9 billion in funding and is known for its groundbreaking products like ChatGPT, which gained over 1 million users within just five day...
🎁 Benefits
OpenAI offers flexible work hours and encourages unlimited paid time off, promoting at least 4 weeks of vacation per year. Employees enjoy comprehensi...
🌟 Culture
OpenAI's culture is centered around its mission to ensure that AGI benefits all of humanity. The company values transparency and ethical consideration...
Skills & Technologies
Overview
OpenAI is hiring a Software Engineer for their Data Infrastructure team to design and implement dataset infrastructure for next-generation training stacks. You'll work with technologies like Python, Docker, and AWS in San Francisco.
Job Description
Who you are
You have a strong background in software engineering, particularly in designing and implementing scalable data infrastructure. With experience in Python and familiarity with containerization technologies like Docker and orchestration tools such as Kubernetes, you are well-equipped to handle the complexities of large-scale data systems. You understand the importance of performance and efficiency in data pipelines and have a proactive approach to identifying and resolving bottlenecks.
Your collaborative spirit shines through as you work closely with researchers and other infrastructure teams to ensure seamless integration of datasets into training and inference pipelines. You are detail-oriented, ensuring that dataset interfaces are standardized, discoverable, and easy for other teams to adopt. Your ability to document processes and maintain clear communication is key to fostering a productive team environment.
Desirable
Experience with cloud platforms like AWS is a plus, as is familiarity with data storage solutions such as Elasticsearch. You may also have exposure to machine learning frameworks, which will enhance your contributions to the team.
What you'll do
In this role, you will design and maintain standardized dataset APIs that cater to multimodal data that cannot fit in memory. You will build proactive testing and scale validation pipelines for dataset loading at GPU scale, ensuring that the infrastructure can handle the demands of OpenAI's next-generation models. Collaborating with teammates, you will integrate datasets seamlessly into training and inference pipelines, ensuring a smooth user experience.
Your responsibilities will include documenting and maintaining dataset interfaces, establishing safeguards and validation systems to ensure datasets remain reproducible, and proactively testing for performance bottlenecks. You will play a crucial role in enabling researchers to focus on advancing model capabilities while you handle the scale, efficiency, and reliability required to bring those models to life.
What we offer
At OpenAI, you will be part of a mission-driven team that believes in the potential of artificial intelligence to solve global challenges. We offer a collaborative work environment where your contributions will directly impact the future of technology. Join us in shaping the future of AI and enjoy the opportunity to work with cutting-edge technologies in a supportive and innovative atmosphere.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at OpenAI.
Similar Jobs You Might Like
Based on your interests and this role

Software Engineering
OpenAI is hiring a Software Engineer for their Fleet Infrastructure team to design and operate systems for model deployment and training on a large GPU fleet. You'll work with technologies like Kubernetes and Docker, and this position requires experience in infrastructure systems.

Data Engineer
OpenAI is hiring a Data Engineer to build and operate data infrastructure that supports massive compute fleets and storage systems. You'll work with technologies like Apache Spark, Kafka, and Airflow in San Francisco.

Software Engineering
Plaid is hiring a Senior Software Engineer for their Data Infrastructure team to scale data systems and maintain data integrity. You'll work with technologies like Apache Spark and Data Warehousing in San Francisco.

Software Engineering
OpenAI is hiring a Software Engineer for their Privacy Infrastructure team to design and operate technical systems supporting legal compliance workflows. You'll work with technologies like Apache Spark and Databricks in San Francisco.

Data Scientist
OpenAI is hiring a Data Scientist for their Infrastructure team to shape how they scale the infrastructure that powers their products and research. You'll work with Python, SQL, and machine learning techniques to develop metrics and optimize resource allocation. This position requires experience in data analysis and collaboration with engineering and research teams.