
About Datadog
The cloud monitoring platform engineers love
Key Highlights
- Public company (NYSE: DDOG) - strong equity upside
- 26,000+ enterprise customers including Netflix & Samsung
- NYC headquarters with offices in Paris, Dublin, Sydney
- $1.5B raised from Sequoia, IVP, and Index Ventures
Datadog (NYSE: DDOG) is a leading cloud observability platform that provides monitoring and analytics for applications, infrastructure, and logs. Trusted by over 26,000 customers including major companies like Netflix, Samsung, and Airbnb, Datadog is headquartered in New York City. The company went ...
🎁 Benefits
Datadog offers competitive salaries, equity options, generous PTO policies, and a flexible remote work policy. Employees also benefit from a learning ...
🌟 Culture
Datadog fosters an engineering-first culture, with 70% of its workforce comprising engineers. The company emphasizes a strong focus on solving complex...
Skills & Technologies
Overview
Datadog is hiring a Senior MLOps Engineer to design and build robust backend systems for AI infrastructure. You'll work with technologies like Python, Docker, and Kubernetes to enhance ML workflows. This role requires significant experience in MLOps and distributed systems.
Job Description
Who you are
You have 5+ years of experience in software engineering with a focus on MLOps, and you understand the intricacies of managing machine learning workflows at scale. Your expertise in Python allows you to build and optimize systems that support model training and deployment. You are familiar with containerization technologies like Docker and orchestration tools such as Kubernetes, enabling you to create scalable and reliable infrastructure. You have a solid understanding of cloud platforms, particularly AWS, and you leverage services to enhance machine learning operations. Your experience with MLflow and TensorFlow equips you to manage model lifecycles effectively, ensuring that models are tracked and versioned properly. You thrive in collaborative environments, working closely with applied scientists and platform teams to drive innovation in AI infrastructure.
Desirable
Experience with PyTorch is a plus, as it complements your skill set in machine learning frameworks. Familiarity with distributed systems and job orchestration will help you tackle the challenges of managing training jobs across multiple data centers. You are proactive in seeking improvements in ML experimentation workflows, and you enjoy mentoring junior engineers to foster a culture of learning and growth within your team.
What you'll do
In this role, you will design and implement scalable systems for training orchestration, artifact tracking, and model registration across various cloud regions. You will improve and streamline ML experimentation workflows, ensuring that applied scientists can iterate rapidly and reliably. Your work will involve collaborating with cross-functional teams to shape the future of AI infrastructure at Datadog. You will be responsible for building deeply technical infrastructure that supports job orchestration and model lifecycle management. As part of a high-impact team, you will tackle critical problems that contribute to Datadog’s AI evolution. You will also engage in code reviews and contribute to the overall architecture of the systems you help build, ensuring they meet the highest standards of reliability and performance.
What we offer
Datadog values a collaborative office culture that fosters creativity and teamwork. As part of a hybrid workplace, you will have the flexibility to create a work-life harmony that suits your needs. You will be part of a team that is at the forefront of AI development, working on projects that have a significant impact on the company's future. We encourage you to apply even if your experience doesn't match every requirement, as we believe in the potential of diverse backgrounds to drive innovation. Join us and be a part of a mission that is transforming the way AI is integrated into business processes.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Datadog.
Similar Jobs You Might Like
Based on your interests and this role

Mlops Engineer
Datadog is seeking a Senior MLOps Engineer to lead the design and development of high-scale model serving systems. You'll work with Ray-based infrastructure and CI/CD pipelines to ensure reliable deployment of ML models. This role requires expertise in machine learning and Python.

Software Engineering
Datadog is seeking a Senior Software Engineer for their AI Platform to design and build scalable tools and infrastructure for AI applications. You'll work with technologies like Python, MLOps, and AWS in a hybrid environment based in Paris or Sophia Antipolis.

Mlops Engineer
Datadog is hiring a Senior Software Engineer (MLOps) to build and scale evaluation systems for AI models. You'll work with technologies like Python, Docker, and AWS to ensure models are reliable and production-ready. This role requires strong experience in machine learning and data engineering.

Engineering Manager
Datadog is seeking a Lead Engineering Manager for their AI Platform team to oversee the development and scaling of AI infrastructure. You'll collaborate with various teams to define technical direction and build a strong organizational culture. This role requires experience in managing engineering teams and strategic planning.

Ai Engineer
Descript is seeking a Senior AI Engineer to develop a next-generation AI-powered platform for audio and video content creation. You'll work with AI models and infrastructure, requiring expertise in Python and machine learning. This role is based in San Francisco.