
About Datadog
The cloud monitoring platform engineers love
Key Highlights
- Public company (NYSE: DDOG) - strong equity upside
- 26,000+ enterprise customers including Netflix & Samsung
- NYC headquarters with offices in Paris, Dublin, Sydney
- $1.5B raised from Sequoia, IVP, and Index Ventures
Datadog (NYSE: DDOG) is a leading cloud observability platform that provides monitoring and analytics for applications, infrastructure, and logs. Trusted by over 26,000 customers including major companies like Netflix, Samsung, and Airbnb, Datadog is headquartered in New York City. The company went ...
🎁 Benefits
Datadog offers competitive salaries, equity options, generous PTO policies, and a flexible remote work policy. Employees also benefit from a learning ...
🌟 Culture
Datadog fosters an engineering-first culture, with 70% of its workforce comprising engineers. The company emphasizes a strong focus on solving complex...
Skills & Technologies
Overview
Datadog is seeking a Senior MLOps Engineer to lead the design and development of high-scale model serving systems. You'll work with Ray-based infrastructure and CI/CD pipelines to ensure reliable deployment of ML models. This role requires expertise in machine learning and Python.
Job Description
Who you are
You have 5+ years of experience in software engineering with a focus on machine learning operations — you've successfully designed and implemented systems that serve ML models at scale, ensuring high availability and performance. Your expertise in Python allows you to develop robust solutions that meet the demands of production environments.
You possess a deep understanding of CI/CD practices — you've built and maintained pipelines that facilitate the seamless deployment of machine learning models, enabling teams to iterate quickly and efficiently. Your experience with Ray demonstrates your ability to optimize inference infrastructure for both low- and high-throughput workloads.
You are familiar with observability and reliability principles — you understand the importance of monitoring and logging in production systems, and you have implemented strategies to ensure that deployed models perform as expected under various conditions. Your collaborative nature allows you to work effectively with applied scientists and product teams to deliver impactful solutions.
You are passionate about enabling self-service tools for model deployment — you believe in empowering teams to manage their own workflows, and you have experience creating user-friendly interfaces that simplify the deployment and testing of models. Your proactive approach to continuous improvement drives you to seek out opportunities for enhancing existing processes.
Desirable
Experience with A/B testing and rollback strategies is a plus — you understand the importance of validating model performance in production and have implemented techniques to ensure safe deployments. Familiarity with cloud platforms and distributed systems will further enhance your ability to contribute to our infrastructure.
What you'll do
As a Senior MLOps Engineer at Datadog, you will architect and build systems for serving machine learning and large language models across multiple data centers — your work will ensure that our models are deployed reliably and efficiently, meeting strong service level agreements. You will design and optimize Ray-based inference infrastructure, enabling high-performance model serving that can handle varying workloads.
You will collaborate closely with applied scientists to enable them to deploy and test their models using self-service tools — your expertise in CI/CD will be crucial in streamlining these processes, allowing for rapid iteration and deployment. You will also implement observability practices to monitor model performance and reliability, ensuring that any issues are quickly identified and addressed.
Your role will involve working across infrastructure, observability, and machine learning engineering domains — you will be a key player in delivering production-ready inference capabilities that are used by teams across Datadog. You will contribute to the continuous improvement of our model serving systems, exploring new technologies and methodologies to enhance our capabilities.
What we offer
At Datadog, we value our office culture and the relationships that it fosters — we operate as a hybrid workplace, allowing you to create a work-life harmony that best fits your needs. You will have the opportunity to work with a talented team of engineers and scientists who are passionate about building innovative solutions in the AI space.
We encourage you to apply even if your experience doesn't match every requirement — we believe that diverse teams build better products, and we are committed to creating an inclusive environment where everyone can thrive. Join us in shaping the future of observability and security products powered by machine learning.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Datadog.
Similar Jobs You Might Like
Based on your interests and this role

Software Engineering
Datadog is seeking a Senior Software Engineer for their AI Platform to design and build scalable tools and infrastructure for AI applications. You'll work with technologies like Python, MLOps, and AWS in a hybrid environment based in Paris or Sophia Antipolis.

Mlops Engineer
Datadog is hiring a Senior Software Engineer (MLOps) to build and scale evaluation systems for AI models. You'll work with technologies like Python, Docker, and AWS to ensure models are reliable and production-ready. This role requires strong experience in machine learning and data engineering.

Mlops Engineer
Datadog is hiring a Senior MLOps Engineer to design and build robust backend systems for AI infrastructure. You'll work with technologies like Python, Docker, and Kubernetes to enhance ML workflows. This role requires significant experience in MLOps and distributed systems.

Machine Learning Engineer
Qonto is hiring a Senior Machine Learning Engineer for their AI Product team to build and ship customer-facing AI solutions. You'll work with Generative AI and machine-learning techniques to impact over 500,000 business customers. This position requires experience in delivering client-facing products end-to-end.

Ai Engineer
Descript is seeking a Senior AI Engineer to develop a next-generation AI-powered platform for audio and video content creation. You'll work with AI models and infrastructure, requiring expertise in Python and machine learning. This role is based in San Francisco.