About Baseten

Simplifying machine learning for every organization

🏢 Tech👥 21-100 employees📍 Union Square, San Francisco, CA💰 $285m

B2BAnalyticsBusiness IntelligenceMachine Learning

Key Highlights

Headquartered in Union Square, San Francisco, CA
$285 million raised in Series C funding
Team growth of 3x over the last five years
Unlimited PTO with a company-wide holiday break

Baseten is a machine learning application builder headquartered in Union Square, San Francisco, CA. With $285 million in funding from investors like Coatue Management and Founders Fund, Baseten simplifies AI integration for businesses, enabling data scientists to deploy ML models without needing spe...

🎁 Benefits

Baseten offers a remote-first work environment with a $1,000 stipend for home office setup, unlimited PTO with a company-wide break during the holiday...

🌟 Culture

Baseten's culture emphasizes simplifying complex AI technologies for businesses, fostering a collaborative environment where team members can connect ...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 38 jobs →

Site Reliability Engineer

Baseten • San Francisco - On-Site

Posted 4 months ago🏛️ On-Site Site Reliability Engineer 📍 San Francisco

Apply Now →

Skills & Technologies

aws docker kubernetes python

Overview

Baseten is hiring a Site Reliability Engineer to build and maintain scalable infrastructure for deploying machine learning models. You'll work with technologies like AWS, Docker, and Kubernetes. This position requires experience in managing CI/CD pipelines and optimizing performance.

Job Description

Who you are

You have a strong background in site reliability engineering, with experience in building and maintaining scalable infrastructure for deploying machine learning models. You understand the importance of reliability and performance, and you have a knack for automating processes to improve efficiency. Your experience with cloud platforms, particularly AWS, has equipped you with the skills to manage multi-cloud environments effectively.

You are proficient in using containerization technologies like Docker and orchestration tools such as Kubernetes. Your programming skills in Python allow you to write scripts and automate tasks, making you a valuable asset to any engineering team. You thrive in collaborative environments and enjoy learning from users to enhance operational processes.

What you'll do

As a Site Reliability Engineer at Baseten, you will be responsible for envisioning and building robust systems that ensure our infrastructure is scalable, reliable, and efficient. You will work on automating deployments and monitoring systems, optimizing performance, and managing incidents. Your role will involve establishing standards and best practices for reliability across the infrastructure, ensuring that our systems can handle the demands of machine learning operations.

You will collaborate closely with the infrastructure team on projects such as multi-cloud capacity management and optimizing GPU usage for model serving. Your contributions will directly impact the efficiency and reliability of our platform, enabling engineers to ship AI products seamlessly. You will also be involved in incident management, ensuring that any issues are resolved quickly and effectively to minimize downtime.

What we offer

At Baseten, we are committed to fostering a diverse and inclusive workplace. We provide equal employment opportunities to all employees and applicants. Join us in building a platform that empowers AI companies to bring cutting-edge models into production. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.

We offer competitive compensation and benefits, along with opportunities for professional growth and development. As part of a rapidly growing company, you will have the chance to work on innovative projects that shape the future of AI infrastructure. Our team is collaborative and forward-thinking, and we are excited to welcome new members who share our vision.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Baseten.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Site Reliability Engineer

ConductorOne•📍 Portland - On-Site

ConductorOne is hiring a Site Reliability Engineer to design and operate highly reliable infrastructure across cloud environments. You'll work with AWS, GCP, and Azure while building automation and tooling to enhance system reliability. This position requires 3+ years of experience in SRE or DevOps.

🏛️ On-SiteMid-Level

4 months ago

Site Reliability Engineer

Alembic•📍 San Francisco - On-Site

Alembic is seeking a Senior Site Reliability Engineer to enhance the reliability and performance of their platform. You'll work with technologies like Docker and Kubernetes to build and maintain scalable infrastructure. This role requires 8+ years of experience in SRE or DevOps.

🏛️ On-SiteSenior

2 months ago

Site Reliability Engineer

Together AI•📍 San Francisco

Together AI is hiring a Site Reliability Engineer to ensure the reliability and performance of user-facing services and production systems. You'll work with Ansible, Terraform, and Kubernetes to build and manage infrastructure. This role requires 2+ years of experience in SRE or a related field.

Mid-Level

2w ago

Site Reliability Engineer

Mercor•📍 San Francisco - On-Site

Mercor is seeking a Site Reliability Engineer to own production reliability across critical systems. You'll work with AWS, Kubernetes, and Terraform to build and improve high-availability systems in San Francisco.

🏛️ On-SiteMid-Level

1 month ago

Site Reliability Engineer

WorkOS•📍 San Francisco - Remote

WorkOS is hiring a Site Reliability Engineer to ensure the platform remains fast, reliable, and resilient at scale. You'll work with AWS, Docker, and Kubernetes to build systems that handle hundreds of millions of requests. This role requires a strong understanding of complex systems and incident response.

🏠 Remote

8 months ago

Browse all jobs →