About Anyscale

Effortless scalable computing for AI and Python

🏢 Tech👥 201-500 employees📅 Founded 2019📍 Yerba Buena, San Francisco, CA💰 $259.6m⭐ 3.9

B2BArtificial IntelligenceEnterpriseMachine LearningSaaS

Key Highlights

Founded by the creators of Ray, powering companies like Netflix & OpenAI
$259.6 million raised in Series C funding
Headquartered in Yerba Buena, San Francisco, CA
Serves customers like Canva, Recursion, and RunwayML

Anyscale, headquartered in Yerba Buena, San Francisco, CA, is a leader in scalable computing for AI and Python, providing an AI-native platform that seamlessly scales from a single machine to thousands of GPUs. Founded by the creators of Ray, Anyscale has raised $259.6 million in Series C funding an...

🎁 Benefits

Anyscale offers a comprehensive benefits package including a monthly learning and wellness stipend, paid volunteer time off, and 12 weeks of paid pare...

🌟 Culture

Anyscale fosters a culture focused on solving the challenges of AI infrastructure, leveraging the open-source Ray framework to enhance distributed AI ...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 27 jobs →

Site Reliability Engineer

Anyscale • San Francisco

Posted 6 months agoSite Reliability Engineer 📍 San Francisco 📍 Palo Alto

Apply Now →

Skills & Technologies

aws docker kubernetes linux

Overview

Anyscale is hiring a Site Reliability Engineer to ensure the smooth operation of user-facing services and production systems. You'll work with AWS, Docker, and Kubernetes in San Francisco or Palo Alto.

Job Description

Who you are

You have a strong background in site reliability engineering, with experience in managing cloud infrastructure and ensuring high availability of services. Your expertise in AWS and container orchestration tools like Docker and Kubernetes allows you to effectively manage and scale applications in production environments. You are proficient in Linux and have a solid understanding of networking principles, which helps you troubleshoot and optimize system performance. You value diversity and inclusion in the workplace and are eager to contribute to a collaborative team environment.

Desirable

Experience with monitoring and logging tools such as Prometheus and Grafana is a plus. Familiarity with CI/CD practices and tools will help you streamline deployment processes and improve operational efficiency. You are comfortable working in a fast-paced environment and can adapt to changing priorities while maintaining a focus on quality and reliability.

What you'll do

As a Site Reliability Engineer at Anyscale, you will play a crucial role in ensuring the smooth operation of all user-facing services and other production systems. You will develop a unified perspective on how cloud components are utilized across the company, taking into account diverse needs and requirements. Your responsibilities will include implementing sound engineering principles and operational discipline to enhance the reliability and performance of our systems. You will collaborate closely with development teams to ensure that deployment methodologies align with the company's goals and best practices. Additionally, you will identify opportunities for cost management and resource optimization, helping teams reduce wastage and improve efficiency.

What we offer

At Anyscale, you will be part of a mission-driven company that is democratizing distributed computing. We offer a competitive salary and benefits package, along with opportunities for professional growth and development. You will work in a supportive environment that values innovation and encourages you to bring your ideas to the table. Join us in building the best place to run Ray and make a significant impact in the world of scalable machine learning.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Anyscale.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Site Reliability Engineer

Anyscale•📍 Bengaluru

Anyscale is hiring a Site Reliability Engineer to ensure the smooth operation of user-facing services and production systems. You'll work with AWS, Docker, and Kubernetes in Bengaluru. This position requires experience in cloud infrastructure and automation.

Mid-Level

4 months ago

Site Reliability Engineer

Harvey•📍 San Francisco - On-Site

Harvey is hiring a Senior Site Reliability Engineer to ensure the reliability and performance of their legal AI platform. You'll work with technologies like AWS, Docker, and Kubernetes to maintain system scalability. This position requires strong experience in infrastructure and reliability engineering.

🏛️ On-SiteSenior

2 months ago

Site Reliability Engineer

Apple•📍 San Francisco - On-Site

Apple is seeking a Site Reliability Engineer to join their Services Engineering team. You'll be responsible for building secure, end-to-end solutions and managing the full infrastructure stack. This role requires expertise in solving complex problems at scale.

🏛️ On-Site

1 month ago

Site Reliability Engineer

Harvey•📍 San Francisco - On-Site

Harvey is hiring a Staff Software Engineer for their Site Reliability team to ensure the reliability and performance of their legal AI platform. You'll work with technologies like AWS, Docker, and Kubernetes. This position requires strong experience in site reliability engineering.

🏛️ On-SiteStaff

2 months ago

Site Reliability Engineer

Together AI•📍 San Francisco

Together AI is hiring a Site Reliability Engineer to ensure the reliability and performance of user-facing services and production systems. You'll work with Ansible, Terraform, and Kubernetes to build and manage infrastructure. This role requires 2+ years of experience in SRE or a related field.

Mid-Level

2w ago

Browse all jobs →