
About Anyscale
Effortless scalable computing for AI and Python
Key Highlights
- Founded by the creators of Ray, powering companies like Netflix & OpenAI
- $259.6 million raised in Series C funding
- Headquartered in Yerba Buena, San Francisco, CA
- Serves customers like Canva, Recursion, and RunwayML
Anyscale, headquartered in Yerba Buena, San Francisco, CA, is a leader in scalable computing for AI and Python, providing an AI-native platform that seamlessly scales from a single machine to thousands of GPUs. Founded by the creators of Ray, Anyscale has raised $259.6 million in Series C funding an...
🎁 Benefits
Anyscale offers a comprehensive benefits package including a monthly learning and wellness stipend, paid volunteer time off, and 12 weeks of paid pare...
🌟 Culture
Anyscale fosters a culture focused on solving the challenges of AI infrastructure, leveraging the open-source Ray framework to enhance distributed AI ...
Skills & Technologies
Overview
Anyscale is hiring a Site Reliability Engineer to ensure the smooth operation of user-facing services and production systems. You'll work with AWS, Docker, and Kubernetes in San Francisco or Palo Alto.
Job Description
Who you are
You have a strong background in site reliability engineering, with experience in managing cloud infrastructure and ensuring high availability of services. Your expertise in AWS and container orchestration tools like Docker and Kubernetes allows you to effectively manage and scale applications in production environments. You are proficient in Linux and have a solid understanding of networking principles, which helps you troubleshoot and optimize system performance. You value diversity and inclusion in the workplace and are eager to contribute to a collaborative team environment.
Desirable
Experience with monitoring and logging tools such as Prometheus and Grafana is a plus. Familiarity with CI/CD practices and tools will help you streamline deployment processes and improve operational efficiency. You are comfortable working in a fast-paced environment and can adapt to changing priorities while maintaining a focus on quality and reliability.
What you'll do
As a Site Reliability Engineer at Anyscale, you will play a crucial role in ensuring the smooth operation of all user-facing services and other production systems. You will develop a unified perspective on how cloud components are utilized across the company, taking into account diverse needs and requirements. Your responsibilities will include implementing sound engineering principles and operational discipline to enhance the reliability and performance of our systems. You will collaborate closely with development teams to ensure that deployment methodologies align with the company's goals and best practices. Additionally, you will identify opportunities for cost management and resource optimization, helping teams reduce wastage and improve efficiency.
What we offer
At Anyscale, you will be part of a mission-driven company that is democratizing distributed computing. We offer a competitive salary and benefits package, along with opportunities for professional growth and development. You will work in a supportive environment that values innovation and encourages you to bring your ideas to the table. Join us in building the best place to run Ray and make a significant impact in the world of scalable machine learning.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Anyscale.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Anyscale is hiring a Site Reliability Engineer to ensure the smooth operation of user-facing services and production systems. You'll work with AWS, Docker, and Kubernetes in Bengaluru. This position requires experience in cloud infrastructure and automation.

Site Reliability Engineer
Harvey is hiring a Senior Site Reliability Engineer to ensure the reliability and performance of their legal AI platform. You'll work with technologies like AWS, Docker, and Kubernetes to maintain system scalability. This position requires strong experience in infrastructure and reliability engineering.

Site Reliability Engineer
Apple is seeking a Site Reliability Engineer to join their Services Engineering team. You'll be responsible for building secure, end-to-end solutions and managing the full infrastructure stack. This role requires expertise in solving complex problems at scale.

Site Reliability Engineer
Harvey is hiring a Staff Software Engineer for their Site Reliability team to ensure the reliability and performance of their legal AI platform. You'll work with technologies like AWS, Docker, and Kubernetes. This position requires strong experience in site reliability engineering.

Site Reliability Engineer
Together AI is hiring a Site Reliability Engineer to ensure the reliability and performance of user-facing services and production systems. You'll work with Ansible, Terraform, and Kubernetes to build and manage infrastructure. This role requires 2+ years of experience in SRE or a related field.