
About Anyscale
Effortless scalable computing for AI and Python
Key Highlights
- Founded by the creators of Ray, powering companies like Netflix & OpenAI
- $259.6 million raised in Series C funding
- Headquartered in Yerba Buena, San Francisco, CA
- Serves customers like Canva, Recursion, and RunwayML
Anyscale, headquartered in Yerba Buena, San Francisco, CA, is a leader in scalable computing for AI and Python, providing an AI-native platform that seamlessly scales from a single machine to thousands of GPUs. Founded by the creators of Ray, Anyscale has raised $259.6 million in Series C funding an...
🎁 Benefits
Anyscale offers a comprehensive benefits package including a monthly learning and wellness stipend, paid volunteer time off, and 12 weeks of paid pare...
🌟 Culture
Anyscale fosters a culture focused on solving the challenges of AI infrastructure, leveraging the open-source Ray framework to enhance distributed AI ...
Skills & Technologies
Overview
Anyscale is hiring a Site Reliability Engineer to ensure the smooth operation of user-facing services and production systems. You'll work with AWS, Docker, and Kubernetes in Bengaluru. This position requires experience in cloud infrastructure and automation.
Job Description
Who you are
You have a strong background in site reliability engineering or a related field, with a focus on ensuring the reliability and performance of production systems. Your experience with cloud infrastructure, particularly AWS, has equipped you with the skills to manage and optimize cloud resources effectively. You are proficient in automation tools and practices, using technologies like Docker and Kubernetes to streamline deployment processes and enhance system reliability.
You possess solid programming skills, particularly in Python, which you use to develop scripts and tools that improve operational efficiency. Your understanding of Linux systems allows you to troubleshoot and resolve issues quickly, ensuring minimal downtime for user-facing services. You are a proactive problem solver who enjoys identifying opportunities for improvement and implementing solutions that enhance system performance.
You value collaboration and communication, working closely with cross-functional teams to align deployment methodologies with company goals. Your ability to analyze system performance metrics helps you make data-driven decisions that contribute to the overall success of the organization. You are committed to fostering a culture of diversity and inclusion, encouraging applications from individuals of all backgrounds.
Desirable
Experience with monitoring and alerting tools is a plus, as it allows you to maintain high availability and performance standards. Familiarity with CI/CD pipelines will further enhance your ability to deliver reliable software updates efficiently.
What you'll do
In this role, you will play a crucial part in ensuring the smooth operation of all user-facing services and other Anyscale production systems. You will develop a unified perspective on how cloud components are utilized across the company, taking into account diverse needs and requirements. Your responsibilities will include implementing sound engineering principles and operational discipline to enhance system reliability.
You will collaborate with engineering teams to design and implement scalable solutions that meet the demands of our growing user base. Your expertise in automation will be essential as you work to streamline processes and reduce operational overhead. You will also be responsible for monitoring system performance, identifying bottlenecks, and implementing solutions to improve efficiency.
As part of your role, you will engage in capacity planning and cost management, ensuring that resources are allocated effectively and efficiently. You will also participate in incident response activities, working to resolve issues quickly and minimize impact on users. Your contributions will help shape the future of Anyscale's infrastructure, enabling us to deliver exceptional service to our customers.
What we offer
At Anyscale, we are committed to creating an inclusive and supportive work environment. We offer competitive compensation and benefits, along with opportunities for professional growth and development. You will have the chance to work with cutting-edge technologies and contribute to a mission-driven company that is transforming the landscape of distributed computing. Join us in our journey to democratize access to scalable machine learning solutions.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Anyscale.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Aerospike is seeking a Site Reliability Engineer to enhance the reliability and performance of their cloud platform. You'll work with technologies like AWS, Docker, and Kubernetes to optimize infrastructure and services. This role requires experience in automation and infrastructure management.

Site Reliability Engineer
Point72 is hiring a Site Reliability Engineer to enhance system reliability and automate operational workflows. You'll work with tools like Datadog and Jenkins, and support database platforms such as SQL Server and MongoDB. This role requires expertise in automation and observability solutions.

Site Reliability Engineer
Instabase is hiring a Senior Site Reliability Engineer to lead the development of scalable and fault-tolerant systems. You'll work at the intersection of software and systems engineering to ensure high availability and performance. This role requires a strong background in reliability engineering and system performance.

Site Reliability Engineer

Site Reliability Engineer
66degrees is hiring a Site Reliability Engineer to help clients maintain, optimize, and scale their cloud implementations. You'll work with Google Cloud Platform and DevOps methodologies. This role requires expertise in cloud technologies.