About Together AI

Empowering corporate mentorship for effective learning

👥 21-100 employees📍 CityPlace, Toronto, ON💰 $1.7m

B2BHRLearningSaaSCommunity

Key Highlights

Founded in 2018, headquartered in Toronto, ON
Raised $1.7 million in seed funding
Partnerships with Heineken, Reddit, and 7-Eleven
4 weeks paid vacation and competitive equity packages

Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...

🎁 Benefits

Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...

🌟 Culture

Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...

🌐 Website All 36 jobs →

Site Reliability Engineer • Mid-Level

Together AI • San Francisco

Posted 2w agoMid-Level Site Reliability Engineer 📍 San Francisco💰 $150,000 - $200,000 / yearly

Apply Now →

Skills & Technologies

ansible terraform kubernetes linux

Overview

Together AI is hiring a Site Reliability Engineer to ensure the reliability and performance of user-facing services and production systems. You'll work with Ansible, Terraform, and Kubernetes to build and manage infrastructure. This role requires 2+ years of experience in SRE or a related field.

Job Description

Who you are

You have 2+ years of professional experience as a Site Reliability Engineer or in a related field, demonstrating a strong understanding of operational discipline and engineering principles. Your educational background includes a Bachelor's degree in Computer Science or a related field, or equivalent work experience. You possess knowledge of Ansible, including roles and playbooks, as well as Terraform and Kubernetes, which are essential for building and managing infrastructure. Your proficiency in programming and scripting languages allows you to automate processes effectively. You have direct experience in monitoring and observability practices, ensuring that systems are reliable and performant. Your familiarity with cloud services enhances your ability to manage scalable infrastructures. You thrive in collaborative environments, working well with cross-functional teams to achieve common goals.

Desirable

Experience with additional monitoring tools and practices would be a plus, as would familiarity with incident management systems like PagerDuty. A strong interest in algorithms and distributed systems will help you identify improvements in product architecture from reliability, performance, and availability perspectives.

What you'll do

As a Site Reliability Engineer at Together AI, you will be responsible for keeping all user-facing services and production systems running smoothly. You will participate in an on-call rotation to respond to production incidents, ensuring that any issues are addressed promptly. Your role will involve building and running infrastructure using tools like Ansible, Terraform, and Kubernetes, enabling the scaling of services to accommodate a massive number of concurrent users. You will also build monitoring systems to ensure the highest quality service for customers, designing and implementing operational processes such as deployments and upgrades. Debugging production issues across all services and levels of the stack will be a key part of your responsibilities, as will identifying improvements for the product architecture from a reliability, performance, and availability perspective. You will plan the growth of Together AI’s infrastructure, contributing to the overall success of the organization.

What we offer

Together AI offers a collaborative work environment where you can thrive as a Site Reliability Engineer. You will have the opportunity to work with cutting-edge technologies and contribute to the reliability of critical systems. The company values your input and encourages you to apply even if your experience doesn't match every requirement. We provide competitive compensation and benefits, fostering a culture of growth and development within the team. Join us in making a significant impact on the reliability and performance of our services.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Together AI.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Site Reliability Engineer

Mercor•📍 San Francisco - On-Site

Mercor is seeking a Site Reliability Engineer to own production reliability across critical systems. You'll work with AWS, Kubernetes, and Terraform to build and improve high-availability systems in San Francisco.

🏛️ On-SiteMid-Level

1 month ago

Site Reliability Engineer

WorkOS•📍 San Francisco - Remote

WorkOS is hiring a Site Reliability Engineer to ensure the platform remains fast, reliable, and resilient at scale. You'll work with AWS, Docker, and Kubernetes to build systems that handle hundreds of millions of requests. This role requires a strong understanding of complex systems and incident response.

🏠 Remote

8 months ago

Site Reliability Engineer

Apple•📍 San Francisco - On-Site

Apple is seeking a Site Reliability Engineer to join their Services Engineering team. You'll be responsible for building secure, end-to-end solutions and managing the full infrastructure stack. This role requires expertise in solving complex problems at scale.

🏛️ On-Site

1 month ago

Site Reliability Engineer

Braze•📍 San Francisco - On-Site

Braze is hiring a Senior Site Reliability Engineer to ensure the uptime of internal-facing services and platforms. You'll work with Linux, distributed systems, and automation to maintain high service availability. This position requires a strong background in system administration and software engineering.

🏛️ On-SiteSenior

1w ago

Site Reliability Engineer

Stellar Development Foundation•📍 San Francisco - On-Site

Stellar Development Foundation is hiring a Senior Site Reliability Engineer to enhance the reliability and scalability of their systems. You'll work with AWS, GCP, and Kubernetes to support the Stellar blockchain ecosystem. This role requires strong experience in infrastructure management and automation.

🏛️ On-SiteSenior

3w ago

Browse all jobs →