
About Together AI
Empowering corporate mentorship for effective learning
Key Highlights
- Founded in 2018, headquartered in Toronto, ON
- Raised $1.7 million in seed funding
- Partnerships with Heineken, Reddit, and 7-Eleven
- 4 weeks paid vacation and competitive equity packages
Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...
🎁 Benefits
Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...
🌟 Culture
Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...
Skills & Technologies
Overview
Together AI is hiring a Site Reliability Engineer to ensure the reliability and performance of user-facing services and production systems. You'll work with Ansible, Terraform, and Kubernetes to build and manage infrastructure. This role requires 2+ years of experience in SRE or a related field.
Job Description
Who you are
You have 2+ years of professional experience as a Site Reliability Engineer or in a related field, demonstrating a strong understanding of operational discipline and engineering principles. Your educational background includes a Bachelor's degree in Computer Science or a related field, or equivalent work experience. You possess knowledge of Ansible, including roles and playbooks, as well as Terraform and Kubernetes, which are essential for building and managing infrastructure. Your proficiency in programming and scripting languages allows you to automate processes effectively. You have direct experience in monitoring and observability practices, ensuring that systems are reliable and performant. Your familiarity with cloud services enhances your ability to manage scalable infrastructures. You thrive in collaborative environments, working well with cross-functional teams to achieve common goals.
Desirable
Experience with additional monitoring tools and practices would be a plus, as would familiarity with incident management systems like PagerDuty. A strong interest in algorithms and distributed systems will help you identify improvements in product architecture from reliability, performance, and availability perspectives.
What you'll do
As a Site Reliability Engineer at Together AI, you will be responsible for keeping all user-facing services and production systems running smoothly. You will participate in an on-call rotation to respond to production incidents, ensuring that any issues are addressed promptly. Your role will involve building and running infrastructure using tools like Ansible, Terraform, and Kubernetes, enabling the scaling of services to accommodate a massive number of concurrent users. You will also build monitoring systems to ensure the highest quality service for customers, designing and implementing operational processes such as deployments and upgrades. Debugging production issues across all services and levels of the stack will be a key part of your responsibilities, as will identifying improvements for the product architecture from a reliability, performance, and availability perspective. You will plan the growth of Together AI’s infrastructure, contributing to the overall success of the organization.
What we offer
Together AI offers a collaborative work environment where you can thrive as a Site Reliability Engineer. You will have the opportunity to work with cutting-edge technologies and contribute to the reliability of critical systems. The company values your input and encourages you to apply even if your experience doesn't match every requirement. We provide competitive compensation and benefits, fostering a culture of growth and development within the team. Join us in making a significant impact on the reliability and performance of our services.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Together AI.
Similar Jobs You Might Like
Based on your interests and this role

Site Reliability Engineer
Mercor is seeking a Site Reliability Engineer to own production reliability across critical systems. You'll work with AWS, Kubernetes, and Terraform to build and improve high-availability systems in San Francisco.

Site Reliability Engineer
WorkOS is hiring a Site Reliability Engineer to ensure the platform remains fast, reliable, and resilient at scale. You'll work with AWS, Docker, and Kubernetes to build systems that handle hundreds of millions of requests. This role requires a strong understanding of complex systems and incident response.

Site Reliability Engineer
Apple is seeking a Site Reliability Engineer to join their Services Engineering team. You'll be responsible for building secure, end-to-end solutions and managing the full infrastructure stack. This role requires expertise in solving complex problems at scale.

Site Reliability Engineer
Braze is hiring a Senior Site Reliability Engineer to ensure the uptime of internal-facing services and platforms. You'll work with Linux, distributed systems, and automation to maintain high service availability. This position requires a strong background in system administration and software engineering.

Site Reliability Engineer
Stellar Development Foundation is hiring a Senior Site Reliability Engineer to enhance the reliability and scalability of their systems. You'll work with AWS, GCP, and Kubernetes to support the Stellar blockchain ecosystem. This role requires strong experience in infrastructure management and automation.