
About Together AI
Empowering corporate mentorship for effective learning
Key Highlights
- Founded in 2018, headquartered in Toronto, ON
- Raised $1.7 million in seed funding
- Partnerships with Heineken, Reddit, and 7-Eleven
- 4 weeks paid vacation and competitive equity packages
Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...
🎁 Benefits
Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...
🌟 Culture
Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...
Skills & Technologies
Overview
Together AI is seeking an LLM Inference Frameworks and Optimization Engineer to design and optimize distributed inference engines for large language models. You'll work with technologies like CUDA, TensorRT, and PyTorch to enhance performance and scalability. This role requires expertise in distributed systems and machine learning.
Job Description
Who you are
You have a strong background in AI engineering with a focus on inference frameworks and optimization. Your experience includes designing and developing distributed systems that support high-performance AI applications. You are proficient in CUDA and have worked with TensorRT and PyTorch to optimize model performance. You understand the intricacies of GPU and accelerator optimizations, and you are familiar with algorithms that enhance inference efficiency. You thrive in collaborative environments, working closely with hardware and software teams to ensure seamless integration and performance. You are passionate about pushing the boundaries of AI inference and are eager to contribute to innovative projects.
Desirable
Experience with multimodal models and techniques such as Mixture of Experts (MoE) parallelism is a plus. Familiarity with software-hardware co-design principles will set you apart. You have a keen interest in the latest advancements in AI and are always looking to learn and apply new technologies.
What you'll do
In this role, you will design and develop fault-tolerant, high-concurrency distributed inference engines for text, image, and multimodal generation models. You will implement and optimize distributed inference strategies, including tensor parallelism and pipeline parallelism, to ensure high-performance serving. Your work will involve applying CUDA graph optimizations and TensorRT/TRT-LLM graph optimizations to enhance the efficiency and scalability of large language models. You will collaborate with hardware teams to ensure that the software and hardware components work seamlessly together, contributing to the overall success of the AI infrastructure. You will also engage in research and development to explore new algorithms and techniques that can further improve inference performance.
What we offer
Together AI provides a dynamic work environment where innovation is encouraged. You will have the opportunity to work on cutting-edge AI technologies and contribute to projects that have a significant impact on the industry. We offer competitive compensation and benefits, along with opportunities for professional growth and development. Join us in shaping the future of AI inference infrastructure and be part of a team that values creativity and collaboration.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Together AI.
Similar Jobs You Might Like
Based on your interests and this role

Ai Engineer
Coin Market Cap Ltd is hiring an LLM Algorithm Engineer to develop and optimize large language models. You'll work with advanced techniques like SFT and RLHF, utilizing frameworks such as PyTorch and TensorFlow. This position requires 3+ years of experience in the field.

Ai Engineer
MongoDB is seeking an LLM Optimization Lead to drive growth via Large Language Models and AI platforms. You'll work on optimizing brand visibility and customer acquisition strategies. This role requires expertise in AI and SEO.

Backend Engineer
Together AI is seeking a Senior Backend Engineer to build and optimize their Inference Platform for advanced generative AI models. You'll work with technologies like Python, Docker, and AWS to enhance performance and scalability. This role requires strong experience in backend engineering and machine learning.

Ai Engineer
SonarSource is hiring an LLM Engineer to work on pioneering AI and ML projects within software engineering. You'll develop novel algorithms and enhance system performance. This role requires expertise in machine learning and Python.

Machine Learning Engineer
OpenAI is hiring a Machine Learning Engineer to improve the training throughput of their internal training framework. You'll work with Python, TensorFlow, and PyTorch to enable researchers to experiment with new ideas. This position requires strong engineering skills and knowledge of supercomputer performance.