
About Together AI
Empowering corporate mentorship for effective learning
Key Highlights
- Founded in 2018, headquartered in Toronto, ON
- Raised $1.7 million in seed funding
- Partnerships with Heineken, Reddit, and 7-Eleven
- 4 weeks paid vacation and competitive equity packages
Together is a corporate mentorship management platform founded in 2018, headquartered in CityPlace, Toronto, ON. The platform streamlines the mentorship lifecycle, facilitating connections among employees at companies like Heineken, Reddit, and 7-Eleven. With $1.7 million in seed funding, Together a...
🎁 Benefits
Together offers competitive salaries and equity packages, 4 weeks of paid vacation, and a comprehensive health, dental, and vision plan through Honeyb...
🌟 Culture
Together fosters a culture of autonomy and impact, allowing employees to take on significant responsibilities without bureaucratic constraints. The fo...
Skills & Technologies
Overview
Together AI is hiring a Staff Engineer to design and deliver multi-petabyte storage systems for AI workloads. You'll work with technologies like WekaFS, Ceph, and Kubernetes to optimize high-performance storage solutions. This position requires extensive experience in distributed storage and HPC infrastructure.
Job Description
Who you are
You have a strong background in designing and delivering multi-petabyte storage systems, particularly for AI training and inference workloads. Your experience includes architecting high-performance parallel filesystems and object stores, and you are adept at integrating cutting-edge technologies such as WekaFS, Ceph, and Lustre. You have a proven track record of driving cost optimization, routinely achieving significant savings through intelligent tiering and lifecycle policies.
You are skilled in building Kubernetes-native storage operators and self-service platforms that enable automated provisioning and strict multi-tenancy. Your expertise extends to optimizing end-to-end data paths for high throughput, ensuring that systems can handle demanding workloads efficiently. You are familiar with designing multi-tier caching architectures and implementing intelligent prefetching strategies.
What you'll do
In this role, you will be responsible for designing multi-petabyte AI/ML storage systems and integrating technologies like WekaFS and Ceph. You will lead capacity planning and cost optimization efforts, achieving substantial savings through effective tiering and lifecycle policies. Your work will involve designing and optimizing RDMA and InfiniBand networks, tuning them for maximum throughput and minimum latency.
You will implement NVMe-oF and iSCSI protocols, troubleshoot bottlenecks, and optimize TCP/IP for storage solutions. Additionally, you will build Kubernetes storage operators and controllers, enabling automated provisioning and creating reusable Helm and Terraform patterns. Your goal will be to deliver high-performance storage solutions capable of supporting 10-50 GB/s per GPU node, optimizing caching strategies to enhance performance further.
What we offer
Together AI offers a collaborative work environment where innovation is encouraged. You will have the opportunity to work with a talented team dedicated to pushing the boundaries of AI infrastructure. We provide a hybrid working model, allowing you to balance your time between our Amsterdam office and remote work. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Together AI.
Similar Jobs You Might Like
Based on your interests and this role

Staff Engineer
Together AI is hiring a Staff Engineer to design and deliver multi-petabyte storage systems for AI workloads. You'll work with technologies like Kubernetes, Ceph, and Lustre to optimize high-performance storage solutions. This role requires expertise in distributed systems and storage architecture.

Ai Engineer
Stacks is hiring a Staff AI Engineer to lead their AI and ML initiatives. You'll build the machine learning and data function from the ground up, focusing on transforming financial processes. This role requires expertise in AI and machine learning.

Hpc Cluster Engineer
Nebius AI is seeking a Senior HPC Cluster Engineer to enhance and optimize their cutting-edge hyperscaler platform. You'll work with GPU computing and InfiniBand networks, focusing on performance tuning and automation. This role requires expertise in high-performance computing environments.

Systems Engineer
Nebius AI is seeking a Systems Engineer to support benchmarking of GPU platforms for machine learning and AI workloads. You'll work closely with hardware and development teams to evaluate GPU performance using technologies like CUDA. This position requires expertise in AI and deep learning frameworks.

It Infrastructure Engineer
Nebius AI is seeking a Senior IT Infrastructure Engineer to support IT operations across multiple data centers globally. You'll work with cutting-edge technologies in cloud computing and infrastructure management. This role requires strong expertise in cloud environments and data operations.