
About Cohere
AI solutions built for enterprise trust and security
Key Highlights
- Headquartered in Grange Park, Toronto, ON
- $1.5 billion in funding from top investors
- Clients include Royal Bank of Canada, Fujitsu, and Oracle
- Focus on AI solutions for regulated industries
Cohere, headquartered in Grange Park, Toronto, ON, specializes in enterprise-grade AI solutions tailored for regulated industries such as banking and telecom. With $1.5 billion in funding, Cohere has secured contracts with major clients including Royal Bank of Canada, Fujitsu, and Oracle, providing ...
🎁 Benefits
Cohere offers comprehensive benefits including 100% coverage for health, dental, and vision insurance premiums, a $2,000 annual education benefit, six...
🌟 Culture
Cohere's culture emphasizes security and trust in AI adoption, focusing on enterprise needs rather than consumer trends. The company prioritizes a sup...
Skills & Technologies
Overview
Cohere is hiring a Senior ML Systems Engineer to build and maintain the training framework for large-scale language models. You'll work with technologies like Python, TensorFlow, and Kubernetes in London. This position requires significant experience in machine learning systems.
Job Description
Who you are
You have 5+ years of experience in machine learning systems engineering, with a strong focus on building and maintaining frameworks for large-scale model training. Your expertise in Python and familiarity with machine learning libraries like TensorFlow and PyTorch enable you to contribute effectively to complex projects. You understand distributed systems and have experience working with high-performance computing (HPC) infrastructure, which allows you to design scalable solutions that meet the demands of modern AI applications.
You are comfortable working across the full stack of ML systems, from data processing to model deployment. Your ability to collaborate with researchers and engineers ensures that you can translate innovative ideas into practical tools that enhance model training efficiency. You thrive in environments that require quick problem-solving and adaptability, and you are committed to delivering high-quality results that drive value for customers.
Desirable
Experience with Kubernetes and Docker is a plus, as these tools are essential for managing containerized applications in a distributed environment. Familiarity with cloud platforms such as AWS or GCP will also be beneficial, as you will be working with cloud-based resources to scale training processes.
What you'll do
In this role, you will build and own the training framework responsible for large-scale language model training. You will design distributed training abstractions that optimize data flow and processing efficiency, ensuring that our models can be trained quickly and reliably. Your work will involve collaborating with cross-functional teams to connect research ideas to thousands of GPUs, enabling rapid experimentation and deployment of new features.
You will also be responsible for maintaining and evolving the core components of our training infrastructure, ensuring that they remain robust and scalable as our models grow in complexity. Your contributions will directly impact the capabilities of our models and the experiences they provide to users.
What we offer
Cohere provides a supportive and inclusive work environment where you can thrive. We offer a competitive salary and benefits package, including a generous vacation policy and parental leave top-up. Our commitment to mental health and personal enrichment ensures that you have the resources you need to maintain a healthy work-life balance. With offices in London, Toronto, New York, San Francisco, and Paris, we also offer flexible remote work options to accommodate your lifestyle.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Cohere.
Similar Jobs You Might Like
Based on your interests and this role

Machine Learning Engineer
UiPath is hiring a Senior Machine Learning Engineer to build the core platform for their AI and Document Understanding products. You'll work with Python and Rust to tackle foundational problems in a production ML environment. This role requires experience in distributed systems and machine learning.

Machine Learning Engineer
Graphcore is hiring a Senior Machine Learning Engineer to advance AI technology by developing and optimizing AI models for specialized hardware. You'll work on large-scale systems and collaborate closely with software development and research teams. This role requires strong technical skills in AI model implementation.

Machine Learning Engineer
UiPath is hiring a Senior Machine Learning Engineer to build the core platform for their AI and Document Understanding products. You'll work with Rust and Python in a distributed systems environment. This position requires experience in machine learning and systems-level engineering.

Machine Learning Engineer
Faculty is hiring a Senior Machine Learning Engineer to lead the development and deployment of AI systems for diverse clients. You'll work with technologies like Python, TensorFlow, and AWS to create scalable ML solutions. This position requires significant experience in machine learning and AI.

Machine Learning Engineer
Faculty is seeking a Senior Machine Learning Engineer to lead the development and deployment of responsible AI solutions. You'll work with a diverse range of clients across various industries, leveraging your expertise in AI to drive innovation. This role requires significant experience in machine learning and AI technologies.