
About Amazon
The everything store and cloud computing leader
Key Highlights
- Headquartered in South Lake Union, Seattle, WA
- Over 1.5 million employees worldwide
- Leading cloud services through Amazon Web Services (AWS)
- Acquired Whole Foods, Twitch, and Ring
Amazon, headquartered in South Lake Union, Seattle, WA, is the world's largest online retailer and a leader in cloud computing through Amazon Web Services (AWS). With over 1.5 million employees globally, Amazon operates in various sectors, including AI with its Alexa devices and a vast marketplace k...
🎁 Benefits
Amazon offers competitive salaries, stock options, generous PTO policies, and comprehensive health benefits. Employees also have access to a learning ...
🌟 Culture
Amazon's culture is driven by customer obsession and a focus on innovation. The company encourages employees to think big and move fast, fostering an ...
Skills & Technologies
Overview
Amazon is hiring a Senior Machine Learning Engineer to develop and optimize distributed training solutions for large-scale ML models. You'll work with AWS Trainium and frameworks like Hugging Face and TensorFlow. This position requires expertise in machine learning and distributed systems.
Job Description
Who you are
You have 5+ years of experience in software engineering with a strong focus on machine learning and distributed systems — you've successfully designed and implemented large-scale ML models and understand the intricacies of optimizing performance across diverse architectures. Your expertise in Python and familiarity with frameworks such as TensorFlow and PyTorch enable you to tackle complex challenges in ML training.
You possess a deep understanding of distributed training frameworks and have experience extending and optimizing libraries like FSDP and Hugging Face for performance on cloud-based systems. Your collaborative spirit shines through as you work closely with chip architects and runtime engineers to deliver efficient solutions that meet customer needs.
What you'll do
In this role, you will be responsible for designing, implementing, and optimizing distributed training solutions for large-scale ML models running on AWS Trainium instances. You will focus on enhancing the performance of ML model training, including pre-training and post-training of large language models and multimodal models. Your work will involve developing mixed-precision and low-precision training techniques to improve efficiency and reduce costs.
You will collaborate with cross-functional teams, including ML researchers and AWS solution architects, to ensure that the solutions you develop are not only performant but also cost-effective. Your contributions will directly impact the capabilities of AWS Trainium, enabling customers to leverage advanced machine learning technologies in their applications.
What we offer
Amazon provides a comprehensive benefits package that includes health insurance, retirement plans, and generous paid time off. You will have the opportunity to work in a dynamic environment where innovation is encouraged, and your contributions will help shape the future of machine learning in the cloud. Join us and be part of a team that is dedicated to pushing the boundaries of technology and delivering impactful solutions to customers around the world.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Amazon.
Similar Jobs You Might Like
Based on your interests and this role

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer to develop and optimize distributed training solutions for AWS Neuron. You'll work with technologies like Python, PyTorch, and AWS to enhance performance for large-scale ML models. This position requires experience in training large models and distributed systems.

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer for the AWS Neuron Distributed Training team. You'll develop and optimize distributed training solutions for large-scale ML models using Python and various libraries. This role requires expertise in machine learning and cloud technologies.

Machine Learning Engineer
Amazon is hiring a Machine Learning Engineer to develop and optimize large-scale ML model training solutions. You'll work with AWS Trainium and collaborate with cross-functional teams to deliver impactful machine learning products. This position requires experience in machine learning frameworks and AWS technologies.

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer to develop and optimize software solutions for AWS Neuron. You'll work with AWS services and machine learning frameworks to build scalable applications. This position requires expertise in Python and machine learning technologies.

Machine Learning Engineer
Amazon is hiring a Machine Learning Engineer for the AWS Neuron team to develop and optimize distributed training solutions for large-scale machine learning models. You'll work with technologies like Python, AWS, and PyTorch. This position requires experience in training large models and performance tuning.