
About Amazon
The everything store and cloud computing leader
Key Highlights
- Headquartered in South Lake Union, Seattle, WA
- Over 1.5 million employees worldwide
- Leading cloud services through Amazon Web Services (AWS)
- Acquired Whole Foods, Twitch, and Ring
Amazon, headquartered in South Lake Union, Seattle, WA, is the world's largest online retailer and a leader in cloud computing through Amazon Web Services (AWS). With over 1.5 million employees globally, Amazon operates in various sectors, including AI with its Alexa devices and a vast marketplace k...
🎁 Benefits
Amazon offers competitive salaries, stock options, generous PTO policies, and comprehensive health benefits. Employees also have access to a learning ...
🌟 Culture
Amazon's culture is driven by customer obsession and a focus on innovation. The company encourages employees to think big and move fast, fostering an ...
Overview
Amazon is hiring a Senior Machine Learning Engineer for the AWS Neuron Distributed Training team. You'll develop and optimize distributed training solutions for large-scale ML models using Python and various libraries. This role requires expertise in machine learning and cloud technologies.
Job Description
Who you are
You have 5+ years of experience in machine learning engineering, particularly with distributed training of large models. Your expertise includes working with frameworks like PyTorch and JAX, and you understand the intricacies of optimizing models for performance on custom silicon. You are proficient in Python and have experience with libraries such as Deepspeed and Nemo, which are essential for building efficient distributed training solutions.
You thrive in collaborative environments, working alongside chip architects and compiler engineers to create innovative solutions. Your strong analytical skills allow you to tackle complex technical challenges, and you have a proven track record of delivering results that drive significant impact. You are passionate about advancing the field of machine learning and are eager to contribute to cutting-edge projects.
Desirable
Experience with AWS services and cloud-based solutions is a plus. Familiarity with large language models like GPT and Llama, as well as vision transformers, will help you excel in this role. You are also open to learning new technologies and methodologies that can enhance your work and the team's output.
What you'll do
In this role, you will lead efforts to integrate distributed training support into PyTorch and JAX, utilizing the Neuron compiler and runtime stacks. Your primary responsibility will be to optimize machine learning models to achieve peak performance on AWS custom silicon, ensuring that they run efficiently and effectively. You will collaborate closely with cross-functional teams to develop and enable a wide variety of ML model families, including massive-scale models.
You will be responsible for performance tuning and enabling distributed training solutions that can handle the demands of large-scale machine learning tasks. Your work will directly contribute to the success of AWS Neuron, helping to deliver innovative cloud solutions that address complex challenges. You will also mentor junior engineers, sharing your knowledge and expertise to foster a culture of learning and growth within the team.
What we offer
Amazon provides a dynamic work environment where you can make a significant impact on the future of machine learning and cloud computing. You will have access to cutting-edge technologies and the opportunity to work on projects that push the boundaries of what is possible. We offer competitive compensation packages, including equity and comprehensive benefits, to ensure that you are well-supported in your role. Join us and be part of a team that is changing the world through technology.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Amazon.
Similar Jobs You Might Like
Based on your interests and this role

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer to develop and optimize distributed training solutions for AWS Neuron. You'll work with technologies like Python, PyTorch, and AWS to enhance performance for large-scale ML models. This position requires experience in training large models and distributed systems.

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer to develop and optimize distributed training solutions for large-scale ML models. You'll work with AWS Trainium and frameworks like Hugging Face and TensorFlow. This position requires expertise in machine learning and distributed systems.

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer to develop and optimize software solutions for AWS Neuron. You'll work with AWS services and machine learning frameworks to build scalable applications. This position requires expertise in Python and machine learning technologies.

Machine Learning Engineer
Amazon is hiring a Machine Learning Engineer to develop and optimize large-scale ML model training solutions. You'll work with AWS Trainium and collaborate with cross-functional teams to deliver impactful machine learning products. This position requires experience in machine learning frameworks and AWS technologies.

Machine Learning Engineer
Amazon is hiring a Machine Learning Engineer for the AWS Neuron team to develop and optimize distributed training solutions for large-scale machine learning models. You'll work with technologies like Python, AWS, and PyTorch. This position requires experience in training large models and performance tuning.