
About Amazon
The everything store and cloud computing leader
Key Highlights
- Headquartered in South Lake Union, Seattle, WA
- Over 1.5 million employees worldwide
- Leading cloud services through Amazon Web Services (AWS)
- Acquired Whole Foods, Twitch, and Ring
Amazon, headquartered in South Lake Union, Seattle, WA, is the world's largest online retailer and a leader in cloud computing through Amazon Web Services (AWS). With over 1.5 million employees globally, Amazon operates in various sectors, including AI with its Alexa devices and a vast marketplace k...
🎁 Benefits
Amazon offers competitive salaries, stock options, generous PTO policies, and comprehensive health benefits. Employees also have access to a learning ...
🌟 Culture
Amazon's culture is driven by customer obsession and a focus on innovation. The company encourages employees to think big and move fast, fostering an ...
Overview
Amazon is hiring a Machine Learning Engineer for the AWS Neuron team to develop and optimize distributed training solutions for large-scale machine learning models. You'll work with technologies like Python, AWS, and PyTorch. This position requires experience in training large models and performance tuning.
Job Description
Who you are
You have a strong background in software engineering with a focus on machine learning applications — your experience includes developing and tuning large-scale ML models, particularly in cloud environments. You are proficient in Python and have hands-on experience with distributed training libraries such as Deepspeed and FSDP. Your understanding of AWS services, especially in relation to machine learning, is robust, and you are comfortable collaborating with cross-functional teams including chip architects and compiler engineers.
You thrive in an inclusive team culture and appreciate the value of diverse perspectives in driving innovation. Your ability to communicate complex technical concepts clearly and effectively makes you a valuable team member. You are passionate about optimizing performance and efficiency in machine learning applications, and you are eager to contribute to the development of cutting-edge technologies.
What you'll do
In this role, you will be responsible for developing and enabling performance tuning for a variety of machine learning model families, including large language models and vision transformers. You will collaborate closely with the distributed training team to integrate support for distributed training into frameworks like PyTorch and Jax, utilizing the Neuron compiler and runtime stacks. Your work will directly impact the efficiency of ML models running on AWS Trainium, ensuring they perform at their best.
You will lead efforts to tune these models for maximum performance, working alongside chip architects and runtime engineers to create and build distributed training solutions. Your expertise will help shape the future of machine learning applications within AWS, and you will play a key role in driving the success of the AWS Neuron initiative.
What we offer
At Amazon, we offer a competitive salary range of $165,200.00 - $223,600.00 annually, along with comprehensive benefits including health insurance, retirement plans, and paid time off. You will be part of a dynamic team that values innovation and collaboration, and you will have opportunities for professional growth and development in a supportive environment. Join us in shaping the future of machine learning at AWS.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Amazon.
Similar Jobs You Might Like
Based on your interests and this role

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer to work on the AWS Neuron SDK, focusing on accelerating deep learning and GenAI workloads. You'll utilize technologies like AWS, PyTorch, and JAX to optimize performance on custom ML accelerators. This position requires expertise in machine learning and deep learning frameworks.

Machine Learning Engineer
Amazon is hiring a Senior Machine Learning Engineer to work on the AWS Neuron SDK, which accelerates deep learning and GenAI workloads. You'll utilize AWS, PyTorch, and JAX to optimize performance for machine learning models. This position requires expertise in AI/ML and experience with custom hardware accelerators.

Machine Learning Engineer
Amazon is hiring a Machine Learning Engineer for the Annapurna Labs team to develop the AWS Neuron SDK, which accelerates deep learning and GenAI workloads. You'll work with technologies like PyTorch and AWS to optimize performance on custom ML accelerators.

Machine Learning Engineer
Amazon is hiring a Machine Learning Engineer II to work on AWS Neuron, focusing on AI/ML and model inference. You'll utilize technologies like PyTorch and JAX to accelerate deep learning workloads. This position requires experience in machine learning and AWS.

Software Development Engineer
Amazon is hiring a Software Development Engineer to work on AWS Neuron, focusing on accelerating deep learning and GenAI workloads. You'll utilize skills in AWS, Python, and PyTorch to enhance machine learning performance. This role requires experience in software development and machine learning.