
About Genmo
Your AI partner for creative content generation
Key Highlights
- Raised $15 million in funding from leading investors
- Headquartered in San Francisco, CA
- Focus on generative AI for creative industries
- Supports creators in entertainment and marketing sectors
Genmo is a pioneering AI company that provides a creative copilot tool enabling users to generate images, videos, and 3D models through advanced generative models. With a focus on Creative General Intelligence, Genmo empowers creators across various industries, including entertainment and marketing,...
🎁 Benefits
Genmo offers competitive salaries, equity options, flexible remote work policies, and generous PTO to support work-life balance....
🌟 Culture
Genmo fosters a culture of innovation and creativity, encouraging employees to experiment with AI technologies while promoting a collaborative environ...
Skills & Technologies
Overview
Genmo is seeking a GPU Performance Engineer to optimize their H100 infrastructure for video generation. You'll leverage advanced profiling tools and write high-performance CUDA kernels to achieve significant speedups. This role requires 5+ years of systems programming experience.
Job Description
Who you are
You have a Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field, and you bring over 5 years of systems programming experience. Your expertise lies in performance optimization, and you thrive on squeezing every last FLOP from GPU infrastructure. You are passionate about microsecond optimizations and enjoy pushing hardware to its theoretical limits.
You are proficient in using advanced profiling tools such as Nsight Systems and nvprof, and you have experience writing high-performance CUDA and Triton kernels for critical model operations. Your understanding of GPU workloads allows you to optimize cold start latency and tune memory access patterns effectively.
You have a collaborative spirit, working closely with ML engineers to optimize model implementations and debug performance issues across the full stack from application to hardware. Your ability to implement custom memory pooling and allocation strategies showcases your innovative approach to performance challenges.
What you'll do
In this role, you will be the performance optimization expert at Genmo, focusing on maximizing the efficiency of our H100 infrastructure. You will profile and optimize GPU workloads, ensuring that our model serving stack operates at peak performance. Your responsibilities will include writing custom CUDA kernels and Triton kernels, as well as optimizing cold start latency from seconds to milliseconds.
You will collaborate with ML engineers to enhance model implementations, debug performance issues, and implement custom memory pooling strategies. Your work will directly contribute to achieving 5-10x speedups in our infrastructure, making a significant impact on our video generation capabilities.
You will also share optimization techniques and foster a performance culture across teams, ensuring that best practices are adopted throughout the organization. Your role will be pivotal in shaping the future of AI at Genmo, as you push the boundaries of what's possible in video generation.
What we offer
At Genmo, you will be part of a cutting-edge research lab dedicated to advancing AI technology. We offer a collaborative work environment where innovation is encouraged, and your contributions will be valued. You will have the opportunity to work with state-of-the-art models and infrastructure, making a real impact in the field of AI.
We believe in supporting our employees' growth and development, providing opportunities for continuous learning and professional advancement. Join us in shaping the future of AI and unlocking the potential of video generation.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Genmo.
Similar Jobs You Might Like
Based on your interests and this role

Performance Engineer
Anthropic is hiring a Senior Performance Engineer specializing in GPU to architect and implement foundational systems for AI. You'll focus on maximizing GPU utilization and performance, requiring deep experience in GPU programming and optimization.

Gpu Performance Analysis Engineer
Apple is hiring a GPU Performance Analysis Engineer to design and manufacture high-performance, power-efficient GPUs. You'll analyze performance issues and collaborate with architecture and verification teams. This role requires 3+ years of relevant experience and expertise in C, C++, and Python.

Software Engineering
Google is hiring a Software Engineer specializing in GPU Performance to work on optimizing high-performance GPU kernels and influencing the technical direction of the GPU software ecosystem. You'll work with technologies like CUDA and Triton, requiring 2 years of experience in software development.

Gpu Kernel Engineer
Baseten is hiring a GPU Kernel Engineer to optimize performance for cutting-edge AI workloads. You'll work with C, C++, and CUDA in San Francisco. This position requires experience in low-level optimization and machine learning.

Graphics (gpu) Performance Analysis Engineer
Apple is hiring a Graphics (GPU) Performance Analysis Engineer to develop performance test plans and analyze GPU performance issues. You'll work with C, C++, and OpenGL in Austin. This position requires experience in computer architecture and GPU performance analysis.