
About Sesame
Affordable healthcare access without insurance hassles
Key Highlights
- Headquartered in New York, NY on Canal Street
- $76.1 million raised in Series B funding
- 101-200 employees, fostering a diverse workplace
- Marketplace model for both B2C and B2B healthcare
Sesame is a healthcare marketplace platform headquartered on Canal Street in New York, NY, that enables patients to access high-quality medical care at affordable self-pay prices. With $76.1 million raised in Series B funding, Sesame has attracted significant investment, allowing users to search, co...
🎁 Benefits
Employees enjoy a flexible vacation policy, comprehensive health care coverage options, and the opportunity to work in a fun, international environmen...
🌟 Culture
Sesame fosters a unique culture focused on transparency and accessibility in healthcare, empowering patients to make informed decisions while simplify...
Skills & Technologies
Overview
Sesame is hiring an ML Model Serving Engineer to enhance their serving layer for LLM, speech, and vision models. You'll work with PyTorch and optimize machine learning models for high throughput and low latency. This position requires significant systems programming experience.
Job Description
Who you are
You are an expert in differentiable array computing frameworks, preferably PyTorch, with a strong background in optimizing machine learning models for serving reliably at high throughput and low latency. Your significant systems programming experience means you are comfortable with the internals of high-performance server systems, and you have a knack for modifying and extending LLM serving frameworks like VLLM and SGLang to leverage the latest techniques in model serving. You understand the importance of reducing model initialization times without sacrificing quality, and you are familiar with techniques like in-flight batching, caching, and custom kernels to speed up inference.
What you'll do
In this role, you will turbocharge Sesame's serving layer, which consists of a variety of LLM, speech, and vision models. You will partner closely with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer that powers a new consumer product category. Your responsibilities will include identifying opportunities to produce faster models without sacrificing quality and implementing optimizations that enhance the overall performance of the serving layer. You will also collaborate with the training team to ensure that the models are not only efficient but also maintain high standards of accuracy and reliability.
What we offer
Sesame provides a full-time position with competitive benefits, including 401k matching, 100% employer-paid health, vision, and dental benefits, unlimited PTO, and flexible spending account matching. You will be part of a team that is shaping the future of lifelike computers, working alongside founders from Oculus and Ubiquity6, and leaders from Meta, Google, and Apple. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.
Interested in this role?
Apply now or save it for later. Get alerts for similar jobs at Sesame.
Similar Jobs You Might Like
Based on your interests and this role

Software Engineering
Databricks is hiring a Senior Software Engineer for their Model Serving team to design and build systems for deploying AI/ML models. You'll work with technologies like Python and focus on scalability and reliability in San Francisco.

Software Engineering
Anyscale is seeking a Software Engineer for their Model Serving Infrastructure team to develop high-performance machine learning serving systems. You'll work with Python and distributed systems to democratize AI applications. This role requires expertise in machine learning and a passion for scalable computing.

Staff Engineer
Databricks is hiring a Staff Software Engineer for their Model Serving team to design and build systems for AI/ML model deployment. You'll work with Python and cloud technologies to ensure high-throughput, low-latency inference. This position requires significant experience in software engineering and machine learning.

Engineering Manager
Databricks is seeking a Senior Engineering Manager to lead the Model Serving team, focusing on product experience and foundational infrastructure. You'll work with technologies like Python, Java, and Kubernetes to enhance AI/ML model deployment. This role requires strong leadership and technical expertise.

Machine Learning Engineer
Uber is hiring a Senior ML Engineer to develop and productionize machine learning models for dynamic security systems. You'll work with Python, TensorFlow, and Kubernetes in San Francisco. This position requires 5+ years of experience in ML and security contexts.