Exa

About Exa

Transforming AI queries with precise search infrastructure

🏢 Tech👥 21-100 employees📍 Lower Haight, San Francisco, CA💰 $112.2m
B2CB2BArtificial Intelligence

Key Highlights

  • Raised $112.2 million in Series A funding
  • Headquartered in Lower Haight, San Francisco, CA
  • 21-100 employees focused on AI search technology
  • Unique embeddings-based search engine eliminating AI 'hallucinations'

Exa is revolutionizing AI search infrastructure with its embeddings-based search engine, designed to enhance the accuracy of AI model responses by filtering the internet for precise knowledge. Headquartered in Lower Haight, San Francisco, Exa has raised $112.2 million in funding through several succ...

🎁 Benefits

Exa offers competitive salaries, equity options, and a flexible remote work policy. Employees enjoy generous PTO, parental leave, and a budget for pro...

🌟 Culture

Exa fosters a culture focused on innovation and precision in AI search technology. The team values collaboration and is committed to building a unique...

Overview

Exa is seeking a Data Engineer to architect and build the data infrastructure for their innovative search engine. You'll work with technologies like Rust, Kafka, and Flink to develop large-scale data processing systems. This role requires a deep understanding of lakehouse architectures and distributed data systems.

Job Description

Who you are

You have a strong background in data engineering with hands-on experience building and operating large-scale distributed data processing pipelines. Your expertise in lakehouse architectures such as Delta Lake, Iceberg, and Hudi allows you to make informed decisions on when to use them effectively. You are familiar with streaming data systems like Kafka and Flink, and you have experience with Ray, Spark, or ClickHouse at production scale. Your obsessive focus on reliability ensures that the systems you build are robust and do not require late-night interventions.

You thrive in environments where you can design systems that scale to hundreds of petabytes, and you are excited about the opportunity to build data pipelines at a scale that most companies only dream about. Your background in GPU-accelerated data processing, such as RAPIDS or cuDF, is a bonus that enhances your ability to contribute to Exa's ambitious projects.

Desirable

Experience with Lance or other vector-native storage formats is a plus, as is a background in GPU-accelerated data processing. You are eager to tackle complex challenges and are motivated by the impact your work will have on AI applications.

What you'll do

In this role, you will architect and build the data infrastructure that powers Exa's search engine, which serves every AI application. You will design a lakehouse architecture capable of handling over 100 petabytes of web crawl data, ensuring that the data is accessible and reliable for various applications. You will build streaming pipelines that process billions of documents per day for real-time indexing, contributing to the efficiency and effectiveness of the search engine.

You will also be responsible for architecting the data layer for the embedding training infrastructure on Ray, which is crucial for training state-of-the-art embedding models. Your work will involve scaling the ClickHouse deployment to handle analytical queries across petabytes of search logs, ensuring that the system remains performant and responsive.

Collaboration is key in this role, as you will work closely with other engineers to ensure that the data infrastructure meets the needs of the entire team. You will have the autonomy to design systems that not only meet current demands but also anticipate future growth and scalability.

What we offer

Exa provides a dynamic work environment where innovation is encouraged, and your contributions will directly impact the company's success. You will have access to a $5M H200 GPU cluster that regularly lights up tens of thousands of machines, allowing you to work with cutting-edge technology. We are happy to sponsor international candidates, including those on STEM OPT, OPT, H1B, O1, and E3 visas. Join us in building a search engine from scratch and be part of a team that is shaping the future of AI applications.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Exa.

Similar Jobs You Might Like

Based on your interests and this role

OpenAI

Software Engineering

OpenAI📍 San Francisco - On-Site

OpenAI is hiring a Senior Software Engineer to design and build a load balancer for their research inference stack. You'll work with technologies like Java and Python, focusing on distributed systems and performance optimization. This role requires strong experience in building reliable and efficient systems.

🏛️ On-SiteSenior
4 months ago
OpenAI

Software Engineering

OpenAI📍 San Francisco - Hybrid

OpenAI is hiring a Software Engineer for the Sora team to design and scale infrastructure for multimodal training and evaluation. You'll work with distributed data systems and collaborate closely with researchers. This position requires strong experience in building reliable infrastructure.

🏢 HybridMid-Level
3 months ago
Krea

Distributed Systems Engineer

Krea📍 San Francisco

Krea is hiring a Distributed Systems Engineer to design and maintain large-scale distributed infrastructure for AI research and real-time model serving. You'll work with technologies like Kubernetes and Python, and collaborate closely with ML engineers. This position requires experience in distributed systems and cloud deployments.

Mid-Level
6 months ago
Databricks

Software Engineering

Databricks📍 Belgrade - On-Site

Databricks is hiring a Software Engineer to contribute to their Distributed Data Systems team. You'll be involved in the entire development cycle, focusing on building and optimizing data infrastructure. This position requires a strong background in Java and distributed systems.

🏛️ On-SiteMid-Level
1d ago
OpenAI

Machine Learning Engineer

OpenAI📍 San Francisco - Hybrid

OpenAI is hiring a Machine Learning Engineer to design and scale infrastructure for large-scale multimodal training and evaluation. You'll work with distributed data systems and collaborate closely with researchers. This position requires strong experience in building reliable infrastructure.

🏢 HybridMid-Level
2w ago