About Reflection

Unlocking knowledge with AI for smarter organizations

🏢 Tech👥 11-50📍 Brooklyn, New York, United States

Key Highlights

Headquartered in Brooklyn, New York
AI-powered platform using natural language processing
Focused on eliminating information silos
Team size of 11-50 employees

ReflectionAI, headquartered in Brooklyn, New York, provides an AI-driven knowledge management platform that leverages natural language processing to transform unstructured information from meetings, documents, and conversations into a searchable knowledge base. With a focus on enhancing productivity...

🎁 Benefits

Employees at ReflectionAI enjoy competitive salaries, equity options, flexible remote work policies, and generous PTO to maintain a healthy work-life ...

🌟 Culture

ReflectionAI fosters a culture of innovation and collaboration, encouraging employees to contribute ideas and solutions while prioritizing work-life b...

🌐 Website 💼 LinkedIn 𝕏 Twitter All 25 jobs →

Data Engineer • Mid-Level

Reflection • San Francisco - On-Site

Posted 1 month ago🏛️ On-Site Mid-Level Data Engineer 📍 San Francisco

Apply Now →

Skills & Technologies

apache airflow aws python sql

Overview

Reflection is seeking a Member of Technical Staff - Data Ingestion Engineer to build and operate large-scale data ingestion systems. You'll work with technologies like Apache, Airflow, and AWS to enhance data quality for AI models. This role requires experience in data engineering and distributed systems.

Job Description

Who you are

You have a strong background in data engineering, with experience building and operating large-scale data ingestion systems. Your expertise in Python and SQL allows you to manipulate and analyze data effectively, ensuring high-quality datasets for AI training. You are familiar with tools like Apache and Airflow, which you have used to streamline data workflows and improve efficiency. You thrive in collaborative environments, working closely with researchers and engineers to understand data needs and optimize ingestion processes. You are comfortable running experiments to evaluate different data acquisition strategies and are adept at analyzing results to drive improvements. You have a keen eye for detail, identifying gaps and redundancies in ingested data to enhance overall data quality.

Desirable

Experience with cloud platforms such as AWS is a plus, as it enables you to leverage scalable infrastructure for data processing. Familiarity with distributed systems and web crawling techniques will further enhance your ability to build robust ingestion pipelines. You are open to learning new technologies and methodologies, continuously seeking ways to improve your skills and contribute to the team's success.

What you'll do

In this role, you will be responsible for building and operating the ingestion systems that transform large-scale data sources into structured datasets for AI model training. You will work on web crawling, data extraction, and dataset delivery, ensuring that the data collected is reliable and well-structured. You will collaborate with the pre-training and data quality teams to close the loop between data collection and model performance, iterating quickly based on measurable impact. Your work will involve running experiments to evaluate different crawling strategies and extraction methods, analyzing the ingested data to identify areas for improvement. You will also be tasked with maintaining and optimizing existing ingestion systems, ensuring they operate efficiently and effectively.

What we offer

At Reflection, we provide a supportive and inclusive work environment where you can thrive. We offer competitive compensation and benefits, including fully paid parental leave and financial support for family planning. Our team enjoys a healthy work-life balance with generous paid time off and relocation support. You will have opportunities to connect with teammates through daily lunches and regular team celebrations, fostering a strong sense of community within the company. Join us in our mission to build open superintelligence and make it accessible to all.

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Reflection.

Apply Now →Get Job Alerts

✨

Similar Jobs You Might Like

Based on your interests and this role

Data Engineer

Mithrl•📍 San Francisco - On-Site

Mithrl is hiring a Data Engineer to build and own an AI-powered ingestion and normalization pipeline for scientific data. You'll work with Python, SQL, and Airflow to transform messy biological data into clean datasets. This role requires experience in data engineering and cloud technologies.

🏛️ On-SiteMid-Level

1 month ago

Data Quality Engineer

Reflection•📍 San Francisco - On-Site

Reflection is hiring a Data Quality Engineer to ensure high standards for data used in AI model training and evaluation. You'll work closely with research teams to operationalize quality standards and improve model performance. This role requires a strong engineering background and a deep curiosity about data quality.

🏛️ On-SiteMid-Level

1 month ago

Data Quality Engineer

Reflection•📍 San Francisco - On-Site

Reflection is seeking a Data Quality Engineer to ensure high standards for data used in training AI models. You'll collaborate with research teams to establish measurable quality signals. This role requires strong engineering fundamentals and a curiosity about data quality.

🏛️ On-SiteMid-Level

1 month ago

Software Engineering

Reflection•📍 San Francisco - On-Site

Reflection is hiring a Member of Technical Staff - Software Engineer to build core software systems and tools for AI research and production. You'll work with technologies like Python, Java, and C++ in San Francisco.

🏛️ On-SiteMid-Level

4 months ago

Data Engineer

Cohere•📍 Toronto - Hybrid

Cohere is hiring a Member of Technical Staff specializing in Data Engineering to develop data pipelines for advanced language models. You'll work with technologies like Airflow and Apache Spark, focusing on data ingestion and optimization. This role requires experience in data management and engineering.

🏢 HybridMid-Level

2 months ago

Browse all jobs →