Protege

About Protege

Mentorship for the next generation of talent

🏢 Career Planning👥 201-500📅 Founded 2021📍 Woburn, Massachusetts, United States

Key Highlights

  • Headquartered in Woburn, Massachusetts
  • Team size of 201-500 employees
  • Focus on Music, Venture Capital, Content, Tech, and Entertainment
  • Offers mentorship from industry experts

Protege is a unique platform based in Woburn, Massachusetts, connecting aspiring talent with industry experts in Music, Venture Capital, Content, Tech, and Entertainment. With a team of over 300 professionals, Protege offers mentorship and guidance to help individuals navigate their careers in these...

🎁 Benefits

Protege provides competitive salaries, equity options, flexible PTO, and opportunities for remote work, ensuring a healthy work-life balance for all e...

🌟 Culture

At Protege, the culture emphasizes mentorship and collaboration, fostering an environment where employees can learn directly from industry leaders and...

Protege

Machine Learning Researcher Senior

ProtegeRemote - Remote

Apply Now →

Job Description

Company Overview:

We are building Protege to solve the biggest unmet need in AI — getting access to the right training data. The process today is time intensive, incredibly expensive, and often ends in failure. The Protege platform facilitates the secure, efficient, and privacy-centric exchange of AI training data.

Solving AI’s data problem is a generational opportunity. We’re backed by world-class investors and already powering partnerships with some of the most ambitious teams in AI. The company that succeeds will be one of the largest in AI — and in tech.

We’re a lean, fast-moving, high-trust team of builders who are obsessed with velocity and impact. Our culture is built for people who thrive on ambiguity, own outcomes, and want to shape the future of data and AI.

Role Overview:

Data is the foundation of AI performance, and we believe model quality starts with data quality. You’ll be at the heart of shaping how we curate, assess, and prepare the training data that powers real-world AI systems.

We’re seeking a Senior Member of the Core Data Team/ Principal Scientist to lead the evaluation and optimization of large-scale datasets used to train state-of-the-art AI models. In this role, you’ll help define what "high-quality data" means in practice, using statistical, computational, and ML-driven methods to ensure our data is diverse, representative, and high-impact. You’ll work closely with research and engineering teams to improve model performance through better data. This is an ideal role for someone with a PhD in machine learning, CS, or a related applied field who is passionate about the role of data in AI training and excited to advance Protege’s mission to become the ubiquitous platform for AI training data.

Key Responsibilities:

  • Design and apply statistical and machine learning methods to curate, filter, and enrich large-scale unstructured datasets

  • Develop frameworks to assess data diversity, duplication, and informativeness. Design statistical approaches to de-risk training datasets

  • Collaborate with model training teams to identify data bottlenecks and optimize dataset performance. Emphasis on ability to collaborate with large foundational models and smaller startups

  • Provide leadership on data quality strategy and shape internal best practices

  • Evaluate external datasets for integration, focusing on scalability, quality, and relevance to model performance. Help build data scorecards

  • Contribute to research and development of tools that automate data preprocessing and validation

About You:

  • PhD or equivalent Master's Degree + 4+ years industry experience in machine learning, economics, mathematics, engineering, computer science, statistics, or a related quantitative field

  • Strong understanding of AI model training pipelines, including pre-processing and evaluation

  • Experience working with large, unstructured datasets, especially text

  • Background in statistical analysis, bias detection, and data validation

  • Able to identify high-impact problems and drive independent solutions

Bonus if you have these attributes:

  • Experience with synthetic data generation or augmentation strategies

  • Publications or open-source contributions in data-centric AI or related areas

  • Experience developing evaluation frameworks or performance metrics for training data

  • Cross-functional collaboration with product, infrastructure, or partnership teams

Interested in this role?

Apply now or save it for later. Get alerts for similar jobs at Protege.

Similar Jobs You Might Like

Based on your interests and this role

Matterworks, Inc.

Machine Learning Scientist

Matterworks, Inc.📍 Remote - Remote

Matterworks, Inc. is hiring a Senior Machine Learning Scientist to design and optimize deep learning models for biological data analysis. You'll work with technologies like PyTorch and NumPy, driving projects from experimentation to deployment.

🏠 RemoteSenior
1 month ago
AKASA

Machine Learning Researcher

AKASA📍 United States - Remote

AKASA is hiring a Senior Machine Learning Researcher to develop generative AI solutions for the healthcare revenue cycle. You'll work with Python, TensorFlow, and PyTorch to enhance clinical quality and documentation accuracy. This position requires significant experience in machine learning and data analysis.

🏠 RemoteSenior
2 months ago
Matterworks, Inc.

Machine Learning Engineer

Matterworks, Inc.📍 Remote - Remote

Matterworks, Inc. is hiring a Staff Machine Learning Scientist to design and build core deep learning models for biological data analysis. You'll work with PyTorch and NumPy to create impactful neural network architectures. This position requires a deep understanding of deep learning principles.

🏠 RemoteStaff
7 months ago
Corelight

Machine Learning Engineer

Corelight📍 Remote - Remote

Corelight is hiring a Senior Machine Learning Engineer to drive innovative threat detection in cybersecurity. You'll leverage your expertise in machine learning and data science to enhance network security solutions. This role requires strong analytical skills and experience in cybersecurity.

🏠 RemoteSenior
2w ago
Upwork

Machine Learning Engineer

Upwork📍 Remote - Remote

Upwork is seeking a Senior Lead Machine Learning Engineer to shape the future of AI-powered intelligence. You'll lead the design and delivery of scalable ML systems and collaborate across teams. This role requires expertise in machine learning and AI technologies.

🏠 RemoteSenior
1w ago