[Remote] Senior Research Scientist, Model Evaluation

Remote Full-time

Note: The job is a remote job and is open to candidates in USA. Cohere is a company dedicated to scaling intelligence to serve humanity by training and deploying frontier models for AI systems. The Senior Research Scientist, Model Evaluation will be responsible for creating next-generation evaluation methods and infrastructure to measure large language model (LLM) progress, working cross-functionally to improve evaluation techniques. Responsibilities • Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish. • Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations. • Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges; refining LLM-based data synthesis pipelines; and improving evaluation efficiency. • Build scalable and reusable tools for digging into model performance. Skills • Create ambitious new evaluation benchmarks that push the limits of what our models can accomplish. • Work on highly cross-functional teams to translate model feedback into trustworthy, repeatable evaluations. • Conduct research to advance the state-of-the-art in LLM evaluation methods, including training LLM judges; refining LLM-based data synthesis pipelines; and improving evaluation efficiency. • Build scalable and reusable tools for digging into model performance. • You enjoy rapidly building prototypes that demonstrate the boundaries of what LLMs are capable of, and you have developed resources to measure those capabilities. • You have spent dozens of hours reviewing complex data and LLM outputs to ensure high data quality. • You are obsessive about rigorously measuring AI capabilities, and also about making sure your measurements actually align with the capabilities you care about. • You have strong software engineering skills. Benefits • An open and inclusive culture and work environment • Work closely with a team on the cutting edge of AI research • Weekly lunch stipend, in-office lunches & snacks • Full health and dental benefits, including a separate budget to take care of your mental health • 100% Parental Leave top-up for up to 6 months • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend • 6 weeks of vacation (30 working days!) Company Overview • Cohere is an enterprise AI firm developing secure and private AI technology to address real-world business challenges. It was founded in 2019, and is headquartered in Toronto, Ontario, CAN, with a workforce of 201-500 employees. Its website is Company H1B Sponsorship • Cohere has a track record of offering H1B sponsorships, with 9 in 2025, 14 in 2024, 13 in 2023, 5 in 2022, 2 in 2021. Please note that this does not guarantee sponsorship for this specific role. Apply tot his job

Apply Now

Experienced Financial Planning and Analysis Senior Manager – Global Sourcing, Supply Chain, and Sustainability Leadership

Remote

[Remote] Senior Research Scientist, Model Evaluation

Similar Opportunities

[Remote] (JRFP)-Fellow- Junior Researcher (Code-EU2076)

Research Analyst: Focus is Social Media/Online Safety(6411)

Remote Hospice Triage RN (weekday eve -no Fridays + every-other-weekend) in Tampa, FL

Regisitered Nurse Intake Specialist- Remote

Primary Care Nurse RN or LPN - Hybrid Remote, NW Expressway

NICU Case Manager, RN - Remote in WA

Virtual RN Hybrid- 1 Day Onsite VRN, 2 Days Olive Branch Med Surg

Registered Nurse (Minimum Data Set Facilitator) in Tacoma, WA

Patient Care Supervisor – Non RN (Virtual Care Team) – 1.0 FTE in Minneapolis, MN

Methodist Health System – Triage Nurse RN – Council Bluffs, IA

Software Engineer, Android Core Product - Almaty, Kazakhstan

Coordinator, Community Management

Cyber Security Defense Analyst – (Entry Level) Jobs – bolthires Store

Privacy Officer

Senior ASP.NET Developer – Amazon Store

[Remote] Sales Development Representative

Experienced Financial Planning and Analysis Senior Manager – Global Sourcing, Supply Chain, and Sustainability Leadership

Experienced Female Remote Customer Service Representative – Empowering Women in the Pet Industry

Financial Controller | Remote | Los Angeles - US Working Hours | Russian & English Speaker

Remote Crisis Chat/Text Licensed Supervisor (LPC, LCSW, LMFT)

[Remote] Senior Research Scientist, Model Evaluation

Similar Opportunities

[Remote] (JRFP)-Fellow- Junior Researcher (Code-EU2076)

Research Analyst: Focus is Social Media/Online Safety(6411)

Remote Hospice Triage RN (weekday eve -no Fridays + every-other-weekend) in Tampa, FL

Regisitered Nurse Intake Specialist- Remote

Primary Care Nurse RN or LPN - Hybrid Remote, NW Expressway

NICU Case Manager, RN - Remote in WA

Virtual RN Hybrid- 1 Day Onsite VRN, 2 Days Olive Branch Med Surg

Registered Nurse (Minimum Data Set Facilitator) in Tacoma, WA

Patient Care Supervisor – Non RN (Virtual Care Team) – 1.0 FTE in Minneapolis, MN

Methodist Health System – Triage Nurse RN – Council Bluffs, IA

Software Engineer, Android Core Product - Almaty, Kazakhstan

Coordinator, Community Management

Cyber Security Defense Analyst – (Entry Level) Jobs – bolthires Store

Privacy Officer

Senior ASP.NET Developer – Amazon Store

[Remote] Sales Development Representative

Experienced Financial Planning and Analysis Senior Manager – Global Sourcing, Supply Chain, and Sustainability Leadership

**Experienced Female Remote Customer Service Representative – Empowering Women in the Pet Industry**

Financial Controller | Remote | Los Angeles - US Working Hours | Russian & English Speaker

Remote Crisis Chat/Text Licensed Supervisor (LPC, LCSW, LMFT)

Experienced Female Remote Customer Service Representative – Empowering Women in the Pet Industry