Job Details

Back to Search

Job Information

Job Title :

AIML - Sr Data Scientist, Evaluation

Job Code :

AWM-6946-AIML - Sr Data Scientist, Evaluation

Job Announced :

3/21/2026

Job Closed :

3/26/2026

Pay Rate:

Negotiable

Duration:

Permanent

Other Information

Organization Name:

Apple

Organization Url:

www.apple.com

Address :

Cupertino, CA, 95015, USA

City :

Cupertino

State :

California

Country :

United States

Zip Code :

95015

Job Description

Weekly Hours: 40

Role Number: 200646119-0836

Summary

Do you get excited by assessing LLM applications’ quality and driving the adoption of these applications?

Our Evaluation organization is responsible for providing principled assessments across a diverse range of Apple features, from Search, Siri to the latest Apple Intelligence capabilities. Our team specializes in building LLM-as-judge(i.e. autograder) and related tooling to improve both the quality and efficiency of these evaluations.

We are seeking a principal Data Scientist to own the end-to-end quality analysis of these autograders — from defining rigorous validation frameworks to driving adoption across feature teams. This is a high-impact, high-visibility role at the intersection of data science, AI evaluation, and product quality.

Description

Translate ambiguous quality concerns of the autograders into well-defined, measurable validation targets.
Partner closely with Autograder developers and engineers to build scalable analytic frameworks to measure autograder quality, using both offline eval data and real-world user signals.
Extract meaningful insights from analysis and craft compelling, audience-tailored narratives to drive stakeholder alignment and autograder adoption.
Act as a bridge between the autograder team and feature development teams, leveraging deep domain knowledge to contextualize quality findings.

Minimum Qualifications

MS/PhD degree in Statistics, Data Science, Machine Learning, AI, or a related field.
8+ years of experience in analyzing ML/LLM based products.
Familiar with image generation or image understanding models.
Proficiency in Python and strong foundation in statistical analysis and quantitative modeling.
Proven ability to translate ambiguous business or product questions into well-scoped, actionable analysis goals and present complex findings clearly to both techinical and non-technical audience.

Preferred Qualifications

Experience in AI or ML model evaluation, quality measurement, or autograder development.
Experience working with post-ship user data and applying user behavioral signals to improve upstream model or feature quality.
Track record of designing scalable analysis frameworks that can be operationalized across multiple features or product lines.
Demonstrated ability to lead initiatives independently, with a strong sense of ownership and execution from ideation to delivery.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) .

Other Details

About Organization

Other Jobs

View other jobs from this employer

Apply Back