Job Details

Job Information

AIML - Sr Data Scientist, Evaluation
AWM-6946-AIML - Sr Data Scientist, Evaluation
3/21/2026
3/26/2026
Negotiable
Permanent

Other Information

www.apple.com
Cupertino, CA, 95015, USA
Cupertino
California
United States
95015

Job Description

No Video Available
 

Weekly Hours: 40

Role Number: 200646119-0836

Summary

Do you get excited by assessing LLM applications’ quality and driving the adoption of these applications?

Our Evaluation organization is responsible for providing principled assessments across a diverse range of Apple features, from Search, Siri to the latest Apple Intelligence capabilities. Our team specializes in building LLM-as-judge(i.e. autograder) and related tooling to improve both the quality and efficiency of these evaluations.

We are seeking a principal Data Scientist to own the end-to-end quality analysis of these autograders — from defining rigorous validation frameworks to driving adoption across feature teams. This is a high-impact, high-visibility role at the intersection of data science, AI evaluation, and product quality.

Description

Translate ambiguous quality concerns of the autograders into well-defined, measurable validation targets.
Partner closely with Autograder developers and engineers to build scalable analytic frameworks to measure autograder quality, using both offline eval data and real-world user signals.
Extract meaningful insights from analysis and craft compelling, audience-tailored narratives to drive stakeholder alignment and autograder adoption.
Act as a bridge between the autograder team and feature development teams, leveraging deep domain knowledge to contextualize quality findings.

Minimum Qualifications

  • MS/PhD degree in Statistics, Data Science, Machine Learning, AI, or a related field.

  • 8+ years of experience in analyzing ML/LLM based products.

  • Familiar with image generation or image understanding models.

  • Proficiency in Python and strong foundation in statistical analysis and quantitative modeling.

  • Proven ability to translate ambiguous business or product questions into well-scoped, actionable analysis goals and present complex findings clearly to both techinical and non-technical audience.

Preferred Qualifications

  • Experience in AI or ML model evaluation, quality measurement, or autograder development.

  • Experience working with post-ship user data and applying user behavioral signals to improve upstream model or feature quality.

  • Track record of designing scalable analysis frameworks that can be operationalized across multiple features or product lines.

  • Demonstrated ability to lead initiatives independently, with a strong sense of ownership and execution from ideation to delivery.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) .

Other Details

No Video Available
--

About Organization

 
About Organization