Job Details
Job Information
Other Information
Job Description
Weekly Hours: 40
Role Number: 200637211-3337
Summary
Do you get excited by building AI applications to enhance the evaluation of various Apple AI products? Our Evaluation organization is responsible for providing principled assessments across a diverse range of Apple features, from Search and Siri to the latest Apple Intelligence capabilities.
Within this critical function, our team specializes in leveraging advanced AI/ML techniques to enhance both the quality and efficiency of these comprehensive evaluations.
We are seeking a highly innovative and passionate Applied AI Scientist to develop cutting-edge AI/ML models for the automatic grading and quality assessment of our internal GenAI products.
Description
In this pivotal role, you will design and advance state-of-the-art autograder systems that evaluate various AI product quality at scale.
You will apply deep expertise in prompt engineering, foundation model adaptation, and evaluation methodology to build robust, trustworthy, and extensible autograders that assess product performance, user experience, and adherence to quality and safety standards.
Then you will collaborate with product, annotation, evaluation data scientists, autograder tooling engineers to deploy the state-of-the-art autograders, directly impacting the quality and success of Apple’s next-generation AI-powered features.
Minimum Qualifications
Extensive experience with prompting techniques.
Deep understanding of GenAI models and 1+ year of industry experience in building or evaluation GenAI models.
Familiarity with LLMOps processes for deploying, monitoring and hillclimbing AI models in production environments.
Excellent analytical skills and judgement, capable of assessing data quality, diagnosing autograder limitations or biases, synthesizing findings into actionable insights, and communicating them clearly across teams.
Ownership mindset with the flexibility to take on whatever tasks—including annotation or operational work—are necessary to deliver results.
Preferred Qualifications
Experience in developing AI models specifically for quality assessment or automated feedback generation.
Familiarity with human annotation operations, sampling strategies, or subjective judgment evaluation.
Experience in designing human-in-the-loop evaluation workflows.
Familiar with image quality evaluation.
Demonstrated passion for leveraging AI to improve work efficiency and scale.
Apple is an equal opportunity employer that is committed to inclusion and diversity. We seek to promote equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant (https://www.eeoc.gov/sites/default/files/2023-06/22-088_EEOC_KnowYourRights6.12ScreenRdr.pdf) .
Other Details

