Job Details
Job Information
Other Information
Job Description
Weekly Hours: 40
Role Number: 200661658-3760
Summary
We are a group of engineers to support training foundation models at Apple! We build infrastructure to support training foundation models with general capabilities such as understanding and generation of text, images, speech, videos, and other modalities and apply these models to Apple products. We are looking for engineers who are passionate about building systems that push the frontier of deep learning in terms of scaling, efficiency, and flexibility and delight millions of users in Apple products.
Description
We are looking for a ML Engineer to join our ML Compute team to help improve the efficiency, scalability, and reliability of model training and inference workloads in the cloud. In this role, you will lead the integration of large-scale ML workloads with cloud infrastructure, working cross-functionally with ML engineers, infrastructure engineers, and researchers to optimize performance, improve system efficiency, and drive high utilization of accelerator resources.
Minimum Qualifications
5+ years of experience in software engineering, ML infrastructure, or related domains.
Hands-on experience with machine learning workflows, including training, evaluation, and inference at scale.
Proficiency in Python and experience with at least one major ML framework (e.g., PyTorch or JAX).
Experience with cloud-based infrastructure and distributed systems (e.g., containers, orchestration, storage, and networking).
Bachelor’s degree in Computer Science, Engineering, or a related field.
Preferred Qualifications
Experience working with accelerator-based systems (e.g., GPUs/TPUs), including performance tuning and debugging of ML workloads.
Hands-on experience with distributed training or inference at scale (e.g., data, model, or pipeline parallelism).
Experience optimizing large-scale ML systems, including bottleneck analysis across compute, memory, and networking.
Familiarity with profiling, tracing, and benchmarking tools for ML workloads (e.g., PyTorch Profiler, NVIDIA Nsight).
Experience building or operating ML infrastructure using containerization and orchestration frameworks (e.g., Docker, Kubernetes).
Advanced degree in Computer Science, Engineering, or a related field.
Other Details

