at Google
Location
Sunnyvale, CA, USA
Compensation
$262k–$365k USD
Type
full time
Posted
4 days ago
Market range · company + function + seniority
p25 · target · p75 · n=403
Posted $365k · well above market
Tailor your résumé to this role in 30 seconds.
Free account · ATS keyword check · per-job bullet rewrite by Claude.
Google Cloud’s mission is to make every business successful through AI by combining cutting-edge technology, infrastructure, and talent. AI/ML software engineers in Cloud bridge the gap between pioneering models and a massive product vehicle reaching billions. Our talent density and AI-powered tools drive rapid development, rooted in a culture of empowerment and a bias to action. In this role, you aren’t just building technology; you’re shaping the frontier of enterprise and driving the evolution of advanced models.
As a Staff Technical Lead, you will own and drive the end-to-end reliability, availability, and serviceability (RAS) for a groundbreaking, next-generation AI accelerator system. This is a unique opportunity for you to lead the reliability engineering efforts for a complex, large-scale hardware/software co-designed platform that will power future critical AI workloads across Google. You will be responsible for defining the reliability strategy, establishing best practices, and influencing a large cross-functional team of hardware, software, and silicon engineers to ensure this new system meets Google's stringent production standards. Your leadership will be instrumental in delivering a robust, resilient, and maintainable platform from concept through to full-scale deployment.
The AI and Infrastructure team is redefining what’s possible. We empower Google customers with breakthrough capabilities and insights by delivering AI and Infrastructure at unparalleled scale, efficiency, reliability and velocity. Our customers include Googlers, Google Cloud customers, and billions of Google users worldwide.
We're the driving team behind Google's groundbreaking innovations, empowering the development of our cutting-edge AI models, delivering unparalleled computing power to global services, and providing the essential platforms that enable developers to build the future. From software to hardware our teams are shaping the future of world-leading hyperscale computing, with key teams working on the development of our TPUs, Vertex AI for Google Cloud, Google Global Networking, Data Center operations, systems research, and much more.
Individual pay is determined by factors including job-related skills, experience, and relevant education or training.More open roles at Google
Hiring velocity, headcount trend, and every open posting on one page.
Open postings ranked by description similarity — useful if this role isn't quite right.