Technical Program Manager II, Infrastructure Incident Readiness, Google Cloud
Company: Google
Location: Atlanta
Posted on: April 1, 2026
|
|
|
Job Description:
info_outline X Note: By applying to this position you will have
an opportunity to share your preferred working location from the
following: Atlanta, GA, USA; Council Bluffs, IA, USA; Ridgeville,
SC, USA; Clarksville, TN, USA; Columbus, OH, USA; New Albany, OH,
USA; Midlothian, TX, USA; Red Oak, TX, USA; The Dalles, OR, USA;
Fort Wayne, IN, USA; Reston, VA, USA; Las Vegas, Nevada, USA;
Lincoln, NE, USA; Kansas City, MO, USA; Moncks Corner, SC 29461,
USA; Phoenix, AZ, USA; Pryor Creek, OK 74361, USA; Reno, NV, USA;
Bridgeport, AL, USA . Minimum qualifications: Bachelor's degree in
a technical field, or equivalent practical experience. 2 years of
experience in program management. Experience in data center power
and cooling infrastructure incident readiness. Experience with root
cause analysis. Ability to travel up to 30% of the time. Preferred
qualifications: Bachelor's degree in Mechanical or Electrical
Engineering. 2 years of experience managing cross-functional or
cross-team projects. Experience in data center operations or
similar mission-critical experience. Experience working
cross-functionally with technical and non-technical teams. Ability
to demonstrate enthusiasm for teaching/leading multi-day training
events. Excellent public speaking skills. About the job A problem
isn’t truly solved until it’s solved for all. That’s why Googlers
build products that help create opportunities for everyone, whether
down the street or across the globe. As a Technical Program Manager
at Google, you’ll use your technical expertise to lead
multi-disciplinary projects from start to finish. You’ll work with
stakeholders to plan requirements, identify risks, manage project
schedules, and communicate clearly with cross-functional partners
across the company. You're equally comfortable explaining your
team's analyses and recommendations to executives as you are
discussing the technical tradeoffs in product development with
engineers. As the Infrastructure Incident Readiness Program Lead,
you will own the intersection of the high-density AI compute
workload and facility operations and readiness programs, ensuring
Data Center Operations teams possess the tools, processes, and
training required to produce executive-grade incident readiness
before, during and after critical events. You will drive the
end-to-end incident life-cycle by establishing an investigation
framework that yields high-quality root cause analysis and
identifies realistic, impactful corrective actions. In this role,
you will require deep infrastructure knowledge and close
collaboration with mechanical, electrical, and controls engineers
to ask the right questions and by leveraging this technical
expertise, you will develop and administer programmatic playbooks
for operations and manage high-fidelity dashboards and disruption
tracking and reporting. You will partner with other team members,
and partner teams, to create frictionless workflows, minimize
incident recurrence and drive operational excellence without adding
overhead to site staff.Google Cloud accelerates every
organization’s ability to digitally transform its business and
industry. We deliver enterprise-grade solutions that leverage
Google’s technology, and tools that help developers build more
sustainably. Customers in more than 200 countries and territories
turn to Google Cloud as their trusted partner to enable growth and
solve their most critical business problems. The US base salary
range for this full-time position is $138,000-$198,000 bonus equity
benefits. Our salary ranges are determined by role, level, and
location. Within the range, individual pay is determined by work
location and additional factors, including job-related skills,
experience, and relevant education or training. Your recruiter can
share more about the specific salary range for your preferred
location during the hiring process. Please note that the
compensation details listed in US role postings reflect the base
salary only, and do not include bonus, equity, or benefits. Learn
more about benefits at Google . Responsibilities Own the end-to-end
incident management process, from real-time response and
investigation to the execution of scalable root cause analysis
findings and corrective actions. Track disruptions, shape monthly
Key Performance Indicator (KPIs) for densified capacity, facilitate
the development of automated dashboarding, and maintain
high-fidelity metrics for uptime. Develop and maintain standard
operating procedures for high density infrastructure operations
(including critical work procedures and change management
processes), ensuring cross-functional alignment with existing data
center operations workflows. Serve as a primary data center
operations liaison for densified functional systems, collaborating
with regional leads and stakeholders to streamline processes and
ensure no additional burden is placed on local site personnel.
Provide expert-level guidance to facility technicians and facility
managers on system debugging and complex queries related to high
density infrastructure.
Keywords: Google, Atlanta , Technical Program Manager II, Infrastructure Incident Readiness, Google Cloud, Science, Research & Development , Atlanta, Georgia