The Director of Cloud Operations leads SRE and CloudOps for PCS, focusing on automation, governance, and high reliability, while managing a high-performing team to meet SLAs.
Job Description SummaryThe Director – Cloud Operations provides leadership, innovation, and oversight for SRE and CloudOps across PCS. The role establishes the operating foundations, metrics, and automation needed to run mission‑critical, greenfield applications with high reliability and security, and is accountable for meeting product SLAs while scaling Cloud Operations and institutionalizing modern SRE practices in close partnership with product, platform, and security teams.Job Description
Essential Responsibilities
Serve as the functional leader for the PCS Digital Cloud Operations team. Define the operating model, governance, and KPIs; drive automation and observability; and ensure secure, reliable deployments across environments with continuous improvement and tight collaboration with security. This role reports to the VP of Engineering – PCS Apps & Platform. Key responsibilities include:
- Own Cloud Operations for PCS cloud applications; stand up and scale CloudOps capabilities to support multiple products while adhering to committed SLAs.
- Institutionalize SRE practices: implement SLI/SLO/SLA frameworks, error budgets, incident/post‑mortem processes, and reliability runbooks; champion automation to reduce toil and improve service health and monitoring.
- Build end‑to‑end observability (APM/RUM, logs, metrics, traces, health dashboards, proactive alerting) and evolve toward auto‑healing and AIOps for anomaly detection and closed‑loop remediation.
- Drive change, incident, and problem management with clear RACI and stakeholder communications; reduce MTTR through streamlined L1–L4 escalation.
- Establish and test DR/BCP posture; conduct AWS Well‑Architected and operational readiness reviews for services (AWS‑first, with multi‑cloud considerations as needed).
- Lead FinOps practices: cost allocation and accountability, right‑sizing, savings plans/reserved instances, spend governance, and unit‑economics optimization.
- Evolve the operating model in partnership with platform and application teams; standardize CI/CD templates and “everything‑as‑code” for speed and repeatability.
- Build and develop a high‑performing team: hire, coach, and grow CloudOps/SRE talent and the next set of leaders; uphold high standards for quality and customer satisfaction.
Core KPIs & outcome metrics:
- Service availability versus SLA/SLO and error‑budget burn rate.
- MTTD/MTTR and incident recurrence; % incidents with post‑mortems completed.
- Change failure rate and lead time for changes for production deployments.
- % automated runbooks/toil reduction; % services with complete SLI/SLO coverage.
Basic Qualifications
- Bachelor’s degree in computer science or a STEM field.
- A minimum of 10 years experience in leading technical teams in complex, fast‑paced environments, including 5+ years of in Cloud Ops and SRE leadership roles
- Proven expertise in the areas of DevSecOps, Day‑2 Ops, APM/RUM, and Cloud Operations.
- Proficiency building and operating services on public cloud (AWS‑first) with CI/CD and Infrastructure‑as‑Code (e.g., Terraform/CloudFormation).
- Track record establishing SLIs/SLOs/SLAs, observability, and incident/change management at scale.
- Strong leadership and team management skills, with the ability to inspire and motivate a team of engineers.
- Excellent project management skills, with the ability to manage multiple complex projects simultaneously.
- In-depth knowledge of SaaS technologies, cloud computing, and medical device development processes.
Desired Characteristics
Technical competencies:
- Experience scaling CloudOps/SRE for multiple products and customer deployments.
- Deep fluency in SLI/SLO/SLA design, error budgets, runbooks, and auto‑healing patterns.
- Strong AWS architecture and operations; Well‑Architected reviews; capacity and cost optimization (FinOps).
- Modern observability (APM/RUM/logs/metrics/traces) and AIOps for predictive analytics/anomaly detection.
- Security by design (DevSecOps, policy‑as‑code) and DR/BCP planning/testing.
Leadership competencies:
- Clear, decisive communicator able to influence across product, platform, and security stakeholders.
- Builder‑coach mindset: hire, mentor, and grow managers and ICs; create leaders of leaders.
- Change agent who challenges the status quo while maintaining high standards for quality and customer satisfaction.
- Operates with ownership, bias for action, and strong judgment in an ambiguous, high‑growth environment.
Top 5 Critical Competencies & Skills
- SRE & Reliability Leadership — SLI/SLO/SLA management, error budgets, disciplined post‑mortems.
- Cloud Operations at Scale (AWS‑first) — operational readiness, DR/BCP, change/incident/problem management, and Well‑Architected operations. Observability & AIOps — end‑to‑end telemetry, APM/RUM, automated remediation to reduce MTTR and toil.
- DevSecOps & Policy‑as‑Code — secure‑by‑default pipelines and vulnerability management with measurable SLAs.
- FinOps & Cost Governance — cost allocation, right‑sizing, and spend optimization to improve unit economics while scaling.
Relocation Assistance Provided: No
Top Skills
Apm
AWS
Ci/Cd
CloudFormation
Infrastructure-As-Code
Rum
Terraform
Similar Jobs
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
The Specialist Generative AI will develop and integrate generative AI solutions into enterprise applications, ensuring compliance with IT security standards and optimizing AI performance.
Top Skills:
Ai Agent FrameworksAWSAzureC#Ci/CdDatabricksDockerGenerative AiKubernetesMachine LearningMlflowPython
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
The Operations Administrator will handle administrative activities, manage infrastructure topics, oversee PLM ticket management, and ensure seamless communication across departments, enhancing operational efficiency.
Top Skills:
DmsSAP
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Develop computer vision algorithms for automotive Drive Assistance Systems, including designing, testing, and debugging software for embedded systems.
Top Skills:
AspiceC++CmmiMatlabSimulink
What you need to know about the Bengaluru Tech Scene
Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.

