DigiCert

Site Reliability Engineer - Embedded

Posted 15 Days Ago

Be an Early Applicant

Bangalore, Bengaluru, Karnataka

Mid level

Bangalore, Bengaluru, Karnataka

Mid level

The Site Reliability Engineer enhances software reliability by implementing best practices, automating deployments, and ensuring high system availability across distributed systems.

The summary above was generated by AI

Who we are

We're a leading, global security authority that's disrupting our own category. Our encryption is trusted by the major ecommerce brands, the world's largest companies, the major cloud providers, entire country financial systems, entire internets of things and even down to the little things like surgically embedded pacemakers. We help companies put trust - an abstract idea - to work. That's digital trust for the real world.

Job Summary

The Site Reliability Engineer (SRE) collaborates with development teams to embed reliability, scalability, and performance best practices throughout the software development lifecycle. This role bridges software engineering and cloud operations, ensuring mission-critical systems remain highly available and resilient. By integrating reliability early, the SRE fosters a culture of shared responsibility while enabling rapid and safe feature delivery.

What you will do

Design and build fault-tolerant, high-performing systems that meet Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
Implement monitoring, alerting, distributed tracing, and logging to ensure real-time system health visibility and proactive issue resolution.
Act as a first responder for production incidents, conduct blameless postmortems, and drive root cause analysis (RCA) and corrective actions.
Develop self-healing, automated deployments, and scaling solutions to minimize toil and improve system efficiency.
Improve continuous integration and deployment pipelines to enable safe, rapid, and reliable feature rollouts.
Review code, debug issues, and perform quality assurance (QA) on software components to enhance system reliability and performance.
Work closely with development teams to ensure best practices in software architecture, coding standards, and operational readiness.
Forecast scalability needs and optimize cloud infrastructure costs while balancing performance and efficiency.
Ensure production environments meet security and compliance requirements, collaborating with teams to mitigate vulnerabilities and enforce best practices.
Work closely with development teams to embed reliability at every stage rather than treating it as an afterthought.
Use error budgets to balance feature velocity with system stability.
Implement observability and automation-first principles to measure system health and drive continuous improvement.
Leverage game days, chaos engineering, and resilience testing to validate system robustness and refine operational processes.

What you will have

3-5 years of extensive experience in distributed systems, cloud-native architectures (AWS, GCP, Azure), and DevOps practices.
Proficiency in Kubernetes, Terraform, CI/CD pipelines, and Infrastructure as Code (IaC).
Strong scripting and automation skills in Python, Go, Bash, or similar languages.
Expertise in observability tools such as Prometheus, Grafana, Datadog, Splunk, New Relic, and Open Telemetry.
Ability to troubleshoot complex production issues and drive scalable, resilient solutions.
Experience reviewing code, debugging applications, and conducting software testing to ensure high reliability and quality.

Benefits

Generous time off policies
Top shelf benefits
Education, wellness and lifestyle support

#LI-SD1

Top Skills

AWS

Azure

Bash

Ci/Cd

Datadog

GCP

Grafana

Kubernetes

New Relic

Open Telemetry

Prometheus

Python

Splunk

Terraform

Similar Jobs

DigiCert

Site Reliability Engineer - Embedded

4 Days Ago

Bangalore, Bengaluru, Karnataka, IND

Mid level

Security • Software • Cybersecurity

The Site Reliability Engineer ensures high availability and performance of systems by integrating reliability and best practices throughout development. Responsibilities include designing fault-tolerant systems, monitoring, automating deployments, troubleshooting issues, and collaborating with teams for operational excellence.

Top Skills: AWSAzureBashCi/CdDatadogGCPGoGrafanaInfrastructure As CodeKubernetesNew RelicOpen TelemetryPrometheusPythonSplunkTerraform

CrowdStrike

Engineer III - Cloud SDET

53 Minutes Ago

Remote

Hybrid

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Join CrowdStrike's NGSIEM Data Onboarding Team to develop third-party ingest pipelines, ensuring fault tolerance and scalability in cloud systems while optimizing quality assurance and automated testing.

Top Skills: AWSAzureDockerGCPGoKubernetes

Cargill

SWE Applicants

Yesterday

Bengaluru, Karnataka, IND

Entry level

Food • Greentech • Logistics • Sharing Economy • Transportation • Agriculture • Industrial

As a member of Cargill's Digital Technology & Data team, you will innovate and drive impact in making food systems more sustainable and accessible.

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.

By clicking Apply you agree to share your profile information with the hiring company.