REF Digital Jobs

Staff Site Reliability Engineer (SRE) – Engineering Tools | Bengaluru, India | On-Site

REF Digital

Staff Site Reliability Engineer (SRE) – Engineering Tools | Bengaluru, India | On-Site

Posted 3 Days Ago

Be an Early Applicant

In-Office

Bengaluru, Bengaluru Urban, Karnataka, IND

Senior level

In-Office

Bengaluru, Bengaluru Urban, Karnataka, IND

Senior level

Lead reliability, scalability, and performance of engineering platforms; build automation, IaC, monitoring/observability, and self-healing systems; troubleshoot Linux systems; develop tooling in Python/Golang/Bash; support CI/CD and developer platforms; mentor junior SREs and participate in on-call rotation.

The summary above was generated by AI

Location: Bengaluru, India (On-site mandatory)
Employment Type: Full-time
Industry: AI / Autonomous Systems / Advanced Engineering Infrastructure
Start Date: ASAP

About the Role

We are seeking a Staff Site Reliability Engineer (SRE) – Engineering Tools to support and scale critical engineering platforms that power advanced AI, machine learning, simulation, and autonomous technology development.

This is a senior-level role operating at the intersection of reliability engineering, internal developer platforms, tooling automation, and large-scale infrastructure performance. You will play a key role in ensuring that engineering teams have highly reliable, scalable, and secure systems to accelerate innovation.

Key Responsibilities

Own reliability, scalability, and performance of engineering tools and infrastructure platforms
Design and implement automation frameworks for system deployment and configuration management
Improve observability, monitoring, and self-healing capabilities across engineering environments
Troubleshoot complex Linux-based systems and optimize performance
Develop automation and internal tooling using Python, Golang, or Bash
Implement Infrastructure-as-Code best practices
Strengthen security posture across engineering systems
Partner with cross-functional teams to streamline development workflows
Participate in on-call rotation for critical systems

Required Profile

Strong expertise in Linux systems administration and performance optimization
Experience with distributed systems and large-scale infrastructure environments
Proficiency in Python, Golang, and/or Bash scripting
Hands-on experience with configuration management tools (e.g., Ansible)
Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk, etc.)
Familiarity with container orchestration technologies such as Kubernetes
Experience supporting developer platforms, CI/CD tooling, or internal engineering systems
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)
Significant experience in Site Reliability Engineering, DevOps, or platform engineering (Staff-level seniority)

What Makes This Role Senior / Staff Level

Ownership of mission-critical engineering systems
Architectural input into reliability and scalability strategy
Mentorship of junior SREs and platform engineers
Direct impact on high-scale AI and autonomous development environments

What’s on Offer

Opportunity to work on cutting-edge engineering infrastructure
High-impact role supporting AI and advanced technology platforms
Collaborative, engineering-driven culture
Competitive compensation and long-term career growth

Similar Jobs

Akamai Technologies

Senior Site Reliability Engineer

29 Days Ago

In-Office or Remote

Senior level

Cloud • Security • Software • Cybersecurity

As a Senior Site Reliability Engineer, you will enhance automation and efficiency, troubleshoot complex issues, and improve system reliability and monitoring.

Top Skills: AnsibleAWSAzureDatadogElkGCPGoGrafanaLinuxOpensearchPrometheusPythonSaltstackSplunkTerraform

AxiCorp Financial Services Pty Ltd

Site Reliability Engineer

10 Days Ago

In-Office

Bengaluru, Bengaluru Urban, Karnataka, IND

Senior level

Fintech • Financial Services

The Site Reliability Engineer ensures technology infrastructure's availability, reliability, and operational excellence through monitoring, alerting, log management, and collaboration with technology teams.

Top Skills: Automation And ScriptingAzureCi/CdDatadogKubernetesTerraform

Flexera

Site Reliability Engineer

13 Days Ago

Hybrid

Bangalore, Bengaluru Urban, Karnataka, IND

Entry level

Big Data • Cloud • Information Technology • Software • Business Intelligence • Cybersecurity

As a Site Reliability Engineer, you will automate operations, enhance CI/CD pipelines, ensure system reliability, and collaborate on product development.

Top Skills: Amazon RdsAWSAzureCloudFormationDockerEc2EcsEksGitGoJavaKubernetesLinuxMongoDBMySQLPerlPythonRubyS3Terraform

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.