REF Digital Logo

REF Digital

Staff Site Reliability Engineer (SRE) – Engineering Tools | Bengaluru, India | On-Site

Posted 3 Days Ago
Be an Early Applicant
In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND
Senior level
In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND
Senior level
Lead reliability, scalability, and performance of engineering platforms; build automation, IaC, monitoring/observability, and self-healing systems; troubleshoot Linux systems; develop tooling in Python/Golang/Bash; support CI/CD and developer platforms; mentor junior SREs and participate in on-call rotation.
The summary above was generated by AI

Location: Bengaluru, India (On-site mandatory)
Employment Type: Full-time
Industry: AI / Autonomous Systems / Advanced Engineering Infrastructure
Start Date: ASAP

About the Role

We are seeking a Staff Site Reliability Engineer (SRE) – Engineering Tools to support and scale critical engineering platforms that power advanced AI, machine learning, simulation, and autonomous technology development.

This is a senior-level role operating at the intersection of reliability engineering, internal developer platforms, tooling automation, and large-scale infrastructure performance. You will play a key role in ensuring that engineering teams have highly reliable, scalable, and secure systems to accelerate innovation.

Key Responsibilities
  • Own reliability, scalability, and performance of engineering tools and infrastructure platforms

  • Design and implement automation frameworks for system deployment and configuration management

  • Improve observability, monitoring, and self-healing capabilities across engineering environments

  • Troubleshoot complex Linux-based systems and optimize performance

  • Develop automation and internal tooling using Python, Golang, or Bash

  • Implement Infrastructure-as-Code best practices

  • Strengthen security posture across engineering systems

  • Partner with cross-functional teams to streamline development workflows

  • Participate in on-call rotation for critical systems

Required Profile
  • Strong expertise in Linux systems administration and performance optimization

  • Experience with distributed systems and large-scale infrastructure environments

  • Proficiency in Python, Golang, and/or Bash scripting

  • Hands-on experience with configuration management tools (e.g., Ansible)

  • Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk, etc.)

  • Familiarity with container orchestration technologies such as Kubernetes

  • Experience supporting developer platforms, CI/CD tooling, or internal engineering systems

  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)

  • Significant experience in Site Reliability Engineering, DevOps, or platform engineering (Staff-level seniority)

What Makes This Role Senior / Staff Level
  • Ownership of mission-critical engineering systems

  • Architectural input into reliability and scalability strategy

  • Mentorship of junior SREs and platform engineers

  • Direct impact on high-scale AI and autonomous development environments

What’s on Offer
  • Opportunity to work on cutting-edge engineering infrastructure

  • High-impact role supporting AI and advanced technology platforms

  • Collaborative, engineering-driven culture

  • Competitive compensation and long-term career growth

Similar Jobs

29 Days Ago
In-Office or Remote
2 Locations
Senior level
Senior level
Cloud • Security • Software • Cybersecurity
As a Senior Site Reliability Engineer, you will enhance automation and efficiency, troubleshoot complex issues, and improve system reliability and monitoring.
Top Skills: AnsibleAWSAzureDatadogElkGCPGoGrafanaLinuxOpensearchPrometheusPythonSaltstackSplunkTerraform
10 Days Ago
In-Office
Bengaluru, Bengaluru Urban, Karnataka, IND
Senior level
Senior level
Fintech • Financial Services
The Site Reliability Engineer ensures technology infrastructure's availability, reliability, and operational excellence through monitoring, alerting, log management, and collaboration with technology teams.
Top Skills: Automation And ScriptingAzureCi/CdDatadogKubernetesTerraform
13 Days Ago
Hybrid
Bangalore, Bengaluru Urban, Karnataka, IND
Entry level
Entry level
Big Data • Cloud • Information Technology • Software • Business Intelligence • Cybersecurity
As a Site Reliability Engineer, you will automate operations, enhance CI/CD pipelines, ensure system reliability, and collaborate on product development.
Top Skills: Amazon RdsAWSAzureCloudFormationDockerEc2EcsEksGitGoJavaKubernetesLinuxMongoDBMySQLPerlPythonRubyS3Terraform

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account