REF Digital
Staff Site Reliability Engineer (SRE) – Engineering Tools | Bengaluru, India | On-Site
Location: Bengaluru, India (On-site mandatory)
Employment Type: Full-time
Industry: AI / Autonomous Systems / Advanced Engineering Infrastructure
Start Date: ASAP
We are seeking a Staff Site Reliability Engineer (SRE) – Engineering Tools to support and scale critical engineering platforms that power advanced AI, machine learning, simulation, and autonomous technology development.
This is a senior-level role operating at the intersection of reliability engineering, internal developer platforms, tooling automation, and large-scale infrastructure performance. You will play a key role in ensuring that engineering teams have highly reliable, scalable, and secure systems to accelerate innovation.
Key ResponsibilitiesOwn reliability, scalability, and performance of engineering tools and infrastructure platforms
Design and implement automation frameworks for system deployment and configuration management
Improve observability, monitoring, and self-healing capabilities across engineering environments
Troubleshoot complex Linux-based systems and optimize performance
Develop automation and internal tooling using Python, Golang, or Bash
Implement Infrastructure-as-Code best practices
Strengthen security posture across engineering systems
Partner with cross-functional teams to streamline development workflows
Participate in on-call rotation for critical systems
Strong expertise in Linux systems administration and performance optimization
Experience with distributed systems and large-scale infrastructure environments
Proficiency in Python, Golang, and/or Bash scripting
Hands-on experience with configuration management tools (e.g., Ansible)
Experience with monitoring and observability platforms (Prometheus, Grafana, Splunk, etc.)
Familiarity with container orchestration technologies such as Kubernetes
Experience supporting developer platforms, CI/CD tooling, or internal engineering systems
Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent practical experience)
Significant experience in Site Reliability Engineering, DevOps, or platform engineering (Staff-level seniority)
Ownership of mission-critical engineering systems
Architectural input into reliability and scalability strategy
Mentorship of junior SREs and platform engineers
Direct impact on high-scale AI and autonomous development environments
Opportunity to work on cutting-edge engineering infrastructure
High-impact role supporting AI and advanced technology platforms
Collaborative, engineering-driven culture
Competitive compensation and long-term career growth



