Drivetrain Logo

Drivetrain

Site Reliability Engineer - SRE

Reposted 18 Days Ago
Be an Early Applicant
Remote
Hiring Remotely in India
Senior level
Remote
Hiring Remotely in India
Senior level
The Senior Site Reliability Engineer ensures the availability, performance, and security of Drivetrain's SaaS platform, managing multi-cloud infrastructure, optimizing CI/CD pipelines, and driving automation.
The summary above was generated by AI
Drivetrain is on a mission to empower businesses to make better decisions. Our financial planning & decision-making platform helps companies scale and achieve their targets predictably.

Drivetrain is a remote-first company headquartered in the San Francisco Bay Area. Founded in 2021 by a couple of ex-Googlers, Drivetrain is a fast-growing company on a trajectory for success with backing from leading venture capital firms.

Drivetrain provides a great culture for its employees to thrive in and be happy. 

💜 Remote-friendly: Drivetrain brings together the best and the brightest, no matter where they are and provides them a great degree of autonomy. We trust our people.
🗣️ Open & transparent:  We know that when our creators have access to all the information they need, their best work will emerge.
👏 Idea-friendly:  We provide an environment to explore new ideas, to take risks, to make mistakes, and to learn, so you can succeed. Anyone in the company can come up with great ideas and become a catalyst for positive change. We let the best ideas win.
👥 Customer-centric:  We follow a product-led growth strategy, continuously  learning from our customers and collaborating to build the amazing software that Drivetrain is.

As a Senior Site Reliability Engineer at Drivetrain, you will be a cornerstone of our engineering organization, ensuring our fast-growing SaaS platform remains highly available, performant, and secure. At this stage of our growth, scaling infrastructure efficiently while maintaining the rigorous security and reliability standards required for financial data is paramount. You will take ownership of our multi-cloud infrastructure, drive automation, champion observability, and collaborate closely with development teams to build a culture of reliability from code commit to production.

Key Responsibilities

Cloud Infrastructure & Orchestration

  • Multi-Cloud Management: Architect, manage, and continuously optimize highly available cloud infrastructure across both AWS and GCP. Balance workload demands to ensure maximum cost-efficiency, scalability, and strict security compliance across both platforms.

  • Advanced Kubernetes Orchestration: Lead the design, deployment, and management of scalable Kubernetes clusters. Utilize configuration management tools like Kustomize to enforce standardized, repeatable, and automated deployment configurations across all environments.

  • Service Mesh & Security Integration: Implement and maintain service mesh technologies (e.g., Istio, Linkerd) to secure, control, and observe service-to-service communication. Drive container security best practices, including image scanning, runtime protection, and strict RBAC enforcement.

CI/CD & Automation

  • Pipeline Engineering: Architect, maintain, and optimize robust CI/CD pipelines using Git and Jenkins. Focus on reducing deployment friction, accelerating release velocity, and enforcing automated testing and security gates.

  • Infrastructure as Code (IaC): Treat infrastructure as software. Write, review, and maintain Terraform modules to provision and manage cloud resources predictably and safely.

  • Operational Automation: Aggressively reduce operational toil. Develop robust Python scripts and tooling to automate routine maintenance, data backups, scaling operations, and system recovery processes.

Observability & Reliability

  • Comprehensive Monitoring: Design and enhance our observability stack to provide deep, real-time insights into system health. Manage and scale tools including Prometheus, Grafana, ELK/EFK stack, AWS CloudWatch, and GCP Operations Suite.

  • Reliability Engineering: Spearhead reliability initiatives critical to a scaling SaaS platform. Drive rigorous capacity planning exercises to stay ahead of growth.

  • Incident Management & SLOs: Own the incident response lifecycle. Facilitate blameless postmortems to extract actionable learnings. Define, track, and enforce SLIs, SLOs, and SLAs, ensuring the platform consistently meets its reliability guarantees.

Collaboration & Leadership

  • DevOps Culture: Act as an embedded reliability advocate. Collaborate closely with software engineers early in the development lifecycle to ensure applications are designed for deployability, scalability, and resilience.

  • Continuous Improvement: Proactively identify system bottlenecks and architectural weaknesses. Contribute to process improvements, build internal developer tooling, and maintain comprehensive documentation to elevate team productivity and system understanding.

Required Proficiency & Qualifications
  • Experience: 5+ years of hands-on experience in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles, preferably within a fast-paced SaaS environment.

  • Cloud Platforms: Deep, proven proficiency in AWS (EC2, EKS, RDS, VPC, IAM, S3) AND GCP (GKE, Compute Engine, Cloud SQL, IAM, Cloud Storage). Ability to navigate and optimize multi-cloud architectures.

  • Containerization: Expert-level knowledge of Docker and Kubernetes, including advanced deployment strategies and lifecycle management.

  • Automation/IaC: Strong programming skills in Python and extensive experience with Terraform.

  • Observability: Hands-on expertise building dashboards and alerting systems using Prometheus, Grafana, and log aggregation stacks (ELK/EFK).

  • Networking & Security: Solid understanding of cloud networking (VPC peering, load balancing, DNS) and zero-trust security principles in a containerized environment.

Sounds exciting? Apply at [email protected]. It may just be the next best decision you’ve ever made!

Similar Jobs

8 Days Ago
Easy Apply
Remote
India
Easy Apply
Senior level
Senior level
Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software
Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.
Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis
15 Days Ago
Remote
India
Senior level
Senior level
Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI
The Lead Site Reliability Engineer will build, deploy, and manage microservices in Kubernetes, optimize cloud applications, and integrate emerging technologies in AI and GenAI, ensuring high reliability and scalability.
Top Skills: Amazon EksAWSAzureBashChefGCPGithub ActionsHelmKubernetesMySQLNew RelicPagerdutyPythonRundeckTerraform
2 Days Ago
Remote
IN
Expert/Leader
Expert/Leader
Big Data • Information Technology • Software • Database • Analytics • Infrastructure as a Service (IaaS) • Big Data Analytics
Lead proactive reliability engineering for a multi-cloud streaming platform: build automation and tooling, define SLO/SLA frameworks, analyze systemic failures, own incident response standards, serve as incident commander, coach teams through post-mortems, produce customer-facing root cause analyses, and partner across engineering to reduce incidents and scale reliability practices.
Top Skills: AWSAzureCi/CdConfluenceGCPGitJIRAKafkaKubernetesLoggingMetricsPagerdutyRootlySlackTracing

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account