StarTree

Site Reliability Engineer

Posted 13 Days Ago

Be an Early Applicant

Remote

Senior level

Remote

Senior level

The Site Reliability Engineer will manage distributed systems, automate operations, monitor performance, and work closely with customers to resolve issues while ensuring system reliability.

The summary above was generated by AI

At StarTree we're a group of passionate individuals that desire to improve the lives of many by developing tools and technologies that support availability and speed in the world of real-time analytics.

Our aim is to make it simple for every company to delight their users - external and internal - and create new revenue streams from their data, by building the world's most comprehensive and accessible cloud analytics system.

About the role:

StarTree is seeking exceptional Site Reliability Engineers (SRE), to manage, tune and debug the large-scale highly available distributed systems. You will be working with a team of passionate and talented engineers in automation, tuning, and troubleshooting of Apache Pinot and SQL DBs. We are looking for motivated, hardworking and focused individuals who have a real passion for operational excellence, data systems, and automation.

Responsibilities:

Leverage various monitoring and alerting services to solve intricate programming problems at scale.
Manage and tune multiple critical customer-facing Apache Pinot clusters
Monitor availability, read/write latencies, and other key telemetry to proactively identify SLO misses and help mitigate issues
Build a rapport with and work closely with customers to mitigate and resolve incidents
Execute disaster recovery strategies with minimal downtime
Collaborate with other engineers to understand and troubleshoot systems and use the experience gained to influence the roadmap of other teams

Requirements:

5+ years of experience as an engineer (SRE, SDET, or development)
Experience managing highly available production facing distributed systems and in-depth knowledge of Java are a plus
Experience with cloud platforms such as AWS, GCP, or Azure
Experience with Kubernetes and container orchestration
Familiarity with streaming systems, such as Kafka, Pulsar, Flume, Flink, Spark, or similar
Knowledge of standard methodologies related to security, performance, and disaster recovery
Strong troubleshooting and critical thinking skills

About StarTree:

StarTree is a cloud-based software company that enables business customers to derive advanced insights from real-time and historical data. StarTree was founded by the core software engineering team and inventors of Apache Pinot, which currently powers hundreds of user-facing applications at companies across industries, including LinkedIn, Uber, Target, 7Eleven, Etsy, Walmart, WePay, Factual, Weibo, and more. StarTree Cloud has enabled even more companies to deploy and operate real-time analytics at scale, including Stripe, Sovrn, Roadie, Just Eat Takeaway.com, Dialpad, Guitar Center, Blinkit, and more.

StarTree recently announced our Series B Funding with investment from GGV Capital, Sapphire Ventures, Bain Capital Ventures, and CRV. We have been named one of The Information's 50 Most Promising Startups and one of CRN's 10 Coolest Cloud Computing Startup Companies of 2022!

Top Skills

Apache Pinot

AWS

Azure

Flink

Flume

GCP

Java

Kafka

Kubernetes

Pulsar

Spark

Similar Jobs

Rackspace Technology

Site Reliability Engineer / Observability Engineer

2 Days Ago

Remote

India

Senior level

Cloud • Information Technology • Software

The Site Reliability Engineer will implement observability solutions, develop monitoring tools, and collaborate on system performance to enhance application reliability.

Top Skills: AnsibleAppdynamicsAWSChefCloud FormationDatadogDynatraceGitLinux ShellNew RelicPerlPHPPuppetPythonRubySignalfxSplunkTerraform

Zoom

Site Reliability Engineer

7 Days Ago

Remote

IND

Senior level

Artificial Intelligence • Information Technology • Software

As a Site Reliability Engineer, you'll install, configure, and monitor systems globally, develop automation, address performance bottlenecks, and manage user access and network issues.

Top Skills: AnsibleCephCobblerForemanGitGoJenkinsKubernetesLinuxPackerPythonShellTerraform

Cision

Senior Site Reliability Engineer (SRE)

4 Days Ago

Remote

India

Senior level

Software

As a Senior Site Reliability Engineer, you will ensure the reliability, scalability, and performance of critical production services while mentoring colleagues and contributing to software architecture and strategy.

Top Skills: AnsibleAWSGCPGrafanaKubernetesLinuxPrometheusTerraform

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.

By clicking Apply you agree to share your profile information with the hiring company.