Zeta

Site Reliability Engineer I (Payzapp)

Posted 21 Days Ago

Be an Early Applicant

Bangalore, Bengaluru, Karnataka

Junior

Bangalore, Bengaluru, Karnataka

Junior

The Site Reliability Engineer I will focus on maintaining and optimizing software systems' reliability through automation, incident response, monitoring, and implementing security best practices. The role includes participating in disaster recovery planning and continuous improvement of system performance.

The summary above was generated by AI

All about Zeta Suite :

Zeta is the world’s first and only Omni Stack for banks and fintechs. We are rethinking payments from core to the edge, led by the vision to augment the purpose of money and banking with technology. A single, modern software stack comprising processing, loans, customizable mobile and web apps, fraud engine, and rewards for retail banking. We are a new-age, high-growth startup (& a unicorn!) founded in 2015 by two visionary leaders Bhavin Turakhia & Ramki Gaddipati, whose entrepreneurial legacy & excellence has put us on top of the global fintech ecosystem. Zeta counts amongst its customers, over 10 banks and 25 fintechs, across 8 countries - some of our notable clients include Sodexo - a leading issuer of employee benefits & rewards with over 30 million global users, and HDFC Bank - the 14th largest bank by market cap in the world. Learn more about our manifesto & beyond.

Responsibilities

System Reliability: Ensuring the reliability of software systems by designing, implementing, and maintaining scalable and reliable infrastructure.
Automation: Developing automation tools and scripts to streamline operational tasks, reduce manual intervention, and improve overall system efficiency.
Incident Response and Resolution: Monitoring system performance and responding to incidents promptly to minimize downtime and ensure high availability.
Capacity Planning: Analyzing system usage patterns and forecasting future capacity needs to ensure that the infrastructure can handle current and future demands.
Performance Optimization: Identifying and addressing performance bottlenecks in software systems through optimization and tuning.
Infrastructure as Code (IaC): Implementing infrastructure as code practices, using tools like Terraform or Ansible, to define and manage infrastructure in a version-controlled and automated manner.
Monitoring and Logging: Implementing and maintaining monitoring and logging solutions to gain insights into system behavior, troubleshoot issues, and proactively address potential problems.
On-Call Support: Participating in an on-call rotation to respond to incidents outside of regular working hours and ensure 24/7 system availability
Security: Collaborating with security teams to implement and maintain security best practices in infrastructure and application
Disaster Recovery Planning: Developing and maintaining disaster recovery plans to ensure that systems can quickly recover from major outages or failures
Continuous Improvement: Continuously analyzing system performance, reliability, and incidents to identify areas for improvement and implementing changes to enhance overall system resilience.

Skills

Programming Languages: Proficiency in one or more programming languages, commonly Python, Go, Shell, Bash.
Automation and Scripting: Strong automation skills using tools like Ansible, Puppet, Chef, or custom scripts. Knowledge of Infrastructure as Code (IaC) tools like Terraform
Containerization and Orchestration: Experience with containerization technologies like Docker and container orchestration platforms like Kubernetes.
Cloud Computing: Proficiency in any of the cloud platforms such as AWS, Azure, or Google Cloud Platform, and knowledge of managing infrastructure in the cloud.
Monitoring and Logging: Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK stack) and logging frameworks to track system performance and troubleshoot issues.
Networking: Understanding of networking concepts, protocols, and troubleshooting skills.
Security: Knowledge of security best practices, including encryption, access controls, and vulnerability management.
Continuous Integration/Continuous Deployment (CI/CD): Understanding and implementation of CI/CD pipelines for automated testing and deployment.
Load Balancing: Experience in incident response, troubleshooting, and resolution.
Version Control: Proficient use of version control systems like Git.

Experience and Qualifications

1-2 year of experience in site reliability engineering.
B.Tech/M.Tech in computer science, information technology or a related field.
Having experience working for a product organization is a plus.

Equal Opportunity

Zeta is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. We encourage applicants from all backgrounds, cultures, and communities to apply and believe that a diverse workforce is key to our success

Top Skills

Bash

Python

Shell

Similar Jobs

Nexthink

Site Reliability Engineer

7 Hours Ago

Hybrid

Bengaluru, Karnataka, IND

Mid level

Artificial Intelligence • Big Data • Information Technology • Software

As a Site Reliability Engineer, you will enhance engineering productivity by providing essential tools for development, maintaining CI/CD pipelines, and developing internal self-service platforms. You'll collaborate with development teams to promote platform adoption and troubleshoot deployment incidents across environments.

Top Skills: BashGoJavaScriptPythonTypescript

BlackLine

Senior Site Reliability Engineer

2 Days Ago

Hybrid

Bengaluru, Karnataka, IND

Senior level

Cloud • Fintech • Information Technology • Machine Learning • Software • App development • Generative AI

As a Senior Site Reliability Engineer at BlackLine, you will design, deploy, and maintain CI/CD pipelines using Jenkins, collaborate with cross-functional teams to optimize processes, and enhance automation in cloud environments. You will also support infrastructure as code practices and troubleshoot deployment issues.

Top Skills: BashGroovyJavaPython

Take-Two Interactive Software

SRE I

2 Days Ago

Hybrid

Bengaluru, Karnataka, IND

Junior

Gaming • Information Technology • Mobile • Software

As a Site Reliability Engineer I, you will maintain the health and reliability of games and services at Take-Two. Your responsibilities include monitoring infrastructure, providing on-call support, troubleshooting incidents, and managing cloud services across multiple platforms. You'll ensure seamless operations and communication during outages, focusing on enhancing system performance and scalability.

Top Skills: AWSAzureLinuxWindows

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.