Apna Jobs

Lead / Staff Data Engineer - Data Platform

Apna

Lead / Staff Data Engineer - Data Platform

Posted 25 Days Ago

Be an Early Applicant

In-Office

Bengaluru, Bengaluru Urban, Karnataka, IND

Senior level

In-Office

Bengaluru, Bengaluru Urban, Karnataka, IND

Senior level

You will design, build, and operate Apna's data platform, focusing on scalable data pipelines, lakehouse architecture, and data reliability systems for analytics and product intelligence.

The summary above was generated by AI

Company: Apna

Team: Data Platform / Engineering

Location: Bangalore

Experience : 5-7 Years of Experience

Why Join Apna

At Apna, data is central to how we build products, understand users, improve employer outcomes, power recommendations, and scale decision-making. This role gives you the opportunity to build the backbone of Apna’s data platform and influence how data is used across the company.

You will work on real-world, high-scale problems across jobs, users, employers, communities, matching, growth, and AI-driven systems.

About the Role

Apna is looking for a Lead / Staff Data Engineer to build and scale our core data platform. This role will work on large-scale data pipelines, lakehouse architecture, query platforms, workflow orchestration, and data reliability systems that power analytics, product intelligence, machine learning, business dashboards, experimentation, and operational decision-making across Apna.

We are looking for someone who can think deeply about data architecture, design reliable pipelines, improve data quality, and help build a platform that can scale with Apna’s growth.

What You’ll Own:

You will be responsible for designing, building, and operating critical parts of Apna’s data platform, including:

Building scalable batch and near-real-time data pipelines across product, business, growth, and ML use cases.
Designing and improving our lakehouse architecture using technologies likeApache Hudi.
Working with query engines such asPresto / Trinofor large-scale analytical workloads.
Building and maintaining orchestration workflows usingApache Airflow.
Creating reusable data models, curated datasets, and reliable data marts for analytics and product teams.
Improving data platform reliability, observability, SLA tracking, lineage, and data quality checks.
Optimizing storage, compute, query performance, and pipeline costs.
Partnering with product, analytics, ML, and backend engineering teams to understand data needs and convert them into scalable platform solutions.
Driving engineering standards around data modeling, schema evolution, partitioning, deduplication, backfills, replayability, and pipeline ownership.
Mentoring data engineers and influencing architecture decisions across teams.

What We’re Looking For

Must Have

Strong experience indata engineering, preferably at scale.
Hands-on experience withApache Airflowor similar orchestration systems.
Strong knowledge ofPresto / Trinoor other distributed query engines.
Good understanding ofApache Hudiconcepts such as:

Copy-on-write vs merge-on-read
Upserts and deletes
Incremental reads
Compaction
Clustering
Timeline and commits
Schema evolution
Partitioning strategy

Strong knowledge of distributed data processing and storage systems.
Ability to design and build reliable ETL / ELT pipelines.
Strong SQL skills and ability to debug complex data issues.
Good understanding of different data architectures, including:

Data warehouse
Data lake
Lakehouse
Lambda architecture
Kappa architecture
Medallion architecture
Event-driven data architecture

Experience with data modeling for analytics and reporting.
Strong programming skills in at least one language such asPython, Java, or Scala.
Ability to reason about trade-offs between freshness, cost, reliability, latency, and complexity.
Strong debugging and production ownership mindset.

Good to Have

Experience with Kafka, Spark, Flink, Hive, Iceberg, Delta Lake, or BigQuery.
Experience building internal data platforms or self-serve data infrastructure.
Experience with data quality frameworks such as Great Expectations, Deequ, Soda, or custom validation systems.
Exposure to ML feature pipelines or feature stores.
Experience with metadata management, data catalogs, lineage, and governance.
Experience with cloud infrastructure such as AWS, GCP, or Azure.
Understanding of privacy, compliance, PII handling, and access control in data systems.

What Success Looks Like
In this role, success means:

Critical business and product datasets are reliable, discoverable, and trusted.
Pipelines are observable, recoverable, and have clear SLAs.
Query performance improves across major analytical workloads.
Data freshness and quality issues reduce significantly.
Teams can build on top of the data platform faster without reinventing pipelines.
The platform can scale with Apna’s user, job, employer, and engagement data.

Similar Jobs

Arcana

Staff Software Engineer

17 Days Ago

In-Office

Bangalore, Bengaluru Urban, Karnataka, IND

Senior level

Information Technology

The Staff Data Engineer will develop real-time data ingestion systems, manage production data pipelines using PySpark, and collaborate with a team to create scalable data solutions.

Top Skills: AirflowAWSCloudwatchEmrGlueHadoopHiveIamKinesisLambdaPysparkPythonRedshiftS3SQLSqoop

Luxor Technology

Data Engineer

4 Days Ago

In-Office or Remote

India

Mid level

Blockchain • Hardware • Software • Energy • Cryptocurrency • Big Data Analytics

The Data Engineer will build and manage scalable data pipelines and databases, collaborate on architecture, and drive data systems towards real-time capabilities in a fast-paced, innovative environment.

Top Skills: AirflowClickhouseDockerKafkaKubernetesPl/PgsqlPostgresPythonRedpandaTrino

Arcana

Senior Data Engineer

19 Days Ago

In-Office

Bangalore, Bengaluru Urban, Karnataka, IND

Mid level

Information Technology

The Senior Data Engineer will develop data ingestion systems, build enterprise data solutions, manage data pipelines, and collaborate with a data team to deliver technical solutions.

Top Skills: AirflowAWSEmrGitGitGlueHadoopHiveKinesisLambdaPysparkPythonRedshiftS3SQLSqoop

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.