Synechron Logo

Synechron

PySpark Data Engineer with Cloudera and Cloud Expertise

Posted 5 Days Ago
Be an Early Applicant
In-Office
Thanisandra Nagavara, Bangalore, Karnataka, IND
Senior level
In-Office
Thanisandra Nagavara, Bangalore, Karnataka, IND
Senior level
The PySpark Data Engineer is responsible for developing and maintaining scalable data pipelines, optimizing data workflows, ensuring data quality, and collaborating with analysts and scientists to meet analytics needs.
The summary above was generated by AI

Job Summary
Synechron is seeking a highly experienced PySpark Data Engineer to develop, optimize, and maintain scalable data pipelines within the Cloudera Data Platform (CDP). This role is essential in ensuring high data quality, availability, and performance across enterprise data ecosystems. The successful candidate will leverage extensive big data and cloud-native processing expertise to support business analytics, reporting, and data science initiatives, driving impactful insights and operational efficiency.

Software Requirements

  • Required:

    • Advanced proficiency in PySpark, including handling DataFrames, RDDs, and optimization techniques for large-scale data processing

    • Strong experience with Cloudera Data Platform components such as Cloudera Manager, Hive, Impala, HDFS, and HBase

    • In-depth knowledge of Hadoop ecosystem technologies (Hadoop, Kafka) and distributed computing frameworks

    • SQL expertise and experience with data warehousing concepts (Hive, Impala)

    • Linux scripting skills (Bash, Python) for automation and operational workflows

    • Experience with orchestration tools like Apache Oozie or Apache Airflow

  • Preferred:

    • Cloud data services (AWS EMR, Azure HDInsight, GCP Dataproc) for scalable data processing

    • Data modeling, metadata management, and data governance tools

    • CI/CD pipelines setup using Jenkins, GitLab, or similar tools

Overall Responsibilities

  • Design, develop, and optimize highly scalable data pipelines using PySpark within the Cloudera Data Platform to support business intelligence and analytics.

  • Manage end-to-end data ingestion processes from various sources such as relational databases, APIs, and file systems.

  • Execute data transformation, cleansing, and aggregation processes on large datasets to facilitate reporting and data science activities.

  • Conduct performance tuning of PySpark jobs and optimize cluster resource utilization.

  • Implement data quality checks, validation routines, and monitoring to ensure data accuracy and consistency.

  • Automate data workflows and pipeline orchestration to reduce manual intervention and improve efficiency.

  • Troubleshoot data pipeline issues and drive operational stability across data ecosystems.

  • Collaborate with data analysts, data scientists, and platform engineers to understand data requirements and improve system performance.

  • Maintain detailed documentation for data pipelines, workflows, configurations, and operational procedures.

  • Support data governance, security, and compliance initiatives aligned with enterprise standards.

Technical Skills (By Category)

  • Programming & Data Processing (Essential):

    • PySpark (DataFrames, RDDs, optimization)

    • SQL (Hive, Impala, relational databases)

    • Linux scripting (Bash, Python) for automation

  • Data Ecosystem & Storage (Essential):

    • Hadoop ecosystem (HDFS, Hive, Impala, HBase)

    • Kafka or similar messaging systems for data streaming

  • Cloud & Orchestration (Preferred):

    • Cloud-native data processing (AWS EMR, Azure HDInsight, GCP Dataproc)

    • Orchestration tools (Apache Airflow, Oozie)

  • Tools & Frameworks (Preferred):

    • CI/CD with Jenkins, GitLab CI

    • Data governance and metadata tools (e.g., Apache Atlas, Collibra)

Experience Requirements

  • Minimum of 5+ years working in data engineering roles with significant PySpark expertise.

  • Proven experience building and managing large-scale data pipelines in enterprise environments.

  • Strong background in big data ecosystems, cloud data services, and data warehousing.

  • Demonstrated ability to optimize Spark jobs and troubleshoot distributed data processing issues.

  • Experience supporting financial or regulated industries is advantageous.

  • Support pathways include extensive hands-on experience in large data ecosystems supporting analytics and reporting.

Day-to-Day Activities

  • Develop, optimize, and monitor scalable data pipelines for ingestion, transformation, and redistribution of data.

  • Troubleshoot data processing issues proactively, perform root cause analysis, and implement fixes.

  • Collaborate with data analysts, data scientists, and platform teams to design data models and pipelines based on business needs.

  • Automate operational workflows using orchestration tools to enhance pipeline reliability.

  • Conduct performance tuning, cluster management, and resource optimization for Spark jobs.

  • Validate data quality, correctness, and completeness through routine reviews and monitoring.

  • Document architecture, workflows, and procedures for operational governance.

  • Support data privacy, security, and compliance measures within data ecosystems.

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.

  • 5+ years of hands-on experience with PySpark, big data ecosystems, and distributed processing.

  • Proven expertise supporting large-scale data pipelines in enterprise or financial industry environments.

  • Experience with Cloudera Data Platform components (Hive, Impala, HDFS, HBase).

  • Strong SQL and data modeling skills.

  • Support experience supporting cloud data processing environments (AWS, Azure, GCP) is advantageous.

  • Relevant certifications (e.g., AWS Big Data Specialty, Cloudera Certified Data Engineer) are preferred.

Professional Competencies

  • Strong analytical and troubleshooting skills for complex data pipeline issues.

  • Ability to work independently and collaboratively across teams.

  • Effective communication skills to convey technical details to non-technical stakeholders.

  • Adaptability to evolving technologies and data processing requirements.

  • Focus on operational excellence, data quality, and process automation.

  • Ownership mindset to ensure data integrity, performance, and reliability.

S​YNECHRON’S DIVERSITY & INCLUSION STATEMENT
 

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Top Skills

Apache Airflow
Apache Oozie
AWS
Azure
Cloudera Data Platform
GCP
Hadoop
Kafka
Linux
Pyspark
SQL

Similar Jobs

6 Days Ago
In-Office
Thanisandra Nagavara, Bangalore, Karnataka, IND
Senior level
Senior level
Fintech • Financial Services
Design, develop, and maintain scalable data pipelines using PySpark within the Cloudera Data Platform. Collaborate with teams to ensure data quality and optimize performance.
Top Skills: Apache AirflowAws GlueAzure Data FactoryBashCloudera Data PlatformEmrHadoopHbaseHdfsHiveImpalaOoziePysparkPythonSQL
3 Hours Ago
In-Office
Bangalore, Bengaluru Urban, Karnataka, IND
Senior level
Senior level
Cloud • Information Technology • Internet of Things • Machine Learning • Software • Cybersecurity • Infrastructure as a Service (IaaS)
The Senior Engineer will verify ASIC designs, implement UVM test environments, develop and run test cases, and collaborate across teams to enhance verification methodologies.
Top Skills: AmbaAxiCadenceChiJenkinsJIRASimscopeSynopsysSystemverilogUvm
3 Hours Ago
Easy Apply
Hybrid
Bangalore, Bengaluru, Karnataka, IND
Easy Apply
Senior level
Senior level
Cloud • Information Technology • Security • Software • Cybersecurity
The role involves overseeing identity and access control infrastructure, leading a team in Zero Trust technology, managing development, and collaborating with various stakeholders to ensure efficient project delivery.
Top Skills: AWSAzureCassandraDynamoDBGCPGoogle SpannerKafkaKubernetesLdapMemcachedOauthRabbitMQRedisSAMLScim

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account