Synechron Logo

Synechron

PySpark Data Engineer | Big Data & Analytics

Posted 10 Days Ago
Be an Early Applicant
In-Office
2 Locations
Senior level
In-Office
2 Locations
Senior level
Design, build, and lead production data and ML pipelines (streaming and batch) using Python and PySpark. Develop ETL, feature engineering, EDA, data quality, and anomaly detection. Use Hadoop/Hive, Pandas, SQL/NoSQL, Airflow/Jenkins and CI/CD tooling. Write tested, maintainable code, collaborate with stakeholders, and mentor teams while communicating technical concepts to technical and non-technical audiences.
The summary above was generated by AI

Job Summary

Synechron is seeking an experienced PySpark Data Engineer / Data Scientist to lead data pipeline development and advanced analytics initiatives within our financial data and index analytics division. This role plays a crucial part in building scalable data processing solutions, enabling data-driven insights, and supporting machine learning workflows in both batch and streaming environments. The ideal candidate will possess a strong technical foundation in big data processing, analytics, and software engineering, along with leadership capabilities to drive impactful data projects.

Software Requirements

Required Skills:

  • Proven expertise in Python programming, emphasizing clean, maintainable, and scalable code

  • Hands-on experience with PySpark in both batch and streaming workflows

  • Deep knowledge of data manipulation and feature engineering, including Pandas, NumPy, and visualization libraries (matplotlib, seaborn)

  • Experience with Spark components like Spark SQL, DataFrames, and Spark MLlib

  • Familiarity with data storage solutions: SQL and NoSQL databases (e.g., Hive, Cassandra)

  • Knowledge of ETL tools such as Apache Airflow, Jenkins, or GithHub Actions for scheduling and automation

  • Experience working with cloud environments, especially Azure or AWS for big data processing

Preferred Skills:

  • Hands-on with containerization and orchestration (Docker, Kubernetes)

  • Exposure to distributed storage solutions like Hadoop HDFS or Azure Data Lake

Overall Responsibilities

  • 5 years of experience in Design, develop, and optimize large-scale data pipelines using PySpark for structured, semi-structured, and unstructured data

  • 5 years of experience to Lead the building of ML pipelines for training, validation, and deployment of models in streaming/batch modes

  • Write high-quality, efficient code that supports data transformation, cleaning, and feature engineering

  • Collaborate with data scientists, analysts, and stakeholders to understand data requirements and deliver actionable insights

  • Build and maintain reusable code base and automation scripts for data processing and model validation

  • Monitor pipeline performance, troubleshoot issues, and implement improvements to ensure robustness and scalability

  • Stay up-to-date with the latest in big data processing, ML techniques, and analytics tools to improve system efficiency and analytics capabilities

Technical Skills (By Category)

Programming Languages:

  • Required: Python (required), PySpark (required)

  • Preferred: Scala, Java

Databases & Data Management:

  • SQL (MySQL, SQL Server), NoSQL (Cassandra, MongoDB), Hive, Data Lakes

Cloud Technologies:

  • Azure Data Factory, Azure Synapse, AWS Glue, S3 (preferred)

Frameworks & Libraries:

  • Spark MLlib, Pandas, NumPy, seaborn, matplotlib, scikit-learn (preferred)

Development Tools & Methodologies:

  • Jupyter, PyCharm, VSCode, Git, CI/CD (Jenkins, GitHub Actions), Airflow

Security & Data Governance:

  • Data privacy principles, secure data ingestion and output, compliance

Experience Requirements

  • 7-12 years of experience in data engineering, analytics, or data science roles, with significant hands-on experience in big data processing and ML pipelines

  • Proven track record of building scalable data pipelines and supporting ML workflows in enterprise environments

  • Experience working with structured, semi-structured, and unstructured data across financial domains

  • Previous leadership or mentorship experience in a technical team is preferred

Day-to-Day Activities

  • Develop and optimize data pipelines for financial and index data using PySpark and related tools

  • Build ML workflows, feature engineering, and model deployment pipelines in both streaming and batch environments

  • Collaborate with business analysts and data scientists to refine data requirements and deliver insights

  • Automate data ingestion, transformation, and validation processes

  • Monitor system performance, troubleshoot issues, and implement tuning activities

  • Review code and pipeline health with peer teams, uphold best practices in software development and data security

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Data Science, Mathematics, or a related field

  • Relevant certifications in big data, cloud platforms, or analytics (preferred)

  • Strong portfolio showcasing data pipeline projects, analytics solutions, and ML workflows

Professional Competencies

  • Critical thinking and analytical problem-solving skills

  • Excellent communication skills for technical and non-technical audiences

  • Leadership qualities to guide project execution and mentor junior team members

  • Adaptability to new tools, frameworks, and evolving project requirements

  • Ability to handle multiple priorities under pressure with a focus on quality and deadlines

S​YNECHRON’S DIVERSITY & INCLUSION STATEMENT
 

Diversity & Inclusion are fundamental to our culture, and Synechron is proud to be an equal opportunity workplace and is an affirmative action employer. Our Diversity, Equity, and Inclusion (DEI) initiative ‘Same Difference’ is committed to fostering an inclusive culture – promoting equality, diversity and an environment that is respectful to all. We strongly believe that a diverse workforce helps build stronger, successful businesses as a global company. We encourage applicants from across diverse backgrounds, race, ethnicities, religion, age, marital status, gender, sexual orientations, or disabilities to apply. We empower our global workforce by offering flexible workplace arrangements, mentoring, internal mobility, learning and development programs, and more.

All employment decisions at Synechron are based on business needs, job requirements and individual qualifications, without regard to the applicant’s gender, gender identity, sexual orientation, race, ethnicity, disabled or veteran status, or any other characteristic protected by law.

Candidate Application Notice

Similar Jobs

An Hour Ago
Easy Apply
In-Office or Remote
Bangalore, Bengaluru, Karnataka, IND
Easy Apply
Senior level
Senior level
Cloud • Information Technology • Security • Software
Own and scale a $1M territory through partner recruitment, enablement, and GTM execution. Build executive partner relationships, manage co-selling and pipeline, run performance analysis, and coordinate internal teams to drive channel revenue in India.
Top Skills: Active DirectoryGeminiGongGoogle Workspace (Gws)ImpartnerJumpcloudSalesforce (Sfdc)
An Hour Ago
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
Senior level
Senior level
Artificial Intelligence • Natural Language Processing • Professional Services • Analytics • Consulting • Conversational AI • Generative AI
Lead module functional consultant for end-to-end Oracle HCM Cloud (Fusion HCM) implementations. Must have functional/configuration experience in two or more modules such as Core HR, Absence, US/Canada Payroll, Benefits, Compensation, Talent Management, ORC, Learning, OTL, or HR Helpdesk.
Top Skills: AbsenceBenefitsCanada PayrollCompensationCore HrFusion HcmHr HelpdeskLearningOracle Fusion HcmOracle Hcm CloudOrcOtlTalent ManagementUs Payroll
An Hour Ago
Hybrid
Bengaluru, Bengaluru Urban, Karnataka, IND
Senior level
Senior level
Artificial Intelligence • Natural Language Processing • Professional Services • Analytics • Consulting • Conversational AI • Generative AI
Lead design and architecture of modern data ecosystems (data lake/lakehouse), data modeling, ETL, and API-based architectures. Build RESTful/GraphQL and event-driven APIs, API gateway/auth patterns, and agentic AI agents/workflows. Provide solution leadership, stakeholder engagement, governance, and cloud data platform (preferably AWS) expertise.
Top Skills: Agentic AiApi GatewayApi VersioningAuthentication/AuthorizationAWSData LakeData ModelingETLEvent-Driven ArchitectureGraphQLLakehouseRate LimitingRestful Api

What you need to know about the Bengaluru Tech Scene

Dubbed the "Silicon Valley of India," Bengaluru has emerged as the nation's leading hub for information technology and a go-to destination for startups. Home to tech giants like ISRO, Infosys, Wipro and HAL, the city attracts and cultivates a rich pool of tech talent, supported by numerous educational and research institutions including the Indian Institute of Science, Bangalore Institute of Technology, and the International Institute of Information Technology.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account