Job Title:  T&T - EAD - All levels - Azure DE

Job requisition ID ::  71133
Date:  Oct 28, 2024
Location:  Bengaluru
Designation:  Assistant Manager
Entity: 

As a Data Engineer with expertise in PySpark, Databricks, and Microsoft Azure, SQL you will be responsible for designing, developing, and maintaining robust and scalable data pipelines and processing systems. You will work closely with data scientists, analysts, and other stakeholders to ensure our data solutions are efficient, reliable, and scalable.

Responsibilities:

• Design, develop, and optimize ETL pipelines using PySpark and Databricks to process large-scale data on the Azure cloud platform.

• Implement data ingestion processes from various data sources into Azure Data Lake and Azure SQL Data Warehouse.

• Develop and maintain data models, data schemas, and data transformation logic tailored for Azure.

• Collaborate with data scientists and analysts to understand data requirements and deliver high-quality datasets.

• Ensure data quality and integrity through robust testing, validation, and monitoring procedures.

• Optimize and tune PySpark jobs for performance and scalability within the Azure and Databricks environments.

• Implement data governance and security best practices in Azure.

• Monitor and troubleshoot data pipelines to ensure timely and reliable data delivery.

• Document data engineering processes, workflows, and best practices specific to Azure and Databricks.

Requirements:

• Bachelor's or Master's degree in Computer Science, Engineering, or a related field.

• Proven experience as a Data Engineer with a strong focus on PySpark and Databricks.

• Proficiency in Python and PySpark for data processing and analysis.

• Strong experience with Azure data services, including Azure Data Lake, Azure Data Factory, Azure SQL Data Warehouse, and Azure Databricks.

• Strong SQL skills and experience with relational databases (e.g., MySQL, PostgreSQL) and NoSQL databases (e.g., MongoDB, Cassandra).

• Experience with big data technologies such as Hadoop, Spark, Hive, and Kafka.

• Strong understanding of data architecture, data modeling, and data integration techniques.

• Familiarity with Azure DevOps, version control systems (e.g., Git), and CI/CD pipelines.

• Excellent problem-solving skills and attention to detail.

• Strong communication and collaboration skills.

Preferred Qualifications:

• Experience with Delta Lake on Azure Databricks.

• Knowledge of data visualization tools (e.g., Power BI, Tableau).

• Experience with containerization and orchestration tools (e.g., Docker, Kubernetes).

• Understanding of machine learning concepts and experience working with data scientists.

 Skills

-------------

• Azure Data Factory: Experience in creating and orchestrating data pipelines, understanding of triggers and data flows.

• Databricks: Knowledge of Apache Spark, programming in Python, Scala or R, experience optimizing data processing and transformation jobs. Experience querying databases and tables in SQL.

• Azure Data Lake Storage: Experience working with ADLS Gen 1 and Gen 2, knowledge of hierarchy, file systems and security aspects.

• Azure DevOps: Experience working with repositories, pipelines, builds and releases, understanding CI/CD processes.

• Dataintegration: Knowledge of various data sources and data formats such as JSON, CSV, XML, Parquet and Delta. Also knowledge of databases such as Azure SQL, MySQL or PostgreSQL.

 

Tasks

----------

• Data extraction: Identifying and extracting data from various sources such as databases, APIs, file systems and external services.

• Data transformation: Data cleaning, enrichment and normalization according to project requirements.

• Data loading: Loading the transformed data into target databases, data warehouses or data lakes.

• Data pipeline development: Implementing and automating ETL or ELT processes using Azure Data Factory and Databricks.

• Monitoring and Troubleshooting: Monitoring data pipelines, identifying issues and implementing fixes.

• Data integration: Developing interfaces and integration solutions for various data sources and platforms.

• Performance optimization: Analyzing and improving the performance of data pipelines and processing jobs.