Job Title: Consulting_SAMA_GCP Data Engineer_Consultant-2
Overall skill requirement :
• Cloud services GCP for scalable and flexible data storage and processing – Cloud Storage, DataProc, Big Query
• Experience in Big Data Technologies, Extensive experience in PySpark and Hive for efficient processing and querying of large datasets.
• Designing and implementing end-to-end data pipelines to facilitate smooth data flow and processing.
• Implementing workflow automation for Data pipelines using tools like Apache Airflow or Apache NiFi.
• Worked on multiple big data storage formats like Parquet, Avro and table formats like Delta, Iceberg etc.
• Experienced in building data ingestion frameworks, including Batch, files, CDC, Real time using Kafka or PubSub.
• Experienced in implementing automated processes for data validation and ensuring quality through unit testing
• Proficient in continuous integration and continuous deployment using Git for version control and Jenkins for automation.
• Experience working with NoSQL Databases MongoDB for handling unstructured data and scalable storage.
• Experienced in Redis for efficient caching and improving data retrieval performance for applications.
• Experienced in DBT or PySpark for managing the transformation and modeling of data in the data pipelines.
• Strong proficiency in Python for scripting, data manipulation, and general programming tasks.
• Experience in both Lambda and Kappa architectures for real-time and batch processing in big data systems.