Data Engineering Basics to Advance: Phase-III- Advanced
- Get link
- X
- Other Apps
Advanced
opics:
Big Data Ecosystem
Apache Spark (core, DataFrames, PySpark)
Hive/Presto/Athena
Hadoop (just architecture overview)
Data Lakes & Lakehouses
Concepts: data lake, warehouse, lakehouse
Table formats: Apache Iceberg, Delta Lake, Hudi
Glue, Athena, Iceberg setup
Streaming Systems
Kafka: pub/sub, brokers, partitions
Kafka Connect, schema registry
Apache Flink or Spark Structured Streaming (basics)
Cloud Data Warehouses
BigQuery, Redshift, Snowflake: architecture & querying
Partitioning, clustering, optimization
Monitoring & Observability
Logging (CloudWatch, Stackdriver)
Data quality with Great Expectations
Lineage tools: OpenMetadata, Amundsen
Let’s connect..
Github: https://github.com/ketankkeshri
LinkedIn: https://in.linkedin.com/in/ketankeshri
YouTube: https://www.youtube.com/@KetanKKeshri
Instagram: https://www.instagram.com/ketankkeshri
medium: https://medium.com/@ketankkeshri
- Get link
- X
- Other Apps
Comments
Post a Comment