Posts

Showing posts from July, 2025

Bhakti-Aarti- Android app Privacy policy

 Privacy Policy Effective Date: October 4th, 2025   App Name: Bhakti Aarti   Developer: k3kventures   Contact: sultanmirzadev@gmail.com --- 1. Introduction This Privacy Policy explains how we collect, use, and protect your personal information when you use our Android application "Bhakti Aarti" ("App"). By using the App, you agree to the terms of this Privacy Policy. --- 2. Information We Collect a. Voice Input Our App may request access to your device’s microphone in order to enable voice-based search functionality. When you use this feature: - The app listens only when you activate the voice input manually.   - Your voice input is processed to understand your search query.   - We do not store or share your voice recordings or voice data. Voice data may be processed by Google's speech recognition services (or similar services) if you are using a device with Google Play Services enabled. Please refer to Google’s Privacy Policy at:   https://policies.go...

Order of SQL query execution

 In SQL, the order  you write the clauses  in a query is  not  the same as the order in which the  database engine executes  them. Here’s the  logical order of execution  of SQL clauses: Step Clause Description 1️⃣ FROM Identifies source tables and joins data. 2️⃣ WHERE Filters rows before grouping. 3️⃣ GROUP BY Groups rows based on specified columns. 4️⃣ HAVING Filters groups after aggregation. 5️⃣ SELECT Selects columns or expressions to return. 6️⃣ DISTINCT Removes duplicate rows from the result set. 7️⃣ ORDER BY Sorts the result set. 8️⃣ LIMIT / OFFSET Returns a subset of the result (pagination). EXAMPLE: SELECT department, COUNT(*) AS employee_count FROM employees WHERE status = 'active' GROUP BY department HAVING COUNT(*) > 10 ORDER BY employee_count DESC LIMIT 5; Execution order: FROM employees WHERE status = 'active' GROUP BY department HAVING COUNT(*) > 10 SELECT department, COUNT(*) AS employee_count ORDER BY employee_count DE...

AWS Lake formation - AWS LF - Governance Security- Access control

  🧭  Overview: What is AWS Lake Formation? AWS Lake Formation is a service that simplifies building a secure data lake by: Ingesting data from various sources Organizing it in Amazon S3 Setting up data catalogs (via AWS Glue) Defining security and access policies Querying data with services like Athena, Redshift, and EMR 🛠️  Prerequisites Before starting, ensure you have: An  AWS account IAM permissions for Lake Formation, Glue, S3, and IAM An existing  S3 bucket  (or create a new one) 🧱  Step 1: Set Up a Data Lake Location Go to the  Lake Formation Console . In the left pane, choose  "Data lake locations" . Click  "Register location" . Choose your  S3 bucket  or a folder (e.g.,  s3://your-bucket/data/ ). Choose an  IAM role  that has permission to access this location. 📋  Step 2: Add a Data Catalog Table From the Lake Formation Console, go to  "Databases" . Click  "Create database"  (this...

DBT tool connect Athena from Local- AWS SSO

USE AWS SSO login to access Athena using DBT outputs : dev : type : athena s3_staging_dir : s3://test-bucket/athena/ region_name : ap-south-1 database : awsdatacatalog schema : <db_name>      work_group : < Athena-workgroup > aws_profile_name : <profile-name-on-local-machine>

Get All the tables in a AWS GLUE database using AWS cli

Name of all the tables in a DB: $ aws glue get-tables --database-name <your-db-name> --output json --query 'TableList[*].Name' --profile <aws-profile-name> Count of tables in the DB $ aws glue get-tables --database-name <your-db-name> --output json --query 'TableList[*].Name' --profile <aws-profile-name>  | jq length

Connect to MySQL through Jump servers tunnel

How to connect MySQL through Jump servers: When you don't have direct access to mysql-server, you use jump-server. From your machine, you connect(ssh) to jump-server and from there you connect to your mysql-server. This can be avoided by using ssh- tunneling. Suppose your        jump server is `jump-ip`        mysql server is `mysql-ip`        your machine is `machine-ip` Just open ssh client(Putty in windows or terminal in linux/ios). Type:     ssh -L [local-port]:[mysql-ip]:[mysql-port] [jump-server-user]@[jump-ip] After this, you can use your localhost and local-port to access mysql-server on the remote machine directly. Eg. Your Jdbc url to access mysql database, in that case, will be jdbc:mysql://localhost:[local-port]/[database-name]

Benefits of Apache Parquet Format in big fata

Benefits of Parquet Format Columnar Storage Efficient for analytics and read-heavy workloads . Only required columns are read into memory. Highly Compressed Supports efficient compression algorithms (Snappy, GZIP, Brotli). Smaller file size compared to row-based formats like CSV/JSON. Splittable & Scalable Files can be split and read in parallel , improving speed in distributed systems like Hadoop/Spark. Schema Evolution Supports adding new columns without breaking existing data pipelines. Efficient for Queries Works well with SQL engines like Hive, Presto, Trino, Athena, BigQuery. Better IO Performance Reduces disk and network IO by avoiding unnecessary data reads. Interoperable Supported across multiple languages and platforms (Python, Java, Spark, Hive, AWS, GCP, etc.). Self-describing Format Stores schema as metadata within the file itself — no need for external schema definitions. Great with Partitioning When used wi...

unzip a zip file stored on S3 without downloading it

unzip a zip file stored on S3 without downloading it   AWS S3 is   an object storage , not a file system. Unzipping is not supported directly. You need some computation for unzipping, which can be achieved by copying that file to some  EC2 machine  or use some  Lambda . Let’s connect.. Github: https://github.com/ketankkeshri LinkedIn: https://in.linkedin.com/in/ketankeshri YouTube: https://www.youtube.com/@KetanKKeshri Instagram: https://www.instagram.com/ketankkeshri/ medium: https://medium.com/@ketankkeshri

Resolve SSL Certificate issue while pip install in Python

  SSL Certificate issue in python pip If you are facing issues while installing pip packages, $ pip install pandas To resolve it, we need to set the certificate path. 1. Find the Cert path. 2. Set the cert path. 3. Run pip install In MacBook: Find the ssl Path first: Run this command in terminal  $ python -c "import ssl; print(ssl.get_default_verify_paths())" OR $ python3 -c "import ssl; print(ssl.get_default_verify_paths())" It will give output something like this DefaultVerifyPaths(cafile='/opt/homebrew/etc/openssl@3/cert.pem', capath='/opt/homebrew/etc/openssl@3/certs', openssl_cafile_env='SSL_CERT_FILE', openssl_cafile='/opt/homebrew/etc/openssl@3/cert.pem', openssl_capath_env='SSL_CERT_DIR', openssl_capath='/opt/homebrew/etc/openssl@3/certs') Now set this path before running pip command, Run this in your terminal: $ export REQUESTS_CA_BUNDLE="/opt/homebrew/etc/openssl@3/cert.pem" You can set this path in...

Data Engineering Basics to Advance: Phase-IV- Capstone Projects

                              Capstone Projects Project Ideas: Batch Pipeline Source: CSV on S3 Process: PySpark/dbt Sink: Redshift/BigQuery Orchestrate: Airflow Streaming Pipeline Source: Kafka (clickstream or logs) Process: Spark Streaming Sink: ElasticSearch or S3 Data Observability Implement Great Expectations Data profiling and alerting Cloud-native Data Lakehouse Glue + Iceberg + Athena + dbt Use partitioning, schema evolution Let’s connect.. Github: https://github.com/ketankkeshri LinkedIn: https://in.linkedin.com/in/ketankeshri YouTube: https://www.youtube.com/@KetanKKeshri Instagram: https://www.instagram.com/ketankkeshri medium: https://medium.com/@ketankkeshri

Data Engineering Basics to Advance: Phase-III- Advanced

                                     Advanced opics: Big Data Ecosystem Apache Spark (core, DataFrames, PySpark) Hive/Presto/Athena Hadoop (just architecture overview) Data Lakes & Lakehouses Concepts: data lake, warehouse, lakehouse Table formats:  Apache Iceberg ,  Delta Lake ,  Hudi Glue, Athena, Iceberg setup Streaming Systems Kafka: pub/sub, brokers, partitions Kafka Connect, schema registry Apache Flink or Spark Structured Streaming (basics) Cloud Data Warehouses BigQuery, Redshift, Snowflake: architecture & querying Partitioning, clustering, optimization Monitoring & Observability Logging (CloudWatch, Stackdriver) Data quality with Great Expectations Lineage tools: OpenMetadata, Amundsen Let’s connect.. Github: https://github.com/ketankkeshri LinkedIn: https://in.linkedin.com/in/ketankeshri YouTube: https://www.youtube.com/@KetanKKeshri Instagra...

Data Engineering Basics to Advance: Phase-II- Intermediate

                                   Intermediate Topics: Databases Relational (PostgreSQL, MySQL) NoSQL (MongoDB, Redis, Cassandra – intro only) Data warehousing concepts (Star, Snowflake schema) ETL/ELT & Data Pipelines Difference between ETL & ELT Hands-on with tools: Airflow  (or Prefect): DAGs, operators, scheduling dbt : modeling, tests, macros, incremental models Cloud Basics Intro to AWS/GCP/Azure S3, IAM, Lambda, CloudWatch (focus AWS if unsure) Basic infra concepts: VPC, subnets, security File Formats & Serialization CSV, JSON, Avro, Parquet Compression: gzip, snappy Let’s connect.. Github: https://github.com/ketankkeshri LinkedIn: https://in.linkedin.com/in/ketankeshri YouTube: https://www.youtube.com/@KetanKKeshri Instagram: https://www.instagram.com/ketankkeshri medium: https://medium.com/@ketankkeshri

Data Engineering Basics to Advance: Phase-I- Foundations

                                   Foundations Topics: Python Programming Basics: variables, loops, functions, OOP Data structures: lists, dicts, sets Working with files, error handling Libraries:  pandas ,  requests ,  json SQL (Structured Query Language) SELECT, WHERE, GROUP BY, JOINS Window functions, CTEs, Subqueries Practice: LeetCode SQL, Mode Analytics SQL Tutorial Data Fundamentals CSV, JSON, Parquet formats Data types, schemas, data quality Basics of data modeling (OLTP vs OLAP) Version Control Git basics, GitHub flow Collaboration, pull requests Let’s connect.. Github: https://github.com/ketankkeshri LinkedIn: https://in.linkedin.com/in/ketankeshri YouTube: https://www.youtube.com/@KetanKKeshri Instagram: https://www.instagram.com/ketankkeshri medium: https://medium.com/@ketankkeshri