Introduction
- Dataset that will be used throughout the course is going to be about taxis rides in New York City (https://www1.nyc.gov/site/tlc/index.page)
- The goal of the bootcamp is to build a pipeline using data from the dataset mentioned, like in the image below

- After the tutorial is completed, we are going to work on an individual project covering the tech stack presented
- Tech stack:
- Google Cloud Storage
- Airflow
- Kafka
- Spark
- dbt
- Google Data Studio
- BigQuery
- Batch and Streaming
- DataTalks Club Github repo: https://github.com/DataTalksClub/data-engineering-zoomcamp
Docker
Docker Documentation
PostgreSQL
PostgreSQL is an open-source database and we will be working with it to run local tests. To interact with the database via command line, we can make use of the pgcli library. To install it, run pip install pgcli.
PostgreSQL
pgcli