Skip to content

Latest commit



203 lines (143 loc) · 4.71 KB

File metadata and controls

203 lines (143 loc) · 4.71 KB


Problem statement Develop a dashboard with two tiles by (with my progress):

  • Selecting a dataset of interest
  • Creating a pipeline for processing this dataset and putting it to a datalake
  • Creating a pipeline for moving the data from the lake to a data warehouse
  • Transforming the data in the data warehouse: prepare it for the dashboard
  • Building a dashboard to visualize the data


Table of Content

Setting up VM

  1. In terminal run
ssh-keygen -t ed25519 -f ~/.ssh/covid_project_gcp -C cncPomper -b 2048

NOTE: cncPomper becomes our profile name on later created VM

in order to generate ssh key pair

  1. Add generated ssh key to GCP

    1. go to Settings > Metadata > Add ssh key
    2. add generated public key
  2. Create VM instance

    1. Region <- europe
    2. Zone <- eurobe-b
    3. Machine type <- e2-standard-4 (4 vCPU, 16GB memory)
    4. Boot disk
      1. OS <- Ubuntu
      2. Version <- Ubuntu 20.04 LTS
      3. Size <- 50 GB
  3. Connect to VM

    1. ssh VM
    ssh -i ~/.ssh/covid_project_gcp cncPomper@EXTERNAL_IP_ADDRESS_OF_VM

NOTE: To make our lives easier we could create a Host profile in .ssh/config

Host covid-project
    User cncPomper
    IdentityFile c:/Users/MS_USERNAME/.ssh/covid_project_gcp or ~/.ssh/covid_project_gcp if on linux

Installing needed packages

  1. Install Anaconda

Run .bashrc (If you decided to run conda init during installation)

source .bashrc
  1. Install docker
sudo apt-get update
sudo apt-get install

Follow this instruction in order to run docker on VM without sudo permission

Test if docker installed succesfully

docker run hello-world

Now we need to setup docker compose

mkdir bin
cd bin
wget -O docker-compose

Now in ~/bin folder we need to make the downloaded package executable

chmod +x docker-compose

Add docker-compose to PATH:

  • add at the end of .bashrc file the following
export PATH="${HOME}/bin:${PATH}"
  • 'refresh' .bashrc by running
source .bashrc

Now to check if everything works run

docker-compose version

To check running containers

docker ps
  1. Install terraform
wget -O- | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform

Configuring GCP for terraform

  1. Terraform setup
  • Go to IAM & Admin > Service Accounts

  • Click on the top button CREATE SERVICE ACCOUNT

    • name : covid-project
    • service account access :
      • Cloud Storage > Storage Admin
      • BigQuery > BigQuery Admin
      • Compute Engine > Compute Admin
  • Go to IAM & Admin > Service Accounts > Service account you just created > Manage keys

    • Add key > Create new Json key (This will download the key on system)
  • Create directory for GCP keys

mkdir keys
cd keys
  • Put downloaded key to keys folder


  1. Run
terraform init
terraform plan

Run to create resources in the cloud

terraform apply

Destroy resources configured by terraform

terraform destroy

Download data

Probably the most convienent way of download this particular dataset is by manually downloading it from kaggle and then:

  • unzip in the /data directory

The rest of the content

Columns description


I have used data from this dataset