Skip to content

iame-uni-bonn/final-project-MarenBermudezBoeckle

Repository files navigation

Review Assignment Due Date

Final Project Maren Bermúdez: Analysis of the Chilean National Urban Citizen Security Survey

Description

  • This project uses the data from the National Urban Citizen Security Survey from Chile (Encuesta Nacional Urbana de Seguridad Ciudadana).
  • The data is cleaned (data management part) and then analysed (analysis and final part).

Objectives

The primary goal of this analysis is to study:

  1. The perception of insecurity among the Chilean population.
  2. Perception based on municipalities and socioeconomic status.
  3. The increase in perception of insecurity at the neighborhood, country, and commune levels.

How to run the project

Downloading the Data

To run the project you have to first get the data. Since it is very heavy it is not possible to push the raw data to github. There are two ways for doing this.

  1. Download it from https://www.dropbox.com/scl/fo/0oe4pz0epdx9az31s43rt/ACFL6YD4UZk6tIym7caipMU?rlkey=ds6wtw5ehatssgrkuqq29coeu&st=yw25julf&dl=0

  2. Download it from the source webpage: https://cead.spd.gov.cl/estudios-y-encuestas/ . Then filter: in "Tipo Documentos" choose "Encuestas" in "Agrupacion" click Encuesta Nacional Urbana de Seguridad", and in "Año" click 2023. Then click "Aplicar" and search for "Base de datos ENUSC 2023" and download it.

After completing one of this two ways put the file into the data folder in src/project_mbb.

(Make sure you are not downloading the data as a .zip)

Programs set-up

To set up this project, you first need to install Miniconda and Git. Once those are installed, you can proceed with creating and activating the environment.

Creating and Activating the Environment

Start by navigating to the project's root directory in your terminal, then execute the following commands:

$ mamba env create -f environment.yml
$ conda activate project_mbb

Building the Project

The src folder contains all the source code necessary to run this project. Files that start with the prefix task_ are pytask scripts, which execute when you run the following command in the console, building up the whole project:

$ pytask

The tests folder includes test scripts that check the functionality of the functions defined in the source code. To run them, type:

$ pytest

It is important to run pytask and then pytest, such that the tests for the plots work. (It is normal that tests take up to 3 minutes to run)

If you encounter any issues, refer to the sections "Preparing your system" and "How to get started on a second machine" in this website, which is based on the template used for this project.

Project Structure

The project is structured into three parts.

  1. Data Cleaning
  2. Data Analysis
  3. Final Plots

The results for this three parts will be found in the BLD folder after running the project. This folder can be safely deleted every time before running the project again.

Troubleshooting

If you are on a Windows machine and pytask does not work, you may try

$ conda activate project_mbb
$ pip install kaleido==0.1.0.post1

If Pytest has problems, it may be because you run Pytest before Pytask. Some tests build on the plots. In this case, to test like this was the best way to be sure that everything is working correctly.

Cleaning Part Description (Extra, not mandatory to read)

The data cleaning process was designed to be flexible, allowing easy adaptation for adding new survey variables. This is achieved by modifying the dictionaries in the parameters.py file rather than directly hardcoding specific variables, making the approach different from the data management structure used in class.

To maintain flexibility and readability, some parts of the cleaning process were split into two sections.

Structure of the Cleaning Process

The cleaning process is divided into two files:

  1. Labels Dataset - Handles label-related data.
  2. Survey Data - Processes the actual survey responses.

For the survey data, the following steps were taken:

1. Filtering, Renaming, and Mapping

  • The data was filtered.
  • Column names were renamed for clarity.
  • Responses that were not simple "yes" or "no" were mapped to their actual values from the survey.
    • If additional variables need to be included, they should be added to the map_category dictionary.

2. Handling Missing Values

  • Missing values were replaced in a structured way to ensure consistency.
  • A distinction was made between mapped and non-mapped variables to:
    1. First convert them to the correct data type.
    2. Then replace missing values without losing observations.

3. Data Type Transformation

  • Variables that were not mapped were converted into appropriate data types.
  • If new variables need to be added without mapping, they can be assigned to the respective dictionaries (floats, integers, categories, or strings) in parameters.py.

4. Optimizing Storage

  • The cleaned data was first saved in CSV format for faster processing.

This structured approach ensures that the cleaning process remains adaptable, readable, and efficient.

Credits

The template for this project is from econ-project-templates.

Contributors

@MarenBermudezBoeckle

About

final-project-MarenBermudezBoeckle created by GitHub Classroom

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages