|
1 | 1 | [](https://classroom.github.com/a/RN_okVXh) |
2 | 2 |
|
3 | | -# Final Project Proposal |
| 3 | +# Final Project Maren Bermúdez: Analysis of the Chilean National Urban Citizen Security Survey |
4 | 4 |
|
5 | | -## Overview |
| 5 | +## Description |
6 | 6 |
|
7 | 7 | \<\<\<\<\<\<\< HEAD |
8 | 8 |
|
9 | | -For my Final Project, I plan to work with data from Chile's **National Urban Citizen |
10 | | -Security Survey** (Encuesta Nacional Urbana de Seguridad Ciudadana). |
| 9 | +- This project uses the data from the **National Urban Citizen Security Survey** |
| 10 | + (Encuesta Nacional Urbana de Seguridad Ciudadana). |
| 11 | +- The data is cleaned (data management part) and then analysed (analysis and final |
| 12 | + part). |
11 | 13 |
|
12 | 14 | ## Objectives |
13 | 15 |
|
| 16 | +## Analysis Objective |
| 17 | + |
14 | 18 | The primary goal of this analysis is to study: |
15 | 19 |
|
16 | 20 | 1. The **perception of insecurity** among the population. |
17 | | -1. Actual Cases of **violence** and **delinquency**, analyzed by municipalities. |
| 21 | +1. Perception based on municipalities and socioeconomic status. |
| 22 | +1. The increase in perception of insecurity at the neighborhood, country, and commune |
| 23 | + levels. |
| 24 | + |
| 25 | +## How to run the project |
| 26 | + |
| 27 | +## Downloading the Data |
| 28 | + |
| 29 | +To run the project you have to first get the data. Since it is very heavy it is not |
| 30 | +possible to push the raw data to github. There are two ways for doing this. |
| 31 | + |
| 32 | +1. Download it from |
| 33 | + https://www.dropbox.com/scl/fo/0oe4pz0epdx9az31s43rt/ACFL6YD4UZk6tIym7caipMU?rlkey=ds6wtw5ehatssgrkuqq29coeu&st=yw25julf&dl=0 |
| 34 | + |
| 35 | +1. Download it from the source webpage: https://cead.spd.gov.cl/estudios-y-encuestas/ |
| 36 | + Then filter: in "Tipo Documentos" choose "Encuestas" in "Agrupacion" click Encuesta |
| 37 | + Nacional Urbana de Seguridad", and in "Año" click 2023. Then click Aplicar and search |
| 38 | + for "Base de datos ENUSC 2023" and download it. Then put this file into the data |
| 39 | + folder in src/project_mbb. |
| 40 | + |
| 41 | +## Programs set-up |
| 42 | + |
| 43 | +To set up this project, you first need to install |
| 44 | +[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and |
| 45 | +[Git](https://git-scm.com/downloads). Once those are installed, you can proceed with |
| 46 | +creating and activating the environment. |
| 47 | + |
| 48 | +To set up this project, you first need to install |
| 49 | +[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and |
| 50 | +[Git](https://git-scm.com/downloads). Once those are installed, you can proceed with |
| 51 | +creating and activating the environment. |
| 52 | + |
| 53 | +## Creating and Activating the Environment |
| 54 | + |
| 55 | +Start by navigating to the project's root directory in your terminal, then execute the |
| 56 | +following commands: |
| 57 | + |
| 58 | +```console |
| 59 | +$ mamba env create -f environment.yml |
| 60 | +$ conda activate project_mbb |
| 61 | +``` |
| 62 | + |
| 63 | +## Building the Project |
| 64 | + |
| 65 | +The `src` folder contains all the source code necessary to run this project. Files that |
| 66 | +start with the prefix `task_` are `pytask` scripts, which execute when you run the |
| 67 | +following command in the console: |
| 68 | + |
| 69 | +```console |
| 70 | +$ pytask |
| 71 | +``` |
| 72 | + |
| 73 | +The `tests` folder includes test scripts that check the functionality of the functions |
| 74 | +defined in the source code. To run them, type: |
| 75 | + |
| 76 | +```console |
| 77 | +$ pytest |
| 78 | +``` |
| 79 | + |
| 80 | +It is important to run `pytask` and then `pytest`, such that the tests for the plots |
| 81 | +work. |
| 82 | + |
| 83 | +If you encounter any issues, refer to the sections **"Preparing your system"** and |
| 84 | +**"How to get started on a second machine"** in this |
| 85 | +[website](https://econ-project-templates.readthedocs.io/en/stable/getting_started/index.html#preparing-your-system), |
| 86 | +which is based on the template used for this project. |
| 87 | + |
| 88 | +## Project Structure |
| 89 | + |
| 90 | +The project is structured into three parts. |
18 | 91 |
|
19 | | -## Scope |
| 92 | +1. Data Cleaning |
| 93 | +1. Data Analysis |
| 94 | +1. Final Plots |
20 | 95 |
|
21 | | -Depending on the available time and workload, I aim to analyze data for one or more |
22 | | -years to observe trends and patterns. |
| 96 | +The results for this three parts will be found in the BLD folder after running the |
| 97 | +project. This folder can be safely deleted every time before running it again. |
23 | 98 |
|
24 | | -## Potential Deliverables |
| 99 | +# Cleaning Part Description |
25 | 100 |
|
26 | | -- Statistical insights into public perception of security. |
27 | | -- Visualizations showing the distribution of violence and delinquency across |
28 | | - municipalities. |
29 | | -- Comparative analysis across different years (if time allows). ======= For my Final |
30 | | - Project, I plan to work with data from Chile's **National Urban Citizen Security |
31 | | - Survey** (Encuesta Nacional Urbana de Seguridad Ciudadana). |
| 101 | +The data cleaning process was designed to be flexible, allowing easy adaptation for |
| 102 | +adding new survey variables. This is achieved by modifying the dictionaries in the |
| 103 | +`parameters.py` file rather than directly hardcoding specific variables, making the |
| 104 | +approach different from the data management structure used in class. |
32 | 105 |
|
33 | | -## Cleaning Part description |
| 106 | +To maintain flexibility and readability, some parts of the cleaning process were split |
| 107 | +into two sections. |
34 | 108 |
|
35 | | -The cleaning part was done in a way in which one can easily adapt the code such that one |
36 | | -can add other variables of the survey to the survey to the different dictionaries in the |
37 | | -parampeters.py file, this is why it does not directly follow exactly the same structure |
38 | | -as in class where we used specific variables. |
| 109 | +## Structure of the Cleaning Process |
39 | 110 |
|
40 | | -Therefore some parts of the cleaning where split up in two parts such that the code |
41 | | -remains flexible. Since you told us in the assignment that it is fine for readability |
42 | | -purposes. |
| 111 | +The cleaning process is divided into two files: |
43 | 112 |
|
44 | | -The cleaning has one file for the labels dataset and one for the actual suvery data. For |
45 | | -the survey data the data was |
| 113 | +1. **Labels Dataset** - Handles label-related data. |
| 114 | +1. **Survey Data** - Processes the actual survey responses. |
46 | 115 |
|
47 | | -- filtered, |
| 116 | +For the **survey data**, the following steps were taken: |
48 | 117 |
|
49 | | -- renamed and |
| 118 | +### 1. Filtering, Renaming, and Mapping |
50 | 119 |
|
51 | | -- mapped: for the columns for actual answers different than yes or no, the data was |
52 | | - mapped to the actual values of the answers of the survey. Then if other variables want |
53 | | - to be added, this can be added to the map_category dictionary. |
| 120 | +- The data was **filtered** to retain relevant responses. |
| 121 | +- Column names were **renamed** for clarity. |
| 122 | +- Responses that were not simple **"yes" or "no"** were **mapped** to their actual |
| 123 | + values from the survey. |
| 124 | + - If additional variables need to be included, they should be added to the |
| 125 | + `map_category` dictionary. |
54 | 126 |
|
55 | | -- In a later step the missing where replaced for all variables such that the replacement |
56 | | - is consistent: for this the distinction of the variables that where already mapped and |
57 | | - the ones that weren't is important, such that we can transfer the values first to a |
58 | | - correct data type before replacing the missing without losing observations. |
| 127 | +### 2. Handling Missing Values |
59 | 128 |
|
60 | | -- At the end the variables that weren't mapped are transformed to a correct data type. |
61 | | - If more variables that don't require to be mapped want to be added they can be added |
62 | | - to the floats, integers, categories or strings dictionaries in parameters.py if |
63 | | - necessary. |
| 129 | +- Missing values were replaced in a structured way to ensure consistency. |
| 130 | +- A distinction was made between **mapped** and **non-mapped** variables to: |
| 131 | + 1. First convert them to the correct data type. |
| 132 | + 1. Then replace missing values **without losing observations**. |
64 | 133 |
|
65 | | -- Notes, it is important to run pytask and then pytest, such that the tests for the |
66 | | - plots work- |
| 134 | +### 3. Data Type Transformation |
67 | 135 |
|
68 | | -- first data in csv beacuase it is faster |
| 136 | +- Variables that were not mapped were converted into appropriate data types. |
| 137 | +- If new variables need to be added without mapping, they can be assigned to the |
| 138 | + respective dictionaries (`floats`, `integers`, `categories`, or `strings`) in |
| 139 | + `parameters.py`. |
69 | 140 |
|
70 | | -- explicitly explain how to run your project. This is crucial for the final project! Do |
71 | | - not refer external websites. You should add a couple of lines explaining what commands |
72 | | - to execute to get the project to run (e.g. "mamba env create -f environment.yml", |
73 | | - "mamba activate ...", "pytask") |
| 141 | +### 4. Optimizing Storage |
74 | 142 |
|
75 | | -# Important Points |
| 143 | +- The cleaned data was first saved in **CSV format** for faster processing. |
76 | 144 |
|
77 | | -- \*\* Specify how to download data \*\* |
| 145 | +This structured approach ensures that the cleaning process remains **adaptable, |
| 146 | +readable, and efficient**. |
78 | 147 |
|
79 | | -Source data: Always start from the data in the way you obtained it Add a detailed |
80 | | -description how you got it If possible, include all datasets in a common format |
| 148 | +## Credits |
81 | 149 |
|
82 | | -Source code: Include any code that is needed to produce your results Programmes: |
83 | | -Document all programmes that need to be installed to run your code Automate the |
84 | | -installation as much as possible with environments |
| 150 | +The template for this project is from |
| 151 | +[econ-project-templates](https://github.com/OpenSourceEconomics/econ-project-templates). |
85 | 152 |
|
86 | | -Raw data and source code are under version control Published results are created from |
87 | | -the main branch with no uncommitted changes Use tags / releases to mark submissions, |
88 | | -revisions, etc. |
| 153 | +## Contributors |
89 | 154 |
|
90 | | -There is a README fi le that documents Your directory structure How to install packages |
91 | | -How to run your code Docstrings and comments explain the code where necessary |
| 155 | +@MarenBermudezBoeckle |
0 commit comments