Skip to content

Commit 027808d

Browse files
Complete README file description.
1 parent 7eecffe commit 027808d

File tree

3 files changed

+123
-59
lines changed

3 files changed

+123
-59
lines changed

README.md

Lines changed: 121 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -1,91 +1,155 @@
11
[![Review Assignment Due Date](https://classroom.github.com/assets/deadline-readme-button-22041afd0340ce965d47ae6ef1cefeee28c7c493a6346c4f15d667ab976d596c.svg)](https://classroom.github.com/a/RN_okVXh)
22

3-
# Final Project Proposal
3+
# Final Project Maren Bermúdez: Analysis of the Chilean National Urban Citizen Security Survey
44

5-
## Overview
5+
## Description
66

77
\<\<\<\<\<\<\< HEAD
88

9-
For my Final Project, I plan to work with data from Chile's **National Urban Citizen
10-
Security Survey** (Encuesta Nacional Urbana de Seguridad Ciudadana).
9+
- This project uses the data from the **National Urban Citizen Security Survey**
10+
(Encuesta Nacional Urbana de Seguridad Ciudadana).
11+
- The data is cleaned (data management part) and then analysed (analysis and final
12+
part).
1113

1214
## Objectives
1315

16+
## Analysis Objective
17+
1418
The primary goal of this analysis is to study:
1519

1620
1. The **perception of insecurity** among the population.
17-
1. Actual Cases of **violence** and **delinquency**, analyzed by municipalities.
21+
1. Perception based on municipalities and socioeconomic status.
22+
1. The increase in perception of insecurity at the neighborhood, country, and commune
23+
levels.
24+
25+
## How to run the project
26+
27+
## Downloading the Data
28+
29+
To run the project you have to first get the data. Since it is very heavy it is not
30+
possible to push the raw data to github. There are two ways for doing this.
31+
32+
1. Download it from
33+
https://www.dropbox.com/scl/fo/0oe4pz0epdx9az31s43rt/ACFL6YD4UZk6tIym7caipMU?rlkey=ds6wtw5ehatssgrkuqq29coeu&st=yw25julf&dl=0
34+
35+
1. Download it from the source webpage: https://cead.spd.gov.cl/estudios-y-encuestas/
36+
Then filter: in "Tipo Documentos" choose "Encuestas" in "Agrupacion" click Encuesta
37+
Nacional Urbana de Seguridad", and in "Año" click 2023. Then click Aplicar and search
38+
for "Base de datos ENUSC 2023" and download it. Then put this file into the data
39+
folder in src/project_mbb.
40+
41+
## Programs set-up
42+
43+
To set up this project, you first need to install
44+
[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and
45+
[Git](https://git-scm.com/downloads). Once those are installed, you can proceed with
46+
creating and activating the environment.
47+
48+
To set up this project, you first need to install
49+
[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and
50+
[Git](https://git-scm.com/downloads). Once those are installed, you can proceed with
51+
creating and activating the environment.
52+
53+
## Creating and Activating the Environment
54+
55+
Start by navigating to the project's root directory in your terminal, then execute the
56+
following commands:
57+
58+
```console
59+
$ mamba env create -f environment.yml
60+
$ conda activate project_mbb
61+
```
62+
63+
## Building the Project
64+
65+
The `src` folder contains all the source code necessary to run this project. Files that
66+
start with the prefix `task_` are `pytask` scripts, which execute when you run the
67+
following command in the console:
68+
69+
```console
70+
$ pytask
71+
```
72+
73+
The `tests` folder includes test scripts that check the functionality of the functions
74+
defined in the source code. To run them, type:
75+
76+
```console
77+
$ pytest
78+
```
79+
80+
It is important to run `pytask` and then `pytest`, such that the tests for the plots
81+
work.
82+
83+
If you encounter any issues, refer to the sections **"Preparing your system"** and
84+
**"How to get started on a second machine"** in this
85+
[website](https://econ-project-templates.readthedocs.io/en/stable/getting_started/index.html#preparing-your-system),
86+
which is based on the template used for this project.
87+
88+
## Project Structure
89+
90+
The project is structured into three parts.
1891

19-
## Scope
92+
1. Data Cleaning
93+
1. Data Analysis
94+
1. Final Plots
2095

21-
Depending on the available time and workload, I aim to analyze data for one or more
22-
years to observe trends and patterns.
96+
The results for this three parts will be found in the BLD folder after running the
97+
project. This folder can be safely deleted every time before running it again.
2398

24-
## Potential Deliverables
99+
# Cleaning Part Description
25100

26-
- Statistical insights into public perception of security.
27-
- Visualizations showing the distribution of violence and delinquency across
28-
municipalities.
29-
- Comparative analysis across different years (if time allows). ======= For my Final
30-
Project, I plan to work with data from Chile's **National Urban Citizen Security
31-
Survey** (Encuesta Nacional Urbana de Seguridad Ciudadana).
101+
The data cleaning process was designed to be flexible, allowing easy adaptation for
102+
adding new survey variables. This is achieved by modifying the dictionaries in the
103+
`parameters.py` file rather than directly hardcoding specific variables, making the
104+
approach different from the data management structure used in class.
32105

33-
## Cleaning Part description
106+
To maintain flexibility and readability, some parts of the cleaning process were split
107+
into two sections.
34108

35-
The cleaning part was done in a way in which one can easily adapt the code such that one
36-
can add other variables of the survey to the survey to the different dictionaries in the
37-
parampeters.py file, this is why it does not directly follow exactly the same structure
38-
as in class where we used specific variables.
109+
## Structure of the Cleaning Process
39110

40-
Therefore some parts of the cleaning where split up in two parts such that the code
41-
remains flexible. Since you told us in the assignment that it is fine for readability
42-
purposes.
111+
The cleaning process is divided into two files:
43112

44-
The cleaning has one file for the labels dataset and one for the actual suvery data. For
45-
the survey data the data was
113+
1. **Labels Dataset** - Handles label-related data.
114+
1. **Survey Data** - Processes the actual survey responses.
46115

47-
- filtered,
116+
For the **survey data**, the following steps were taken:
48117

49-
- renamed and
118+
### 1. Filtering, Renaming, and Mapping
50119

51-
- mapped: for the columns for actual answers different than yes or no, the data was
52-
mapped to the actual values of the answers of the survey. Then if other variables want
53-
to be added, this can be added to the map_category dictionary.
120+
- The data was **filtered** to retain relevant responses.
121+
- Column names were **renamed** for clarity.
122+
- Responses that were not simple **"yes" or "no"** were **mapped** to their actual
123+
values from the survey.
124+
- If additional variables need to be included, they should be added to the
125+
`map_category` dictionary.
54126

55-
- In a later step the missing where replaced for all variables such that the replacement
56-
is consistent: for this the distinction of the variables that where already mapped and
57-
the ones that weren't is important, such that we can transfer the values first to a
58-
correct data type before replacing the missing without losing observations.
127+
### 2. Handling Missing Values
59128

60-
- At the end the variables that weren't mapped are transformed to a correct data type.
61-
If more variables that don't require to be mapped want to be added they can be added
62-
to the floats, integers, categories or strings dictionaries in parameters.py if
63-
necessary.
129+
- Missing values were replaced in a structured way to ensure consistency.
130+
- A distinction was made between **mapped** and **non-mapped** variables to:
131+
1. First convert them to the correct data type.
132+
1. Then replace missing values **without losing observations**.
64133

65-
- Notes, it is important to run pytask and then pytest, such that the tests for the
66-
plots work-
134+
### 3. Data Type Transformation
67135

68-
- first data in csv beacuase it is faster
136+
- Variables that were not mapped were converted into appropriate data types.
137+
- If new variables need to be added without mapping, they can be assigned to the
138+
respective dictionaries (`floats`, `integers`, `categories`, or `strings`) in
139+
`parameters.py`.
69140

70-
- explicitly explain how to run your project. This is crucial for the final project! Do
71-
not refer external websites. You should add a couple of lines explaining what commands
72-
to execute to get the project to run (e.g. "mamba env create -f environment.yml",
73-
"mamba activate ...", "pytask")
141+
### 4. Optimizing Storage
74142

75-
# Important Points
143+
- The cleaned data was first saved in **CSV format** for faster processing.
76144

77-
- \*\* Specify how to download data \*\*
145+
This structured approach ensures that the cleaning process remains **adaptable,
146+
readable, and efficient**.
78147

79-
Source data: Always start from the data in the way you obtained it Add a detailed
80-
description how you got it If possible, include all datasets in a common format
148+
## Credits
81149

82-
Source code: Include any code that is needed to produce your results Programmes:
83-
Document all programmes that need to be installed to run your code Automate the
84-
installation as much as possible with environments
150+
The template for this project is from
151+
[econ-project-templates](https://github.com/OpenSourceEconomics/econ-project-templates).
85152

86-
Raw data and source code are under version control Published results are created from
87-
the main branch with no uncommitted changes Use tags / releases to mark submissions,
88-
revisions, etc.
153+
## Contributors
89154

90-
There is a README fi le that documents Your directory structure How to install packages
91-
How to run your code Docstrings and comments explain the code where necessary
155+
@MarenBermudezBoeckle

inst/WORDLIST

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
"base-usuario-20-enusc-2023.sav"
2+
fo
23
aa
34
aA
45
AAt

to_dos.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,4 @@
22
- eliminate pytoml that we want
33
- create version and eliminate documents later for final version
44
- complete README file
5-
- correr de nuevo al final para ver si lo de documents es un problema en mi caso
6-
- maybe analysis part with pkl put as pyarrow final type of data?
5+
- pin the version at the end. with the year of data and that it is for epp

0 commit comments

Comments
 (0)