Complete README file description.

MarenBermudezBoeckle · MarenBermudezBoeckle · commit 027808d76c25 · 2025-03-03T11:16:52.000+01:00
diff --git a/README.md b/README.md
@@ -1,91 +1,155 @@
 [![Review Assignment Due Date](https://classroom.github.com/assets/deadline-readme-button-22041afd0340ce965d47ae6ef1cefeee28c7c493a6346c4f15d667ab976d596c.svg)](https://classroom.github.com/a/RN_okVXh)
 
-# Final Project Proposal
+# Final Project Maren Bermúdez: Analysis of the Chilean National Urban Citizen Security Survey
 
-## Overview
+## Description
 
 \<\<\<\<\<\<\< HEAD
 
-For my Final Project, I plan to work with data from Chile's **National Urban Citizen
-Security Survey** (Encuesta Nacional Urbana de Seguridad Ciudadana).
+- This project uses the data from the **National Urban Citizen Security Survey**
+  (Encuesta Nacional Urbana de Seguridad Ciudadana).
+- The data is cleaned (data management part) and then analysed (analysis and final
+  part).
 
 ## Objectives
 
+## Analysis Objective
+
 The primary goal of this analysis is to study:
 
 1. The **perception of insecurity** among the population.
-1. Actual Cases of **violence** and **delinquency**, analyzed by municipalities.
+1. Perception based on municipalities and socioeconomic status.
+1. The increase in perception of insecurity at the neighborhood, country, and commune
+   levels.
+
+## How to run the project
+
+## Downloading the Data
+
+To run the project you have to first get the data. Since it is very heavy it is not
+possible to push the raw data to github. There are two ways for doing this.
+
+1. Download it from
+   https://www.dropbox.com/scl/fo/0oe4pz0epdx9az31s43rt/ACFL6YD4UZk6tIym7caipMU?rlkey=ds6wtw5ehatssgrkuqq29coeu&st=yw25julf&dl=0
+
+1. Download it from the source webpage: https://cead.spd.gov.cl/estudios-y-encuestas/
+   Then filter: in "Tipo Documentos" choose "Encuestas" in "Agrupacion" click Encuesta
+   Nacional Urbana de Seguridad", and in "Año" click 2023. Then click Aplicar and search
+   for "Base de datos ENUSC 2023" and download it. Then put this file into the data
+   folder in src/project_mbb.
+
+## Programs set-up
+
+To set up this project, you first need to install
+[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and
+[Git](https://git-scm.com/downloads). Once those are installed, you can proceed with
+creating and activating the environment.
+
+To set up this project, you first need to install
+[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and
+[Git](https://git-scm.com/downloads). Once those are installed, you can proceed with
+creating and activating the environment.
+
+## Creating and Activating the Environment
+
+Start by navigating to the project's root directory in your terminal, then execute the
+following commands:
+
+```console
+$ mamba env create -f environment.yml
+$ conda activate project_mbb
+```
+
+## Building the Project
+
+The `src` folder contains all the source code necessary to run this project. Files that
+start with the prefix `task_` are `pytask` scripts, which execute when you run the
+following command in the console:
+
+```console
+$ pytask
+```
+
+The `tests` folder includes test scripts that check the functionality of the functions
+defined in the source code. To run them, type:
+
+```console
+$ pytest
+```
+
+It is important to run `pytask` and then `pytest`, such that the tests for the plots
+work.
+
+If you encounter any issues, refer to the sections **"Preparing your system"** and
+**"How to get started on a second machine"** in this
+[website](https://econ-project-templates.readthedocs.io/en/stable/getting_started/index.html#preparing-your-system),
+which is based on the template used for this project.
+
+## Project Structure
+
+The project is structured into three parts.
 
-## Scope
+1. Data Cleaning
+1. Data Analysis
+1. Final Plots
 
-Depending on the available time and workload, I aim to analyze data for one or more
-years to observe trends and patterns.
+The results for this three parts will be found in the BLD folder after running the
+project. This folder can be safely deleted every time before running it again.
 
-## Potential Deliverables
+# Cleaning Part Description
 
-- Statistical insights into public perception of security.
-- Visualizations showing the distribution of violence and delinquency across
-  municipalities.
-- Comparative analysis across different years (if time allows). ======= For my Final
-  Project, I plan to work with data from Chile's **National Urban Citizen Security
-  Survey** (Encuesta Nacional Urbana de Seguridad Ciudadana).
+The data cleaning process was designed to be flexible, allowing easy adaptation for
+adding new survey variables. This is achieved by modifying the dictionaries in the
+`parameters.py` file rather than directly hardcoding specific variables, making the
+approach different from the data management structure used in class.
 
-## Cleaning Part description
+To maintain flexibility and readability, some parts of the cleaning process were split
+into two sections.
 
-The cleaning part was done in a way in which one can easily adapt the code such that one
-can add other variables of the survey to the survey to the different dictionaries in the
-parampeters.py file, this is why it does not directly follow exactly the same structure
-as in class where we used specific variables.
+## Structure of the Cleaning Process
 
-Therefore some parts of the cleaning where split up in two parts such that the code
-remains flexible. Since you told us in the assignment that it is fine for readability
-purposes.
+The cleaning process is divided into two files:
 
-The cleaning has one file for the labels dataset and one for the actual suvery data. For
-the survey data the data was
+1. **Labels Dataset** - Handles label-related data.
+1. **Survey Data** - Processes the actual survey responses.
 
-- filtered,
+For the **survey data**, the following steps were taken:
 
-- renamed and
+### 1. Filtering, Renaming, and Mapping
 
-- mapped: for the columns for actual answers different than yes or no, the data was
-  mapped to the actual values of the answers of the survey. Then if other variables want
-  to be added, this can be added to the map_category dictionary.
+- The data was **filtered** to retain relevant responses.
+- Column names were **renamed** for clarity.
+- Responses that were not simple **"yes" or "no"** were **mapped** to their actual
+  values from the survey.
+  - If additional variables need to be included, they should be added to the
+    `map_category` dictionary.
 
-- In a later step the missing where replaced for all variables such that the replacement
-  is consistent: for this the distinction of the variables that where already mapped and
-  the ones that weren't is important, such that we can transfer the values first to a
-  correct data type before replacing the missing without losing observations.
+### 2. Handling Missing Values
 
-- At the end the variables that weren't mapped are transformed to a correct data type.
-  If more variables that don't require to be mapped want to be added they can be added
-  to the floats, integers, categories or strings dictionaries in parameters.py if
-  necessary.
+- Missing values were replaced in a structured way to ensure consistency.
+- A distinction was made between **mapped** and **non-mapped** variables to:
+  1. First convert them to the correct data type.
+  1. Then replace missing values **without losing observations**.
 
-- Notes, it is important to run pytask and then pytest, such that the tests for the
-  plots work-
+### 3. Data Type Transformation
 
-- first data in csv beacuase it is faster
+- Variables that were not mapped were converted into appropriate data types.
+- If new variables need to be added without mapping, they can be assigned to the
+  respective dictionaries (`floats`, `integers`, `categories`, or `strings`) in
+  `parameters.py`.
 
-- explicitly explain how to run your project. This is crucial for the final project! Do
-  not refer external websites. You should add a couple of lines explaining what commands
-  to execute to get the project to run (e.g. "mamba env create -f environment.yml",
-  "mamba activate ...", "pytask")
+### 4. Optimizing Storage
 
-# Important Points
+- The cleaned data was first saved in **CSV format** for faster processing.
 
-- \*\* Specify how to download data \*\*
+This structured approach ensures that the cleaning process remains **adaptable,
+readable, and efficient**.
 
-Source data: Always start from the data in the way you obtained it Add a detailed
-description how you got it If possible, include all datasets in a common format
+## Credits
 
-Source code: Include any code that is needed to produce your results Programmes:
-Document all programmes that need to be installed to run your code Automate the
-installation as much as possible with environments
+The template for this project is from
+[econ-project-templates](https://github.com/OpenSourceEconomics/econ-project-templates).
 
-Raw data and source code are under version control Published results are created from
-the main branch with no uncommitted changes Use tags / releases to mark submissions,
-revisions, etc.
+## Contributors
 
-There is a README fi le that documents Your directory structure How to install packages
-How to run your code Docstrings and comments explain the code where necessary
+@MarenBermudezBoeckle
diff --git a/inst/WORDLIST b/inst/WORDLIST
@@ -1,4 +1,5 @@
 "base-usuario-20-enusc-2023.sav"
+fo
 aa
 aA
 AAt
diff --git a/to_dos.md b/to_dos.md
@@ -2,5 +2,4 @@
 - eliminate pytoml that we want
 - create version and eliminate documents later for final version
 - complete README file
-- correr de nuevo al final para ver si lo de documents es un problema en mi caso
-- maybe analysis part with pkl put as pyarrow final type of data?
+- pin the version at the end. with the year of data and that it is for epp

Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,5 @@`
`1`	`1`	`"base-usuario-20-enusc-2023.sav"`
	`2`	`+fo`
`2`	`3`	`aa`
`3`	`4`	`aA`
`4`	`5`	`AAt`