Skip to content

Commit ff9aec6

Browse files
Add docstrings.
1 parent 4c0d2a2 commit ff9aec6

File tree

16 files changed

+226
-130
lines changed

16 files changed

+226
-130
lines changed

.DS_Store

-2 KB
Binary file not shown.

README.md

Lines changed: 12 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@
44

55
# Description
66

7-
- This project uses the data from the **National Urban Citizen Security Survey**
8-
(Encuesta Nacional Urbana de Seguridad Ciudadana).
7+
- This project uses the data from the **National Urban Citizen Security Survey** from
8+
Chile (Encuesta Nacional Urbana de Seguridad Ciudadana).
99
- The data is cleaned (data management part) and then analysed (analysis and final
1010
part).
1111

1212
# Objectives
1313

1414
The primary goal of this analysis is to study:
1515

16-
1. The **perception of insecurity** among the population.
16+
1. The **perception of insecurity** among the Chilean population.
1717
1. Perception based on municipalities and socioeconomic status.
1818
1. The increase in perception of insecurity at the neighborhood, country, and commune
1919
levels.
@@ -28,18 +28,15 @@ possible to push the raw data to github. There are two ways for doing this.
2828
1. Download it from
2929
https://www.dropbox.com/scl/fo/0oe4pz0epdx9az31s43rt/ACFL6YD4UZk6tIym7caipMU?rlkey=ds6wtw5ehatssgrkuqq29coeu&st=yw25julf&dl=0
3030

31-
1. Download it from the source webpage: https://cead.spd.gov.cl/estudios-y-encuestas/
31+
1. Download it from the source webpage: https://cead.spd.gov.cl/estudios-y-encuestas/ .
3232
Then filter: in "Tipo Documentos" choose "Encuestas" in "Agrupacion" click Encuesta
33-
Nacional Urbana de Seguridad", and in "Año" click 2023. Then click Aplicar and search
34-
for "Base de datos ENUSC 2023" and download it. Then put this file into the data
35-
folder in src/project_mbb.
33+
Nacional Urbana de Seguridad", and in "Año" click 2023. Then click "Aplicar" and
34+
search for "Base de datos ENUSC 2023" and download it.
3635

37-
## Programs set-up
36+
After completing one of this two ways put the file into the data folder in
37+
src/project_mbb.
3838

39-
To set up this project, you first need to install
40-
[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and
41-
[Git](https://git-scm.com/downloads). Once those are installed, you can proceed with
42-
creating and activating the environment.
39+
## Programs set-up
4340

4441
To set up this project, you first need to install
4542
[Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) and
@@ -60,7 +57,7 @@ $ conda activate project_mbb
6057

6158
The `src` folder contains all the source code necessary to run this project. Files that
6259
start with the prefix `task_` are `pytask` scripts, which execute when you run the
63-
following command in the console:
60+
following command in the console, building up the whole project:
6461

6562
```console
6663
$ pytask
@@ -90,7 +87,7 @@ The project is structured into three parts.
9087
1. Final Plots
9188

9289
The results for this three parts will be found in the BLD folder after running the
93-
project. This folder can be safely deleted every time before running it again.
90+
project. This folder can be safely deleted every time before running the project again.
9491

9592
# Cleaning Part Description
9693

@@ -113,7 +110,7 @@ For the **survey data**, the following steps were taken:
113110

114111
### 1. Filtering, Renaming, and Mapping
115112

116-
- The data was **filtered** to retain relevant responses.
113+
- The data was **filtered**.
117114
- Column names were **renamed** for clarity.
118115
- Responses that were not simple **"yes" or "no"** were **mapped** to their actual
119116
values from the survey.

pyproject.toml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,8 +93,6 @@ extend-ignore = [
9393
"RET504", # Don't force to calculate upon return
9494
"S101", # Use of `assert` detected.
9595
"S301", # pickle module is unsafe
96-
"ARG001", # Unused function MB
97-
"ERA001", # commented MB
9896
"TRY003", # Messages outside exception MB
9997
"D415", # First line should end with a period, question mark, or exclamation MB
10098
]

src/project_mbb/analysis/desc_analysis.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,17 @@
88

99

1010
def calculate_perception_general(enusc_clean):
11+
"""Calculates the general perception of crime increase at different
12+
13+
geographic levels.
14+
15+
Args:
16+
enusc_clean (pd.DataFrame): The cleaned ENUSC dataset.
17+
18+
Returns:
19+
pd.DataFrame: A dataframe containing the percentage distribution of responses
20+
for crime perception at the national, commune, and neighborhood levels.
21+
"""
1122
perception_columns = [
1223
"crime_increase_perception_nation",
1324
"crime_increase_perception_commune",
@@ -30,6 +41,16 @@ def calculate_perception_general(enusc_clean):
3041

3142

3243
def calculate_perception_by_commune(enusc_clean):
44+
"""Calculates crime perception percentages for each commune.
45+
46+
Args:
47+
enusc_clean (pd.DataFrame): The cleaned ENUSC dataset, including a 'commune'
48+
column.
49+
50+
Returns:
51+
pd.DataFrame: A dataframe containing the percentage distribution of responses
52+
for crime perception at different geographic levels, grouped by commune.
53+
"""
3354
_fail_if_no_total_communes(enusc_clean, commune_mapping)
3455

3556
perception_columns = [
@@ -70,6 +91,16 @@ def calculate_perception_by_commune(enusc_clean):
7091

7192

7293
def calculate_perception_by_ses(enusc_clean):
94+
"""Calculates crime perception percentages by socioeconomic status.
95+
96+
Args:
97+
enusc_clean (pd.DataFrame): The cleaned ENUSC dataset, including a
98+
'socioecon_status' column.
99+
100+
Returns:
101+
pd.DataFrame: A dataframe with the percentage distribution of crime perception
102+
responses, grouped by socioeconomic status.
103+
"""
73104
_fail_if_ses_not_categorical(enusc_clean)
74105

75106
perception_columns = [

src/project_mbb/analysis/model.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,17 @@
66

77

88
def regression_perception_info(enusc_clean):
9+
"""Performs logistic regression analysis on how information sources influence
10+
11+
crime perception.
12+
13+
Args:
14+
enusc_clean (pd.DataFrame): The cleaned ENUSC dataset.
15+
16+
Returns:
17+
statsmodels.discrete.discrete_model.MNLogit: A fitted multinomial logistic
18+
regression model.
19+
"""
920
enusc_model_pre = _set_category_values(enusc_clean)
1021
enusc_model = _set_binary_for_info_source(enusc_model_pre)
1122
enusc_model_clean = _drop_missing(enusc_model)
@@ -14,6 +25,7 @@ def regression_perception_info(enusc_clean):
1425

1526

1627
def _set_category_values(enusc_clean):
28+
"""Encodes categorical values for perception and information source."""
1729
_fail_if_invalid_categories_perception(enusc_clean, perception_change_mapping)
1830
_fail_if_invalid_categories_source(enusc_clean, info_sources_mapping)
1931

@@ -28,6 +40,7 @@ def _set_category_values(enusc_clean):
2840

2941

3042
def _set_binary_for_info_source(enusc_model):
43+
"""Creates a binary variable for technology-based information sources."""
3144
_fail_if_invalid_category_values(enusc_model, "crime_increase_perception_commune")
3245
_fail_if_invalid_category_values(enusc_model, "crime_info_source_commune")
3346

@@ -40,6 +53,7 @@ def _set_binary_for_info_source(enusc_model):
4053

4154

4255
def _drop_missing(enusc_model):
56+
"""Removes rows with missing values in relevant columns."""
4357
_fail_if_invalid_tech_based_values(enusc_model)
4458

4559
enusc_model_clean = enusc_model[
@@ -50,6 +64,7 @@ def _drop_missing(enusc_model):
5064

5165

5266
def _run_logistic_regression(enusc_model_clean):
67+
"""Fits a multinomial logistic regression model."""
5368
_fail_if_missing_values_after_drop(enusc_model_clean)
5469

5570
x = enusc_model_clean[["tech_based"]]
@@ -97,7 +112,7 @@ def _fail_if_invalid_categories_perception(enusc_clean, perception_change_mappin
97112
raise ValueError(error_msg)
98113

99114

100-
def _fail_if_invalid_categories_source(enusc_clean, info_source_mapping):
115+
def _fail_if_invalid_categories_source(enusc_clean):
101116
"""Raises ValueError if the categories in 'crime_info_source_commune'
102117
103118
are missing.

src/project_mbb/analysis/task_analysis.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,15 @@ def task_perception_general(
1717
enusc_clean=BLD / "data" / "enusc_clean.pkl",
1818
produces=BLD / "analysis" / "perception_general.arrow",
1919
):
20+
"""Computes general crime perception statistics and saves results.
21+
22+
Args:
23+
enusc_clean (str): Path to the cleaned ENUSC dataset (pickle file).
24+
produces (str): Path to save the perception results (Feather format).
25+
26+
Returns:
27+
None (saves file to produces)
28+
"""
2029
enusc_clean = pd.read_pickle(enusc_clean)
2130
perception_results = calculate_perception_general(enusc_clean)
2231

@@ -27,6 +36,15 @@ def task_perception_by_commune(
2736
enusc_clean=BLD / "data" / "enusc_clean.pkl",
2837
produces=BLD / "analysis" / "perception_by_commune.arrow",
2938
):
39+
"""Computes crime perception statistics by commune and saves results.
40+
41+
Args:
42+
enusc_clean (str): Path to the cleaned ENUSC dataset (pickle file).
43+
produces (str): Path to save the perception results by commune (Feather format).
44+
45+
Returns:
46+
None (saves file to produces)
47+
"""
3048
enusc_clean = pd.read_pickle(enusc_clean)
3149
perception_results_commune = calculate_perception_by_commune(enusc_clean)
3250

@@ -37,6 +55,16 @@ def task_perception_by_ses(
3755
enusc_clean=BLD / "data" / "enusc_clean.pkl",
3856
produces=BLD / "analysis" / "perception_by_ses.arrow",
3957
):
58+
"""Computes crime perception statistics by socioeconomic status and saves results.
59+
60+
Args:
61+
enusc_clean (str): Path to the cleaned ENUSC dataset (pickle file).
62+
produces (str): Path to save the perception results by socioeconomic status
63+
(Feather format).
64+
65+
Returns:
66+
None (saves file to produces)
67+
"""
4068
enusc_clean = pd.read_pickle(enusc_clean)
4169
perception_results_ses = calculate_perception_by_ses(enusc_clean)
4270

@@ -47,6 +75,15 @@ def task_regression(
4775
enusc_clean=BLD / "data" / "enusc_clean.pkl",
4876
produces=BLD / "analysis" / "regression_results.txt",
4977
):
78+
"""Performs logistic regression on crime perception and saves the model summary.
79+
80+
Args:
81+
enusc_clean (str): Path to the cleaned ENUSC dataset (pickle file).
82+
produces (str): Path to save the regression model summary (text file).
83+
84+
Returns:
85+
None (saves file to produces)
86+
"""
5087
enusc_clean = pd.read_pickle(enusc_clean)
5188
reg_result = regression_perception_info(enusc_clean)
5289
with produces.open("w") as f:

src/project_mbb/data/.DS_Store

0 Bytes
Binary file not shown.
-157 KB
Binary file not shown.
-165 Bytes
Binary file not shown.

src/project_mbb/data_management/clean_enusc.py

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,16 @@
1616

1717

1818
def clean_enusc(raw_enusc):
19+
"""Cleans and preprocesses the ENUSC dataset by filtering, renaming,
20+
21+
mapping categories, filling missing values, and setting appropriate data types.
22+
23+
Args:
24+
raw_enusc (pd.DataFrame): The raw ENUSC dataset.
25+
26+
Returns:
27+
pd.DataFrame: The cleaned and processed ENUSC dataset.
28+
"""
1929
enusc_filtered = _filter_enusc(raw_enusc, relevant_var)
2030
enusc_renamed = _rename_enusc(enusc_filtered, rename_mapping)
2131
enusc_mapped = _map_categories(enusc_renamed)
@@ -25,20 +35,25 @@ def clean_enusc(raw_enusc):
2535

2636

2737
def _filter_enusc(raw_enusc, relevant_var):
38+
"""Filters the dataset to include only relevant variables."""
2839
_fail_if_not_list(relevant_var)
40+
2941
enusc_filtered = raw_enusc[relevant_var]
3042
return enusc_filtered
3143

3244

3345
def _rename_enusc(enusc_filtered, rename_mapping):
46+
"""Renames columns in the dataset based on the given rename_mapping."""
3447
_fail_if_not_equal_length(enusc_filtered, rename_mapping)
48+
3549
enusc_renamed = enusc_filtered.copy()
3650
enusc_renamed.columns = enusc_renamed.columns.str.lower()
3751
enusc_renamed = enusc_renamed.rename(columns=rename_mapping)
3852
return enusc_renamed
3953

4054

41-
def _map_categories(enusc_renamed):
55+
def _map_categories(enusc_renamed, map_category):
56+
"""Maps categorical values to their corresponding labels in map_category."""
4257
enusc_mapped = enusc_renamed.copy()
4358
for key, value in map_category.items():
4459
if key in enusc_mapped.columns:
@@ -53,6 +68,7 @@ def _map_categories(enusc_renamed):
5368

5469

5570
def _fill_missing(enusc_mapped):
71+
"""Handles missing values by replacing codes with values in replacements."""
5672
_fail_if_not_dataframe(enusc_mapped)
5773
_fail_if_missing_columns(enusc_mapped, categories, "categories")
5874
_fail_if_missing_columns(enusc_mapped, map_category, "map_category")
@@ -88,6 +104,7 @@ def _fill_missing(enusc_mapped):
88104

89105

90106
def _set_data_types_not_mapped_var(enusc_filled):
107+
"""Sets appropriate data types for numeric, categorical, and string variables."""
91108
_fail_if_columns_not_found(enusc_filled, floats)
92109
_fail_if_columns_not_found(enusc_filled, integers)
93110
_fail_if_columns_not_found(enusc_filled, categories)

0 commit comments

Comments
 (0)