Skip to content

Commit

Permalink
Feature/25 integrate notebooks into docu (#149)
Browse files Browse the repository at this point in the history
* Update dependencies, pre-commit, and add pandoc (instructions) #25

* Update docu creation and deployment workflow for nbsphinx #25

* Update headings for compatibility with nbsphinx/ docu rendering #25

* Incorporate coderabbit suggestions #25

* Update dependencies to fix safety issues for jupyterlab and notebook (not jinja) #25

* Rename notebooks, clean-up headings #25

* Attempt to fix case-sensitivity problem by renaming twice #25

* Attempt to fix case-sensitivity problem by renaming twice #25
  • Loading branch information
MarcoHuebner authored and pmayd committed Oct 21, 2024
1 parent 88a75f1 commit fbb870a
Show file tree
Hide file tree
Showing 13 changed files with 1,516 additions and 1,232 deletions.
13 changes: 13 additions & 0 deletions .github/workflows/deploy-docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,26 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install Pandoc
run: |
sudo apt-get update
sudo apt-get install -y pandoc
- name: Run poetry image
uses: abatilo/[email protected]
with:
poetry-version: ${{ vars.POETRY_VERSION }}
- name: Install dependencies
run: |
poetry install --with dev
- name: Remove existing nb directory
run: |
if [ -d "docs/source/nb" ]; then
rm -rf docs/source/nb
fi
- name: Copy Notebook to docs
run: |
mkdir -p docs/source/nb
cp -r nb/. docs/source/nb/
- name: Build docs
run: |
cd docs
Expand Down
16 changes: 0 additions & 16 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,22 +28,6 @@ repos:
files: ^\.github/workflows/
types: [yaml]
args: ["--schemafile", "https://json.schemastore.org/github-workflow"]
- repo: https://github.com/thclark/pre-commit-sphinx
rev: 0.0.1
hooks:
- id: build-docs
name: "Check if documentation compiles"
args:
[
"--cache-dir",
"docs/build/doctrees",
"--html-dir",
"docs/build/html",
"--source-dir",
"docs/source",
]
language_version: python3
additional_dependencies: [myst-parser]
- repo: https://github.com/Lucas-C/pre-commit-hooks-safety
rev: v1.3.3
hooks:
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ To learn more about `poetry`, see [Dependency Management With Python Poetry](htt

### Documentation process

Documentation can also be built locally by running
Documentation can also be built locally by ensuring that [pandoc is installed](https://pandoc.org/installing.html), e.g. via `conda install pandoc`, and then running

```bash
cd docs && make clean && make html
Expand Down
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"nbsphinx",
"myst_parser",
"sphinx.ext.autodoc",
"sphinx.ext.autosummary", # used to generate overview tables
Expand All @@ -36,6 +37,7 @@

templates_path = ["_templates"]
exclude_patterns = []
nbsphinx_execute = "never"


# -- Options for HTML output -------------------------------------------------
Expand Down
11 changes: 11 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,17 @@ pystatis
contribute
license

.. toctree::
:maxdepth: 2
:caption: Notebooks

nb/00_setup
nb/01_table
nb/02_geo_visualization_int_students_germany
nb/03_find
nb/04_jobs
nb/05_presentation

.. toctree::
:maxdepth: 2
:caption: Modules
Expand Down
2 changes: 1 addition & 1 deletion nb/00_Setup.ipynb → nb/00_setup.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down
4 changes: 2 additions & 2 deletions nb/01_table.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
"id": "8e14f4db",
"metadata": {},
"source": [
"# The `Table` class\n",
"# The `Table` Class\n",
"\n",
"The `Table` class in `pystatis` is the main interface for users to interact with the different databases and download the data/tables in form of `pandas` `DataFrames`."
]
Expand Down Expand Up @@ -2388,7 +2388,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.4"
"version": "3.11.9"
},
"vscode": {
"interpreter": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Geo-Visualization Example Notebook\n",
"Geovisualization of Data on the Map of Germany\n",
"==============================================\n",
"\n",
"Welcome to the `Geo-Visualization Example` notebook! This notebook is designed to guide you through the process of visualizing geographical data from the Regionalstatistik database using Python and pystatis as API wrapper.\n",
"\n",
"## Libraries Overview\n",
"Libraries Overview\n",
"------------------\n",
"\n",
"In this notebook, we will require the following additional libraries:\n",
"\n",
"- GeoPandas: An open-source project that makes working with geospatial data in python easier. It extends the datatypes used by pandas to allow spatial operations on geometric types. GeoPandas enables us to work with geospatial data in Python similarly to how we work with pandas for regular data.\n",
Expand All @@ -35,7 +38,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Import Required Libraries"
"### Import Required Libraries\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
Expand All @@ -56,7 +60,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualization on the Level of Bundesländer\n",
"Visualization on the Level of Bundesländer\n",
"------------------------------------------\n",
"\n",
"In this first example, we will visualize the ratio of international students among students on the level of the Bundesländer. We will use the table with code `21311-01-01-4` from the Regionalstatistik API for the student data and `12411-01-01-4` for the population data. You can find the data by either search on the website or use the `Find` class which we also provide in `pystatis` to skim through the available data.\n",
"\n",
Expand Down Expand Up @@ -245,7 +250,13 @@
"metadata": {},
"source": [
"### Load Regionalstatistik Data\n",
"\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To now fill the map with our tables of interest, we need to query the data from the Regionalstatistik API. We will use our the `pystatis` library - more specifically the `Table` class - to query the data."
]
},
Expand Down Expand Up @@ -293,7 +304,13 @@
"metadata": {},
"source": [
"### Process Students Data\n",
"\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To determine the ratio of international students among students per year and region we need to first filter the data for the relevant columns. We will then merge the two tables and calculate the ratio of international students among students."
]
},
Expand Down Expand Up @@ -436,10 +453,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The ratio is now calculated and grouped by `Kreise and kreisfreie Städte` (districts and urban districts) as well as further parameters and can be visualized on the map of Germany. The missing data for `Aachen, Kreis` will be discussed later.\n",
"\n",
"The ratio is now calculated and grouped by `Kreise and kreisfreie Städte` (districts and urban districts) as well as further parameters and can be visualized on the map of Germany. The missing data for `Aachen, Kreis` will be discussed later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plot the Development of International Student Ratio for All Bundesländer\n",
"\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we do this, we will first convert the grouped data into a DataFrame and re-sort the data by year to have a look at the time development for individual Bundesländer first. Lastly, we merge the DataFrame with international student ratios with the geopandas DataFrame to visualize the data on the map."
]
},
Expand Down Expand Up @@ -648,7 +676,13 @@
"metadata": {},
"source": [
"### Plot the Development of International Student Ratio on the Map of Germany\n",
"\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As mentioned before, we merge the DataFrame now with international student ratios with the geopandas DataFrame to visualize the data on the map of Germany."
]
},
Expand Down Expand Up @@ -725,7 +759,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualization on the Level of Landkreise\n",
"Visualization on the Level of Landkreise\n",
"----------------------------------------\n",
"\n",
"In this second example, we will visualize the ratio of international students among students on the level of individual Landkreise. For this, we additionally need to load the map of Germany which outlines the individual Landkreise."
]
Expand Down Expand Up @@ -846,7 +881,13 @@
"metadata": {},
"source": [
"### Process Students Data\n",
"\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can re-determine the ratio of international students among students per year and region. We will first look again at specific regions to see the time development of the ratio of international students among students before we then merge the DataFrame with the geopandas DataFrame to visualize the data on the map."
]
},
Expand Down Expand Up @@ -1021,7 +1062,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plot the Development of International Student Ratio for Köln and Aachen"
"### Plot the Development of International Student Ratio for Köln and Aachen\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
Expand Down Expand Up @@ -1252,15 +1294,27 @@
"Having looked at the time development of the ratio of international students among students for specific regions shows a continues increase in the ratio of international students among students - in different strengths. However, it also shows that for example for `Aachen, Kreis`, there is no data available for all years in question.\n",
"\n",
"### Investigating Missing Data and the Data Quality Parameter\n",
"\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A more detailed investigation of the `quality` parameter of the data would be necessary to potentially determine the reason for the missing data. While this is in principle supported by the API via `quality=\"on\"`, regionalstatistik is the only of the three GENESIS databases to not actively support this. As a workaround, the website can be used to determine potential quality parameters of the data.\n",
"\n",
"Looking at the data on the [website](https://www.regionalstatistik.de/genesis/online?operation=ergebnistabelleUmfang&levelindex=3&levelid=1719518083070&downloadname=21311-01-01-4#abreadcrumb) reveals that there are indeed no values for `Aachen, Kreis` (more specifically, \"-\" means \"nichts vorhanden\"), while the data for `Aachen, kreisfreie Stadt` is unknown or to be kept secret (\".\" means \"Zahlenwert unbekannt oder geheimzuhalten\").\n",
"\n",
"(Explanation of legend [here](https://www.regionalstatistik.de/genesis/online?operation=ergebnistabelleQualitaet&language=de&levelindex=3&levelid=1719518083070#abreadcrumb))\n",
"\n",
"### Plot the Development of International Student Ratio on the Map of Germany With Finer Granularity\n",
"\n",
"~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As before we merge the DataFrame with international student ratios with the geopandas DataFrame to visualize the data on the map of Germany. However, this time we will visualize the data on the level of individual Landkreise."
]
},
Expand Down Expand Up @@ -1379,7 +1433,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.6"
"version": "3.11.9"
}
},
"nbformat": 4,
Expand Down
3 changes: 2 additions & 1 deletion nb/03_find.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Finding the right tables\n",
"Finding the Right Tables\n",
"========================\n",
"\n",
"Suppose you want to search for interesting tables about a certain project. `pystatis` offers the `Find` class to search for any piece of information with GENESIS. Behind the scene it's using the `find` endpoint."
]
Expand Down
3 changes: 2 additions & 1 deletion nb/04_jobs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@
}
},
"source": [
"# Jobs\n",
"Jobs: Loading Large Tables\n",
"==========================\n",
"\n",
"Some tables from the database are quite large and the API provides them in a different way:\n",
"1. The standard request is rejected with code 98\n",
Expand Down
Loading

0 comments on commit fbb870a

Please sign in to comment.