Merge pull request #49 from dccuchile/fix/benchmark_doc

pbadillatorrealba · web-flow · commit bebc71be4716 · 2023-05-05T11:16:44.000-04:00
Fix/benchmark doc
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -4,11 +4,16 @@ formats:
   - epub
   - pdf
 
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.11"
+
+
 sphinx:
   configuration: docs/conf.py
 
 python:
-  version: "3.7"
   install:
     - requirements: requirements.txt
     - requirements: requirements-dev.txt
diff --git a/docs/benchmark/benchmark.rst b/docs/benchmark/benchmark.rst
@@ -519,8 +519,8 @@ supporting the same number of number of word sets).
 
 
 
-2. Fair Embedding Engine
-~~~~~~~~~~~~~~~~~~~~~~~~
+Fair Embedding Engine
+~~~~~~~~~~~~~~~~~~~~~
 
 In the case of Fair Embedding Engine, the WE model is passed in the
 metric instantiation. Then, the output value of the metric is computed
@@ -833,8 +833,8 @@ family vs. career).
         "relatives",
     ]
 
-1. WEFE
-~~~~~~~
+WEFE
+~~~~
 
 WEFE defines a standardized framework for executing bias mitigation
 algorithms based on the scikit-learn fit transform interface.
@@ -968,8 +968,8 @@ methods implemented in the library.
     Repulsion Attraction Neutralization debiased model WEAT evaluation:  0.26007230998948216
 
 
-1. Fair Embedding Engine
-~~~~~~~~~~~~~~~~~~~~~~~~
+Fair Embedding Engine
+~~~~~~~~~~~~~~~~~~~~~
 
 The Fair Embedding Engine (FEE) requires the embedding model to be
 passed during instantiation of the algorithm. It currently does not
@@ -1042,8 +1042,8 @@ interface
 
 
 
-1. Responsibly
-~~~~~~~~~~~~~~
+Responsibly
+~~~~~~~~~~~
 
 In Responsibly the embedding model is provided during the instantiation
 of the ``GenderBiasWe`` class. Definitional pairs cannot be provided by
@@ -1063,8 +1063,8 @@ such as ``twitter-25``.
     gender_bias_we = GenderBiasWE(word2vec)  # instance the GenderBiasWE
     gender_bias_we.debias(neutral_words=targets)  # apply the debias
 
-4. EmbeddingBiasScore
-~~~~~~~~~~~~~~~~~~~~~
+EmbeddingBiasScore
+~~~~~~~~~~~~~~~~~~
 
 The library does not implement mitigation methods, so it is not included
 in this comparison.
@@ -1111,15 +1111,13 @@ SAME             ✖    ✖   ✖           ✔
 Generalized WEAT ✖    ✖   ✖           ✔
 ================ ==== === =========== ===================
 
-The table exclusively focuses on metrics that directly compute from word
-embeddings (WE) using predefined word sets. As a result, it omits
-metrics that are not compatible with the wefe interface such as:
+The table exclusively focuses on metrics that directly compute from word embeddings
+(WE) using predefined word sets. As a result, it omits the following metrics:
 
--  IndirectBias, a metric that accepts as input only two words and the
-   gender direction, previously calculated in a distinct operation.
--  GIPE, PMN, and Proximity Bias, which evaluate WE models before and
-   after debiasing with auxiliary mitigation methods.
--  SemBias, which is an analogy evaluation dataset.
+- IndirectBias, a metric that accepts as input only two words and the gender
+  direction, previously calculated in a distinct operation.
+- GIPE, PMN, and Proximity Bias, which evaluate WE models before and after debiasing
+  with auxiliary mitigation methods.
 
 Mitigation algorithms
 ~~~~~~~~~~~~~~~~~~~~~
@@ -1140,47 +1138,15 @@ Conclusion
 The following table summarizes the main differences between the
 libraries analyzed in this benchmark study.
 
-+-------------+-----------+--------------------+------------+---------+
-|             | WEFE      | FEE                | Responsibl | Embeddi |
-|             |           |                    | y          | ngBiasS |
-|             |           |                    |            | cores   |
-+=============+===========+====================+============+=========+
-| Implemented | 7         | 7                  | 3          | 6       |
-| Metrics     |           |                    |            |         |
-+-------------+-----------+--------------------+------------+---------+
-| Implemented | 5         | 3                  | 1          | 0       |
-| Mitigation  |           |                    |            |         |
-| Algorithms  |           |                    |            |         |
-+-------------+-----------+--------------------+------------+---------+
-| Extensible  | Easy      | Easy               | Difficult, | Easy    |
-|             |           |                    | not very   |         |
-|             |           |                    | modular.   |         |
-+-------------+-----------+--------------------+------------+---------+
-| Well-define | ✔         | ✖                  | ✖          | ✔       |
-| d           |           |                    |            |         |
-| interface   |           |                    |            |         |
-| for metrics |           |                    |            |         |
-+-------------+-----------+--------------------+------------+---------+
-| Well-define | ✔         | ✖                  | ✖          | ✖       |
-| d           |           |                    |            |         |
-| interface   |           |                    |            |         |
-| for         |           |                    |            |         |
-| mitigation  |           |                    |            |         |
-| algorithms  |           |                    |            |         |
-+-------------+-----------+--------------------+------------+---------+
-| Lastest     | January   | October 2020       | April 2021 | April   |
-| update      | 2023      |                    |            | 2023    |
-+-------------+-----------+--------------------+------------+---------+
-| Installatio | Easy: pip | No instructions.   | Only with  | Only    |
-| n           | or conda  | It can be          | pip.       | from    |
-|             |           | installed from the | Presents   | the     |
-|             |           | repository         | problems   | reposit |
-|             |           |                    |            | ory     |
-+-------------+-----------+--------------------+------------+---------+
-| Documentati | Extensive | Almost no          | Limited    | No      |
-| on          | documenta | documentation      | documentat | documen |
-|             | tion      |                    | ion        | tation, |
-|             | with      |                    | with some  | only    |
-|             | examples  |                    | examples   | example |
-|             |           |                    |            | s.      |
-+-------------+-----------+--------------------+------------+---------+
+ ==================================================== ========================================= ========================================================== ========================================== ====================================
+ Item                                                 WEFE                                      FEE                                                        Responsibly                                EmbeddingBiasScores
+ ==================================================== ========================================= ========================================================== ========================================== ====================================
+  Implemented   Metrics                                7                                         7                                                          3                                          6
+  Implemented   Mitigation Algorithms                  5                                         3                                                          1                                          0
+  Extensible                                           Easy                                      Easy                                                       Difficult,   not very modular.             Easy
+  Well-defined   interface for metrics                 ✔                                         ✖                                                          ✖                                          ✔
+  Well-defined   interface for mitigation algorithms   ✔                                         ✖                                                          ✖                                          ✖
+  Lastest update                                       January 2023                              October 2020                                               April 2021                                 April 2023
+  Installation                                         Easy:   pip or conda                      No instructions. It can be installed from the repository   Only   with pip. Presents problems         Only   from the repository
+  Documentation                                        Extensive   documentation with examples   Almost   no documentation                                  Limited documentation with some examples   No   documentation, only examples.
+ ==================================================== ========================================= ========================================================== ========================================== ====================================
diff --git a/examples/benchmark.ipynb b/examples/benchmark.ipynb
@@ -1491,6 +1491,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -1513,11 +1514,13 @@
     "| SAME             | ✖    | ✖   | ✖           | ✔                   |\n",
     "| Generalized WEAT | ✖    | ✖   | ✖           | ✔                   |\n",
     "\n",
-    "The table exclusively focuses on metrics that directly compute from word embeddings (WE) using predefined word sets. As a result, it omits metrics that are not compatible with the wefe interface such as: \n",
+    "The table exclusively focuses on metrics that directly compute from word embeddings\n",
+    "(WE) using predefined word sets. As a result, it omits the following metrics:\n",
     "\n",
-    "- IndirectBias, a metric that accepts as input only two words and the gender direction, previously calculated in a distinct operation.\n",
-    "- GIPE, PMN, and Proximity Bias, which evaluate WE models before and after debiasing with auxiliary mitigation methods.\n",
-    "- SemBias, which is an analogy evaluation dataset."
+    "- IndirectBias, a metric that accepts as input only two words and the gender\n",
+    "  direction, previously calculated in a distinct operation.\n",
+    "- GIPE, PMN, and Proximity Bias, which evaluate WE models before and after debiasing\n",
+    "  with auxiliary mitigation methods."
    ]
   },
   {
diff --git a/requirements-dev.txt b/requirements-dev.txt
@@ -1,16 +1,17 @@
 pytest>=7.0.0
 pytest-cov==3.0.0
-coverage==6.4.2
+coverage==7.2.5
 # flake8==5.0.4
-black==22.6.0
-isort==5.10.1
-mypy==0.812
-Sphinx==5.0.2
-sphinx-gallery==0.11.1
-sphinx-rtd-theme==1.0.0
-sphinx-copybutton==0.5.0
+urllib3==1.26.15
+black==23.3.0
+isort==5.11.5
+mypy==1.2.0
+Sphinx==5.3.0
+sphinx-gallery==0.13.0
+sphinx-rtd-theme==1.2.0
+sphinx-copybutton==0.5.2
 numpydoc==1.5.0
-docutils==0.16
+docutils==0.18
 torch==1.13.1
 ipython==7.34.0
-ruff==0.0.194
+ruff==0.0.264