Skip to content

Commit 5a01869

Browse files
authored
update all pre-commit hooks (#167)
* update all pre-commit hooks to via autoupdate * switch back to 06907d0 revision for docformatter
1 parent 9d8030d commit 5a01869

File tree

11 files changed

+63
-65
lines changed

11 files changed

+63
-65
lines changed

.pre-commit-config.yaml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ exclude: '^tests/fixtures/.*|^data/.*'
55

66
repos:
77
- repo: https://github.com/pre-commit/pre-commit-hooks
8-
rev: v4.4.0
8+
rev: v5.0.0
99
hooks:
1010
# list of supported hooks: https://pre-commit.com/hooks.html
1111
- id: trailing-whitespace
@@ -23,21 +23,21 @@ repos:
2323

2424
# python code formatting
2525
- repo: https://github.com/psf/black
26-
rev: 23.7.0
26+
rev: 24.10.0
2727
hooks:
2828
- id: black
2929
args: [--line-length, "99"]
3030

3131
# python import sorting
3232
- repo: https://github.com/PyCQA/isort
33-
rev: 5.12.0
33+
rev: 5.13.2
3434
hooks:
3535
- id: isort
3636
args: ["--profile", "black", "--filter-files"]
3737

3838
# python upgrading syntax to newer version
3939
- repo: https://github.com/asottile/pyupgrade
40-
rev: v3.9.0
40+
rev: v3.19.1
4141
hooks:
4242
- id: pyupgrade
4343
args: [--py38-plus]
@@ -53,10 +53,10 @@ repos:
5353

5454
# python check (PEP8), programming errors and code complexity
5555
- repo: https://github.com/PyCQA/flake8
56-
rev: 6.0.0
56+
rev: 7.1.1
5757
hooks:
5858
- id: flake8
59-
args: ["--ignore", "E501,F401,F841,W503,E203", "--extend-select", "W504", "--exclude", "logs/*"]
59+
args: ["--ignore", "E501,F401,F841,W503,E203,E704", "--extend-select", "W504", "--exclude", "logs/*"]
6060

6161
# python security linter
6262
# - repo: https://github.com/PyCQA/bandit
@@ -68,7 +68,7 @@ repos:
6868

6969
# md formatting
7070
- repo: https://github.com/executablebooks/mdformat
71-
rev: 0.7.16
71+
rev: 0.7.21
7272
hooks:
7373
- id: mdformat
7474
args: ["--number"]
@@ -81,7 +81,7 @@ repos:
8181

8282
# word spelling linter
8383
- repo: https://github.com/codespell-project/codespell
84-
rev: v2.2.5
84+
rev: v2.3.0
8585
hooks:
8686
- id: codespell
8787
args:
@@ -92,7 +92,7 @@ repos:
9292

9393
# python static type checking
9494
- repo: https://github.com/pre-commit/mirrors-mypy
95-
rev: v1.4.1
95+
rev: v1.14.1
9696
hooks:
9797
- id: mypy
9898
files: src

dataset_builders/pie/aae2/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,11 @@ See [PIE-Brat Data Schema](https://huggingface.co/datasets/pie/brat#data-schema)
5353

5454
### Data Splits
5555

56-
| Statistics | Train | Test |
57-
| ---------------------------------------------------------------- | -------------------------: | -----------------------: |
58-
| No. of document | 322 | 80 |
59-
| Components <br/>- `MajorClaim`<br/>- `Claim`<br/>- `Premise` | <br/>598<br/>1202<br/>3023 | <br/>153<br/>304<br/>809 |
60-
| Relations\*<br/>- `supports`<br/>- `attacks` | <br/>3820<br/>405 | <br/>1021<br/>92 |
56+
| Statistics | Train | Test |
57+
| ------------------------------------------------------------ | -------------------------: | -----------------------: |
58+
| No. of document | 322 | 80 |
59+
| Components <br/>- `MajorClaim`<br/>- `Claim`<br/>- `Premise` | <br/>598<br/>1202<br/>3023 | <br/>153<br/>304<br/>809 |
60+
| Relations\*<br/>- `supports`<br/>- `attacks` | <br/>3820<br/>405 | <br/>1021<br/>92 |
6161

6262
\* included all relations between claims and premises and all claim attributions.
6363

@@ -90,7 +90,7 @@ See further statistics in Stab & Gurevych (2017), p. 650, Table A.1.
9090

9191
See further description in Stab & Gurevych 2017, p.627 and the [annotation guideline](https://github.com/ArneBinder/pie-datasets/blob/db94035602610cefca2b1678aa2fe4455c96155d/data/datasets/ArgumentAnnotatedEssays-2.0/guideline.pdf).
9292

93-
**Note that** relations between `MajorClaim` and `Claim` were not annotated; however, each claim is annotated with an `Attribute` annotation with value `for` or `against` - which indicates the relation between itself and `MajorClaim`. In addition, when two non-related `Claim` 's appear in one paragraph, there is also no relations to one another. An example of a document is shown here below.
93+
**Note that** relations between `MajorClaim` and `Claim` were not annotated; however, each claim is annotated with an `Attribute` annotation with value `for` or `against` - which indicates the relation between itself and `MajorClaim`. In addition, when two non-related `Claim` 's appear in one paragraph, there is also no relations to one another. An example of a document is shown here below.
9494

9595
#### Example
9696

@@ -351,7 +351,7 @@ Three non-native speakers; one of the three being an expert annotator.
351351

352352
### Social Impact of Dataset
353353

354-
"\[Computational Argumentation\] have
354+
"[Computational Argumentation] have
355355
broad application potential in various areas such as legal decision support (Mochales-Palau and Moens 2009), information retrieval (Carstens and Toni 2015), policy making (Sardianos et al. 2015), and debating technologies (Levy et al. 2014; Rinott et al.
356356
2015)." (p. 619)
357357

@@ -366,7 +366,7 @@ The relations between claims and major claims are not explicitly annotated.
366366
"The proportion of non-argumentative text amounts to 47,474 tokens (32.2%) and
367367
1,631 sentences (22.9%). The number of sentences with several argument components
368368
is 583, of which 302 include several components with different types (e.g., a claim followed by premise)...
369-
\[T\]he identification of argument components requires the
369+
[T]he identification of argument components requires the
370370
separation of argumentative from non-argumentative text units and the recognition of
371371
component boundaries at the token level...The proportion of paragraphs with unlinked
372372
argument components (e.g., unsupported claims without incoming relations) is 421

dataset_builders/pie/abstrct/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ The dataset provides document converters for the following target document types
6161
- `LabeledSpans`, converted from `BratDocumentWithMergedSpans`'s `spans`
6262
- labels: `MajorClaim`, `Claim`, `Premise`
6363
- `BinraryRelations`, converted from `BratDocumentWithMergedSpans`'s `relations`
64-
- labels: `Support`, `Partial-Attack`, `Attack`
64+
- labels: `Support`, `Partial-Attack`, `Attack`
6565

6666
See [here](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/documents.py) for the document type definitions.
6767

@@ -93,7 +93,7 @@ Morio et al. ([2022](https://aclanthology.org/2022.tacl-1.37.pdf); p. 642, Table
9393

9494
- `MajorClaim` are more general/concluding `claim`'s, which is supported by more specific claims
9595
- `Claim` is a concluding statement made by the author about the outcome of the study. Claims only points to other claims.
96-
- `Premise` (a.k.a. evidence) is an observation or measurement in the study, which supports or attacks another argument component, usually a `claim`. They are observed facts, and therefore credible without further justifications, as this is the ground truth the argumentation is based on.
96+
- `Premise` (a.k.a. evidence) is an observation or measurement in the study, which supports or attacks another argument component, usually a `claim`. They are observed facts, and therefore credible without further justifications, as this is the ground truth the argumentation is based on.
9797

9898
(Mayer et al. 2020, p.2110)
9999

@@ -354,7 +354,7 @@ python src/evaluate_documents.py dataset=abstrct_base metric=count_text_tokens
354354

355355
### Curation Rationale
356356

357-
"\[D\]espite its natural employment in healthcare applications, only few approaches have applied AM methods to this kind
357+
"[D]espite its natural employment in healthcare applications, only few approaches have applied AM methods to this kind
358358
of text, and their contribution is limited to the detection
359359
of argument components, disregarding the more complex phase of
360360
predicting the relations among them. In addition, no huge annotated
@@ -373,7 +373,7 @@ Extended from the previous dataset in [Mayer et al. 2018](https://webusers.i3s.u
373373

374374
#### Who are the source language producers?
375375

376-
\[More Information Needed\]
376+
[More Information Needed]
377377

378378
### Annotations
379379

@@ -405,7 +405,7 @@ Two annotators with background in computational linguistics. No information was
405405

406406
### Personal and Sensitive Information
407407

408-
\[More Information Needed\]
408+
[More Information Needed]
409409

410410
## Considerations for Using the Data
411411

@@ -426,17 +426,17 @@ scale." (p. 2114)
426426

427427
### Discussion of Biases
428428

429-
\[More Information Needed\]
429+
[More Information Needed]
430430

431431
### Other Known Limitations
432432

433-
\[More Information Needed\]
433+
[More Information Needed]
434434

435435
## Additional Information
436436

437437
### Dataset Curators
438438

439-
\[More Information Needed\]
439+
[More Information Needed]
440440

441441
### Licensing Information
442442

dataset_builders/pie/cdcp/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ The dataset provides document converters for the following target document types
4949
- labels: `fact`, `policy`, `reference`, `testimony`, `value`
5050
- if `propositions` contain whitespace at the beginning and/or the end, the whitespace are trimmed out.
5151
- `binary_relations`: `BinaryRelation` annotations, converted from `CDCPDocument`'s `relations`
52-
- labels: `reason`, `evidence`
52+
- labels: `reason`, `evidence`
5353

5454
See [here](https://github.com/ChristophAlt/pytorch-ie/blob/main/src/pytorch_ie/documents.py) for the document type
5555
definitions.

dataset_builders/pie/conll2012_ontonotesv5/conll2012_ontonotesv5.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -447,9 +447,9 @@ def _generate_document_kwargs(self, dataset):
447447
pos_tags_feature = dataset.features["sentences"][0]["pos_tags"].feature
448448
return dict(
449449
entity_labels=dataset.features["sentences"][0]["named_entities"].feature,
450-
pos_tag_labels=pos_tags_feature
451-
if isinstance(pos_tags_feature, datasets.ClassLabel)
452-
else None,
450+
pos_tag_labels=(
451+
pos_tags_feature if isinstance(pos_tags_feature, datasets.ClassLabel) else None
452+
),
453453
)
454454

455455
def _generate_document(self, example, **document_kwargs):

dataset_builders/pie/sciarg/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ possibly since [Lauscher et al., 2018](https://aclanthology.org/W18-5206/) prese
155155

156156
- `supports`:
157157
- if the assumed veracity of *b* increases with the veracity of *a*
158-
- "Usually, this relationship exists from data to claim, but in many cases a claim might support another claim. Other combinations are still possible." - (*Annotation Guidelines*, p. 3)
158+
- "Usually, this relationship exists from data to claim, but in many cases a claim might support another claim. Other combinations are still possible." - (*Annotation Guidelines*, p. 3)
159159
- `contradicts`:
160160
- if the assumed veracity of *b* decreases with the veracity of *a*
161161
- It is a **bi-directional**, i.e., symmetric relationship.
@@ -335,15 +335,15 @@ python src/evaluate_documents.py dataset=sciarg_base metric=count_text_tokens
335335

336336
### Curation Rationale
337337

338-
"\[C\]omputational methods for analyzing scientific writing are becoming paramount...there is no publicly available corpus of scientific publications (in English), annotated with fine-grained argumentative structures. ...\[A\]rgumentative structure of scientific publications should not be studied in isolation, but rather in relation to other rhetorical aspects, such as the
338+
"[C]omputational methods for analyzing scientific writing are becoming paramount...there is no publicly available corpus of scientific publications (in English), annotated with fine-grained argumentative structures. ...[A]rgumentative structure of scientific publications should not be studied in isolation, but rather in relation to other rhetorical aspects, such as the
339339
discourse structure.
340340
(Lauscher et al. 2018, p. 40)
341341

342342
### Source Data
343343

344344
#### Initial Data Collection and Normalization
345345

346-
"\[W\]e randomly selected a set of 40 documents, available in PDF format, among a bigger collection provided by experts in the domain, who pre-selected a representative sample of articles in Computer Graphics. Articles were classified into four important subjects in this area: Skinning, Motion Capture, Fluid Simulation and Cloth Simulation. We included in the corpus 10 highly representative articles for each subject." (Fisas et al. 2015, p. 44)
346+
"[W]e randomly selected a set of 40 documents, available in PDF format, among a bigger collection provided by experts in the domain, who pre-selected a representative sample of articles in Computer Graphics. Articles were classified into four important subjects in this area: Skinning, Motion Capture, Fluid Simulation and Cloth Simulation. We included in the corpus 10 highly representative articles for each subject." (Fisas et al. 2015, p. 44)
347347

348348
"The Corpus includes 10,789 sentences, with an average of 269.7 sentences per document." (p. 45)
349349

@@ -367,7 +367,7 @@ The annotation were done using BRAT Rapid Annotation Tool ([Stenetorp et al., 20
367367

368368
### Personal and Sensitive Information
369369

370-
\[More Information Needed\]
370+
[More Information Needed]
371371

372372
## Considerations for Using the Data
373373

@@ -384,7 +384,7 @@ of the different rhetorical aspects of scientific language (which we dub *scitor
384384

385385
"While the background claims and own claims are on average of similar length (85 and 87 characters, respectively), they are much longer than data components (average of 25 characters)."
386386

387-
"\[A\]nnotators identified an average of 141 connected component per publication...This indicates that either authors write very short argumentative chains or that our annotators had difficulties noticing long-range argumentative dependencies."
387+
"[A]nnotators identified an average of 141 connected component per publication...This indicates that either authors write very short argumentative chains or that our annotators had difficulties noticing long-range argumentative dependencies."
388388

389389
(Lauscher et al. 2018, p.43)
390390

src/pie_datasets/core/builder.py

Lines changed: 6 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -176,12 +176,10 @@ def _generate_example_kwargs(
176176
return None # pragma: no cover
177177

178178
@overload # type: ignore
179-
def _convert_dataset_single(self, dataset: datasets.IterableDataset) -> IterableDataset:
180-
...
179+
def _convert_dataset_single(self, dataset: datasets.IterableDataset) -> IterableDataset: ...
181180

182181
@overload # type: ignore
183-
def _convert_dataset_single(self, dataset: datasets.Dataset) -> Dataset:
184-
...
182+
def _convert_dataset_single(self, dataset: datasets.Dataset) -> Dataset: ...
185183

186184
def _convert_dataset_single(
187185
self, dataset: Union[datasets.Dataset, datasets.IterableDataset]
@@ -204,22 +202,18 @@ def _convert_dataset_single(
204202
return result
205203

206204
@overload # type: ignore
207-
def _convert_datasets(self, datasets: datasets.DatasetDict) -> datasets.DatasetDict:
208-
...
205+
def _convert_datasets(self, datasets: datasets.DatasetDict) -> datasets.DatasetDict: ...
209206

210207
@overload # type: ignore
211208
def _convert_datasets(
212209
self, datasets: datasets.IterableDatasetDict
213-
) -> datasets.IterableDatasetDict:
214-
...
210+
) -> datasets.IterableDatasetDict: ...
215211

216212
@overload # type: ignore
217-
def _convert_datasets(self, datasets: datasets.IterableDataset) -> IterableDataset:
218-
...
213+
def _convert_datasets(self, datasets: datasets.IterableDataset) -> IterableDataset: ...
219214

220215
@overload # type: ignore
221-
def _convert_datasets(self, datasets: datasets.Dataset) -> Dataset:
222-
...
216+
def _convert_datasets(self, datasets: datasets.Dataset) -> Dataset: ...
223217

224218
def _convert_datasets(
225219
self,

src/pie_datasets/core/dataset.py

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -179,17 +179,15 @@ def dataset_to_document_type(
179179
dataset: "Dataset",
180180
document_type: Type[Document],
181181
**kwargs,
182-
) -> "Dataset":
183-
...
182+
) -> "Dataset": ...
184183

185184

186185
@overload
187186
def dataset_to_document_type(
188187
dataset: "IterableDataset",
189188
document_type: Type[Document],
190189
**kwargs,
191-
) -> "IterableDataset":
192-
...
190+
) -> "IterableDataset": ...
193191

194192

195193
def dataset_to_document_type(
@@ -383,9 +381,11 @@ def map(
383381
result_document_type: Optional[Type[Document]] = None,
384382
) -> "Dataset":
385383
dataset = super().map(
386-
function=decorate_convert_to_dict_of_lists(function)
387-
if as_documents and function is not None
388-
else function,
384+
function=(
385+
decorate_convert_to_dict_of_lists(function)
386+
if as_documents and function is not None
387+
else function
388+
),
389389
with_indices=with_indices,
390390
with_rank=with_rank,
391391
input_columns=input_columns,
@@ -588,11 +588,13 @@ def map( # type: ignore
588588
**kwargs,
589589
) -> "IterableDataset":
590590
dataset_mapped = super().map(
591-
function=decorate_convert_to_document_and_back(
592-
function, document_type=self.document_type, batched=batched
593-
)
594-
if as_documents and function is not None
595-
else function,
591+
function=(
592+
decorate_convert_to_document_and_back(
593+
function, document_type=self.document_type, batched=batched
594+
)
595+
if as_documents and function is not None
596+
else function
597+
),
596598
batched=batched,
597599
**kwargs,
598600
)

tests/dataset_builders/common.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def _deep_compare(
3333
if re.match(excluded_path, path):
3434
return
3535

36-
if type(obj) != type(obj_expected):
36+
if type(obj) is not type(obj_expected):
3737
raise AssertionError(f"{path}: {obj} != {obj_expected}")
3838
if isinstance(obj, (list, tuple)):
3939
if len(obj) != len(obj_expected):

tests/dataset_builders/pie/sciarg/test_sciarg.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -842,9 +842,11 @@ def test_tokenize_documents_all(converted_dataset, tokenizer, dataset_variant):
842842
tokenizer=tokenizer,
843843
return_overflowing_tokens=True,
844844
result_document_type=TOKENIZED_DOCUMENT_TYPE_MAPPING[type(doc)],
845-
partition_layer="labeled_partitions"
846-
if isinstance(doc, TextDocumentWithLabeledPartitions)
847-
else None,
845+
partition_layer=(
846+
"labeled_partitions"
847+
if isinstance(doc, TextDocumentWithLabeledPartitions)
848+
else None
849+
),
848850
strict_span_conversion=strict_span_conversion,
849851
verbose=True,
850852
)

tests/unit/builder/test_brat_builder.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -107,9 +107,9 @@ def hf_example(request) -> dict:
107107

108108
def test_generate_document(builder, hf_example):
109109
kwargs = dict()
110-
generated_document: Union[
111-
BratDocument, BratDocumentWithMergedSpans
112-
] = builder._generate_document(example=hf_example, **kwargs)
110+
generated_document: Union[BratDocument, BratDocumentWithMergedSpans] = (
111+
builder._generate_document(example=hf_example, **kwargs)
112+
)
113113

114114
if hf_example == HF_EXAMPLES[0]:
115115
assert len(generated_document.relations) == 0

0 commit comments

Comments
 (0)