Skip to content

Commit 4ac6be9

Browse files
committed
updated dependencies; adopting black
1 parent 5e21ffe commit 4ac6be9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+2153
-747
lines changed

.black.toml

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
[tool.black]
2+
line-length = 88
3+
target-version = ['py38']
4+
include = '\.pyi?$'
5+
exclude = '''
6+
/(
7+
\.git
8+
| \.hg
9+
| \.mypy_cache
10+
| \.tox
11+
| \.venv
12+
| _build
13+
| buck-out
14+
| build
15+
| dist
16+
| \.pdpp.*
17+
)/
18+
'''

.flake8

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[flake8]
2+
exclude = .venv, */dodo.py
3+
max-line-length = 90
4+
extend-ignore = E203, W503, E231

.gitignore

100755100644
+3-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.DS_Store
2+
13
/.vscode*
24
!/.gitignore
35
*__pycache__
@@ -7,4 +9,4 @@
79
/build/
810
/*.egg-info
911
/*.egg
10-
/.idea
12+
/.idea

.isort.cfg

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
[settings]
2+
profile = black
3+
line_length = 88
4+
skip = .pdpp*

.pre-commit-config.yaml

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
repos:
2+
- repo: [email protected]:pre-commit/pre-commit-hooks.git
3+
rev: v4.4.0
4+
hooks:
5+
- id: check-added-large-files
6+
args: ['--maxkb=102400']
7+
- id: check-executables-have-shebangs
8+
- id: check-merge-conflict
9+
- id: check-symlinks
10+
- id: check-toml
11+
- id: check-yaml
12+
- id: detect-private-key
13+
- id: end-of-file-fixer
14+
- id: trailing-whitespace
15+
16+
- repo: [email protected]:psf/black.git
17+
rev: 23.1.0
18+
hooks:
19+
- id: black
20+
args: ['--config', '.black.toml']
21+
22+
- repo: [email protected]:pre-commit/mirrors-isort.git
23+
rev: v5.10.1
24+
hooks:
25+
- id: isort
26+
27+
- repo: [email protected]:pycqa/flake8.git
28+
rev: 6.0.0
29+
hooks:
30+
- id: flake8
31+
args: ['--config', '.flake8']

LICENSE.txt

100755100644
File mode changed.

README.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
# `pdpp`
2222

23-
`pdpp` is a command-line interface for facilitating the creation and maintainance of transparent and reproducible data workflows. `pdpp` adheres to principles espoused by Patrick Ball in his manifesto on ['Principled Data Processing'](https://www.youtube.com/watch?v=ZSunU9GQdcI). `pdpp` can be used to create 'tasks', populate task directories with the requisite subdirectories, link together tasks' inputs and outputs, and executing the pipeline using the `doit` [suite of automation tools](https://pydoit.org/).
23+
`pdpp` is a command-line interface for facilitating the creation and maintainance of transparent and reproducible data workflows. `pdpp` adheres to principles espoused by Patrick Ball in his manifesto on ['Principled Data Processing'](https://www.youtube.com/watch?v=ZSunU9GQdcI). `pdpp` can be used to create 'tasks', populate task directories with the requisite subdirectories, link together tasks' inputs and outputs, and executing the pipeline using the `doit` [suite of automation tools](https://pydoit.org/).
2424

2525
`pdpp` is also capable of producing rich visualizaitons of the data processing workflows it creates:
2626

@@ -34,12 +34,12 @@ Each task directory contains at minimum three subdirectories:
3434
2. `output`, which contains all of the task's local data outputs (also referred to as 'targets')
3535
3. `src`, which all of the task's source codeWhich, ideally, would be contained within a single script file.]
3636

37-
The `pdpp` package adds two additional constraints to Patrick Ball's original formulation of PDP:
37+
The `pdpp` package adds two additional constraints to Patrick Ball's original formulation of PDP:
3838

3939
1. All local data files needed by the workflow but which are not generated by any of the workflow's tasks must be included in the `_import_` directory, which `pdpp` places at the same directory level as the overall workflow during project initialization.
4040
2. All local data files produced by the workflow as project outputs must be routed into the `_export_` directory, which `pdpp` places at the same directory level as the overall workflow during project initialization.
4141

42-
These additional constraints disambiguate the input and output of the overall workflow, which permits `pdpp` workflows to be embedded within one another.
42+
These additional constraints disambiguate the input and output of the overall workflow, which permits `pdpp` workflows to be embedded within one another.
4343

4444

4545
## Installation Prerequisites
@@ -67,15 +67,15 @@ Doing so should produce a directory tree similar to this one:
6767

6868
![](img/init.png)
6969

70-
For the purposes of this example, a `.csv` file containing some toy data has been added to the `_import_` directory.
70+
For the purposes of this example, a `.csv` file containing some toy data has been added to the `_import_` directory.
7171

7272
At this point, we're ready to add our first task to the project. To do this, we'll use the `new` command:
7373

7474
```bash
7575
pdpp new
7676
```
7777

78-
Upon executing the command, `pdpp` will request a name for the new task. We'll call it 'task_1'. After supplying the name, `pdpp` will display an interactive menu which allows users to specify which other tasks in the project contain files that 'task_1' will depend upon.
78+
Upon executing the command, `pdpp` will request a name for the new task. We'll call it 'task_1'. After supplying the name, `pdpp` will display an interactive menu which allows users to specify which other tasks in the project contain files that 'task_1' will depend upon.
7979

8080
![](img/task_1_task_dep.png)
8181

@@ -96,7 +96,7 @@ new_rows = []
9696

9797
with open('../input/example_data.csv', 'r') as f1:
9898
r = csv.reader(f1)
99-
for row in r:
99+
for row in r:
100100
new_row = [int(row[0]) + 1, int(row[1]) + 1]
101101
new_rows.append(new_row)
102102

@@ -112,7 +112,7 @@ After running `task_1.py`, a new file called `example_data_plus_one.csv` should
112112
pdpp rig
113113
```
114114

115-
Select `_export_` from the list of tasks available, then select `task_1` (and not `_import_`); finally, select `example_data_plus_one.csv` as the only dependency for `_export_`.
115+
Select `_export_` from the list of tasks available, then select `task_1` (and not `_import_`); finally, select `example_data_plus_one.csv` as the only dependency for `_export_`.
116116

117117
Once `_export_` has been rigged, this example project is a complete (if exceedingly simple) example of a `pdpp` workflow. The workflow imports a simple `.csv` file, adds one to each number in the file, and exports the resulting modified `.csv` file. `pdpp` workflows can be visualized using the built-in visualization suite like so:
118118

@@ -124,7 +124,7 @@ The above command will prompt users for two pieces of information: the output fo
124124

125125
![](img/dependencies_all.png)
126126

127-
In `pdpp` visualizations, the box-like nodes represent tasks, the nodes with the folded-corners repesent data files, and the nodes with two tabs on the left-hand side represent source code.
127+
In `pdpp` visualizations, the box-like nodes represent tasks, the nodes with the folded-corners repesent data files, and the nodes with two tabs on the left-hand side represent source code.
128128

129129
One may execute the entire workflow by using one of the two following commands (both are functionally identical):
130130

@@ -148,7 +148,7 @@ When a workflow is run, the `doit` automation suite -- atop which `pdpp` is buil
148148
-- task_1
149149
```
150150

151-
This is because `doit` checks the relative ages of each tasks' inputs and outputs at runtime; if any given task has any outputsOr 'targets,' in `doit` nomenclature.] that are older than one or more of the task's inputs,Or 'dependencies,' in `doit` nomenclature] that task must be re-run. If all of a task's inputs are older than its outputs, the task does not need to be run. This means that a `pdpp`/`doit` pipeline can be run as often as the user desires without running the risk of needlessly wasting time or computing power: tasks will only be re-run if changes to 'upstream' files necessitate it. You can read more about this impressive feature of the `doit` suite [here](https://pydoit.org/tasks.html).
151+
This is because `doit` checks the relative ages of each tasks' inputs and outputs at runtime; if any given task has any outputsOr 'targets,' in `doit` nomenclature.] that are older than one or more of the task's inputs,Or 'dependencies,' in `doit` nomenclature] that task must be re-run. If all of a task's inputs are older than its outputs, the task does not need to be run. This means that a `pdpp`/`doit` pipeline can be run as often as the user desires without running the risk of needlessly wasting time or computing power: tasks will only be re-run if changes to 'upstream' files necessitate it. You can read more about this impressive feature of the `doit` suite [here](https://pydoit.org/tasks.html).
152152

153153

154154
## Usage from the Command Line
@@ -170,7 +170,7 @@ Adds a new custom task to a `pdpp` project and launches an interactive rigging s
170170

171171
### `pdpp sub`
172172

173-
Adds a new sub-project task to a `pdpp` project and launches an interactive rigging session for it (see `pdpp rig` below for more information). Sub-project tasks are distinct `pdpp` projects nested inside the main project -- structurally, they function identically to all other `pdpp` projects. Their dependencies are defined as any local files contained inside their `_import_` directory (which functions as if it were an `input` directory for a task) and their targets are defined as any local files contained inside their `_export_` directory (which functions as if if were an `output` directory for a task).
173+
Adds a new sub-project task to a `pdpp` project and launches an interactive rigging session for it (see `pdpp rig` below for more information). Sub-project tasks are distinct `pdpp` projects nested inside the main project -- structurally, they function identically to all other `pdpp` projects. Their dependencies are defined as any local files contained inside their `_import_` directory (which functions as if it were an `input` directory for a task) and their targets are defined as any local files contained inside their `_export_` directory (which functions as if if were an `output` directory for a task).
174174

175175

176176
### `pdpp rig`
@@ -179,7 +179,7 @@ Launches an interactive rigging session for a selected task, which allows users
179179

180180
### `pdpp run` or `doit`
181181

182-
Runs the project. The `pdpp run` command provides basic functionality; users may pass arguments to the `doit` command that provides a great deal of control and specificity. More information about the `doit` command can be found [here](https://pydoit.org/cmd-run.html).
182+
Runs the project. The `pdpp run` command provides basic functionality; users may pass arguments to the `doit` command that provides a great deal of control and specificity. More information about the `doit` command can be found [here](https://pydoit.org/cmd-run.html).
183183

184184
### `pdpp graph`
185185

@@ -196,4 +196,4 @@ Incorporates an already-PDP compliant directory (containing `input`, `output`, a
196196

197197
### `pdpp enable`
198198

199-
Allows users to toggle tasks 'on' or 'off'; tasks that are 'off' will not be executed when `pdpp run` or `doit` is used.
199+
Allows users to toggle tasks 'on' or 'off'; tasks that are 'off' will not be executed when `pdpp run` or `doit` is used.

dodo.py

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
import doit
2+
3+
from pdpp.automation.task_creator import gen_many_tasks, task_all
4+
5+
doit.run(globals())

env.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ channels:
55
dependencies:
66
- python
77
- pip
8-
- Click >= 7.0
9-
- doit >= 0.31.1
8+
- Click >= 7.0
9+
- doit >= 0.31.1
1010
- networkx >= 2.2
1111
- graphviz >= 0.10.1
1212
- pydot >= 1.4.1
1313
- questionary >= 1.0.2
14-
- pyyaml >= 5.3
14+
- pyyaml >= 5.3

pdpp/__init__.py

100755100644
File mode changed.

pdpp/automation/__init__.py

100755100644
File mode changed.

pdpp/automation/doit_run.py

100755100644
+5-2
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
from pdpp.automation.task_creator import gen_many_tasks, task_all
22

3+
34
def doit_run():
45
import doit
6+
57
doit.run(globals())
68

7-
if __name__ == '__main__':
8-
doit_run()
9+
10+
if __name__ == "__main__":
11+
doit_run()

pdpp/automation/link_task.py

100755100644
+55-25
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,83 @@
1-
from pdpp.tasks.base_task import BaseTask
2-
from pdpp.automation.mylinker import file_linker, dir_linker
3-
from typing import List
4-
from posixpath import join
51
import os
2+
from posixpath import join
3+
from typing import List
64

5+
from pdpp.automation.mylinker import dir_linker, file_linker
6+
from pdpp.tasks.base_task import BaseTask
77

8-
def make_link_task(task: BaseTask, disabled_list: List[str], final_dep_list: List):
98

9+
def make_link_task(task: BaseTask, disabled_list: List[str], final_dep_list: List):
1010
for task_with_dependency, dependency_metadata in task.dep_files.items():
11-
12-
link_action_list = []
11+
link_action_list = []
1312
link_dep_list = []
1413
link_targ_list = []
1514

16-
if task_with_dependency not in disabled_list:
15+
if task_with_dependency not in disabled_list:
16+
file_link_start = [
17+
join(task_with_dependency, dependency_metadata.task_out, f)
18+
for f in dependency_metadata.file_list
19+
]
20+
file_link_end = [
21+
join(task.target_dir, task.IN_DIR, f)
22+
for f in dependency_metadata.file_list
23+
]
1724

18-
file_link_start = [join(task_with_dependency, dependency_metadata.task_out, f) for f in dependency_metadata.file_list]
19-
file_link_end = [join(task.target_dir, task.IN_DIR, f) for f in dependency_metadata.file_list]
25+
link_action_list.extend(
26+
[
27+
(file_linker, [fls, fle])
28+
for fls, fle in list(zip(file_link_start, file_link_end))
29+
]
30+
)
2031

21-
link_action_list.extend([(file_linker, [fls, fle]) for fls, fle in list(zip(file_link_start, file_link_end))])
32+
dir_link_start = [
33+
join(task_with_dependency, dependency_metadata.task_out, f)
34+
for f in dependency_metadata.dir_list
35+
]
36+
dir_link_end = [
37+
join(task.target_dir, task.IN_DIR, f)
38+
for f in dependency_metadata.dir_list
39+
]
2240

23-
dir_link_start = [join(task_with_dependency, dependency_metadata.task_out, f) for f in dependency_metadata.dir_list]
24-
dir_link_end = [join(task.target_dir, task.IN_DIR, f) for f in dependency_metadata.dir_list]
25-
26-
link_action_list.extend([(dir_linker, [dls, dle]) for dls, dle in list(zip(dir_link_start, dir_link_end))])
41+
link_action_list.extend(
42+
[
43+
(dir_linker, [dls, dle])
44+
for dls, dle in list(zip(dir_link_start, dir_link_end))
45+
]
46+
)
2747

2848
link_dep_list.extend(file_link_start)
2949
link_targ_list.extend(file_link_end)
3050

3151
for dir_dependency in dependency_metadata.dir_list:
32-
path_to_dep_dir = join(dependency_metadata.task_name, dependency_metadata.task_out)
52+
path_to_dep_dir = join(
53+
dependency_metadata.task_name, dependency_metadata.task_out
54+
)
3355
startdir = os.getcwd()
3456
os.chdir(path_to_dep_dir)
3557
for root, _, filenames in os.walk(dir_dependency):
3658
for filename in filenames:
37-
38-
subdir_filepath_start = join(dependency_metadata.task_name, dependency_metadata.task_out, root, filename)
59+
subdir_filepath_start = join(
60+
dependency_metadata.task_name,
61+
dependency_metadata.task_out,
62+
root,
63+
filename,
64+
)
3965
link_dep_list.append(subdir_filepath_start)
4066

41-
subdir_filepath_end = join(task.target_dir, task.IN_DIR, root, filename)
67+
subdir_filepath_end = join(
68+
task.target_dir, task.IN_DIR, root, filename
69+
)
4270
link_targ_list.append(subdir_filepath_end)
4371
os.chdir(startdir)
4472

4573
final_dep_list.extend(link_targ_list)
4674

4775
yield {
48-
'basename': '_task_{}_LINK_TO_{}'.format(task_with_dependency, task.target_dir),
49-
'actions': link_action_list,
50-
'file_dep': link_dep_list,
51-
'targets': link_targ_list,
52-
'clean': True,
53-
}
76+
"basename": "_task_{}_LINK_TO_{}".format(
77+
task_with_dependency, task.target_dir
78+
),
79+
"actions": link_action_list,
80+
"file_dep": link_dep_list,
81+
"targets": link_targ_list,
82+
"clean": True,
83+
}

pdpp/automation/mylinker.py

100755100644
+3-3
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
from os import link, remove
2-
from shutil import rmtree, copytree
2+
from shutil import copytree, rmtree
33

44

55
def file_linker(link_start, link_end):
66
try:
77
link(link_start, link_end)
88
except FileExistsError:
99
remove(link_end)
10-
link(link_start, link_end)
10+
link(link_start, link_end)
1111

1212

1313
def dir_linker(link_start, link_end):
1414
try:
1515
copytree(link_start, link_end, copy_function=link)
1616
except FileExistsError:
1717
rmtree(link_end)
18-
copytree(link_start, link_end, copy_function=link)
18+
copytree(link_start, link_end, copy_function=link)

0 commit comments

Comments
 (0)