Updated datasets.py #31

ShreyParikh07 · 2024-10-08T07:47:51Z

Datasets implementation

for more information, see https://pre-commit.ci

test notebooks

…hackathon

review-notebook-app · 2024-10-08T07:59:32Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

eroell · 2024-10-08T10:29:06Z

thanks for the PR 🥳 comments in the files!

eroell · 2024-10-08T10:34:43Z

And add the 3 datasets in pytest:
you can e.g. just call ed.dt.<dataset X> in a function test_loading_datasets.py script within a function test_loading_dataset(), that gets called by pytest and checks it actually downloads the desired data; they are small enough I believe for this to be done

eroell · 2024-10-08T10:30:14Z

docs/notebooks/ehrapy_data/GIBleed_dataset/GiBleed_5.3/CARE_SITE.csv

@@ -0,0 +1 @@
+CARE_SITE_ID,CARE_SITE_NAME,PLACE_OF_SERVICE_CONCEPT_ID,LOCATION_ID,CARE_SITE_SOURCE_VALUE,PLACE_OF_SERVICE_SOURCE_VALUE


no csv files stored here, rather download from external :)

eroell · 2024-10-08T10:31:50Z

src/ehrdata/dt/datasets.py

@@ -62,7 +62,7 @@ def mimic_iv_omop(backend_handle: DuckDBPyConnection, data_path: Path | None = N
        >>> con.execute("SHOW TABLES;").fetchall()
    """
    if data_path is None:
-        data_path = "ehrapy_data/mimic-iv-demo-data-in-the-omop-common-data-model-0.9"
+        data_path = Path("ehrapy_data/mimic-iv-demo-data-in-the-omop-common-data-model-0.9")


yep good one

eroell · 2024-10-08T10:36:19Z

src/ehrdata/dt/datasets.py

+    extracted_folder = next(
+        (folder for folder in data_path.iterdir() if folder.is_dir() and "_csv" in folder.name), data_path
+    )
+    return _set_up_duckdb(extracted_folder, backend_handle)


 def gibleed_omop(backend_handle: DuckDBPyConnection, data_path: Path | None = None) -> None:


Adding more information to the docstrings would be great; see e.g. other ehrapy datasets that we already have, with links to where the data is from, and that this is in the omop data format with maybe a link to the omop data description

for more information, see https://pre-commit.ci

…hackathon

for more information, see https://pre-commit.ci

…hackathon

for more information, see https://pre-commit.ci

…hackathon

for more information, see https://pre-commit.ci

…hackathon

for more information, see https://pre-commit.ci

eroell · 2024-11-04T10:49:23Z

Fixed in #62. Worked locally, failed on CI; best guess is that Ubuntu acts more strictly than macOS on capitalization of files to read.

Zethson

Just random things I saw

Zethson · 2024-11-06T09:04:07Z

tests/test_datasets.py

+        shutil.rmtree(TEST_DATA_DIR)
+
+
+if __name__ == "__main__":


Don't think that this is necessary?

don't think so either, thx for the looking over, appreciate!

Zethson · 2024-11-06T09:05:36Z

src/ehrdata/dt/datasets.py

+    if data_path is None:
+        data_path = Path("ehrapy_data/Synthea27Nj")
+
+    if data_path.exists():


I'd have a general downloading function that you can reuse. Just steal the one from ehrapy. Consider not printing but use proper logging.

eroell · 2024-11-06T09:14:27Z

Some things here became part of #51 and #62, but as these branches have diverged quite a bit likely will close this PR soon

ShreyParikh07 and others added 4 commits October 8, 2024 09:30

Update datasets.py

766f2f1

Datasets implementation

[pre-commit.ci] auto fixes from pre-commit.com hooks

d8cd5c1

for more information, see https://pre-commit.ci

Create implement_datasets.ipynb

4642318

test notebooks

Merge branch 'hackathon' of https://github.com/theislab/ehrdata into …

deaa639

…hackathon

ShreyParikh07 and others added 2 commits October 8, 2024 10:05

corrected notebook

3c7bc0c

[pre-commit.ci] auto fixes from pre-commit.com hooks

431609d

for more information, see https://pre-commit.ci

eroell requested changes Oct 8, 2024

View reviewed changes

ShreyParikh07 and others added 20 commits October 8, 2024 14:13

pytest for data implementation added

5566d05

[pre-commit.ci] auto fixes from pre-commit.com hooks

7a9ae2e

for more information, see https://pre-commit.ci

Updated

cdeee11

[pre-commit.ci] auto fixes from pre-commit.com hooks

3be9613

for more information, see https://pre-commit.ci

Update test_datasets.py

26869fa

Merge branch 'hackathon' of https://github.com/theislab/ehrdata into …

cf69716

…hackathon

Update datasets.py

ad313d7

test repeated

5b19544

[pre-commit.ci] auto fixes from pre-commit.com hooks

e378d78

for more information, see https://pre-commit.ci

Update datasets.py

e28be1e

Update test_datasets.py

f5db783

[pre-commit.ci] auto fixes from pre-commit.com hooks

907e183

for more information, see https://pre-commit.ci

Update test_datasets.py

55780ca

Merge branch 'hackathon' of https://github.com/theislab/ehrdata into …

19604f3

…hackathon

[pre-commit.ci] auto fixes from pre-commit.com hooks

6c5cc8f

for more information, see https://pre-commit.ci

one last try

a080cd2

Merge branch 'hackathon' of https://github.com/theislab/ehrdata into …

3c8552d

…hackathon

[pre-commit.ci] auto fixes from pre-commit.com hooks

22884d1

for more information, see https://pre-commit.ci

minor changes

4f4a390

Merge branch 'hackathon' of https://github.com/theislab/ehrdata into …

ae1a5db

…hackathon

[pre-commit.ci] auto fixes from pre-commit.com hooks

0c8cda3

for more information, see https://pre-commit.ci

Zethson reviewed Nov 6, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated datasets.py #31

Updated datasets.py #31

ShreyParikh07 commented Oct 8, 2024

review-notebook-app bot commented Oct 8, 2024

eroell commented Oct 8, 2024

eroell commented Oct 8, 2024 •

edited

Loading

eroell Oct 8, 2024

eroell Oct 8, 2024

eroell Oct 8, 2024

eroell commented Nov 4, 2024

Zethson left a comment

Zethson Nov 6, 2024

eroell Nov 6, 2024

Zethson Nov 6, 2024

eroell commented Nov 6, 2024

		@@ -0,0 +1 @@
		CARE_SITE_ID,CARE_SITE_NAME,PLACE_OF_SERVICE_CONCEPT_ID,LOCATION_ID,CARE_SITE_SOURCE_VALUE,PLACE_OF_SERVICE_SOURCE_VALUE

Updated datasets.py #31

Are you sure you want to change the base?

Updated datasets.py #31

Conversation

ShreyParikh07 commented Oct 8, 2024

review-notebook-app bot commented Oct 8, 2024

eroell commented Oct 8, 2024

eroell commented Oct 8, 2024 • edited Loading

eroell Oct 8, 2024

Choose a reason for hiding this comment

eroell Oct 8, 2024

Choose a reason for hiding this comment

eroell Oct 8, 2024

Choose a reason for hiding this comment

eroell commented Nov 4, 2024

Zethson left a comment

Choose a reason for hiding this comment

Zethson Nov 6, 2024

Choose a reason for hiding this comment

eroell Nov 6, 2024

Choose a reason for hiding this comment

Zethson Nov 6, 2024

Choose a reason for hiding this comment

eroell commented Nov 6, 2024

eroell commented Oct 8, 2024 •

edited

Loading