Dear EMBED OPEN DATA team,
Would it be possible to modify/adapt the 'NUS_example_pipeline.ipynb' notebook so that it is entirely self-contained and can be executed with access to only the .csv files available on AWS? Currently, the notebook makes heavy use of additional columns and information (exam_birads, exam_path_severity, exam_path_desc, exam_outcome, etc.) from complicated pre-processing that are not available in any of the AWS files...
I think having a self-contained example showing how to process EMBED data for cancer vs. no cancer classification, which includes official patient-level train-test split information, would greatly accelerate community adoption of this great resource!
Thanks!