-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roslin qc #151
base: develop
Are you sure you want to change the base?
Conversation
Travis CI Fails with |
@stevekm fix for this was version bump of GitPython https://github.com/mskcc/beagle/blob/develop/requirements.txt#L11. Fix is already on develop. Did you merge develop back to your branch? |
needs to be updated with the new Roslin CWL output format for operator input |
This is blocked by #154 until we can figure out a way to get copies of the updated Roslin pipeline output format into Beagle. |
|
Test that the API Run Create Serializer works and creates a Run | ||
""" | ||
# start with 0 runs in the database | ||
self.assertEqual(len(Run.objects.all()), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes much more sense to be tested through POST /v0/run/api/
, but we can leave it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you POST to <localhost:1234>/v0/run/api/
you are going to get the real server, not the test-instance.
|
||
# merge the dict's; the second dict will overwrite args from the first | ||
normal = {**normal_r1, **normal_r2} | ||
normal['R1'] = file_to_cwl(pair['normal']['R1'].file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this converted to CWL format? Operator needs to create a job where files are referenced by uri, which means they should be specified with the location
instead of the path
and uri should be either in format juno://<file_path>
or bid://<file_id_in_beagle>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not clear when I first started building the Operator, but the code works well for producing the CWL output needed in order to develop the Operator so I am not sure what to do with it now that the Operator has been refactored to output Job JSON instead of CWL JSON
runparams['pi_email'] = "NA" | ||
return(runparams) | ||
|
||
def get_db_files(assay, references_json = "runner/operator/roslin_operator/reference_jsons/roslin_resources.json"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8. There shouldn't be
in kwargs between =
def get_db_files(assay, references_json="runner/operator/roslin_operator/reference_jsons/roslin_resources.json"):
db_files['ref_fasta'] = references["request_files"]['ref_fasta'] | ||
|
||
# need to resolve some items to URI's | ||
for key in ['fp_genotypes', 'hotspot_list_maf', 'ref_fasta']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these files specified in references_json runner/operator/roslin_operator/reference_jsons/roslin_resources.json
? If they are why do we have this logic to check is it a path or uri?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because some of the paths must be sent to CWL as character strings because they correspond to files inside containers, while other items must be passed as CWL File records and must be sent as URI's for Beagle to convert back to CWL.
Operator.__init__(self, model, request_id = request_id, run_ids = run_ids) | ||
|
||
def get_pipeline_id(self): | ||
return "9b7f2ac8-03a5-4c44-ae87-1d9f6500d19a" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this anymore as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to remove it and things started breaking. Not sure why I think it has to do with the recent changes to how Operators work.
@@ -95,6 +96,9 @@ def process_triggers(): | |||
@shared_task | |||
def create_run_task(run_id, inputs, output_directory=None): | |||
logger.info("Creating and validating Run") | |||
if settings.DUMP_JSON: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like putting this in a task. Prints like this for debug are fine, but I think we should remove them from production code. I don't like the idea of generating files in which are not used anywere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is required for development, debug, and manual run setup as per #162 and #150. Otherwise there is no straightforward method to actually see what your Job JSON and final output CWL look like. Its easier to leave it in as a feature than to have to re-add it every single time you need to use it for debugging.
for request_id in request_ids: | ||
logging.info("Submitting requestId %s to pipeline %s" % (request_id, pipeline_name)) | ||
create_jobs_from_request.delay(request_id, pipeline.operator_id) | ||
if pipeline_name == "roslin-qc": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't we handling roslin-qc operator the same way as any other operator? request_handlers shouldn't have any operator specific logic in them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because Roslin QC Operator requires that all request_ids
and run_ids
be passed, while other Operators only currently support a single request_id
arg. Trying to refactor all other Operators to be able to use this would take a lot more work. Also, its less confusing when the logic for dictating the handling behavior is placed up front here near the endpoint/view, instead of buried way down inside some other module.
@sivkovic the copy outputs operator is failing the tests, I am not sure what is wrong with it; https://travis-ci.com/mskcc/beagle/builds/151844563#L2014 |
Roslin QC Operator