Skip to content

crab report - raises exception when all the jobs failed #5248

@mapellidario

Description

@mapellidario

While working on dmwm/CRABServer#6540, I noticed that if all jobs of a task fail, then crab report refuses to compute notFinishedLumis.json and notPublishedLumis.json

example of a problem

consider 231026_133050:dmapelli_crab_20231026_153049 on test11. 4 jobs, all failed. (I killed the task after the jobs failed).

crab report fails with [1] because at

if not reportData['lumisToProcess'] or not reportData['runsAndLumis']:

reportData['runsAndLumis'] is empty, since

> curl -L --key $X509_USER_PROXY --cert $X509_USER_PROXY "https://cmsweb-test11.cern.ch/crabserver/devtwo/workflow?workflow=231026_133050:dmapelli_crab_20231026_153049&subresource=report2"
{"result": [
 {"taskDBInfo": {"userWebDirURL": "http://vocms059.cern.ch/mon/dmapelli/231026_133050:dmapelli_crab_20231026_153049", "inputDataset": "/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM", "outputDatasets": [], "publication": true}, "runsAndLumis": {}}
]}

expected result

If I remove the aforementioned check in crab report, then notPublishedLumis.json and notFinishedLumis.json are identical to lumisToProcess.json [2].

discussion

I know that it is very unlikely that a user will ever care running crab report on a task where all jobs failed. And also a crab recover will likely be not necessary, submitting the same task again will be a proper alternative. However, I do not like that our client returns an ambiguous result.

What shall we do?

Keep in mind that crab report has this info [3], so it is not difficult to change the message to "sorry, all the jobs failed. You'd better submit a new task with the same config" if

  • reportData["runsAndLumis"] is empty, and
  • all entries in reportData["jobList"] are failed jobs

[1]

Singularity> crab report -d crab_20231026_153049 --recover notPublished --proxy=$X509_USER_PROXY
...
Error: Cannot get all the needed information for the report. Maybe no job has completed yet ?
 Notice, if your task has been submitted more than 30 days ago, then everything has been cleaned.
Will save lumi files into output directory /home/dario/crab/local/z-submitted/crab_20231026_153049/results
Additional report lumi files:
  Input dataset lumis (from DBS, at task submission time) written to inputDatasetLumis.json
  Lumis to process written to lumisToProcess.json
...

[2]

Singularity> crab report -d crab_20231026_153049 --recover notPublished --proxy=$X509_USER_PROXY
...
Will save lumi files into output directory /home/dario/crab/local/z-submitted/crab_20231026_153049/results
Summary from jobs in status 'finished':
  Number of files processed: 0
  Number of events read: 0
  Number of events written in EDM files: 0
  Number of events written in TFileService files: 0
  Number of events written in other type of files: 0
  Warning: 'notPublished' lumis written to notPublishedLumis.json
           The 'notPublished' lumis were calculated as: the lumis to process minus the lumis published in the output dataset.
...
Singularity> diff -s crab_20231026_153049/results/notPublishedLumis.json crab_20231026_153049/results/lumisToProcess.json
Files crab_20231026_153049/results/notPublishedLumis.json and crab_20231026_153049/results/lumisToProcess.json are identical
Singularity> crab report -d crab_20231026_153049 --proxy=$X509_USER_PROXY
...
Will save lumi files into output directory /home/dario/crab/local/z-submitted/crab_20231026_153049/results
Summary from jobs in status 'finished':
  Number of files processed: 0
  Number of events read: 0
  Number of events written in EDM files: 0
  Number of events written in TFileService files: 0
  Number of events written in other type of files: 0
  Warning: 'notFinished' lumis written to notFinishedLumis.json
           The 'notFinished' lumis were calculated as: the lumis to process minus the processed lumis.
...
Singularity> diff -s crab_20231026_153049/results/notFinishedLumis.json crab_20231026_153049/results/lumisToProcess.json
Files crab_20231026_153049/results/notFinishedLumis.json and crab_20231026_153049/results/lumisToProcess.json are identical

[3]

pprint.pprint(reportData)
{'inputDataset': '/GenericTTbar/HC-CMSSW_9_2_6_91X_mcRun1_realistic_v2-v2/AODSIM',
 'inputDatasetDuplicateLumis': {},
 'inputDatasetLumis': {'1': [[1, 49],
...
                             [3144, 3334]]},
 'jobList': [('failed', '2'),
             ('failed', '4'),
             ('failed', '1'),
             ('failed', '3')],
 'lumisToProcess': {'1': {'1': [[419, 419], [592, 592]]},
                    '2': {'1': [[652, 652], [1261, 1261]]},
                    '3': {'1': [[1849, 1849], [1858, 1858]]},
                    '4': {'1': [[2702, 2702], [2748, 2748]]}},
 'outputDatasets': [],
 'outputDatasetsInfo': {'outputDatasets': {}},
 'publication': True,
 'runsAndLumis': {}
}

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions