-
Notifications
You must be signed in to change notification settings - Fork 10
Synchronization exercise
In the synchronization exercise we want to compare our analysis FW to other groups' FWs at the Ntuple level. We consider two types of synchronizations:
- synchronization at the object level: to check how compatible the reconstructed objects are
- do we apply the same energy corrections (to jets, leptons);
- are we using the correct tau ID discriminant;
- how are the objects cleaned;
- are the object-level variables computed the same way in all implementations;
- etc etc;
- synchronization at the event level: to check if the event-level cuts are implemented correctly.
The trees and branches in the output Ntuple should be named according to some kind of nomenclature that all groups have agreed to follow. In 2016+2017+2018 analysis we decided on the nomenclature as explained here and here.
For the object level synchronization we have a dedicated analysis executable analyze_inclusive
which doesn't select any events but computes proper objects used in the analyses.
The standard analyses -- 0l+2tau
, 1l+1tau
etc all the way up to 4l
, including four CRs (ttW
, ttZ
, WZ
, ZZ
) are capable of producing event-level sync Ntuples.
The sync Ntuples are produced not only for the signal regions (SRs) but also for fake application regions (ARs), flip ARs and MC closure regions (when applicable).
We also have the capability to produce the sync Ntuples with the following shape uncertainties: central
, JES
, JER
, tauES
, UnclusteredEn
, btag
.
The following examples are based on 2017 era.
The starting point for all groups is a ttH signal MC MINIAODSIM file, which is Ntupelized by each group individually.
In our framework we use our custom nanoAO fork to Ntupelize the MINIAOD file.
The Ntupelization is carried out with cmsRun
executable that requires a Python config file as an input.
The config files are generated with a script called launchall_nanoaod.sh
which is inspired by the launchall.sh
script from the old VHbb days.
Even though the script is globally available, it should be always executed in $CMSSW_BASE/src/tthAnalysis/NanoAOD
for it to function properly.
The help message of this script is currently:
Usage: launchall_nanoaod.sh -e <era>
-j <type>
[-d]
[-g]
[-f <dataset file>]
[-v version]
[-w whitelist]
[-n <job events = 50000>]
[-N <cfg events = -1>]
[-r <frequency = 1000>]
[-t <threads = 1>]
[ -p <publish: 0|1 = 1> ]
Available eras: 2016v2, 2016v3, 2017v1, 2017v2, 2018, 2018prompt
Available job types: data, mc, fast, sync
And the explanation of each option or flag:
-
-e <era>
: mandatory option that specifies the MiniAOD production campaign you want to Ntupelize. Available eras are:2016v2
(RunIISummer16MiniAODv2),2016v3
(RunIISummer16MiniAODv3),2017v1
(RunIIFall17MiniAOD),2017v2
(RunIIFall17MiniAODv2),2018
(RunIIAutumn18MiniAOD) and2018prompt
(only for processing 2018RunD data files). -
-j <type>
: mandatory option that specifies which type of files you want to Ntupelize. Available options are:data
,mc
(FullSim MC),fast
(FastSim MC) andsync
(MINIAODSIM for the syncrhonization exercise); -
-d
: flag that submits CRAB jobs with the--dryrun
option. Useful when you want to validate CRAB submission; -
-g
: flag that tells the script to only generate config files, not prepare any CRAB jobs; -
-f <dataset file>
: optional. Specifies the the location to the text file that lists all datasets the user wants to Ntupelize. If this option is not provided, then the the location is automatically guessed from the job type and campaign era. In specialized analyses like multilepton or bbWW HH analysis the option is mandatory, because valid guesses are made only when you Ntupelize MINIAOD files in ttH analysis; -
-v <version>
: optional. Specifies the directory name of the CRAB jobs in/store/cms/user/
. When the option is not provided, a default one is generated (consists of era and date of submission); -
-w whitelist
: optional comma-separated list of sites that the user wants their jobs to run on; -
-n <job events = 50000>
: (optional) number of events to process per CRAB job. Defaults to 50'000 events per CRAB job; -
-N <cfg events = -1>
: (optional) number of events to process percmsRun
task. Should never be changed, unless the user wants to test sync Ntuple production on a few events; -
-r <frequency = 1000>
: optional setting that tells how often the job should inform user about the progress. Defaults to 1000, which means that after every 1000th event a statement is made about the progress; -
-t <threads = 1>
: option that specifies the number of concurrent threads in the job. Setting it to a higher value is reasonable when the Ntupelization job is run locally (e.g. when Ntupelizing for the synchronization), but should never be touched when submitting CRAB jobs; -
-p <publish: 0|1 = 1>
: option that specifies whether the processed datasets should be published on DAS (1: the default) or not (0). Has no effect when-g
flag is enabled.
When running Ntuple mass production, you need to open grid proxy for long time (~weeks) before running the script.
However, in case of synchronization exercise we don't need massive computing resources that the grid computing provides -- we want the results as soon as possible.
So, the plan of attack is to generate only the config files (use -g
option), increase the number of concurrent threads to something reasonable (with -t
option) and run the cmsRun
job locally.
If the job is run locally, there's no need to open the grid proxy at all.
Example. Let's say we want to produce synchronization Ntuple for 2017 legacy ttH analysis.
In order to do that, we need to generate cmsRun
config file with the following command:
launchall_nanoaod.sh -e 2017v2 -j sync -g -r 1
The command basically tells that we want to only generate the config files (-g
) from RunIIFall17MiniAODv2 (2017v2
) MiniAOD for the synchronization exercise (-j sync
) where the reporting frequency is set to 1 per every event (-r 1
).
The user is first prompted the following message:
Sure you want to run NanoAOD production on samples defined in $CMSSW_BASE/src/tthAnalysis/NanoAOD/test/datasets/txt/datasets_sync_2017_RunIIFall17MiniAODv2.txt? [y/N]
In other words, the script has made an educated guess for the location of the file that contains the list of datasets subject to processing.
After pressing y
character, the user is prompted again with another question:
Sure you want to use this config file: $CMSSW_BASE/src/tthAnalysis/NanoAOD/test/cfgs/nano_sync_RunIIFall17MiniAODv2_cfg.py? [y/N]
The script made another educated guess for the location of generated config file. After entering y
, the script proceeds to generate the config file for the Ntupelization job.
Finally, the Ntupelization job can be run anywhere, but the general advice is that you create a directory somewhere on you $HOME
with the name that is descriptive enough for the job you're about to run, and then execute the following:
cmsRun $CMSSW_BASE/src/tthAnalysis/NanoAOD/test/cfgs/nano_sync_RunIIFall17MiniAODv2_cfg.py &> log.txt
The standard output and standard error streams are redirected to file log.txt
that is placed in the same directory where you executed the above command.
You can use tail -F log.txt
to track progress of the job in another shell session.
Example #2.
Let's say we want to Ntupelize a MINIAODSIM file for the synchronization exercise in HH bbWW analysis.
We start with the same command as in the previous example, but provide -f
option:
launchall_nanoaod.sh -e 2017v2 -j sync -g -r 1 -t 4 -f test/datasets/txt/datasets_hh_bbww_sync_2017_RunIIFall17MiniAODv2.txt
The script prompts the following question:
Sure you want to use this config file: $CMSSW_BASE/src/tthAnalysis/NanoAOD/test/cfgs/nano_sync_RunIIFall17MiniAODv2_cfg.py? [y/N]
After pressing y
the config file for this job is generated.
Notice that there's no question about the dataset file anymore since the user has already provided one.
The location of Ntuples is managed via sample dictionaries, which come about in multiple stages.
The only part that's written by a human are JSON files:
-
datasets.json
-- specifies all MC samples used in ttH analysis; -
datasets_sync.json
-- specifies MC samples used in ttH synchronization exercise; -
datasets_hh_multilepton.json
-- specifies all MC signal samples used in HH multilepton analysis; -
datasets_hh_bbww.json
-- specifies all MC signal samples used in HH bbWW analysis; -
datasets_hh_bbww_sync.json
-- specifies MC samples used in HH bbWW synchronization exercise;
These JSON files follow certain rules:
- samples grouped into an array of categories;
- each category is split into samples that share the same physics and information about the cross section;
- each sample is further split into production campaigns;
- all DAS names (
dbs
) or locations (loc
) of each individual sample or file name (file
) are defined for every production campaign. It may be the case that there are multiple samples covering the same phase space (e.g. samples with parton shower weights or extended samples), so they must be defined in the same production campaign but must be given a different name viaalt
option.
For instance, datasets_hh_bbww_sync.json
currently reads:
[
{
"category": "signal",
"comment": "",
"samples": [
{
"name": "signal_ggf_spin0_750_hh_2b2v",
"enabled": 1,
"use_case": "signal extraction",
"process": "HH -> bbWW, WW -> 2l 2v (GGF)",
"datasets": {
"RunIISummer16MiniAODv2": [],
"RunIISummer16MiniAODv3": [],
"RunIIFall17MiniAOD": [],
"RunIIFall17MiniAODv2": [
{
"dbs": "/GluGluToRadionToHHTo2B2VTo2L2Nu_M-750_narrow_13TeV-madgraph_correctedcfg/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM",
"file": "/local/karl/store/mc/RunIIFall17MiniAODv2/GluGluToRadionToHHTo2B2VTo2L2Nu_M-750_narrow_13TeV-madgraph_correctedcfg/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/10000/F86CC95D-A1B0-E811-B516-0242AC130002.root"
}
],
"RunIIAutumn18MiniAOD": []
},
"xs": {
"value": 0.026422,
"order": "",
"references": [
"https://twiki.cern.ch/twiki/bin/view/LHCPhysics/CERNYellowReportPageBR",
"http://pdglive.lbl.gov/BranchingRatio.action?desig=7&parCode=S043"
],
"comment": "normalized to 1 pb: 2*BR(H->bb)*BR(H->WW)*BR(W->lnu)^2=2*0.5824*0.2137*0.3258^2"
}
}
]
}
]
From this JSON, it's clear that the sample from RunIIFall17MiniAODv2
campaign is considered for the HH bbWW synchronization exercise.
The JSON files give rise to so-called dataset tables and sample sum tables that are grouped by production campaigns (which is clear from the file names).
They're generated with generate_dataset_table.py
script like so:
generate_dataset_table.py -i test/datasets/json/datasets.json
The output will be stored in $CMSSW_BASE/src/tthAnalysis/NanoAOD/test/datasets/txt
.
For instance, generate_dataset_table.py -i test/datasets/json/datasets_hh_bbww_sync.json
creates only one file $CMSSW_BASE/src/tthAnalysis/NanoAOD/test/datasets/txt/datasets_hh_bbww_sync_2017_RunIIFall17MiniAODv2.txt
which has the following content:
# file generated at 2019-05-12 02:26:43 with the following command:
# generate_dataset_table.py -i test/datasets/json/datasets_hh_bbww_sync.json
# HH -> bbWW, WW -> 2l 2v (GGF)
/GluGluToRadionToHHTo2B2VTo2L2Nu_M-750_narrow_13TeV-madgraph_correctedcfg/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM 1 signal signal_ggf_spin0_750_hh_2b2v 0.026422 /local/karl/store/mc/RunIIFall17MiniAODv2/GluGluToRadionToHHTo2B2VTo2L2Nu_M-750_narrow_13TeV-madgraph_correctedcfg/MINIAODSIM/PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/10000/F86CC95D-A1B0-E811-B516-0242AC130002.root # [1][2]; normalized to 1 pb: 2*BR(H->bb)*BR(H->WW)*BR(W->lnu)^2=2*0.5824*0.2137*0.3258^2
# References:
# [1] https://twiki.cern.ch/twiki/bin/view/LHCPhysics/CERNYellowReportPageBR
# [2] http://pdglive.lbl.gov/BranchingRatio.action?desig=7&parCode=S043
As it's evident from the first line in this file, each dataset table shows the actual command that was used to generate the tables. This is useful information because
- the user doesn't have to memorize the commands that were used to generate the dataset tables;
- when there's a mistake in the dataset tables, it's easier to track down the bug in the original JSON file.
The dataset files serve two purposes:
- they are read by
launchall_nanoaod.sh
script that generates config files for CRAB jobs or sync Ntuple production jobs; - they are instrumental in building the so-called meta dictionaries from which sample dictionaries are generated;
NB! Unless you find new datasets that are missing in the JSON files, you don't need to generate the dataset tables. But if you do, please make sure that you push newly generated tables to the repository.
The so-called meta-dictionaries contain basic information about the MINIAOD datasets: what are sample names, category names, cross sections, how many MINIAOD files the dataset has, how many (unweighted) events the dataset includes, in what CMSSW release the MINIAOD files were produced and what's the status of the dataset according to DBS. All this information can be fetched only if you have opened a grid proxy for sufficient amount of time.
Examples of meta dictionaries can be found in $CMSSW_BASE/src/tthAnalysis/HiggsToTauTau/python/samples/metaDict_2017_sync.py
and in $CMSSW_BASE/src/hhAnalysis/bbww/python/samples/metaDict_2017_hh_sync.py
.
Each meta-dictionary contain the exact command that was used to generate the meta-dictionary.
For instance, metaDict_2017_hh_sync.py
was generated in $CMSSW_BASE/src/hhAnalysis/bbww
with
find_samples.py -V -i ../../tthAnalysis/NanoAOD/test/datasets/txt/datasets_hh_bbww_sync_2017_RunIIFall17MiniAODv2.txt -m python/samples/metaDict_2017_hh_sync.py
as is evident from the content of this meta-dictionary:
from collections import OrderedDict as OD
# file generated at 2019-05-02 01:40:57 with the following command:
# find_samples.py -V -i ../../tthAnalysis/NanoAOD/test/datasets/txt/datasets_hh_bbww_sync_2017_RunIIFall17MiniAODv2.txt -m python/samples/metaDict_2017_hh_sync.py
meta_dictionary = OD()
### event sums
sum_events = {
}
meta_dictionary["/GluGluToRadionToHHTo2B2VTo2L2Nu_M-750_narrow_13TeV-madgraph_correctedcfg/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM"] = OD([
("crab_string", ""),
("sample_category", "signal"),
("process_name_specific", "signal_ggf_spin0_750_hh_2b2v"),
("nof_db_events", 200000),
("nof_db_files", 11),
("fsize_db", 11931037531),
("xsection", 0.026422),
("use_it", True),
("genWeight", True),
("comment", "status: VALID; size: 11.93GB; nevents: 200.00k; release: 9_4_7; last modified: 2018-10-06 03:20:04"),
])
# event statistics by sample category:
# signal: 200.00k
Unlike JSON files and dataset tables which belong to tth-nanoAOD repository, the meta-dictionaries are part of certain analysis. The information about production campaign is now dropped in the file name and only the year of the production name is kept.
NB! There's no need to generate any meta-dictionaries unless there has something changed in the dataset tables (which in turn implies changes in the JSON files).
Sample dictionaries define the locations of input NanoAOD Ntuples in an analysis. There are two or three types of sample dictionaries:
- sample dictionaries for Ntuples that haven't been post-processed. Needed before the post-processing begins;
- sample dictionaries for Ntuples that have been post-processed. Needed before we can run actual analysis jobs;
- sample dictionaries for Ntuples that have been post-processed and further skimmed for analyzing them with systematic uncertainties. Needed before we can run analysis jobs with systematic uncertainties.
Similarly to generating dataset tables and meta-dictionaries, the sample dictionaries always contain the actual command that was used to generate the sample dictionary itself.
Here's an example from sample dictionary that corresponds to the bbWW sync Ntuple in 2017 ($CMSSW_BASE/src/hhAnalysis/bbww/python/samples/hhAnalyzeSamples_2017_nanoAOD_sync.py
):
from collections import OrderedDict as OD
# file generated at 2019-05-02 01:46:10 with the following command:
# create_dictionary.py -m python/samples/metaDict_2017_hh_sync.py -p /hdfs/local/karl/sync_ntuples/2017/nanoAODproduction/2019May01 -N samples_2017 -E 2017 -o python/samples -g hhAnalyzeSamples_2017_nanoAOD_sync.py -M
samples_2017 = OD()
samples_2017["/GluGluToRadionToHHTo2B2VTo2L2Nu_M-750_narrow_13TeV-madgraph_correctedcfg/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM"] = OD([
("type", "mc"),
("sample_category", "signal"),
("process_name_specific", "signal_ggf_spin0_750_hh_2b2v"),
("nof_files", 1),
("nof_db_files", 11),
("nof_events", {
}),
("nof_tree_events", 52000),
("nof_db_events", 200000),
("fsize_local", 141613135), # 141.61MB, avg file size 141.61MB
("fsize_db", 11931037531), # 11.93GB, avg file size 1.08GB
("use_it", True),
("xsection", 0.026422),
("genWeight", True),
("triggers", ['1e', '1mu', '2e', '2mu', '1e1mu', '3e', '3mu', '2e1mu', '1e2mu', '1e1tau', '1mu1tau', '2tau']),
("has_LHE", True),
("LHE_set", "LHA IDs 306000 - 306102 -> NNPDF31_nnlo_hessian_pdfas PDF set, expecting 103 weights (counted 103 weights)"),
("local_paths",
[
OD([
("path", "/hdfs/local/karl/sync_ntuples/2017/nanoAODproduction/2019May01/signal_ggf_spin0_750_hh_2b2v"),
("selection", "*"),
("blacklist", []),
]),
]
),
("missing_from_superset", [
# not computed
]),
("missing_hlt_paths", [
]),
("hlt_paths", [
# not computed
]),
])
samples_2017["sum_events"] = [
]
NB! The sample dictionaries for Ntuples that haven't been post-processed must be generated every time your ran the Ntupelization (i.e. cmsRun
) job.
After the post-production job a new sample dictionary is needed for the sync Ntuple, which can be generated with the same command as the previous sample dictionary was, but the output file name (-g
) and location of the Ntuples (-p
) must be changed accordingly:
from collections import OrderedDict as OD
# file generated at 2019-05-16 20:02:57 with the following command:
# create_dictionary.py -m python/samples/metaDict_2017_hh_sync.py -p /hdfs/local/karl/ttHNtupleProduction/2017/2019May02_woPresel_nonNom_hh_bbww_sync/ntuples -N samples_2017 -E 2017 -o python/samples -g hhAnalyzeSamples_2017_sync.py -M
samples_2017 = OD()
samples_2017["/GluGluToRadionToHHTo2B2VTo2L2Nu_M-750_narrow_13TeV-madgraph_correctedcfg/RunIIFall17MiniAODv2-PU2017_12Apr2018_94X_mc2017_realistic_v14-v1/MINIAODSIM"] = OD([
("type", "mc"),
("sample_category", "signal"),
("process_name_specific", "signal_ggf_spin0_750_hh_2b2v"),
("nof_files", 1),
("nof_db_files", 11),
("nof_events", {
'Count' : [ 52000, ],
'CountWeighted' : [ 51972, 51953, 51984, ],
'CountWeightedNoPU' : [ 51992, ],
'CountFullWeighted' : [ 51972, 51953, 51984, ],
'CountFullWeightedNoPU' : [ 51992, ],
'CountWeightedL1PrefireNom' : [ 50011, 49988, 50026, ],
'CountWeightedL1Prefire' : [ 50011, 49562, 50455, ],
'CountWeightedNoPUL1PrefireNom' : [ 50031, ],
'CountFullWeightedL1PrefireNom' : [ 50011, 49988, 50026, ],
'CountFullWeightedL1Prefire' : [ 50011, 49562, 50455, ],
'CountFullWeightedNoPUL1PrefireNom' : [ 50031, ],
}),
("nof_tree_events", 52000),
("nof_db_events", 200000),
("fsize_local", 203322842), # 203.32MB, avg file size 203.32MB
("fsize_db", 11931037531), # 11.93GB, avg file size 1.08GB
("use_it", True),
("xsection", 0.026422),
("genWeight", True),
("triggers", ['1e', '1mu', '2e', '2mu', '1e1mu', '3e', '3mu', '2e1mu', '1e2mu', '1e1tau', '1mu1tau', '2tau']),
("has_LHE", True),
("LHE_set", "LHA IDs 306000 - 306102 -> NNPDF31_nnlo_hessian_pdfas PDF set, expecting 103 weights (counted 103 weights)"),
("local_paths",
[
OD([
("path", "/hdfs/local/karl/ttHNtupleProduction/2017/2019May02_woPresel_nonNom_hh_bbww_sync/ntuples/signal_ggf_spin0_750_hh_2b2v"),
("selection", "*"),
("blacklist", []),
]),
]
),
("missing_from_superset", [
# not computed
]),
("missing_hlt_paths", [
]),
("hlt_paths", [
# not computed
]),
])
samples_2017["sum_events"] = [
]
NB! A sample dictionary for the post-processed Ntuples must be always generated after the post-processing step because the scripts that submit the analysis jobs read the Ntuple location from these sample dictionaries.
Once the cmsRun
job is done, you need to copy the output file tree.root
to /hdfs/local/$USER
.
However, the script that generates sample dictionaries (create_dictionary.py
) assumes that the Ntuples are stored in a subdirectory that has the same name as the process name.
Furthermore, in order to maintain the compatibility with directory structure produced by CRAB jobs, the Ntuple must be placed in another subdirectory called 0000
and renamed as tree_1.root
Good examples of the locations to the NanoAOD Ntuples that haven't been post-processed are:
/hdfs/local/$USER/sync_ntuples/2017/nanoAODproduction/2019May04/ttHJetToNonbb_M125_amcatnlo/0000/tree_1.root
/hdfs/local/$USER/sync_ntuples/2017/nanoAODproduction/2019May01/signal_ggf_spin0_750_hh_2b2v/0000/tree_1.root
You can now proceed with post-processing the Ntuple by executing
./test/tthProdNtuple.py -e 2017 -v 2018Nov07 -m sync -O
NB! The Ntuples used in regular analysis are not produced with the -O
flag.
The flag does not smear the jets and thus not recompute MET with the smeared jets, i.e. we use "non-nominal" jets.
We use this option only in the synchronization because other groups do not smear the jets; if we were to enable this feature during the synchronization exercise it would difficult to disentangle this effect from other (potentially more serious) problems that may arise in the synchronization.
However, in some cases it would be nice to see how large impact smearing the jets would have.
In that case it makes sense to post-process the sync Ntuple again without the -O flag, put the output Ntuple into a different sample dictionary (python/samples/tthAnalyzeSamples_2017_sync.py
) and perform synchronization exercise with ourselves.
NBB! In case you want to run sync Ntuple post-production for another analysis, you need to specify this accordingly with -m/--mode
option.
For instance, to run Ntuple post-production for bbWW sync, you need to use hh_bbww_sync
mode like so:
./test/tthProdNtuple.py -e 2017 -v 2019May02 -m hh_bbww_sync -O
Note that in case it's not possible to run the job on cluster, you have two options:
- either run the makefile in "local" mode by adding
-R makefile
at the end of above command; or - run the Ntuple post-production interactively by refusing to submit the jobs to SLURM (which can be automatically done with
-E
option) and runproduceNtuple.sh
command directly on the generated config file.
However, if it's not possible to create any files to /hdfs
, you have to set outputDir
to configDir
in ./test/tthProdNtuple.py
.
The same tips apply to analysis jobs as well.
Similarly to previous step, a sample dictionary must be generated -- this time for the post-processed NanoAOD Ntuple.
However, there's no need to copy the output file explicitly, unless you ran the post-production job interactively (and even in that case you don't have to create the directory structure yourself -- it's been already created by tthProdNtuple.py
script).
By continuing the above examples, the post-processed Ntuple should be placed to:
/hdfs/local/$USER/ttHNtupleProduction/2017/2018Nov07_woPresel_nonNom_sync/ntuples/ttHJetToNonbb_M125_amcatnlo/0000/tree_1.root
in ttH analysis and
/hdfs/local/$USER/ttHNtupleProduction/2017/2019May02_woPresel_nonNom_hh_bbww_sync/ntuples/signal_ggf_spin0_750_hh_2b2v/0000/tree_1.root
in HH bbWW analysis.
The respective sample dictionaries can be created with:
# in $CMSSW_BASE/src/tthAnalysis/HiggsToTauTau
create_dictionary.py \
-m python/samples/metaDict_2017_sync.py \
-p /hdfs/local/$USER/ttHNtupleProduction/2017/2018Nov07_woPresel_nonNom_sync/ntuples \
-N samples_2017 \
-E 2017 \
-o python/samples \
-g tthAnalyzeSamples_2017_sync.py \
-M
# in $CMSSW_BASE/src/hhAnalysis/bbww
create_dictionary.py \
-m python/samples/metaDict_2017_hh_sync.py \
-p /hdfs/local/$USER/ttHNtupleProduction/2017/2019May02_woPresel_nonNom_hh_bbww_sync/ntuples \
-N samples_2017 \
-E 2017 \
-o python/samples \
-g hhAnalyzeSamples_2017_sync.py \
-M
To get the final sync Ntuple that can be compared to that provided by other groups, you need to run the following command:
./test/tthSyncNtuple.py -e 2017 -v 2018Nov07 -O -o sync_Tallinn_v30.root
NB!! Notice again the -O
flag!
It has the same meaning as before: the jets in the input Ntuple are not smeared and MET not recomputed.
This script (tthSyncNtuple.py
) is actually a wrapper for running multiple analyses all at once where each analysis job is told to produce a sync Ntuple from a single signal Ntuple.
The workflow then hadd
s the files together into a file specified by -o
option.
The output file can be found in
/hdfs/local/$USER/ttHAnalysis/2017/2018Nov07/sync_ntuple/sync_Tallinn_v30.root
.
This file contains many trees, including syncTree
, which is used for object-level synchronization; other event-level trees are named after the convention syncTree_$CHANNEL_$REGION
.
The tree structure is the same regardless of the tree.
As mention earlier, the workflow also supports the production of sync Ntuple for various cases of systematic uncertainties.
Although we are currently (as of writing) the only group who has implemented this feature and at first glance would seem like it's unnecessary use-case, it's actually a useful way of quantifying the effects of various shape uncertainties at the sync Ntuple level.
Synchronizing with ourselves on central vs shape uncertainties serves as a preliminary sanity check because running whole analyses with various shape uncertainties is very expensive in terms of computing and human time, and we would want to catch the mistakes as early as possible.
Note that in the sync exercise of HH bbWW analysis the sync Ntuple is generated with ./test/hhSyncNtuple.py
in $CMSSW_BASE/src/hhAnalysis/bbww
.
There are three-four major tools available in our FW that makes it relatively easy to find out discrepancies in the sync Ntuples:
-
compare_sync_objects.py
which compares object level sync Ntuples by counting the number of dR-matched objects; -
compareRootRLENumbers.py
which compares event level sync Ntuples by performing various set operations based on the run, lumi and event numbers in each channel and region; -
compareSyncNtuples.C
which is a macro for producing plots of variables taken from object level sync Ntuple; -
tthSyncNtuple.py
to rerun the sync Ntuple production on the events that other groups select but we reject.
There are two main tools to tackle the object level synchronization: compareSyncNtuples.C
and compare_sync_objects.py
.
The former is used to produce a bunch of plots that compare all branches in each tree between any two groups.
It's a very useful tool that shows how well each and every object- and event-level variable agree in all channels and regions.
However, the downside is that one needs to sift through a huge number of plots by hand.
For instance, if there are 5 groups that each have provided 20 sync trees containing 50 variables, there will be C(5; 2) * 20 * 150 = 90'000
plots.
Although, to be fair we're only interested in how well are we in sync with the other groups (C(5; 2) -> C(5; 1) = 5
) and in the object level-sync we should only look at the inclusive tree (called syncTree
: 20 -> 1
), which leaves only 5 * 150 = 750
plots.
When looking at the sync plots, you should pay attention to the following:
- are the variables cut off differently between the two groups?
- if one or the other group cuts on the variable we sync it becomes apparent in the sync plot
- is the plot that does a relative bin-by-bin comparison of the event counts in the bottom shifted up or down a bit?
- this indicates that the two groups select different number of objects. If the normalized distributions match in the main plot, look more carefully either the first or the last bin: there must be some kind of excess of events.
- do the distributions match, but not quite exactly?
- if the variables are plotted for objects that depend on the cleaning (electrons, taus, jets), then you need to take into account this effect when looking at the plots
- if only the positive range is filled, then it's likely that the absolute value is taken before filling the sync tree
- if the distributions don't look nothing alike, then it's a major pitfall that needs to be sorted out
However, these plots tell only half-truths: the two groups may select completely different object at times but the overall distribution of object-level variables still remain the same either because the variables naturally follow the same trends or because number of such cases is small relative to the sample size.
This is where compare_sync_objects.py
comes into play.
The program has only two prerequisites - pyROOT
and matplotlib
- so in principle it's can be run on any platform.
According to its help message, the script has three functions:
$ compare_sync_objects.py -h
usage: compare_sync_objects.py [-h] {count,inspect,plot} ...
optional arguments:
-h, --help show this help message and exit
commands:
{count,inspect,plot}
Each of these functions have their own help message, viz
$ compare_sync_objects.py count -h
usage: compare_sync_objects.py count [-h] -i path [path ...] [-t name] [-o]
optional arguments:
-h, --help show this help message and exit
-i path [path ...], --input path [path ...] Input files (default: None)
-t name, --tree name TTree name (default: syncTree)
-o, --count-objects Count the number of preselected objects (default: False)
-a analysis, --analysis analysis Type of analysis the sync Ntuple was produced in (default: tth)
Unfortunately, the plot function is a bit buggy, so you should avoid it.
The counting function is useful when update the general sync table that is also filled by other groups:
$ compare_sync_objects.py count -i /some/path/to/sync_Tallinn_v30.root
/some/path/to/sync_Tallinn_v30.root:
syncTree: 56465
syncTree_0l2tau_Fake: 38
syncTree_0l2tau_SR: 37
syncTree_0l2tau_mcClosure_t: 75
...
syncTree_ttZctrl_SR: 43
syncTree_ttZctrl_mcClosure_e: 51
syncTree_ttZctrl_mcClosure_m: 54
When adding -o
to the previous command, you'll also get the object counts at the very end:
n_presel_mu: 18464
n_presel_ele: 17660
n_presel_tau: 14458
n_presel_jet: 56440
In order to compare sync Ntuple produced in HH bbWW analysis, you have to add -a hh_bbww
to the above command.
This is needed because the nomenclature of the branch names in the sync Ntuples are a bit different between the two analyses.
The second function, inspect
, takes so-called reference Ntuple (-i
) and test Ntuple (-j
) as inputs and compares the objects in sync tree specified by -t
(default to syncTree
aka the object-level sync tree) by dR-matching the objects with cone size given by -c
(defaults to 0.01
).
It's also possible to limit the number of events in this test by either setting the maximum number of events with -n
or by giving the list of RLE numbers or a file containing the RLE numbers as argument to -r
.
Flag -v
increases the verbosity level as usual.
Finally, option -a
tells in which analysis the sync Ntuple was produced (default to tth
).
Here's the full help message:
$ compare_sync_objects.py inspect -h
usage: compare_sync_objects.py inspect [-h] -i path -j path [-t name]
[-r path/list [path/list ...]]
[-n number] [-d cone size] [-v]
[-a analysis]
optional arguments:
-h, --help show this help message
and exit
-i path, --input-ref path Input reference file (default: None)
-j path, --input-test path Input test file (default: None)
-t name, --tree name TTree name (default: syncTree)
-r path/list [path/list ...], --rle path/list [path/list ...]
Path to the list of run:lumi:event numbers, or explicit space-separated list of those (default: [])
-n number, --max-events number Maximum number of events to be considered (default: -1, i.e. all dR-matched objects) (default: -1)
-d cone size, --dr cone size Maximum cone size used in object dR-matching (default: 0.01)
-v, --verbose Enable verbose output (default: False)
-a analysis, --analysis analysis Type of analysis the sync Ntuple was produced in (default: tth)
Example:
$ compare_sync_objects.py inspect -i sync_Tallinn_v30.root -j other_object_ntuple.root
Total number of events considered: 56465
ispresel mu1: 18464 ( 7) in ref, 18464 ( 7) in test, 18457 dR-matched
isfakeablesel mu1: 13847 ( 4) in ref, 13848 ( 5) in test, 13843 dR-matched
ismvasel mu1: 11790 ( 4) in ref, 11791 ( 5) in test, 11786 dR-matched
ispresel mu2: 2890 ( 6) in ref, 2889 ( 5) in test, 2884 dR-matched
isfakeablesel mu2: 1982 ( 3) in ref, 1981 ( 2) in test, 1979 dR-matched
ismvasel mu2: 1522 ( 4) in ref, 1519 ( 1) in test, 1518 dR-matched
ispresel ele1: 17660 ( 27) in ref, 17765 ( 132) in test, 17633 dR-matched
isfakeablesel ele1: 10945 ( 18) in ref, 11000 ( 73) in test, 10927 dR-matched
ismvasel ele1: 9345 ( 17) in ref, 9395 ( 67) in test, 9328 dR-matched
ispresel ele2: 2721 ( 8) in ref, 2750 ( 37) in test, 2713 dR-matched
isfakeablesel ele2: 1562 ( 3) in ref, 1580 ( 21) in test, 1559 dR-matched
ismvasel ele2: 1192 ( 2) in ref, 1210 ( 20) in test, 1190 dR-matched
ispresel tau1: 14458 ( 2) in ref, 14458 ( 2) in test, 14456 dR-matched
ispresel tau2: 2262 ( 1) in ref, 2261 ( 0) in test, 2261 dR-matched
ispresel jet1: 56440 ( 1509) in ref, 56454 ( 1523) in test, 54931 dR-matched
ispresel jet2: 56170 ( 3593) in ref, 56307 ( 3730) in test, 52577 dR-matched
ispresel jet3: 54997 ( 5664) in ref, 55604 ( 6271) in test, 49333 dR-matched
ispresel jet4: 51532 ( 6915) in ref, 53125 ( 8508) in test, 44617 dR-matched
Here, the Tallinn Ntuple is used as a reference; the other group's object Ntuple is the test. The meaning of columns:
- 1st column says which level of object select the object has passed
-
ispresel
means loose -
isfakeablesel
means fakeable -
ismvasel
means tight
-
- 2nd column tells which object we are dealing with
-
mu
,ele
,tau
,jet
stand for muon, electron, hadronic tau and jet, respectively - the number tells the order of object (1 for leading, 2 for subleading etc)
-
- 3rd (4th) column tells how many objects the reference group selected (how many objects the reference group selected but aren't dR-matched with the objects of the same class from the test Ntuple)
- 5th (6th) column tells how many objects the test group selected (how many objects the test group selected but aren't dR-matched with the objects of the same class from the reference Ntuple)
- the last column tells how many objects selected by both reference and test group were actually dR-matched.
From the above example we can tell the following:
- both groups select roughly the same objects for leading and subleading muons, electrons and taus
- some discrepancy (< 1%) is expected and acceptable because the frameworks may use different precision for the variables
- the number of unmatched electrons and taus looks relatively high, which may arise due to different cone size used in the cleaning of electrons and taus (but it can also mean nothing)
- there are serious discrepancies in the way the jets are selected, though. This high level of disagreement is likely due to a different strategy of cleaning the jets, due to JECs or due to smearing of the jets, or just that the jets were not ordered by pT. The sync plots are probably very noisy/fuzzy in all jet variables.
By modifying the main loop of compare_sync_objects.py
we can gain some insight into the matter:
# Modify only between these long lines
##################################################################################################
if not evt.jet1.is_matched or not evt.jet2.is_matched or not evt.jet3.is_matched or not evt.jet4.is_matched:
print('RLE: %s' % rle)
evt.jet1.printVars(['pt', 'E', 'eta', 'phi'])
evt.jet2.printVars(['pt', 'E', 'eta', 'phi'])
evt.jet3.printVars(['pt', 'E', 'eta', 'phi'])
evt.jet4.printVars(['pt', 'E', 'eta', 'phi'])
evt.tau1.printVars(['pt', 'eta', 'phi'])
evt.tau2.printVars(['pt', 'eta', 'phi'])
##################################################################################################
Sure enough, after running the inspection on the first 20 events we see the following:
RLE: 1:8009:13579612
jet1 pt 149.500000 vs 149.528900 => -0.028900
jet1 E 452.318207 vs 452.394928 => -0.076721
jet1 eta 1.771240 vs 1.771216 => 0.000025
jet1 phi 1.146484 vs 1.146584 => -0.000099
jet2 pt 58.656250 vs 67.665901 => -9.009651
jet2 E 209.465302 vs 68.779884 => 140.685417
jet2 eta 1.945557 vs -0.174771 => 2.120327
jet2 phi 1.736572 vs 2.937042 => -1.200469
jet3 pt 53.437500 vs 58.651024 => -5.213524
jet3 E 55.312141 vs 209.422897 => -154.110756
jet3 eta 0.110275 vs 1.945438 => -1.835163
jet3 phi -1.876709 vs 1.736583 => -3.613292
jet4 pt - vs 53.436680 => -
jet4 E - vs 55.311844 => -
jet4 eta - vs 0.110281 => -
jet4 phi - vs -1.876784 => -
tau1 pt 60.447723 vs 60.447723 => 0.000000
tau1 eta -0.177399 vs -0.177404 => 0.000005
tau1 phi 2.938477 vs 2.938482 => -0.000005
tau2 pt 21.484795 vs 21.484795 => 0.000000
tau2 eta 0.717651 vs 0.717695 => -0.000043
tau2 phi -1.442871 vs -1.442788 => -0.000083
So, it looks like the other group hasn't cleaned their jets wrt the hadronic taus. Further investigation indicates that the group hasn't cleaned their jets at all.
The synchronization at the event level is based on the run, lumi and event (RLE) numbers in each channel and analysis region. If a group provides only the RLE numbers and nothing else, you can still use this information to build a minimal sync Ntuple for the event selection; example:
#!/usr/bin/env python
import ROOT
import array
input_map = {
'syncTree_2lSS_SR' : '2lss_sr.txt',
'syncTree_3l_SR' : '3l_sr.txt',
'syncTree_ttWctrl_SR' : 'ttW_CR.txt',
'syncTree_ttZctrl_SR' : 'ttZ_CR.txt',
'syncTree_WZctrl_SR' : 'wz_CR.txt',
}
fn = 'sync_tree.root'
f = ROOT.TFile.Open(fn, 'recreate')
for tree in input_map:
tree_obj = ROOT.TTree(tree, tree)
run = array.array('I', [0])
lumi = array.array('I', [0])
evt = array.array('L', [0])
tree_obj.Branch("run", run, "run/i")
tree_obj.Branch("ls", lumi, "ls/i")
tree_obj.Branch("nEvent", evt, "nEvent/l")
input_rles = input_map[tree]
with open(input_rles, 'r') as input_f:
for line in input_f:
rles = list(map(int, line.rstrip('\n').split(':')))
if len(rles) != 3:
continue
run[0] = rles[0]
lumi[0] = rles[1]
evt[0] = rles[2]
tree_obj.Fill()
tree_obj.Write()
f.Close()
With the following command:
compareRootRLENumbers.py \
-i group_1.root group_3.root group_2.root \
-n Group1 Group3 Group2 \
-T -v -f \
-o ~/path/to/results \
-t syncTree_2lSS_SR 2lSS_Fake syncTree_WZctrl_SR syncTree_ttWctrl_SR syncTree_ttZctrl_SR
you'll generate various tables which are helpful for determining any discrepancies between two groups. The meaning of the flags and options are the following:
-
-i
takes a list of full paths to the (event-level) sync Ntuples; -
-n
takes a complementary list of group names separated by a space (the order in which you pass these labels must match to the order in which you specify the input ROOT files) -
-T
generates various cross-tables -
-v
increases the on-screen verbosity level -
-o
tells where to store the tables -
-f
creates the output directory specified by-o
if it doesn't exist -
-t
lists the tree names on which you want to synchronize on; if this option is not used, all trees are taken into account
For more options, see compareRootRLENumbers.py -h
.
Note that there is no upper limit on how many sync Ntuples you want to perform the synchronization on; the minimum number of input files is obviously two.
The tables are generated in two or three file formats:
-
.txt
files are humand-readable tables -
.csv
are tables in CSV (comma-separated value) format -
.xls
are Excel tables which are produced only ifunoconv
program is installed
It should be noted that this script is self-sufficient and requires the following prerequisites:
pyROOT
-
prettytable
module -
unoconv
program (optional)
The program works on any platform as long as those two (or three) requirements are satisfied. The first two requirements are automatically satisfied in any (recent) CMSSW release.
The command produces 5 types of files as a result:
-
cross_Group1.*
,cross_Group2.*
,cross_Group3.*
etc shows how many events in one channel & region are shared by another channel & region of the same group. You don't want to see any overlaps between two different SRs and between fake/flip AR and SR of the same channel. So, this table serves as a cross-check for mutual exclusivity of the SRs, and SRs and fake/flip ARs of the same channel.Example 1.1:
cross_Group1.txt
+------------+---------+-----------+-----------+------------+------------+ | Group1 | 2lSS_SR | 2lSS_Fake | WZctrl_SR | ttWctrl_SR | ttZctrl_SR | +------------+---------+-----------+-----------+------------+------------+ | 2lSS_SR | 463 | 0 | 0 | 0 | 0 | | 2lSS_Fake | 0 | 157 | 0 | 0 | 0 | | WZctrl_SR | 0 | 0 | 9 | 0 | 2 | | ttWctrl_SR | 0 | 0 | 0 | 79 | 0 | | ttZctrl_SR | 0 | 0 | 2 | 0 | 43 | +------------+---------+-----------+-----------+------------+------------+
According to this table
Group1
has 9 events in the SR of WZ CR, and 2 from those events also enter the SR of ttZ CR. This is a bad sign because the SRs of two different channels should not overlap.Example 1.2:
cross_Group2.txt
+------------+---------+-----------+-----------+------------+------------+ | Group2 | 2lSS_SR | 2lSS_Fake | WZctrl_SR | ttWctrl_SR | ttZctrl_SR | +------------+---------+-----------+-----------+------------+------------+ | 2lSS_SR | 453 | 10 | 0 | 0 | 0 | | 2lSS_Fake | 10 | 167 | 1 | 0 | 0 | | WZctrl_SR | 0 | 1 | 14 | 0 | 0 | | ttWctrl_SR | 0 | 0 | 0 | 79 | 0 | | ttZctrl_SR | 0 | 0 | 0 | 0 | 43 | +------------+---------+-----------+-----------+------------+------------+
Here we see that
Group2
has selected 167 events in the fake AR of 2lSS channel. However, 10 events from this region also enter the SR of the same channel, which is wrong. There is 1 event selected in both the fake AR of 2lSS channel and SR of WZ CR, which is actually fine. Some minor overlaps between the SR of one channel and AR of another channel are expected and acceptable. -
Files
cross_Group1_Group2.*
,cross_Group2_Group3.*
,cross_Group1_Group3.*
show the overlaps of all channels and regions common between the two groups. The tables don't tell who has implemented the event selection incorrectly, but they tell that some kind of event migration is going on between the two groups.Example 2.1:
+-----------------+---------+-----------+-----------+------------+------------+------+-------+ | Group1 v Group2 | 2lSS_SR | 2lSS_Fake | WZctrl_SR | ttWctrl_SR | ttZctrl_SR | none | total | +-----------------+---------+-----------+-----------+------------+------------+------+-------+ | 2lSS_SR | 433 | 2 | 0 | 0 | 0 | 18 | 453 | | 2lSS_Fake | 0 | 142 | 0 | 0 | 0 | 25 | 167 | | WZctrl_SR | 0 | 0 | 5 | 0 | 0 | 2 | 7 | | ttWctrl_SR | 11 | 1 | 0 | 70 | 0 | 5 | 87 | | ttZctrl_SR | 2 | 0 | 0 | 0 | 0 | 5 | 7 | | none | 17 | 12 | 4 | 9 | 43 | x | 76 | | total | 463 | 157 | 9 | 79 | 43 | 62 | x | +-----------------+---------+-----------+-----------+------------+------------+------+-------+
Here rows (columns) correspond to channels & regions by
Group2
(Group1
).- the same 433 events are selected in 2lSS SR by both Group1 and Group2
- 11 events selected in 2lSS SR by
Group1
are actually selected in ttW SR byGroup2
- 2 events selected in 2lSS SR by
Group2
are actually selected in 2lSS fake AR byGroup2
- etc etc
-
none
means that the events selected in one category by one group are not selected at all by some other group
-
For each cell in tables 1. are text files containing the RLE numbers of these events:
cross_rle_Group1_syncTree_2lSS_SR_syncTree_2lSS_SR.txt
cross_rle_Group1_syncTree_2lSS_SR_syncTree_2lSS_Fake.txt
cross_rle_Group1_syncTree_2lSS_SR_syncTree_WZctrl_SR.txt
- ...
cross_rle_Group3_syncTree_ttZctrl_SR_syncTree_ttZctrl_SR.txt
-
table_syncTree_$CHANNEL_$REGION.txt
shows the same information as 2. but it compares multiple groups at once. This is extremely useful when multiple groups have covered the same channel and SR -- easier to see who to "blame".Example 4.1:
table_syncTree_2lSS_SR.txt
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| syncTree_2lSS_SR | none | Group2 | Group3 | Group1 | Group2 & Group3 | Group1 & Group2 | Group1 & Group3 | total |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| Group2 | | | 19 | 20 | | | 19 | 453 |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| Group3 | | 24 | | 1 | | 0 | | 458 |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| Group1 | | 30 | 6 | | 6 | | | 463 |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| Group2 & Group3 | 434 | | | 1 | | | | |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| Group1 & Group2 | 433 | | 0 | | | | | |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| Group1 & Group3 | 457 | 24 | | | | | | |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
| Group1 & Group2 & Group3 | 433 | | | | | | | |
+--------------------------+------+--------+--------+--------+-----------------+-----------------+-----------------+-------+
Explanation:
- the same 433 events are selected by
Group1
,Group2
andGroup3
-
Group1
andGroup3
select the same 457 events but 24 of those events are rejected byGroup2
-
Group2
andGroup3
select the same 434 events but 1 of those events are rejected byGroup1
-
Group1
selects- 30 events that are rejected by
Group2
- 6 events that are rejected by
Group2
andGroup3
- 30 events that are rejected by
-
Group2
selects- 20 events that are rejected by
Group1
- 19 events that are rejected by
Group1
andGroup3
- 20 events that are rejected by
-
Group3
selects- 1 event that is rejected by
Group1
- 24 events that are rejected by
Group2
- 1 event that is rejected by
- conclusion:
Group1
andGroup3
agree the most andGroup2
has some catch up to do. Although in practice there have been instances where the group that disagreed the most was actually right...
-
For each filled cell in tables 4. there are files containing with the corresponding RLE numbers:
syncTree_2lSS_SR_Group1_Group2_Group3_select.txt
syncTree_2lSS_SR_Group1_Group2_select_Group3_reject.txt
syncTree_2lSS_SR_Group1_Group2_select.txt
syncTree_2lSS_SR_Group1_Group3_select_Group2_reject.txt
syncTree_2lSS_SR_Group1_Group3_select.txt
syncTree_2lSS_SR_Group1_select_Group2_Group3_reject.txt
syncTree_2lSS_SR_Group1_select_Group2_reject.txt
syncTree_2lSS_SR_Group1_select_Group3_reject.txt
syncTree_2lSS_SR_Group2_Group3_select_Group1_reject.txt
syncTree_2lSS_SR_Group2_Group3_select.txt
syncTree_2lSS_SR_Group2_select_Group1_Group3_reject.txt
syncTree_2lSS_SR_Group2_select_Group1_reject.txt
syncTree_2lSS_SR_Group2_select_Group3_reject.txt
syncTree_2lSS_SR_Group3_select_Group1_Group2_reject.txt
syncTree_2lSS_SR_Group3_select_Group1_reject.txt
syncTree_2lSS_SR_Group3_select_Group2_reject.txt
What can be done with all this information:
- fix our code if we see an overlap between the same SRs or between the SR and fake/flip AR of the same channel
- send the RLE numbers (
syncTree_$CHANNEL_$REGION_Tallinn_select_*_reject.txt
) that the other group rejects but we select in the same channel & region - figure out why we reject the events that the other group selects
The last point can be executed in an automatic way:
./test/tthSyncNtuple.py \
-e 2017 -v 2018Nov07_debug -O -o sync_Tallinn_v30.root \
-S "~/path/to/results/%s_Group1_Group2_select_Tallinn_reject.txt" \
-D -c 2lss ttWctrl ttZctrl WZctrl
The new options are:
-
-S
basically tells each analysis job where the RLE numbers are ('%s
' is placeholder for the tree name) -
-D
enables debugging messages in each analysis job -
-c
lists the channels (to save time on waiting)
Note that this command won't produce any sync Ntuples because the jobs are run on events that we reject but other groups (Group1
and Group2
in this example) reject.
The important output of this command are actually the log files, which include the cutflow table and detailed information about each event (only if -D
flag is supplied).