Filtering ESDs on the Grid
Using the acmd.py command one can filter ESDs given a list of events. Here is a perscription for how to do that using the Grid. These commands have been tested and validated using release 15.6.9.13 on a lxplus node.
- Create a file called eventlist.txt containing the events you want to select. The format should have two columns the first is run number the second is event number. This file must be in the local directory where you will issue the prun command.
- Set up your release and grid environment. I have used 15.6.9.13
- Issue the following command in a directory containing the file eventlist.txt:
%>prun –exec “acmd.py filter-files -o filter.ESD.pool.root -s eventlist.txt \`echo %IN | sed ‘s/,/ /g’\`” –outputs filter.ESD.pool.root –athenaTag=15.6.9.13,AtlasProduction –nFilesPerJob 12 –dbRelease LATEST –inDS ESDDATASET –outDS OUTPUTDATASETNAME
Here you would like to change ESDDATASET to the dataset you wish to run on, and OUTPUTDATSETNAME to the output dataset name. Everything else should be the same. - When the prun jobs have completed dq2-get the results using:
dq2-get OUTPUTDATASETNAME - cd to the directory of OUTPUTDATSETNAME
- Merge the output ESD files with the following command:
Merging_trf.py –omitvalidation=ALL inputESDFile=`ls user*.root* | tr -d ‘\n’ |sed ‘s/.rootu/.root,u/g’` –ignoreerrors=ALL autoConfiguration=everything outputESDFile=Filter.merge.ESD.pool.root
This final command will produce a single output file from all the subjob ESDs, it ignores ESDs with zero entries, and the time it takes to run is dependent on the number of ESDs. For run 156682 with 66 subjobs it took ~10min to run interactively on an lxplus node. In total this took ~40min from start to finish.