Accessing Data on the Grid#

In this page, we provide an overview of how to access data on the grid using the Dirac Portal and Dirac commands. The last section Using Ganga explains how to use ganga, a python-based job submission and management system, to handle data on the grid.

Warning

Please note that this page is intended as a brief overview of accessing data on the grid and was not reviewed by experts in bookkeeping, data management, or DIRAC. For accurate and detailed information, refer to the official documentation.

Note

Before executing any Dirac command, remember to initialise your LHCb proxy with lhcb-proxy-init.

Bookkeeping path, LFNs, replicas and PFNs#

As explained in this twiki page, To manage data on the grid,

  • Each production is assigned a bookkeeping path, which keeps track of all the output files

  • The files are identified by their Logical File Names (LFNs), which are always of the form /lhcb/....

  • Each file can be stored in different sites, called Storage Elements (SE), such at one of the CERN sites (e.g., CERN_MC-DST-EOS), the IN2P3 sites (e.g., IN2P3_MC-DST), and others.

  • These instances of a file are called replicas, and it is important to note that replicas may change with time.

  • Each replica is identified by its Physical File Name (PFN).

The Dirac Portal is a web-based interface that allows you, among other things, to browse the data available on the grid, find their bookkeeping path and LFNs (see the next section for more details), and obtain and their dddb_tag and conddb_tag (see the Obtain the Condition Database Tags of a MC Production section).

Additionally, Dirac commands can be used to quickly retrieve LFNs (see Get the LFNs section) and PFNs (see Get replicas section).

For further information, please refer to:

Explore Available Data on the Grid#

The Dirac Portal (https://lhcb-portal-dirac.cern.ch/DIRAC/), provides access to the data available on the grid, which is indexed under Application > Bookkeeping Browser.

For instance, the simulated data with the following characteristics can be found under the bookkeeping path:

/MC/Upgrade/Beam7000GeV-Upgrade-MagDown-Nu7.6-25ns-Pythia8/Sim10b/30000000/XDIGI

corresponding to simulated data with

  • 8722 XDIGI files

  • simulation version: sim10b

  • Min-bias data (event type 30000000): no selection or particular decays applied

  • Spill-over: Bunch-crossing events are produced every 25 ns, and event overlapping is taken into account in the simulation

  • Magnet polarity: Down

  • Average number of \(p\)-\(p\) collisions: \(\nu = 7.6\)

  • Beam energy in the Center-of-Mass: 7 TeV

Tip

To quickly find the files, you can type sim+std://MC/Upgrade/Beam7000GeV-Upgrade-MagDown-Nu7.6-25ns-Pythia8/Sim10b/30000000/XDIGI in the address bar at the bottom of the right panel of the LHCb Bookkeeping browser.

Obtain the Condition Database Tags of a MC Production#

dddbtag and conddb_tag corresponds to 2 tags in the Condition Database (see DOI 10.1088/1742-6596/119/7/072010), which are used to track different aspects of the simulation data. Specifically

  • dddb_tag corresponds to the detector description

  • condb_tag corresponds to the SIMCOND tag (source), which relates to the simulation conditions.

You will need these tags to run Moore on simulated data. In order to obtain them, you need to

  1. Obtain the ProductionID of the production. In the central panel of the Bookkeeping browser, a list of Logical File Names (LFNs) are displayed. For example:

    /lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00001799_1.xdigi
    

    The ProductionID is common to all the files, and is 00171960 in the LFN.

  2. Then, go to the Transformation Monitor in Applications > Data > Transformation Monitor.

  3. Enter the ProductionID 00171960 in the text field labeled “ProductionID(s):”

  4. Click “Submit” to submit the search request of this production

  5. In the central panel, right-click on the only request and select “Show request”.

  6. In the “Production Request Manager” tab that opens, right-click on the only production shown in the central panel and select “View”. You should now see details about the simulation conditions of the samples.

    D: 104392
    Name: Sim10b test for Upgrade - MD - Pythia8 - L=2x10^33 - reference
    Type: Simulation
    State: Done
    Priority: 1a
    Author: gcorti WG:
    Event type: 30000000 minbias
    Number of events: 10000000
    Starting Date: 2022-11-17
    Finalization Date: 2022-11-21
    Fast Simulation Type: None
    Retention Rate: 1
    
    Simulation Conditions: Beam7000GeV-Upgrade-MagDown-Nu7.6-25ns-Pythia8
    Beam: beta*~3m, zpv=0.0mm, interal xAngle=-0.135mrad to give total as in options Beam energy: 7000 GeV Generator: Pythia8 G4 settings: specified in sim step
    Magnetic field: -1 Detector: Upgrade - Baseline detector: VP compact w microchannel cooling, UT and FT monolayer, no M1, no SPD/PRS Luminosity: pp collisions nu = 7.6, 25ns spillover
    
    Processing Pass: Sim10b
    MC Version: Upgrade
    Step 1 Sim10b Run3 for basic tests - reference nominal lumi, spillover 25 ns(162045/Sim10b) : Gauss-v56r1
    System config: x86_64_v2-centos7-gcc11-opt MC TCK:
    Options: $APPCONFIGOPTS/Gauss/Beam7000GeV-md100-nu7.6-HorExtAngle.py;$APPCONFIGOPTS/Gauss/EnableSpillover-25ns.py;$DECFILESROOT/options/@{eventType}.py;$LBPYTHIA8ROOT/options/Pythia8.py;$APPCONFIGOPTS/Gauss/Gauss-Upgrade-Baseline-20150522.py;$APPCONFIGOPTS/Gauss/G4PL_FTFP_BERT_EmOpt2.py Options format: Multicore: N
    DDDB: dddb-20221004 Condition DB: sim-20220929-vc-md100 DQTag:
    Extra: AppConfig.v3r411;Gen/DecFiles.v32r1 Runtime projects:
    Visible: Y Usable:Yes
    Input file types: Output file types: SIM
    
    Step 2 Digi15-Upgrade for Upgrade studies with spillover - xdigi(162049/Digi15U2) : Boole-v44r0
    System config: MC TCK:
    Options: $APPCONFIGOPTS/Boole/Default.py;$APPCONFIGOPTS/Boole/EnableSpillover.py;$APPCONFIGOPTS/Boole/Boole-Upgrade-Baseline-20200616.py;$APPCONFIGOPTS/Boole/Upgrade-RichMaPMT-NoSpilloverDigi.py;$APPCONFIGOPTS/Boole/xdigi.py;$APPCONFIGOPTS/Boole/Boole-Upgrade-IntegratedLumi.py Options format: Multicore: N
    DDDB: fromPreviousStep Condition DB: fromPreviousStep DQTag:
    Extra: AppConfig.v3r411 Runtime projects:
    Visible: N Usable:Yes
    Input file types: SIM Output file types: XDIGI
    
    
    Inform also: michal.kreps@cern.ch,adam.davis@cern.ch
    
    Comments
    Test with reference conditions for timing checks and sub-detectors verification
    
  7. Under “Step 1”, you can find the dddbtag and condb_tag:

    DDDB: dddb-20221004 Condition DB: sim-20220929-vc-md100
    

Note

To my knowledge, this is the only method currently available for obtaining the dddb_tag and conddb_tag. If you are aware of other methods, please let us know.

Get the LFNs#

There are different ways to retrieve the LFNs of files in the grid. One way is to use the Bookkeeping browser in the Dirac Portal, as explained in the previous section. To save the LFNs, click on Save on the bottom right panel of the Bookkeeping browser and save them in a file with the .txt, .py or .csv extension.

Alternatively, you can use the dirac-bookkeeping-get-files command on LXPLUS. For example, to retrieve the LFNs of all files in a given production, run the following command:

lb-dirac dirac-bookkeeping-get-files --BKQuery /MC/Upgrade/Beam7000GeV-Upgrade-MagDown-Nu7.6-25ns-Pythia8/Sim10b/30000000/XDIGI

This command will output a list of LFN paths, such as:

LFN:/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi

If you have an LFN path and want to retrieve the bookkeeping path, you can use the dirac-bookkeeping-file-path command. For example:

lb-dirac dirac-bookkeeping-file-path /lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi

Get replicas#

To retrieve the Physical File Names (PFNs) of a Logical File Name (LFN), you can run use the dirac-dms-lfn-replicas command. For example:

lb-dirac dirac-dms-lfn-replicas LFN:/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi

The output of this command will provide you with the PFNs of the file in the different Storage Elements (SE) where it’s stored. For instance:

Successful : 
    /lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi : 
        IN2P3_MC-DST : root://gridproxy@ccxrootdlhcb.in2p3.fr//pnfs/in2p3.fr/data/lhcb/LHCb-Disk/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi
  • IN2P3_MC-DST is the name of the Storage Element where the file is stored.

  • root://gridproxy@ccxrootdlhcb.in2p3.fr//pnfs/in2p3.fr/data/lhcb/LHCb-Disk/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi is the PFN of the replica within this SE.

Get the XML catalog#

You can generate an XML catalog that contains information about the association between LFNs and replicas. To generate the XML catalog, use the dirac-bookkeeping-genXMLCatalog command:

lb-dirac dirac-bookkeeping-genXMLCatalog -l LFN:/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi

Important

Replicas can change over time, so the XML catalog needs to be regenerated periodically.

Download a file#

To download a file, you can use the dirac-dms-get-file command

lb-dirac dirac-dms-get-file LFN:/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi

You can also use the PFN of the file instead of the LFN:

lb-dirac dirac-dms-get-file root://gridproxy@ccxrootdlhcb.in2p3.fr//pnfs/in2p3.fr/data/lhcb/LHCb-Disk/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi

you can also use the xrdcp command

xrdcp root://gridproxy@ccxrootdlhcb.in2p3.fr//pnfs/in2p3.fr/data/lhcb/LHCb-Disk/lhcb/MC/Upgrade/XDIGI/00171960/0000/00171960_00000353_1.xdigi .

Using ganga#

You can perform the same operations using ganga, a python-based job submission and management system,

To start an interactive ganga Python shell, simply run ganga in your terminal on LXPLUS

Then, you can use the following code to perform the same operations:

# Query the bookkeeping database
bkpath = "/MC/Upgrade/Beam7000GeV-Upgrade-MagDown-Nu7.6-25ns-Pythia8/Sim10b/30000000/XDIGI"
bkq = BKQuery(bkpath)
data = bkq.getDataset()

# Get the LFNs of the dataset
data.getLFNs()

# Get the replicas of the first two files in the dataset
data[0:2].getReplicas()

# Generate the XML catalog for the first two files in the dataset
data[0:2].getCatalog()

You can also use ganga inside a Python script. To do so, you’ll first need to export the following environment variables:

export GANGA_CONFIG_PATH=${GANGA_CONFIG_PATH:-GangaLHCb/LHCb.ini}
export GANGA_SITE_CONFIG_AREA=${GANGA_SITE_CONFIG_AREA:-/cvmfs/lhcb.cern.ch/lib/GangaConfig/config}
export PYTHONPATH=$PYTHONPATH:/cvmfs/ganga.cern.ch/Ganga/install/LATEST/lib/python3.8/site-packages/

These environment variables are already set up in setup/setup.sh.

Once you’ve exported the environment variables, you can import ganga in your Python script using

import ganga.ganga

All the ganga objects, such as BKQuery, are accessible inside the ganga namespace, e.g., ganga.BKQuery.

Important

Please note that ganga does not handle many concurrent sessions well, so it cannot be used inside many subjobs in parallel, for example, in HTCondor.