Configuration Files and Helper Scripts#

This guide explains how to use configuration files in order to configure programs. It does not explain how the programs work and how to run it on your data.

If you want to view all the possible options that can be set up in a YAML configuration file, along with their descriptions, the easiest way is to refer to the setup/config_default.yaml file, which contains the default configuration.

Configuration files#

Brief point on YAML files#

YAML (Yet Another Marking Language) is a data serialisation language commonly used for configuration files. You can refer to this course for a small introduction to YAML.

In Python, a YAML file defines a dictionary whose keys are strings and values can be str, float, int, list, dict or None.

Here is an example:

key_string1: string  # str
key_string2: 'string'  # str
key_string3: "string"  # str
key_int: 1  # int
key_float: 1.  # float
key_bool1: true  # bool
key_bool2: false  # bool
key_list1: [el1, el2, el3] # list
key_list2: # list
- el1
- el2
- el3
key_dict1: {subkey1: subvalue1, subkey2: subvalue2} # dict
key_dict2: # dict
    subkey1: subvalue1
    subkey2: subvalue2
key_none: null  # `None`

This example shows different data types that can be used in YAML files. Note that the hash symbol (#) is used to indicate comments in the YAML file.

Note

In this repository, YAML files are used instead of JSON files because YAML allows comments to be included in the file.

Configuration and default configuration#

The programs in this repository are fully configurable using YAML files. The YAML files are divided into sections, as shown below:

section1:
    param1.1: value1.1
    param1.2: value1.2
section2:
    param2.1: value2.1
    param2.2: value2.2

This allows scripts used to run the programs (see next section) to load only the sections and parameters that are needed.

The default sections, parameters, and their default values are defined in the YAML file config_default.yaml. Default values can be overridden by creating and writing to setup/config.yaml.

The purpose of each section and parameter can be found in definitions/dconfig.py. The available sections are:

  • build: the path to the build directories of Moore and Allen standalone. By default, this points to the repositories set up in my AFS public space.

  • global: general variables

  • moore_input: parameters used to configure the input of a Moore algorithm

  • output: parameters used to configure the output directory

  • computing: parameters used to divide a job into sub-jobs using ganga or HTCondor

In addition, each program in this repository has its own section in the configuration file: xdigi2csv, xdigi2root, xdigi2mdf and mdf2csv.

Run a program#

Helper scripts to run the programs are located in the run folder:

  • run/moore/run.py is used to run one of the Moore programs (XDIGI2CSV, XDIGI2ROOT and XDIGI2MDF). You can select the appropriate program by running either ./run/moore/run.py xdigi2csv, or ./run/moore/run.py xdigi2root or ./run/moore/run.py xdigi2mdf.

  • ./run/mdf2csv/run.py is used to run the MDF2CSV program.

Parse arguments#

To configure a program, you can also parse the arguments quoted in setup/config_default.yaml.

For instance, if you execute ./run/moore/run.py xdigi2csv -h, you will see the appropriate parameters given in config_default.yaml repeated. This means you can configure (or override) the parameter values by parsing the arguments.

For example, you can configure the XDIGI2CSV program by running:

./run/moore/run.py xdigi2csv --detectors velo ut --extended true --paths /path/to/xdigi/file --outdir output/

In order to ensure that all the parameters in setup/config_default.yaml can be parsed, the following rules are set up:

  • If the parameter is an boolean parameter such as extended, --extended or --extended true set the parameter to True while --extended false set the parameter to false.

  • If the parameter is an str such as outdir, --outdir with no argument value set the parameter to None.

  • If the parameter is a list such as detectors, the values are provided by separating by space each element of the list.

Using a configuration file#

Passing all the arguments through the command line is not very practical. For this reason, it is possible to configure the algorithm using the -c (or --config) parameter that the scripts have. The previous command is equivalent to

./run/moore/run.py xdigi2csv --config local_config.yaml

where the content of local_config.yaml is

xdigi2csv:
    detectors:
    - velo
    - ut
    extended: true
moore_input:
    paths: /path/to/xdigi/file
output:
    outdir: output/

You can still use the command-line to override the arguments in local_config.yaml, e.g.,

./run/moore/run.py xdigi2csv --config local_config.yaml --extended false # finally set extended to `False`

Concretly, local_config.yaml overrides the arguments in config.yaml and config_default.yaml.

Important

Relative paths in a YAML file are always expressed RELATIVE TO this YAML file.

Use several configuration files#

--config can take a list of configuration so you can also run

./run/moore/run.py xdigi2csv --config local_config1.yaml local_config2.yaml

where local_config1.yaml is

xdigi2csv:
    detectors:
    - velo
    - ut
    extended: true
moore_input:
    paths: /path/to/xdigi/file

and local_config2.yaml is

xdigi2csv:
    extended: true
output:
    outdir: output/

The parameters in local_config2.yaml override the parameters in local_config1.yaml

It is also possible to include local_config1.yaml in local_config2.yaml:

include:
- local_config1.yaml
xdigi2csv:
    extended: true
output:
    outdir: output/

and execute ./run/moore/run.py xdigi2csv -c local_config2.yaml. In this case, the arguments in local_config2.yaml override the ones in local_config1.yaml.