tools package

tools package#

Package that contains helper functions used within this repository.

tools.configmanager module#

Module to handle the configuration of the repository (the YAML files in the setup/ folder)

tools.configmanager.assert_config_validity(config)[source]#

Check the the configuration has the correct formatting.

Parameters:: config (dict) – configuration as configured in YAML files.

tools.configmanager.filter_config(config, sections=None, constraints=None)[source]#

Filter the configuration to keep only what is necessary for the algorithm to run.

Parameters:

config – actual configuration
sections – parameter sections to add
constraints – associates a section with the parameters to include. If the section is not in this dictionnary, all the parameters are included.

Returns:

Filtered configuration

tools.configmanager.group_argparse_args(parser_args, sections, constraints=None, ignore=None)[source]#

Group the argparse parameters by section according to as it is supposed to be for a configuration

Parameters:

parser_args (argparse.Namespace | dict) – result of the parse_args method
sections (typing.Iterable[str]) – parameter sections to include
constraints (typing.Optional[typing.Dict[str, typing.Iterable[str]]]) – associates a section with the parameters to include. If the section is not in this dictionnary, all the parameters are included.
ignore (typing.Optional[typing.List[str]]) – list of variables to ignore in the grouping

Return type:

dict

Returns:

Configuration described by parser_args

tools.configmanager.import_config(path, resolve_relative_paths=True)[source]#

Import the configuration and asserts its format is valid. Also supports an include: section where a YAML file can be included.

Parameters:

path (str) – path to the YAML configuration file
resolve_relative_paths (bool) – whether to resolve the relative paths using the resolve_paths_in_config() function

Return type:

dict

Returns:

Configuration

tools.configmanager.import_default_config(repo, sections=None, constraints=None)[source]#

Import the default configuration.

Parameters:: repo – path to the root of the repository

Notes

The default configuration is in <repo>/setup/config_default.yaml. The custom configuration is in <repo>/setup/config.yaml

tools.configmanager.merge_configs(configs)[source]#

Merge configurations. Configurations are either given by their path ( in this case they are loaded) or the actual dictionnary.

Parameters:: configs (typing.List[str | dict]) – list of configs or their paths
Return type:: dict
Returns:: Merged configuration, by override the configuration one by one

tools.configmanager.override_nested_dict(dict_to_insert, dict_to_override, inplace=False)[source]#

Override a nested dictionnary by another nested dictionnary

Parameters:

dict_to_insert (dict) – dictionnary to add to dict_to_override
dict_to_override (dict) – base dictionnary which dict_to_insert is added to
inplace (bool) – whether to override in place

Return type:

dict | None

Returns:

overriden dictionnary if not inplace

tools.configmanager.prepare_parser(parser, sections, program_args=None, constraints=None, with_config=True)[source]#

Add sections to a parser.

Parameters:

parser (argparse.ArgumentParser | str) – parser to complete (type argparse.ArgumentParser) or description of the parser to create (if type str)
sections (typing.Iterable[str]) – parameter sections to add
program_args (typing.Optional[dict]) – Dictionnary that allows to specify how to configure the subparsers that define which program to uses. It contains the key-value dest that specifies which program to run, and the possible programs sections (as a program is also associated with a section.) The other key-value couples (such as help) are passed to the add_subparsers() method.
constraints (typing.Optional[typing.Dict[str, typing.Iterable[str]]]) – associates a section with the parameters to include. If the section is not in this dictionnary, all the parameters are included.
with_config (bool) – whether to add the --config argument.

Return type:

argparse.ArgumentParser | None

Returns:

The parser that was created, if parser of type str

tools.configmanager.resolve_paths_in_config(config, config_path)[source]#

If relative paths or paths with wildcards are given in a configuration file,

Replace the wildcards by the actual environment variable values
express them relative to the directory where the YAML configuration file was.

The function is in-place.

Parameters:

config (dict) – configuration in config_path
config_path (str) – path to the YAML file that contained config

Notes

This is applied to the options specified in definitions.dconfig.path_in_config

tools.configmanager.return_config_from_parser(repo, sections, program_args=None, constraints=None, parse_known_args=False)[source]#

Load the default configuration. Create a parser that allows to parse (a) YAML configuration file(s) as well as arguments to change the configuration. Returns the configuration altered by the arguments that are parsed.

Parameters:

repo – path to the root of the repository
sections – parameter sections to add
constraints – associates a section with the parameters to include. If the section is not in this dictionnary, all the parameters are included.
program_args – Dictionnary that allows to specify how to configure the subparsers that define which program to use. It contains the key-value dest that specifies which program to run, and the possible programs sections (as a program is also associated with a section.) The other key-value couples (such as help) are passed to the add_subparsers() method.
parse_known_args – whether to only parse known arguments. The other arguments that are unknown given sections and constraints will be passed to the other scripts.
return_last_config_path – whether to return the last configuration path that was used. The latter is used to determine the output directory in the case where auto_output_mode is set to same.

Returns:

Configuration. If parse_known_args is set to True, alse returns the arguments that were not parsed.. If return_last_config_path, also return the path of the last configuration file that was used (or None if there were no configuration file)

tools.configmanager.str2bool(value)[source]#

Correct parsing of a boolean.

Parameters:: value (str) – value to parse
Return type:: bool
Returns:: True if value represents True (yes, true, …). False if value represents False (no, false, …)

Notes

Taken from https://stackoverflow.com/questions/15008758

tools.configmanager.update_config(config, base_config, inplace=False)[source]#

Update the configuration with another configuration

Parameters:

config (str | dict) – path to the new YAML configuration or actual configuration
base_config (str | dict) – base configuration that will be updated

Return type:

dict | None

Returns:

Updated configuration if inplace set to False

tools.envvar module#

Helper functions to interface with environment variables.

tools.envvar.get_environment_variable(env_var_name, is_bool=False)[source]#

Get the value of an environment variable.

Parameters:

env_var_name (str) – Name of the environment variable to load
is_bool (bool) – whether the environment variable is assumed to be a boolean variable (either false or true). In this case, a boolean variable is returned.

Raises:

AssertionError – The environment variable does not exist
AssertionError – is_bool but the environment is neither true nor false

Return type:

Union[bool, str]

tools.envvar.get_repo_path()[source]#

Get the path to root of the repository.

Return type:: str

tools.envvar.resolve_wildcards(strings)[source]#

Replace wildcards or placeholders in strings by the corresponding environment variables.

Parameters:: strings (str | typing.List[str]) – a string or list of strings
Return type:: str | typing.List[str]
Returns:: string or list of strings with any placeholders replaced by the corresponding environment variable.

Notes

An error is raised of not all the placeholders are replaced

tools.envvar.set_environment_variable(env_var_name, value)[source]#

Create and environment variable.

Parameters:

env_var_name (str) – name of the environment variable
value (str | bool) – value of the environment variable. If boolean, the variable is set to true or false

tools.inoutconfig module#

Tools to configure the input and output paths.

class tools.inoutconfig.MooreInputConfig(config, repo, return_paths=False)[source]#

Bases: object

This context manager allows the configure the input needed for Moore, given the configuration.

It returns the path to the python file that is needed, that is, python_input in the configuration, or a custom python file that allows the configure the input in the case where bookkeeping_path or paths are used. In the latter, it creates the necessary temporary configuration file that the custom python input file will read to configure the input.

When the context manager is exited, the environment variables and temporary files are deleted.

config#: configuration of moore_input

repo#: path to the root of the repository

return_paths#: whether to return the paths as well

tools.inoutconfig.ban_storage_elements(banned_storage_elements, paths, xml_catalog_path)[source]#

Remove the physical links to banned storage elements in the XML catalog file.

Parameters:

banned_storage_elements (List[str]) – storage elements that must not be used
paths (List[str]) – list of paths that may include LFN paths
xml_catalog_path (str) – path to the XML catalog to alter

Return type:

List[str]

Returns:

List of LFNs to remove as they are only stored on banned storage elements

Notes

All this is EXTREMELY ugly but I couldn’t find another way of removing storage elements while keeping XML files.

tools.inoutconfig.generate_catalog(paths, use_ganga=False)[source]#

Get the PFNs given LFNs.

Parameters:

paths (typing.Iterable[str]) – list of paths, that can contain LFNs (Logical File Name). LFNs must start with LFN:
use_ganga (bool) – whether to use ganga

Return type:

tempfile.NamedTemporaryFile | tempfile.TemporaryDirectory | None

Returns:

Temporary file that contains the XML catalog of the LFNs, or None if no LFNs were found.

Notes

This function uses ganga.

tools.inoutconfig.get_allen_input(indir=None, mdf_filename=None, geo_dirname=None, paths=None, geodir=None)[source]#

Get the MDF input paths and the geometry directory from the configuration. There are 2 ways of specifying an Allen input

With indir, mdf_filename and geo_dirname. This is practical for files generated by the xdigi2mdf program because you just need to specify the input directory
With paths and geodir

Parameters:

indir (typing.Optional[str]) – Input where directory where the MDF files are
mdf_filename (typing.Optional[str]) – MDF file name in indir
geo_dirname (typing.Optional[str]) – geometry directory name in indir
paths (typing.Optional[str]) – list of MDF paths
geodir (typing.Optional[str]) – input geometry directory

Return type:

typing.Tuple[typing.List[str], str | None]

Returns:

List of MDF input files and the geometry directory.

tools.inoutconfig.get_bookkeeping_lfns(bookkeeping_path, start_index=0, nb_files=-1)[source]#

Get the LFNs associated with a bookkeeping path.

Parameters:

bookkeeping_path (str) – path in the Dirac Bookkeeping browser
start_index (int) – index of the first LFN to retrieve
nb_files (int) – number of LFNs to retrieve

Return type:

List[str]

Returns:

List of LFNs from start_index to start_index + nb_files

Notes

This function uses ganga.

tools.inoutconfig.get_moore_build(moore_build, platform=None)[source]#

Get what to run in order to have access to the Moore build

Parameters:

moore_build (str) – value of the build/moore option
platform (str | None) – Platform of the build to use (within lb-run or the local stack)

Return type:

typing.List[str]

Returns:

if moore_build starts with lb-run:, it is intepreted as lb-run Moore/{version} and the latter is returned. Otherwise, moore_build is returned as it is

tools.inoutconfig.get_moore_input_config(input_config)[source]#

Return the configuration dictionnary of the input file for Moore, in the case where the python_input is not used, and then, the generic python moore input file will be used.

Parameters:: input_config (dict) – section moore_input of the configuration
Return type:: dict | None
Returns:: configuration dictionnary of the input file for Moore, or None if a python file is used as input.

tools.inoutconfig.get_outdir(config, datatype=None)[source]#

Get the output path according to the output mode that was chosen.

Parameters:

config (dict) – current configuration
datatype (Optional[str]) – Type of the data (e.g., csv). This is used if auto_output_mode is set to eos.

Return type:

str

Returns:

output path

tools.xdigi2csvtools module#

Module that contains functions to configure the algorithms to execute, to get the hits, MC particles and MC hits from a (X)DIGI file.

tools.xdigi2csvtools.configure_algos(stack, algonames, outdir='persistence_csv', retina_clusters=True, extended=False, all_mc_particles=False, erase=True)[source]#

Configure the algorithms to run by changing the context (by basically doing a bunch of algo.bind(...))

Parameters:

stack (ExitStack) – context manager. Modified in place
algonames (List[str]) – list of the algorithms to run. We need this list in order only to to configure algorithms that will be run.
retina_clusters (bool) – Whether to use Retina clusters
extended (bool) – Whether to erase existing CSV files at the same locations
outdir (Optional[str]) – path to the directory where the CSV files are saved
erase (bool) – whether to erase an existing CSV file

tools.xdigi2csvtools.get_algo_sequence(algonames)[source]#

A function that returns the list of algorithms to run to persist the hits and MC particles into CSV files, according to the configuration given as an input

Parameters:: algonames (List[str]) – list of the names of the algorithms to run
Return type:: Callable[[], Reconstruction]
Returns:: Returns a function that returns the list of the algorithms, wrapped around a Moore Reconstruction object

tools.xdigi2csvtools.get_algonames(detectors, dump_event_info=False, dump_mc_hits=False)[source]#

Obtain the list of the names of the algorithms to run.

Parameters:

detectors (List[str]) – list of the detectors for which the hits and MC particles are dumped
dump_event_info (bool) – whether to dump event_info.csv
dump_mc_hits (bool) – wether to dump the MC hits

Return type:

List[str]

Returns:

List of the names of the algorithms to run

tools.tcomputing module#

tools.tcomputing.add_environment_variables_to_copy(environment_variables, backend)[source]#

Add in-place the environment variables to copy.

Parameters:

environment_variables (Dict[str, str]) – dictionnary of environemnt variables that is going to be altered in place.
backend (str) – backend used for the computation

tools.tcomputing.submit_job(script_path, args, nb_files=None, nb_files_per_job=-1, backend='condor', logdir=None, max_runtime=None, max_transfer_output_mb=None)[source]#

Submit a job in HTCondor or ganga.

Parameters:

script_path (str) – path to the script to run
args (list) – list of the arguments to pass to the script
nb_files (Optional[int]) – number of files in total
nb_files_per_job (int) – number of files per job
backend (str) – condor, ganga-local or ganga-dirac. Only the condor backend is suppose to work properly
logdir (Optional[str]) – path where to save the log
max_runtime (Optional[int]) – time within which the job should be run. This is used by the condor backend
max_transfer_output_mb (Optional[int]) – MAX_TRANSFER_OUTPUT_MB HTCondor parameter. This is used by the condor backend

tools.tcomputing.write_condor_job(job, path, items=None)[source]#

Write a HTCondor submit file.

Parameters:

job (Dict[str, str]) – associates an argument with its value in the submit file
path (str) – where to save the file
items (Optional[list]) – list of items, used after the queue statement

tools.tconversion module#

Tools to convert a CSV file into another format.

tools.tconversion.convert_all_table_paths(indir, outformat, outdir=None, outcompression=None, informat='csv', incompression=None, keep_original=True, verbose=True)[source]#

Convert all the CSV-like files in a given directory into another format and/or compression.

Parameters:

indir (str) – input directory where the files to convert are
outformat (str) – output format
outdir (Optional[str]) – output directory. If not given, it is indir
outcompression (Optional[str]) – compression of the output files
informat (str) – format of the input files
incompression (Optional[str]) – compression of the input files. Only use to figure out the extension
keep_original (bool) – whether to keep the original files. If set to False, the files are removed
verbose (bool) – whether to print some information

tools.tconversion.get_all_table_paths(indir, ext='.csv')[source]#

This function reads all the CSV-like files that are in a folder.

Return type:: List[str]

tools.tpaths module#

Module to manage paths.

tools.tpaths.expand_paths(paths)[source]#

Expand paths in a list of paths that have a *.

Parameters:

paths (List[str]) – list of paths. The paths that contain a star * are expanded using the python glob.glob() function
**kwargs – passed to glob.glob()

Return type:

List[str]

Returns:

List of paths, where the paths that contain * have been expanded

tools.tpaths.resolve_relative_path(path, reference)[source]#: If path is relative, turns it into an absolute path w.r.t. reference. If it is an absolute path, just return the path itself.

tools.tpaths.resolve_relative_paths(paths, reference)[source]#: Same as resolve_relative_path() but input can be one path or a list of paths.

tools.tpaths.write_yaml_file(dictionnary, path)[source]#: Write a dictionnary to a YAML file.

tools.tpaths.write_yaml_temp_file(dictionnary, **kwargs)[source]#

Write a dictionnary to a temporary YAML file.

Parameters:

dictionnary (dict) – dictionnary to dump
kwargs – passed to tempfile.NamedTemporaryFile()

Return type:

NamedTemporaryFile

Returns:

Written temporary file

tools package

Contents

tools package#

tools.configmanager module#

tools.envvar module#

tools.inoutconfig module#

tools.xdigi2csvtools module#

tools.tcomputing module#

tools.tconversion module#

tools.tpaths module#