The PyRDF API reference¶
-
PyRDF.
include_headers
(headers_paths)[source]¶ Includes the C++ headers to be declared before execution. Each header is also declared on the current running session.
- Parameters
headers_paths (str, iter) – A string or an iterable (such as a list, set…) containing the paths to all necessary C++ headers as strings. This function accepts both paths to the headers themselves and paths to directories containing the headers.
Includes the C++ shared libraries to be declared before execution. Each library is also declared on the current running session. If any pcm file is present in the same folder as the shared libraries, the function will try to retrieve them (and distribute them if working on a distributed backend).
- Parameters
shared_libraries_paths (str, iter) – A string or an iterable (such as a list, set…) containing the paths to all necessary C++ shared libraries as strings. This function accepts both paths to the libraries themselves and paths to directories containing the libraries.
-
PyRDF.
initialize
(fun, *args, **kwargs)[source]¶ Set a function that will be executed as a first step on every backend before any other operation. This method also executes the function on the current user environment so changes are visible on the running session.
This allows users to inject and execute custom code on the worker environment without being part of the RDataFrame computational graph.
- Parameters
fun (function) – Function to be executed.
*args (list) – Variable length argument list used to execute the function.
**kwargs (dict) – Keyword arguments used to execute the function.
-
PyRDF.
send_generic_files
(files_paths)[source]¶ Sends to the workers the generic files needed by the user.
- Parameters
files_paths (str, iter) – Paths to the files to be sent to the distributed workers.
-
PyRDF.
use
(backend_name, conf={})[source]¶ Allows the user to choose the execution backend.
- Parameters
backend_name (str) – This is the name of the chosen backend.
conf (str, optional) – This should be a dictionary with necessary configuration parameters. Its default value is an empty dictionary {}.
The CallableGenerator module¶
-
class
PyRDF.CallableGenerator.
CallableGenerator
(head_node)[source]¶ Class that generates a callable to parse a PyRDF graph.
-
head_node
¶ Head node of a PyRDF graph.
-
__init__
(head_node)[source]¶ Creates a new CallableGenerator.
- Parameters
head_node – Head node of a PyRDF graph.
-
get_action_nodes
(node_py=None)[source]¶ Recurses through PyRDF graph and collects the PyRDF node objects.
- Parameters
node_py (optional) – The current state’s PyRDF node. If None, it takes the value of self.head_node.
- Returns
A list of the action nodes of the graph in DFS order, which coincides with the order of execution in the callable function.
- Return type
list
-
The Node module¶
-
class
PyRDF.Node.
Node
(get_head, operation, *args)[source]¶ A Class that represents a node in RDataFrame operations graph. A Node houses an operation and has references to children nodes. For details on the types of operations supported, try :
Example:
import PyRDF PyRDF.use(...) # Choose your backend print(PyRDF.current_backend.supported_operations)
-
get_head
¶ A lambda function that returns the head node of the current graph.
- Type
function
-
operation
¶ The operation that this Node represents. This could be
None
.
-
children
¶ A list of
PyRDF.Node
objects which represent the children nodes connected to the current node.- Type
list
-
_new_op_name
¶ The name of the new incoming operation of the next child, which is the last child node among the current node’s children.
- Type
str
-
value
¶ The computed value after executing the operation in the current node for a particular PyRDF graph. This is permanently
None
for transformation nodes and the action nodes get aROOT.RResultPtr
after event-loop execution.
-
pyroot_node
¶ Reference to the PyROOT object that implements the functionality of this node on the cpp side.
-
has_user_references
¶ A flag to check whether the node has direct user references, that is if it is assigned to a variable. Default value is
True
, turns toFalse
if the proxy that wraps the node gets garbage collected by Python.- Type
bool
-
__getstate__
()[source]¶ Converts the state of the current node to a Python dictionary.
- Returns
A dictionary that stores all instance variables that represent the current PyRDF node.
- Return type
dictionary
-
__init__
(get_head, operation, *args)[source]¶ Creates a new node based on the operation passed as argument.
- Parameters
get_head (function) – A lambda function that returns the head node of the current graph. This value could be None.
operation (PyRDF.Operation.Operation) – The operation that this Node represents. This could be
None
.
-
__setstate__
(state)[source]¶ Retrieves the state dictionary of the current node and sets the instance variables.
- Parameters
state (dict) – This is the state dictionary that needs to be converted to a Node object.
-
graph_prune
()[source]¶ Prunes nodes from the current PyRDF graph under certain conditions. The current node will be pruned if it has no children and the user application does not hold any reference to it. The children of the current node will get recursively pruned.
- Returns
True if the current node has to be pruned, False otherwise.
- Return type
bool
-
The Operation module¶
-
class
PyRDF.Operation.
Operation
(name, *args, **kwargs)[source]¶ A Generic representation of an operation. The operation could be a transformation or an action.
-
Types
¶ A class member that is an
Enum
of the types of operations supported. This can be eitherACTION
,TRANSFORMATION
orINSTANT_ACTION
.
-
name
¶ Name of the current operation.
- Type
str
-
args
¶ Variable length argument list for the current operation.
- Type
list
-
kwargs
¶ Arbitrary keyword arguments for the current operation.
- Type
dict
-
op_type
¶ The type or category of the current operation (
ACTION
,TRANSFORMATION
orINSTANT_ACTION
).
For the list of operations that your current backend supports, try :
Example:
import PyRDF PyRDF.use(...) # Choose a backend print(PyRDF.current_backend.supported_operations)
-
class
Types
An enumeration.
-
__init__
(name, *args, **kwargs)[source]¶ Creates a new
Operation
for the given name and arguments.- Parameters
name (str) – Name of the current operation.
- args (list): Variable length argument list for the current
operation.
kwargs (dict): Keyword arguments for the current operation.
-
is_action
()[source]¶ Checks if the current operation is an action.
- Returns
True if the current operation is an action, False otherwise.
- Return type
bool
-
The Proxy module¶
-
class
PyRDF.Proxy.
ActionProxy
(node)[source]¶ Instances of ActionProxy act as futures of the result produced by some action node. They implement a lazy synchronization mechanism, i.e., when they are accessed for the first time, they trigger the execution of the whole RDataFrame graph.
-
GetValue
()[source]¶ Returns the result value of the current action node if it was executed before, else triggers the execution of the entire PyRDF graph before returning the value.
- Returns
The value of the current action node, obtained after executing the current action node in the computational graph.
-
-
class
PyRDF.Proxy.
Proxy
(node)[source]¶ Abstract class for proxies objects. These objects help to keep track of nodes’ variable assignment. That is, when a node is no longer assigned to a variable by the user, the role of the proxy is to show that. This is done via changing the value of the
has_user_references
of the proxied node fromTrue
toFalse
.-
__del__
()[source]¶ This function is called right before the current Proxy gets deleted by Python. Its purpose is to show that the wrapped node has no more user references, which is one of the conditions for the node to be pruned from the computational graph.
-
The RDataFrame module¶
-
class
PyRDF.RDataFrame.
HeadNode
(*args)[source]¶ The Python equivalent of ROOT C++’s RDataFrame class.
-
args
¶ A list of arguments that were provided to construct the RDataFrame object.
- Type
list
PyRDF’s RDataFrame constructor accepts the same arguments as the ROOT’s RDataFrame constructor (see RDataFrame)
In addition, PyRDF allows you to use Python lists in place of C++ vectors as arguments of the constructor, example:
PyRDF.RDataFrame("myTree", ["file1.root", "file2.root"])
- Raises
RDataFrameException – An exception raised when input arguments to the RDataFrame constructor are incorrect.
-
__init__
(*args)[source]¶ Creates a new RDataFrame instance for the given arguments.
- Parameters
*args (list) – Variable length argument list to construct the RDataFrame object.
-
get_inputfiles
()[source]¶ Get list of input files.
This list can be extracted from a given TChain or from the list of arguments.
- Returns
Name of a single file, list of files (both may contain globbing characters), or None if there are no input files.
- Return type
(str, list, None)
-
get_num_entries
()[source]¶ Gets the number of entries in the given dataset.
- Returns
This is the computed number of entries in the input dataset.
- Return type
int
-
-
class
PyRDF.RDataFrame.
RDataFrame
[source]¶ User interface to the object containing the Python equivalent of ROOT C++’s RDataFrame class. The purpose of this class is to kickstart the head node of the computational graph, together with a proxy wrapping it.