Page History
...
Do "systemctl daemon -reload" to pick up the changed .conf file above. Omar needs to find out how to tell RHEL7 to find the .ko files in /cds/sw/package/daq/modules/. (i.e. this is still a work-in-progress).
Libfabric
Libfabric supplies a program called fi_info
to list the available providers for transferring data between nodes and processes. The results list is sorted from highest to lowest performing. Options can be given to filter according to various features or capabilities. On systems with infiniband, the verbs
provider is returned as the most performant interface. On systems without infiniband, the 'tcp' provider is listed as the most performant.
Our code follows this same pattern, so absent constraining parameters, libfabric chooses the highest performing interface it has access to. The following kwargs exist to narrow the selection:
ep_domain
: Forces the use of a particular domain (physical network interface)ep_fabric
: Forces the use of a particular fabric- e
p_provider
: Forces the use of a particular provider
The valid kwarg values are those listed by the fi_info
command. More information about the various providers can be found on the libfabric github site in the Readme and the Provider Feature Matrix on the Wiki tab. These are updated with each release.
Recently, there was a problem in UED that turned out to be due to the provider libfabric chose to use. The highest performing network hardware on the UED machines use 100 Gbit/sec mlx5 interfaces from Mellanox. Although these are capable of running infiniband, we run ethernet over them. Libfabric by default chooses to use the verbs
provider with these, so the above parameters were created to be able to force the tcp
provider to be selected. This was done in ued.cnf
with the line:
Code Block | ||
---|---|---|
| ||
kwargs = 'ep_fabric="172.21.36.0/24",ep_domain=enp129s0' |
The tcp
provider is a replacement for the deprecated sockets
provider that was originally used to commission running the DAQ over ethernet. sockets
behaves similarly to verbs
but differently from tcp
. To get UED going again, I switched to using the sockets
provider, which appears to solve the problem:
Code Block | ||
---|---|---|
| ||
kwargs = 'ep_provider=sockets,ep_domain=enp129s0' |
(I now think the fabric
specification was redundant.) I have a guess as to what is different about tcp
vs sockets
that caused the problem, but it need further investigation.