You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 28 Next »

Contents

Miscellaneous Parameters

rmem_max
# echo 268435456 > /proc/sys/net/core/rmem_max
vm.max_map_count

For the pgp driver this parameter needs to be increased in /etc/sysctl.conf

# grep vm /etc/sysctl.conf 
vm.max_map_count=1000000
#

DAQ Setup of DSS/FFB Nodes (LCLS-I)

Link here

CPU Frequency Governor

All daq nodes should run the cpu frequency governor in "performance" mode.

As of this writing (May 2, 2022) the daq node daq-xpp-cam02 is not running the cpu frequency governor in "performance" mode.
It appears to be running in "ondemand" mode, which  "tries to use the slowest speed as much as possible, but quickly switches up or down when needed."

XPP: WRONG scaling_governor setting
$ hostname
daq-xpp-cam02
$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
ondemand
$

===========

$ hostname
daq-xpp-dss03
$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
powersave
powersave
$

As of this writing (May 4, 2022) two MEC pgp nodes differ in their scaling_governor settings.

MEC: INCONSISTENT scaling_governor settings
-bash-4.2$ hostname
daq-mec-pgp01
-bash-4.2$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
-bash-4.2$

==============================

-bash-4.2$ hostname
daq-mec-pgp02
-bash-4.2$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
performance
performance
performance
performance
performance
performance
performance
performance
performance
performance
performance
performance
-bash-4.2$
LCLS-I dss nodes are running in 'powersave' mode
-bash-4.2$ hostname
daq-xcs-dss03
-bash-4.2$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
-bash-4.2$ 
============
-bash-4.2$ hostname
daq-mec-dss01
-bash-4.2$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
-bash-4.2$
===========
daq-xpp-dss03
-bash-4.2$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
powersave
-bash-4.2$

See earlier notes on "CPU Freq Scaling" here.

Hyperthreading

As of this writing (May 6, 2022), hyperthreading is enabled on drp-neh-cmp001 and drp-neh-cmp007 (and perhaps elsewhere).

Arguments for hyperthreading

Ric Claus writes:

I can say that the state of the hyperthreading flag is not important for running with trigger rates of 71 KHz and below, but I’m not ready to say that we can dispense with it for 1 MHz running.

Arguments against hyperthreading

Chris O'Grady writes:

On May 6, 2022, at 5:52 PM, Perazzo, Amedeo <perazzo@slac.stanford.edu> wrote:


I had the rhino thought when we installed drp-srcf so we decided to enable hyperthreading (the AMD one, which has a different name) on the new system and we asked Chris (O'Grady) to give it another try. After Chris' tests we decided there was no value in hyperthreading, even on AMD, and we disabled it entirely on drp-srcf. Chris, do you remember?


I had forgotten, but you are correct Amedeo.  My results (for mpi-psana analysis) are here.

It looks like an early test suggested HT helped, but when I tried to reproduce it later on I couldn’t, so we disabled it.  Many years ago I also benchmarked some quantum-chemistry code with/without HT and it didn’t help there either.

chris

March 2015 email (LCLS-I)
From: Ford, Christopher <caf@slac.stanford.edu>
Sent: Thursday, March 26, 2015 4:45 PM
To: pcds-daq-l
Subject: SXR: hyperthreading enabled on daq-sxr-cam02 and '03?

Folks,

While testing the Andor camera  I learned that some DAQ nodes were known to work better than others for this USB-based device.
Tomy writes, "One example I remember is that daq-sxr-cam01 was okay to run for hours, but daq-sxr-cam{02,03} would hang after some minutes."
Today I took a closer look, and I noticed that hyperthreading seems to be enabled on daq-sxr-cam02 and '03, where Andor fails.
Hyperthreading seems to be *disabled* on daq-sxr-cam01, where Andor runs well.

Even if hyperthreading is not proven to cause Andor failures on daq-sxr-cam02 and '03, it should not be enabled there.
Hyperthreading is not a feature useful for real-time systems, and it's not worth our time debugging this extra unknown.

Thanks,
 -caf

[caf@psdev02 03]$ ssh daq-sxr-cam01 cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0
[caf@psdev02 03]$ ssh daq-sxr-cam02 cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,8
[caf@psdev02 03]$ ssh daq-sxr-cam03 cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,8
[caf@psdev02 03]$
  • No labels