Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As of this writing (May 6, 2022), hyperthreading is enabled on drp-neh-cmp001 and drp-neh-cmp007 (and perhaps elsewhere).

The case for enabling hyperthreading

In Progress

...

Arguments against hyperthreading

Chris O'Grady writes:

On May 6, 2022, at 5:52 PM, Perazzo, Amedeo <perazzo@slac.stanford.edu> wrote:



I had the rhino thought when we installed drp-srcf so we decided to enable hyperthreading (the AMD one, which has a different name) on the new system and we asked Chris (O'Grady) to give it another try. After Chris' tests we decided there was no value in hyperthreading, even on AMD, and we disabled it entirely on drp-srcf. Chris, do you remember?


I had forgotten, but you are correct Amedeo.  My results (for mpi-psana analysis) are here:

https://confluence.slac.stanford.edu/display/PCDS/Scaling+Measurements

It looks like an early test suggested HT helped, but when I tried to reproduce it later on I couldn’t, so we disabled it.  Many years ago I also benchmarked some quantum-chemistry code with/without HT and it didn’t help there either.

chris

Code Block
languagetext
titleMarch 2015 email (LCLS-I)
collapsetrue
From: Ford, Christopher <caf@slac.stanford.edu>
Sent: Thursday, March 26, 2015 4:45 PM
To: pcds-daq-l
Subject: SXR: hyperthreading enabled on daq-sxr-cam02 and '03?

Folks,

While testing the Andor camera  I learned that some DAQ nodes were known to work better than others for this USB-based device.
Tomy writes, "One example I remember is that daq-sxr-cam01 was okay to run for hours, but daq-sxr-cam{02,03} would hang after some minutes."
Today I took a closer look, and I noticed that hyperthreading seems to be enabled on daq-sxr-cam02 and '03, where Andor fails.
Hyperthreading seems to be *disabled* on daq-sxr-cam01, where Andor runs well.

Even if hyperthreading is not proven to cause Andor failures on daq-sxr-cam02 and '03, it should not be enabled there.
Hyperthreading is not a feature useful for real-time systems, and it's not worth our time debugging this extra unknown.

Thanks,
 -caf

[caf@psdev02 03]$ ssh daq-sxr-cam01 cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0
[caf@psdev02 03]$ ssh daq-sxr-cam02 cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,8
[caf@psdev02 03]$ ssh daq-sxr-cam03 cat /sys/devices/system/cpu/cpu0/topology/thread_siblings_list
0,8
[caf@psdev02 03]$