Steps for New User
- request unix-admin to give them login access to the suncat machines
Wiki Markup add them to the LSF queue permissions list with _ypgroup adduser \-group suncat-norm \-user <username> \[-user ...\]_
- if appropriate, add them to majordomo list with approve <password> subscribe suncat-list <emailaddr>
- if appropriate, add them to .mailrc suncatcomp list
Serial Port Linux Console Access
If you have a kerberos ticket, you can connect to the serial port
console from any machine (eg, your desktop, iris, noric, etc.).
...
To disconnect, use
Ctrl-e c .
hpiLO Command Line Interface
NOTE: this is for the suncatlsX/suncatfsX nodes ONLY. The suncat0XXX have a BMC/IPMI interface instead (see below)
...
Sometimes the console baud rates are messed up. They can be changed with ^e-c-p. Choose 56000 or 9600 typically.
BMC/IPMI Command Line Interface
NOTE: this is for the suncat0XXX nodes ONLY. The suncatlsX/suncatfsX have an hpiLO interface instead (see above)
...
Code Block |
---|
----------------------------------------------------- login: Login timed out Red Hat Enterprise Linux Client release 5.5 (Tikanga) Kernel 2.6.18-194.11.1.el5 on an x86_64 suncat0006 login: ----------------------------------------------------- |
ipmitool
The command
Code Block |
---|
sudo ipmitool sel elist |
shows errors.
Linux kernel SysRq facility
When you are connected to the linux serial port console (and NOT the BMC) you
can send the kernel SysRq commands. See this page if you are unfamiliar with
the linux kernel SysRq facility:
...
The Capital letters show the character that you use for each action. For example,
to show current memory statistics, you use 'M'.
System Monitoring/History: Ganglia and Nagios
For CPU usage monitoring:
...
There is also some history in /scswork/ranger
How bsub Command Functions
bsub is a script written by Neal Adams which calls the "real" executable:
...
Depending on the "-a" option (for suncat this is typically "openmpi") bsubx calls an "esub" script (in the lfs "etc" directory). This in turn points to another wrapper script in the lsf "bin" directory. For openmpi the script is "openmpirun_wrapper". This last one is the one that executes the mpirun command. mpirun uses "lsgrun" on the master node to direct the "res" daemons on the slave nodes to start executables.
Batch Commands
Code Block |
---|
lsload -R suncat (show CPU loading of all suncat machines) lshosts -R suncat (show list of suncat machines and associated info) bhosts -w suncatfarm (show status of hosts, from a batch perspective) bacct -u all -b -q "suncat-xlong suncat-long suncat-medium suncat-short" -C "2010/9/25," > bacct.out & |
Access to BIOS
For the farm nodes, at the command line type
...
For the login nodes, the best I have been able to do is hit ESC-9. This gives the "rbsu" CLI to view/control BIOS settings. We have only been able to get the VT100-graphics version of the BIOS with a crash cart.
Running Disk Tests
Quick stats:
Code Block |
---|
/usr/sbin/smartctl -a /dev/sda |
The longer testing mode can be done while jobs are running (non-destructive)
Running Memory Tests
from ole: run memtest86+ (from memtest.org)
Power Controller History
need to copy from /nfs/slac/g/suncatfs/sw/package/PPIC_LinuxV1/ppic to local directory like /tmp (can't execute nfs exe's with sudo)
ppic -d -v
need to have ipmi daemon running to talk to "carbondale" chip
(if not, ppic will try to start ipmi itself, but doesn't do it right)
Checking RAID Controller
install rpm for hpacucli
/nfs/slac/g/suncatfs/sw/package/smartshow
...
1I:1:1: OK: 500 GB: HP MM0500FAMYT : HPD3
1I:1:2: OK: 500 GB: HP MM0500FAMYT : HPD3
2E:1:1: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:2: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:3: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:4: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:5: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:6: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:7: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:8: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:9: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:10: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:11: OK: 600 GB: HP EF0600FARNA : HPD2
2E:1:12: OK: 600 GB: HP EF0600FARNA : HPD2
How Karl Monitors the NFS RAID Status
There is a cronjob on suncatfs1 which checks the status of the raid cards
every two hours. The current status is updated in this file:
...