You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

Steps for New User

  • request unix-admin to give them login access to the queues
  • add them to the LSF queue permissions list with ypgroup adduser -group suncat-norm -user <username>
  • add them to majordomo list with approve <password> subscribe suncat-l <emailaddr>
  • if appropriate, add them to .mailrc suncatcomp list

Serial Port Linux Console Access

If you have a kerberos ticket, you can connect to the serial port
console from any machine (eg, your desktop, iris, noric, etc.).

to connect to serial console:

$ /usr/local/bin/console suncat0006

or use this command if your machine has a private /usr/local :

$ /afs/slac/local/bin/console suncat0006

You will get this message from the console software:

[suncat0006: Attached readwrite on conserv1]

Press enter once. You should see:

-----------------------------------------------------
Red Hat Enterprise Linux Client release 5.5 (Tikanga)
Kernel 2.6.18-194.11.1.el5 on an x86_64

suncat0006 login: 
-----------------------------------------------------

That is the linux login prompt on the serial console.

All commands you give to the console software start with
the escape sequence:

Ctrl-e c (Ctrl-e, followed by a 'c')

plus one character (sometimes 2 characters).
You can press ctrl-e c ? to see the possible commands:

-----------------------------------------------------
Escape seq (currently ^Ec) + char
.    disconnect
a    attach readwrite
b    display heartbeats of consoles on this server
c    change to new console
d    down (close tty) a console
e    change escape sequence
f    force attach readwrite
g    get location infomation
h    print this message
i    display info about all consoles on this server
k    set idle timeouts for this session
l1   send break (halt host!)
m    execute a macro (? for list)
o    (re)open the tty file
p    Display and select the baud rate
qy   shutdown the server
r    replay the last 20 lines
s    attach readonly
u    show status of all consoles on this server
v    show server version info
w    show all users connected to this server
x    examine -- show detailed console info
<cr> ignore/abort command
?    print this message
-----------------------------------------------------

To disconnect, use

Ctrl-e c .

BMC/IPMI Command Line Interface

You can also connect to the HP Baseboard Management Controller (ie, IPMI)
on the serial port by pressing

Esc-( [that is, escape followed by shift-9)

Then press enter once. You will get this "Login: " prompt with a capital L.
This the BMC login prompt:

-----------------------------------------------------
Command Line Interface
Copyright 2004-2008 ServerEngines Corporation
All rights reserved.

Login:
-----------------------------------------------------

That is the login to the command-line-interface to the IPMI management controller.

Once you log in, you get this prompt:

-----------------------------------------------------
CLP Session Initiated
/./->
-----------------------------------------------------

To see valid commands, type 'show'.

There are two entry points: system1 is the Linux OS. map1 is the Management Service Processor.

You can do things like a hard power reset, query error logs, temperature states, etc.

It is important to return to the linux OS login prompt when you are done
using the Baseboard Management Controller. You exit out of here and switch
the serial port back to Linux by typing 'exit', followed by Esc-Q
(escape, followed by Shift-q) Then press enter once.

-----------------------------------------------------
/./-> exit

Command Line Interface (CLI)
Copyright 2004-2008 ServerEngines Corporation
All rights reserved.

Login: 
-----------------------------------------------------

After typing: Esc-Q, followed by 'enter', you will see a login prompt with
a lowercase 'login: '. This is the linux login prompt. It will time out
after a minute or so, then it will give you the full linux login prompt:

-----------------------------------------------------
login: Login timed out 
Red Hat Enterprise Linux Client release 5.5 (Tikanga)
Kernel 2.6.18-194.11.1.el5 on an x86_64

suncat0006 login: 
-----------------------------------------------------

ipmitool

The command

sudo ipmitool sel elist

shows errors.

Linux kernel SysRq facility

When you are connected to the linux serial port console (and NOT the BMC) you
can send the kernel SysRq commands. See this page if you are unfamiliar with
the linux kernel SysRq facility:

http://en.wikipedia.org/wiki/Magic_SysRq_key

You can send the kernel sysrq commands by first sending a break signal on
the serial port console, followed by a single character which represents a
command, eg, hard reset, show memory, etc.
You can send a break followed by a space to see the possible commands.

You send a break signal using the serial console software with
Ctrl-e c l 1

(that is, ctrl-e, followed by a c, then l (lower case 'L'), then a 1 (number 1).

The console software "help" display says that 'l1' will halt the host.
That was only true for older versions of Solaris. It will not halt
a linux host, or modern Solaris systems (as we have then configured).

The console software will let you know when it has sent a break signal on the line
with this output:

halt - sent

The SysRq help looks like this:

SysRq : HELP : loglevel0-8 reBoot Crashdump tErm Full kIll thaw-filesystems(J) saK showMem Nice powerOff showPc unRaw Sync showTasks Unmount shoWcpus 

The Capital letters show the character that you use for each action. For example,
to show current memory statistics, you use 'M'.

System Monitoring/History: Ganglia and Nagios

For CPU usage monitoring:

http://ganglia02.slac.stanford.edu:8080/ganglia/batch/?m=load_one&r=week&s=descending&c=suncat&h=&sh=1&hc=4&z=small

To look for memory/CPU problems:

http://nagios.slac.stanford.edu/nagios/cgi-bin/status.cgi?hostgroup=SUNCAT%20cluster&style=detail

There is also some history in /scswork/ranger

How bsub Command Functions

bsub is a script written by Neal Adams which calls the "real" executable:

/afs/slac/package/lsf/curr/bin/bsubx

Depending on the "-a" option (for suncat this is typically "openmpi") bsubx calls an "esub" script (in the lfs "etc" directory). This in turn points to another wrapper script in the lsf "bin" directory. For openmpi the script is "openmpirun_wrapper". This last one is the one that executes the mpirun command. mpirun uses "lsgrun" on the master node to direct the "res" daemons on the slave nodes to start executables.

Batch System Accounting History

bacct -u all -b -q "suncat-xlong suncat-long suncat-medium suncat-short" -C "2010/9/25," > bacct.out &
  • No labels