This page describes various monitoring tools used for the Fermi Xrootd cluster.
Table of Contents | ||||||
---|---|---|---|---|---|---|
|
The xrootd data server disk space is obtained twice a day and written to a nfs log directory. From these values a simple table with the last usage values is created:
The data are also loaded to an Oracle database and are accessible through a web application (currently in test stage):
The cronjob, running in crontab on each xrootd data server, is
/opt/xrootd/admin/mon_diskspace.sh
It collects the total and free diskspcae as well as the inode usage for ufs file systems (zfs has no inode restrictions).
The disk usage is stored in files:
.bq /nfs/farm/g/glast/u15/xrootd/diskspace/df_server_YYYYMM
...
/nfs/farm/g/glast/
...
u18/xrootd/
...
hpss/
...
Contains the listing of files in HPSS. The current format is hl_<topdir><date> where <topdir> is the directory below all files are listed. The common _/glast prefix is omitted. The listing contains filename, the size of a file in bytes and the date the file was put into HPSS. The date format is YYYYMMDDHHMMSS.file_listing/
Each line in these files shows the disk usage for a particular date. The format is:
Wiki Markup |
---|
bd. DF <date> <server> <totalSpace> <freeSpace> <%Used> \[<inodesFree> <%inodesFree>\] |
The foramt of date is YYYYMMDDTHHMMSS, e.g. 20080712T123258. The totalSpace and freeSpace are in GB. The inode info is only shown for serves that use the ufs file system (all sulkies) but not for the servers that employ zfs.
In order to calculate the free disk and total disk space of the xrootd cluster the values, taken around the same time , for all data servers have to be summed.
The disk space usage values are stored in Oracle. A key-value together with the date is recorded. The following key names are used:
Key | Description |
---|---|
xrootd_freeGB_<server> | free disk space in GB for <server>, e.g.: xrootd_freeGB_wain021 |
xrootd_diskGB_<server> | disk size in GB for <server> |
xrootd_usedGB_<server> | used disk space in GB for <server>, (xrootd_diskGB_server> - xrootd_freeGB_<server>) |
xrootd_all_freeGB | free disk space in GB summed over all xrootd data servers. |
xrootd_all_diskGB | disk size in GB summed over all xrootd data servers. |
xrootd_all_usedGB | used disk space in GB summed over all xrootd data servers, (xrootd_all_diskGB - xrootd_all_freeGB) |
...