Blog from March, 2009

The current 10.2.0 driver used by applications needing passwordless (wallet) Oracle database access has a bug which prevents it from connecting to Oracle from a client machine which has been up for more than 200 days (32-bit linux kernel version 2.6) or 248 days (32-bit linux kernel version 2.4)

A patched version of the client tools has been made available which fixes this problem.

Change details:

We have a central configuration script (actually 2, one for each of unix sh and csh) which requires a 1-line change to point everything to the new driver.

Rollback details:

Backing out the change is only a matter of restoring the original pointer.

Testing:

I've tested both clients on both 32-bit and 64-bit linux machines and the results are consistent.

CCB Request:

https://jira.slac.stanford.edu/browse/SSC-187

Reason for change

This is a request to change the xrootd server version and have a minor update of the client tools.

Server version update

We would like to upgrade the xrootd server version for the Fermi xrootd cluster from 20080828-1632 to 20090202-1402. The main change between these two versions are:

  1. Improved and fixed handling of checksum requests by the server. This fixes issues that could cause checksum request to hang and large cpu usage of the xrootd server.

Due to this issue the crawler is currently not using the production xrootd but the test xrootd that runs the new version.

Client version update

The first time a xrootd client connects to a cluster it tries FirstConnectMaxCnt times to connect before it will fail. The default for this number is 150 but for xrd.pl it is overwritten and set to 10. Therefore a client will fail after about 3.3 min (the wait between connection attempts is 20sec) whereas with the default setting the client will fail only after 50 min. This is import as for an outage which typically last from 5-30 min we stop the redirector to avoid clients from being redirected and with the short wait time xrd.pl might fail.

Testing

As every xrootd version basic tests were done reading from and writing to xrootd, and testing the client admin interface (rm, stat, checksum,...).

The new version has been installed as a test version on the Fermi xrootd cluster which allows access to the glast data. The production crawler is using this version for more than a month.
Also skimmer jobs were run successfully against this version.

The fix to the timeout for xrd.pl has been tested. It has been verified that it will wait the expected time if a xrootd server is not available.

Rollback

To switch the servers back to the old version the xrootd configuration has to be reverted to the old version followed by a restart of the old version.

The client version is rolled back by recreating the link to the old version.

CCB Request

https://jira.slac.stanford.edu/browse/SSC-185

Details

Server version upgrade

cmsd logfile name change

At the same time of the restart I would like to change the logfile name for the cmsd from olbdlog to cmsdlog this requires to change the name in StartXrd.cf.glast:

  1. CMSLOGFN=cmsdlog

Restart of xrootd

  1. Stop the redirector
  2. Restart the data servers with the new version
  3. Start the redirectors

The restart should take less then five minutes. Stopping the redirectors first prevents clients being redirected and the chance that a file is not found because a data server is being restarted. The clients will wait while the xrootds are down and reconnect once the data servers and redirectors are up.

Update the client

  1. Change the link /afs/slac.stanford.edu/g/glast/applications/xrootd/PROD from
    @sys/20080728-0933/ to dist/20080728-0933v1/@sys

This update will not change the xrootd client binaries it will only change the xrd.pl
perl script.

The directory structure of the xrootd application changed. Instead of keeping releases in afs-sysname/release-name they are now in dist/release-name/afs-sysname (e.g.: i386_rhel30/20080728-0933v1 to dist/20080728-0933v1/i386_rhel30).