Recent News

  2009/10/15
Updating Xrootd client version
Last changed: Oct 15, 2009 10:43 by Wilko Kroeger
Labels: sassoccb

Reason for change

This is a request to upgrade the xrootd client tools from version 20090202-1402v2 to 20091008-2019v1. The main reason for the update is that the new version runs on rhel5 whereas for the old version certain commands did not work properly on rhel5.

Testing

The new version has been installed in the Fermi xrootd application area and is available through the DEV link. All tools xrdcp, xrd.pl, xrdls and xrdprel were tested against the test and production xrootd server running on rhel3, rhel4-32, rhel4-64, rhel5-64 and rhel5-32.

Rollback

The client version is rolled back by recreating the link to the old version.

CCB Request

https://jira.slac.stanford.edu/browse/SSC-232

Details

Some of the client tools do not work on rhel5. The reason is that there is a mismatch between the bitness of the executable and the xrootd libraries. For example perl is a 32-bit executable on all architectures but on rhel5 perl attempts to load the 64bit xrootd libraries which fails.

The new version addresses the problem in the following way:

  • rhel3 (32bit) and rhel4-64 (64bit) xrootd versions are installed.
  • if xrdcp and xrd.pl are executed the 32 bit version of the xrootd release are used.
  • for xrdls the bitness is obtained from the system (which matches the one for ls)
  • for xrdprel the bitness is guessed from the command that is run with the preload library. If the bitness can not
    be guessed (shell scripts for example) the systems default is used
  • the options -32 or -64 are available to force using the 32bit or 64bit xrootd release respectively.

Besides addressing the bitness issue a new xrootd version will be used. The xrootd version is 20091008-2019. There are
are a few improvements for the xrootd client and xrdcp. In particular the annoying message about the xrootd client version is not printed anymore.

In order to update the version the link /afs/slac.stanford.edu/g/glast/applications/xrootd/PROD has to be changed to point to dist/0091008-2019v1/@sys. This is an atomic operation and clients should not fail because of this change.

Posted at 15 Oct @ 9:35 AM by Wilko Kroeger | 0 Comments
  2009/09/29
CCB - Pipeline monitor script modified to shut-down pipeline during scheduled outages
Last changed: Sep 29, 2009 16:31 by Daniel Flath
Labels: sassoccb

The pipeline can now be scheduled for shutdown by creating a file in the pipeline installation directory.  The file must be named "shutdown_schedule" and must contain exactly 2 lines, both of which are dates in the form understood by the unix date command.  When the first date has passed, the monitor (which runs every 5 minutes in cron) will shutdown the pipeline and not restart it until the second date has passed or the file has been removed.  (The second date could also be changed to the current time in order to force the monitor to restart the pipeline on it's next execution.)

As an example, the following file will be used to turn off the pipeline during the Sept 30th computing center 1st-floor power outage:

[dflath@glastlnx13 prod]$ pwd
/afs/slac.stanford.edu/u/gl/glast/pipeline-II/prod
[dflath@glastlnx13 prod]$ cat shutdown_schedule
Wed Sep 30 04:25:00 PDT 2009
Wed Sep 30 17:00:00 PDT 2009
[dflath@glastlnx13 prod]$

See:

https://jira.slac.stanford.edu/browse/PII-398

And:

https://jira.slac.stanford.edu/browse/SSC-228

Posted at 29 Sep @ 4:30 PM by Daniel Flath | 0 Comments
  2009/09/28
Request to deploy Xroot version 20090721-0636
Last changed: Sep 28, 2009 13:55 by Wilko Kroeger
Labels: sassoccb

Reason for change

We would like to upgrade the xrootd server version for the Fermi xrootd cluster from 20090202-1402 to 20090721-0636.
The main reasons for the change is an improvement in the xrootd server and a configuration change:

  1. Better handling of sendfile error recovery. In the old version some sendfile errors caused the server to disconnect the client. In the new version the server recovers from the sendfile errors and does not disconnect the client. Disconnecting the client is not fatal as after a timeout the client notices the disconnect and reconnects again, but it will slow down the client.
  2. Allow production accounts to remove directories below /glast/Scratch/. So far this option has been available only for the test xrootd setup.

Testing

As every xrootd version basic tests were done reading from and writing to xrootd, and testing the client admin interface (rm, stat, checksum,...).

The new version has been installed as a test version on the Fermi xrootd cluster which allows access to the glast data. Tests were performed to read and write to the new version. Reprocessing test jobs were successfully run against the server and the new version was also used for L1 tests.

The test xrootd has been setup for the directory removal (rmdir). It has been successfully used for some production testing.

Rollback

To switch the servers back to the old version the production link has to be set to the old version and a restart of all xrootd servers is needed.

CCB Request

https://jira.slac.stanford.edu/browse/SSC-227

Details

To allow production accounts (glastraw, glastxrw, glastmc and glast) to remove directory trees the xrootd forward method is used. The redirector will be configured to forward a rmdir request to all data servers. The data servers upon a request will execute a script that first checks if a directory is eligible for removal and then remove all files and directories below the specified directory. The xrootd configuration changes are:

  1. On the redirector allow forwarding of the rmdir command
  2. On the data servers specify the application that is called to remove directories. Only directories below /glast/Scratch will be allowed for removal.

To deploy a new xrootd version the following steps are required:

  1. Update the xrootd config
  2. Stop the redirector
  3. Restart the data servers with the new version
  4. Start the redirectors with the new version

The restart should take less then five minutes. Stopping the redirectors first prevents clients being redirected and the chance that a file is not found because a data server is being restarted. The clients will wait while the xrootds are down and reconnect once the data servers and redirectors are up.

Posted at 28 Sep @ 11:55 AM by Wilko Kroeger | 0 Comments
  2009/09/21
CCB Request to Install pipeline version 1.3.5 to PROD
Last changed: Sep 21, 2009 09:50 by Daniel Flath
Labels: sassoccb

1.3.5 is built against a patched version of the Data-Handling-Common library which allows database connections to be removed from the connection pool as they age (and replaced with freshly created connections.)  It also contains monitoring and run-time configuration capabilities.
The patch has been tested in DEV and works as expected.
This is intended to address the Memory leak we see on the Oracle server which slows down the pipeline software when the application has been running for some time.  Since the Oracle Memory usage goes back down when the pipeline application is restarted, we feel that the problem is probably in the long-lived, cached connections.

Jira CCB Request:  https://jira.slac.stanford.edu/browse/SSC-224

Posted at 21 Sep @ 9:45 AM by Daniel Flath | 0 Comments
  2009/08/24
CCB Request for re-creating foreign keys on Pipeline database tables
Last changed: Aug 24, 2009 14:25 by Daniel Flath
Labels: sassoccb

Details are in the following page:

http://confluence.slac.stanford.edu/x/_gRzAw

I intend to perform the clean-up and foreign-key creation during the all-day Computing Center power outage on Tuesday, August 25th (6am-5pm)

Posted at 24 Aug @ 2:21 PM by Daniel Flath | 0 Comments
  2009/06/29
Upgrade Pipeline Server to use java 1.6 version of the ojdbc driver.
Last changed: Jun 29, 2009 15:17 by Daniel Flath
Labels: sassoccb

Oracle support insists that we use a java 1.6 driver with our java 1.6 application before they will give us more help on the PGA usage problem.

We will be moving to ojdbc6.jar version 11.1.0.7 and the associated native libraries provided in the oracle client software.

SSC jira is here:

https://jira.slac.stanford.edu/browse/SSC-208

Jira Release info for pipeline project is here:

https://jira.slac.stanford.edu/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=ASC&sorter/field=priority&pid=10360&fixfor=11960

Posted at 29 Jun @ 3:11 PM by Daniel Flath | 0 Comments
  2009/05/27
Updating Xrootd client version
Last changed: May 27, 2009 15:12 by Wilko Kroeger
Labels: sassoccb

Reason for change

This is a request to upgrade the xrootd client tools from version 20080728-0933v1 to 20090202-1402v2.
In the new version xrdcp is able to overwrite a file that is located on a data server that has no space left. The current xrdcp will fail in this case.

Testing

The new version has been installed in the Fermi xrootd application area and is available through the DEV link. All tools xrdcp, xrd.pl, xrdls and
xrdprel were tested against the test and production xrootd server.

Rollback

The client version is rolled back by recreating the link to the old version.

CCB Request

https://jira.slac.stanford.edu/browse/SSC-202

Details

The current xrdcp version will fail to overwrite a file that is on a data server that has no free space left. It will fail because the redirector will not redirect the client. The new xrdcp version however will first remove the file and then write it to a new server.

The other client tools have not been changed except xrd.pl for which an option to remove a directory tree has been added which is currently not applicable for the production xrootd.

In order to update the version the link /afs/slac.stanford.edu/g/glast/applications/xrootd/PROD has to be changed to point to dist/20090202-1402v2/@sys

Posted at 27 May @ 2:03 PM by Wilko Kroeger | 0 Comments
  2009/05/11
Change of Xrootd authorization for rm
Last changed: May 11, 2009 14:37 by Wilko Kroeger
Labels: sassoccb

Reason for change

The xrootd redirectors are configured to forward a file remove request to all of its data server. Therefore we would like to configure the redirectors so that clients have to authenticate them self and only one production account is authorized to remove files.

Testing

The Fermi xrootd test setup was configured to use authentication/authorization for the redirectors and data servers:
1) only glastxrw was allowed to remove files (through redirector or data server)
2) all clients were allowed to read/write files if connected to redirector
3) only Fermi users are allowed to read files from the data servers
4) only Fermi production accounts are allowed to write files

These rules were tested using the four accounts one being a Fermi user (read-only access), a production account, the account that that has privileges to remove files and a non Fermi
user account.

Rollback

The configuration can be rolled back by using the previous xrootd configuration and authorization. A restart of the xrootd redirectors is needed.

CCB Request

https://jira.slac.stanford.edu/browse/SSC-199

Details

Authentication and authorization is required for all of the xrootd data server in order to restrict access to the Fermi data to Fermi members only. Write and remove privileges are granted to production accounts only. No restrictions were needed for the redirectors as all they did was to redirect clients to the data server.

The redirectors got reconfigured so that they are able to remove files and therefore authentication and authorization has to be enabled.
The same authentication scheme as used for the data servers will be used and the authorization will be very simple:
1. All users are allowed to read and write files (this is later restricted by the data servers)
2. Only glastxrw is allowed to remove files

For the data servers we would like to change the authorization so that only the glastxrw user is able to remove files (so far other production accounts are also allowed).

After changing the authorization files and xrootd config file the xrootd on the redirectors have to be restarted in order to activate the changes.
The data server do not need to be restarted as they reread the authorization file periodically.

Posted at 11 May @ 10:23 AM by Wilko Kroeger | 0 Comments
  2009/05/06
Install DataCat Linemode-Client patch version 2.3.3 to PROD
Last changed: May 06, 2009 10:50 by Daniel Flath
Labels: sassoccb

2.3.3 fixes a bug that prevents (large) dataset searches issued from the linemode client from completing.

See https://jira.slac.stanford.edu/browse/SSC-197 for details.

Posted at 06 May @ 10:46 AM by Daniel Flath | 0 Comments
  2009/03/16
Request to migrate to patched Oracle client tools v10.2.0
Last changed: Mar 16, 2009 17:23 by Daniel Flath
Labels: sassoccb

The current 10.2.0 driver used by applications needing passwordless (wallet) Oracle database access has a bug which prevents it from connecting to Oracle from a client machine which has been up for more than 200 days (32-bit linux kernel version 2.6) or 248 days (32-bit linux kernel version 2.4)

A patched version of the client tools has been made available which fixes this problem.

Change details:

We have a central configuration script (actually 2, one for each of unix sh and csh) which requires a 1-line change to point everything to the new driver.

Rollback details:

Backing out the change is only a matter of restoring the original pointer.

Testing:

I've tested both clients on both 32-bit and 64-bit linux machines and the results are consistent.

CCB Request:

https://jira.slac.stanford.edu/browse/SSC-187

Posted at 16 Mar @ 4:25 PM by Daniel Flath | 0 Comments
  2009/03/13
Request to install a new Xrootd production version
Last changed: Mar 16, 2009 10:14 by Wilko Kroeger
Labels: sassoccb

Reason for change

This is a request to change the xrootd server version and have a minor update of the client tools.

Server version update

We would like to upgrade the xrootd server version for the Fermi xrootd cluster from 20080828-1632 to 20090202-1402. The main change between these two versions are:

  1. Improved and fixed handling of checksum requests by the server. This fixes issues that could cause checksum request to hang and large cpu usage of the xrootd server.

Due to this issue the crawler is currently not using the production xrootd but the test xrootd that runs the new version.

Client version update

The first time a xrootd client connects to a cluster it tries FirstConnectMaxCnt times to connect before it will fail. The default for this number is 150 but for xrd.pl it is overwritten and set to 10. Therefore a client will fail after about 3.3 min (the wait between connection attempts is 20sec) whereas with the default setting the client will fail only after 50 min. This is import as for an outage which typically last from 5-30 min we stop the redirector to avoid clients from being redirected and with the short wait time xrd.pl might fail.

Testing

As every xrootd version basic tests were done reading from and writing to xrootd, and testing the client admin interface (rm, stat, checksum,...).

The new version has been installed as a test version on the Fermi xrootd cluster which allows access to the glast data. The production crawler is using this version for more than a month.
Also skimmer jobs were run successfully against this version.

The fix to the timeout for xrd.pl has been tested. It has been verified that it will wait the expected time if a xrootd server is not available.

Rollback

To switch the servers back to the old version the xrootd configuration has to be reverted to the old version followed by a restart of the old version.

The client version is rolled back by recreating the link to the old version.

CCB Request

https://jira.slac.stanford.edu/browse/SSC-185

Details

Server version upgrade

cmsd logfile name change

At the same time of the restart I would like to change the logfile name for the cmsd from olbdlog to cmsdlog this requires to change the name in StartXrd.cf.glast:

  1. CMSLOGFN=cmsdlog

Restart of xrootd

  1. Stop the redirector
  2. Restart the data servers with the new version
  3. Start the redirectors

The restart should take less then five minutes. Stopping the redirectors first prevents clients being redirected and the chance that a file is not found because a data server is being restarted. The clients will wait while the xrootds are down and reconnect once the data servers and redirectors are up.

Update the client

  1. Change the link /afs/slac.stanford.edu/g/glast/applications/xrootd/PROD from
    @sys/20080728-0933/ to dist/20080728-0933v1/@sys

This update will not change the xrootd client binaries it will only change the xrd.pl
perl script.

The directory structure of the xrootd application changed. Instead of keeping releases in afs-sysname/release-name they are now in dist/release-name/afs-sysname (e.g.: i386_rhel30/20080728-0933v1 to dist/20080728-0933v1/i386_rhel30).

Posted at 13 Mar @ 8:41 AM by Wilko Kroeger | 0 Comments
  2008/12/12
Release Pipeline II version 1.3.2
Last changed: Dec 12, 2008 15:34 by Daniel Flath
Labels: sassoccb

These small changes to the pipeline make it possible to perform reprocessing, and make it possible for the run status to default to good. It also adds a feature to timeout database connections to see if this fixes the problem with gradual pipeline slowdown requiring frequent restarts.

The timeout can be easily turned off, and the new version can be easily backed out if any problems occur. The associated JIRA is SSC-168@JIRA.

Pipeline 1.3.2

DataCat Client 2.3.2, DataCat Stored Procedures 2.2.1, DataHandling Common 1.5.1

Run Quality 1.3.2

Posted at 12 Dec @ 2:42 PM by Daniel Flath | 0 Comments
  2008/10/30
CCB Request for Pipeline Upgrade
Last changed: Oct 30, 2008 23:23 by Tony Johnson
Labels: sassoccb

These small changes to the pipeline make it possible to perform reprocessing, and make it possible for the run status to default to good. It also adds a feature to timeout database connections to see if this fixes the problem with gradual pipeline slowdown requiring frequent restarts.

The timeout can be easily turned off, and the new version can be easily backed out if any problems occur. The associated JIRA is SSC-156@JIRA.

Pipeline 1.3.1

Pipeline Front-End 2.8

Run Quality 1.3

Posted at 30 Oct @ 7:19 AM by Tony Johnson | 0 Comments
  2008/09/08
Request to install a new Xrootd production version
Last changed: Nov 12, 2008 11:15 by Wilko Kroeger
Labels: sassoccb

Reason for change

Server version update

We would like to upgrade the xrootd server version for the glast xrootd cluster from version
20080513-1222 to 20080828-1632. The main changes between these two versions are:

  1. Allow removing files via the redirector from data servers that are filled (GXR-37@JIRA)
  2. Allow collection of xrootd statistics by Ganglia (GXR-38@JIRA)
  3. Option to suppress reverse DNS lookups.
  4. Support for sendfile(). Improves read performance and lowers xrootd's memory usage.
  5. Fixes a bug that could cause the redirector cmsd to crash if a suspended data server connects
    to quickly again.

The full cvs Changelog is available at http://xrootd.slac.stanford.edu/download/20080828-1632/ChangeLog_to_v20080513-1222

Client version update

The xrootd client tools are installed in /afs/slac.stanford.edu/g/glast/applications/xrootd We also would like to update the PROD version from 20080513-1222 to 20080728-0933.
The main changes for the 20080728-0933 version are:

  1. Support reading fits files using the xroootd preload library.
  2. Bug fixes to xrootd that cause increased memory usage for large file transfers.

We also would like to link the FITS version to PROD.

Testing

As every xrootd version basic tests were done reading from and writing to xrootd, and testing the client admin interface (rm, stat, checksum,...).

The new version has been installed as a test version on the glast xrootd cluster which allows access to the glast data. MC jobs were run successfully against it.

The xrootd client version has been installed as TEST, DEV and FITS version in /afs/slac.stanford.edu/g/glast/applications/xrootd. The FITS and DEV versions have been successfully used for production activities.

Rollback

To switch the servers back to the old version the xrootd configuration has to be reverted to the old version followed by a restart of the old version.

The client version is rolled back by recreating the link to the old version.

CCB Request

https://jira.slac.stanford.edu/browse/SSC-141

Details

Server version upgrade

The following changes to the config file, xoootd.cf, are needed:

  1. use load balancing between xrootd and cmsd
  2. turn of dns reverse lookup
  3. allow the data server to login to xrootd without authentication (needed to gather statistics with Ganglia)

Restart of xrootd

  1. Stop the redirector
  2. Restart the data servers
  3. Start the redirectors

The restart should take less then five minutes. Stopping the redirectors first prevents clients being redirected and the chance that a file is not found because a data server is being restarted. The clients will wait during the restart and reconnect to the data servers and redirectors.

Update the client

  1. Change the link /afs/slac.stanford.edu/g/glast/applications/xrootd/PROD to point to 20080728-0933.
  2. Have FITS linked to PROD
Posted at 08 Sep @ 10:53 PM by Wilko Kroeger | 0 Comments
  2008/09/06
CCB Request to Deploy Pipeline-II version 1.3
Last changed: Sep 06, 2008 20:32 by Daniel Flath
Labels: sassoccb

Actions:

  • Install org-glast-runquality-web (version 1.2) library to PROD.
  • Install org-glast-datahandling-common (version 1.4) library to PROD.
  • Install org-glast-datacat-client (version 2.3) library to PROD.
  • Install org-glast-datacat-sp (version 2.2) library to PROD.
    • Upload org-glast-datacat-sp (version 2.2) stored procedures to PROD.
  • Install org-glast-pipeline-server (version 1.3) library to PROD.
    • Upload org-glast-pipeline-server (version 1.3) stored procedures to PROD.
  • Modify PROD pipeline startup script to use version 1.3
  • Apply patches to Database tables, adding new columns with default values to support new features.
    • this is non-destructive and does not have to be reverted in the event of a version back-out
  • Restart PROD

Estimated Length of Outage to Perform Upgrade

Approximately 20 minutes total based on length of time it took to add the new table-columns in DEV.

Motivation:

  • Support for L1 Data reprocessing
  • Support in Pipeline Jython scripts for newest dataset-find routine (with significant improvements in meta-data search capabilities and bugfixes to same)
  • Ability to auto-retry failed processes
  • New ability to adjust zombie-process reaping delay
  • New command line feature enables users to deal with 'zombie' processes when reaper is unable to. (previously this required a developer's direct intervention.)

Rollback Procedure:

Because of the new stored procedures that will be installed the back-out procedure is slightly more involved than usual:

  1. Shut-down pipeline server
  2. Return server start-up script to version 1.2.5
  3. Re-upload Pipeline stored procedures v1.2.5
  4. Re-upload Data Catalog stored procedures v2.1
  5. Restart pipeline server

Note that the additional steps are quite simple to perform and only extend the outage for a back-out from ~5 minutes to ~10.
Updates that would not require backing out (with justification):

  1. Columns added to Process and ProcessInstance tables for auto-retry support need not be removed as it is not used by previous pipeline version and will be ignored.
  2. PFE (Pipeline Front End) need not be reverted from 2.7 back to 2.6 because additional features supporting process auto-retry will work if columns are not removed, and schema additions are optional and backward-compatible.

Associated Jira:

SSC-135@JIRA

Details

Pipeline 1.3

DataCat Client 2.3


Pipeline Front End 2.7


 



Posted at 06 Sep @ 7:36 PM by Daniel Flath | 0 Comments

November 2009  
Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30