|
Updating Xrootd client version
Reason for changeThis is a request to upgrade the xrootd client tools from version 20090202-1402v2 to 20091008-2019v1. The main reason for the update is that the new version runs on rhel5 whereas for the old version certain commands did not work properly on rhel5. TestingThe new version has been installed in the Fermi xrootd application area and is available through the DEV link. All tools xrdcp, xrd.pl, xrdls and xrdprel were tested against the test and production xrootd server running on rhel3, rhel4-32, rhel4-64, rhel5-64 and rhel5-32. RollbackThe client version is rolled back by recreating the link to the old version. CCB Requesthttps://jira.slac.stanford.edu/browse/SSC-232 DetailsSome of the client tools do not work on rhel5. The reason is that there is a mismatch between the bitness of the executable and the xrootd libraries. For example perl is a 32-bit executable on all architectures but on rhel5 perl attempts to load the 64bit xrootd libraries which fails. The new version addresses the problem in the following way:
Besides addressing the bitness issue a new xrootd version will be used. The xrootd version is 20091008-2019. There are In order to update the version the link /afs/slac.stanford.edu/g/glast/applications/xrootd/PROD has to be changed to point to dist/0091008-2019v1/@sys. This is an atomic operation and clients should not fail because of this change.
CCB - Pipeline monitor script modified to shut-down pipeline during scheduled outages
The pipeline can now be scheduled for shutdown by creating a file in the pipeline installation directory. The file must be named "shutdown_schedule" and must contain exactly 2 lines, both of which are dates in the form understood by the unix date command. When the first date has passed, the monitor (which runs every 5 minutes in cron) will shutdown the pipeline and not restart it until the second date has passed or the file has been removed. (The second date could also be changed to the current time in order to force the monitor to restart the pipeline on it's next execution.) As an example, the following file will be used to turn off the pipeline during the Sept 30th computing center 1st-floor power outage: [dflath@glastlnx13 prod]$ pwd See: https://jira.slac.stanford.edu/browse/PII-398 And:
Request to deploy Xroot version 20090721-0636
Reason for changeWe would like to upgrade the xrootd server version for the Fermi xrootd cluster from 20090202-1402 to 20090721-0636.
TestingAs every xrootd version basic tests were done reading from and writing to xrootd, and testing the client admin interface (rm, stat, checksum,...). The new version has been installed as a test version on the Fermi xrootd cluster which allows access to the glast data. Tests were performed to read and write to the new version. Reprocessing test jobs were successfully run against the server and the new version was also used for L1 tests. The test xrootd has been setup for the directory removal (rmdir). It has been successfully used for some production testing. RollbackTo switch the servers back to the old version the production link has to be set to the old version and a restart of all xrootd servers is needed. CCB Requesthttps://jira.slac.stanford.edu/browse/SSC-227 DetailsTo allow production accounts (glastraw, glastxrw, glastmc and glast) to remove directory trees the xrootd forward method is used. The redirector will be configured to forward a rmdir request to all data servers. The data servers upon a request will execute a script that first checks if a directory is eligible for removal and then remove all files and directories below the specified directory. The xrootd configuration changes are:
To deploy a new xrootd version the following steps are required:
The restart should take less then five minutes. Stopping the redirectors first prevents clients being redirected and the chance that a file is not found because a data server is being restarted. The clients will wait while the xrootds are down and reconnect once the data servers and redirectors are up.
CCB Request to Install pipeline version 1.3.5 to PROD
1.3.5 is built against a patched version of the Data-Handling-Common library which allows database connections to be removed from the connection pool as they age (and replaced with freshly created connections.) It also contains monitoring and run-time configuration capabilities. Jira CCB Request: https://jira.slac.stanford.edu/browse/SSC-224
CCB Request for re-creating foreign keys on Pipeline database tables
Details are in the following page: http://confluence.slac.stanford.edu/x/_gRzAw I intend to perform the clean-up and foreign-key creation during the all-day Computing Center power outage on Tuesday, August 25th (6am-5pm)
Upgrade Pipeline Server to use java 1.6 version of the ojdbc driver.
Oracle support insists that we use a java 1.6 driver with our java 1.6 application before they will give us more help on the PGA usage problem. We will be moving to ojdbc6.jar version 11.1.0.7 and the associated native libraries provided in the oracle client software. SSC jira is here: https://jira.slac.stanford.edu/browse/SSC-208 Jira Release info for pipeline project is here:
Updating Xrootd client version
Reason for changeThis is a request to upgrade the xrootd client tools from version 20080728-0933v1 to 20090202-1402v2. TestingThe new version has been installed in the Fermi xrootd application area and is available through the DEV link. All tools xrdcp, xrd.pl, xrdls and RollbackThe client version is rolled back by recreating the link to the old version. CCB Requesthttps://jira.slac.stanford.edu/browse/SSC-202 DetailsThe current xrdcp version will fail to overwrite a file that is on a data server that has no free space left. It will fail because the redirector will not redirect the client. The new xrdcp version however will first remove the file and then write it to a new server. The other client tools have not been changed except xrd.pl for which an option to remove a directory tree has been added which is currently not applicable for the production xrootd. In order to update the version the link /afs/slac.stanford.edu/g/glast/applications/xrootd/PROD has to be changed to point to dist/20090202-1402v2/@sys
Change of Xrootd authorization for rm
Reason for changeThe xrootd redirectors are configured to forward a file remove request to all of its data server. Therefore we would like to configure the redirectors so that clients have to authenticate them self and only one production account is authorized to remove files. TestingThe Fermi xrootd test setup was configured to use authentication/authorization for the redirectors and data servers: These rules were tested using the four accounts one being a Fermi user (read-only access), a production account, the account that that has privileges to remove files and a non Fermi RollbackThe configuration can be rolled back by using the previous xrootd configuration and authorization. A restart of the xrootd redirectors is needed. CCB Requesthttps://jira.slac.stanford.edu/browse/SSC-199 DetailsAuthentication and authorization is required for all of the xrootd data server in order to restrict access to the Fermi data to Fermi members only. Write and remove privileges are granted to production accounts only. No restrictions were needed for the redirectors as all they did was to redirect clients to the data server. The redirectors got reconfigured so that they are able to remove files and therefore authentication and authorization has to be enabled. For the data servers we would like to change the authorization so that only the glastxrw user is able to remove files (so far other production accounts are also allowed). After changing the authorization files and xrootd config file the xrootd on the redirectors have to be restarted in order to activate the changes.
Install DataCat Linemode-Client patch version 2.3.3 to PROD
2.3.3 fixes a bug that prevents (large) dataset searches issued from the linemode client from completing. See https://jira.slac.stanford.edu/browse/SSC-197 for details.
Request to migrate to patched Oracle client tools v10.2.0
The current 10.2.0 driver used by applications needing passwordless (wallet) Oracle database access has a bug which prevents it from connecting to Oracle from a client machine which has been up for more than 200 days (32-bit linux kernel version 2.6) or 248 days (32-bit linux kernel version 2.4) A patched version of the client tools has been made available which fixes this problem. Change details:We have a central configuration script (actually 2, one for each of unix sh and csh) which requires a 1-line change to point everything to the new driver. Rollback details:Backing out the change is only a matter of restoring the original pointer. Testing:I've tested both clients on both 32-bit and 64-bit linux machines and the results are consistent. CCB Request:
Request to install a new Xrootd production version
Reason for changeThis is a request to change the xrootd server version and have a minor update of the client tools. Server version updateWe would like to upgrade the xrootd server version for the Fermi xrootd cluster from 20080828-1632 to 20090202-1402. The main change between these two versions are:
Due to this issue the crawler is currently not using the production xrootd but the test xrootd that runs the new version. Client version updateThe first time a xrootd client connects to a cluster it tries FirstConnectMaxCnt times to connect before it will fail. The default for this number is 150 but for xrd.pl it is overwritten and set to 10. Therefore a client will fail after about 3.3 min (the wait between connection attempts is 20sec) whereas with the default setting the client will fail only after 50 min. This is import as for an outage which typically last from 5-30 min we stop the redirector to avoid clients from being redirected and with the short wait time xrd.pl might fail. TestingAs every xrootd version basic tests were done reading from and writing to xrootd, and testing the client admin interface (rm, stat, checksum,...). The new version has been installed as a test version on the Fermi xrootd cluster which allows access to the glast data. The production crawler is using this version for more than a month. The fix to the timeout for xrd.pl has been tested. It has been verified that it will wait the expected time if a xrootd server is not available. RollbackTo switch the servers back to the old version the xrootd configuration has to be reverted to the old version followed by a restart of the old version. The client version is rolled back by recreating the link to the old version. CCB Requesthttps://jira.slac.stanford.edu/browse/SSC-185 DetailsServer version upgradecmsd logfile name changeAt the same time of the restart I would like to change the logfile name for the cmsd from olbdlog to cmsdlog this requires to change the name in StartXrd.cf.glast:
Restart of xrootd
The restart should take less then five minutes. Stopping the redirectors first prevents clients being redirected and the chance that a file is not found because a data server is being restarted. The clients will wait while the xrootds are down and reconnect once the data servers and redirectors are up. Update the client
This update will not change the xrootd client binaries it will only change the xrd.pl The directory structure of the xrootd application changed. Instead of keeping releases in afs-sysname/release-name they are now in dist/release-name/afs-sysname (e.g.: i386_rhel30/20080728-0933v1 to dist/20080728-0933v1/i386_rhel30).
Release Pipeline II version 1.3.2
These small changes to the pipeline make it possible to perform reprocessing, and make it possible for the run status to default to good. It also adds a feature to timeout database connections to see if this fixes the problem with gradual pipeline slowdown requiring frequent restarts. The timeout can be easily turned off, and the new version can be easily backed out if any problems occur. The associated JIRA is SSC-168@JIRA. Pipeline 1.3.2DataCat Client 2.3.2, DataCat Stored Procedures 2.2.1, DataHandling Common 1.5.1Run Quality 1.3.2
CCB Request for Pipeline Upgrade
These small changes to the pipeline make it possible to perform reprocessing, and make it possible for the run status to default to good. It also adds a feature to timeout database connections to see if this fixes the problem with gradual pipeline slowdown requiring frequent restarts. The timeout can be easily turned off, and the new version can be easily backed out if any problems occur. The associated JIRA is SSC-156@JIRA. Pipeline 1.3.1Pipeline Front-End 2.8Run Quality 1.3
Request to install a new Xrootd production version
Reason for changeServer version updateWe would like to upgrade the xrootd server version for the glast xrootd cluster from version
The full cvs Changelog is available at http://xrootd.slac.stanford.edu/download/20080828-1632/ChangeLog_to_v20080513-1222 Client version updateThe xrootd client tools are installed in /afs/slac.stanford.edu/g/glast/applications/xrootd We also would like to update the PROD version from 20080513-1222 to 20080728-0933.
We also would like to link the FITS version to PROD. TestingAs every xrootd version basic tests were done reading from and writing to xrootd, and testing the client admin interface (rm, stat, checksum,...). The new version has been installed as a test version on the glast xrootd cluster which allows access to the glast data. MC jobs were run successfully against it. The xrootd client version has been installed as TEST, DEV and FITS version in /afs/slac.stanford.edu/g/glast/applications/xrootd. The FITS and DEV versions have been successfully used for production activities. RollbackTo switch the servers back to the old version the xrootd configuration has to be reverted to the old version followed by a restart of the old version. The client version is rolled back by recreating the link to the old version. CCB Requesthttps://jira.slac.stanford.edu/browse/SSC-141 DetailsServer version upgradeThe following changes to the config file, xoootd.cf, are needed:
Restart of xrootd
The restart should take less then five minutes. Stopping the redirectors first prevents clients being redirected and the chance that a file is not found because a data server is being restarted. The clients will wait during the restart and reconnect to the data servers and redirectors. Update the client
CCB Request to Deploy Pipeline-II version 1.3
Actions:
Estimated Length of Outage to Perform UpgradeApproximately 20 minutes total based on length of time it took to add the new table-columns in DEV. Motivation:
Rollback Procedure:Because of the new stored procedures that will be installed the back-out procedure is slightly more involved than usual:
Note that the additional steps are quite simple to perform and only extend the outage for a back-out from ~5 minutes to ~10.
Associated Jira:DetailsPipeline 1.3DataCat Client 2.3Pipeline Front End 2.7
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||
