Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The four existing NFS servers are operating on equipment that is over 5 years old. Each of these servers hosts 32 GB of storage. 
  • wain025, the user/group disk, is routinely heavily loaded and not infrequently overloaded, sometimes leading to hangs and crashes.
  • within the xroot cluster, the oldest four 32-GB servers are over 6 years old (wain017, 019, 020, 021)
  • the 11 next oldest 32-GB xroot servers were purchased in 2009 (wain033, 034, 035, 036, 037, 038 and 039)
  • six xroot servers have been made READ-ONLY by Wilko due to frequent problems (wain053, 054, 055, 056, 069, 071) This may be due to the extensive use of Seagate drives, which fail in a way that can hang the I/O bus. Note that Maintech has begun to use Hitachi drives for replacement.

...

Ordered Priorities:
  1. Growth and reliability for Fermi on-orbit data. 
  2. Upgrade server for Fermi user and group disk partitions to increase performance – but retain segregation of the four NFS groups.
  3. Retire old equipment
    1. oldest equipment
    2. unreliable equipment

...

Other factors:
  • SCS is planning a strategic change to GPFS and is now experimenting with the two new Fermi servers. One possible outcome of this work is to retain GPFS and run xroot on top of it. If not, then the machines will be reinitialized, the disks reformatted to XFS and configured as earlier Dell super-servers.

...

  • There is worry about mixing xroot and NFS on the same server so that practice has been avoided in the past.

...

Option 1:
  • Dedicate the two new servers to xroot service.
  • NFS upgrade
    • Migrate all xroot data from fermi-xrd001 (90 TB, installed 3/6/2012) to other servers
    • Reconfigure fermi-xrd001 for NFS use
    • Move contents of wain026, 031 and 032 to fermi-xrd001
    • Move user partitions on wain025 to wain031 (6-months newer than wain025)
    • Move group partitions on wain025 to wain032
    • Retire wain025 and wain026
  • Select eight (8) additional wains in xroot service to vacate and retire.

Discussion:  

  • This  This idea might not be popular with ISOC as it bundles their currently independent file server, wain031, with other user groups on fermi-xrd001.  They seem comfortable with the status quo so the remaining issue is the age of wain031 (installed 3/9/2009).  (Queston: Could one use VMs, CHOS, cgroups or some other mechanism to effectively partition a machines resources and then run separate instances of NFS on each partition to achieve some guarantee of performance for each NFS group?)
  • While this option effectively doubles the performance for the user/group disks, as well as substantially increasing the amount of space available, it does not address the age of those servers (both installed March 2009).