Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    • devices_and_attributes row count checks: problem accessing the table, or row count 0, of devices_and_attributes row count differs from distinct rows in the curr_pvs view.
    • If there is a discrepancy:
      • A possible candidate is PVs with the same name in both FACET and LCLS, or between more than 1 IOC in LCLS.  Since devices_and_attributes has unique PV names, while in curr_pvs PV name+IOC name, names duplicated in curr_pvs will only show up once in devices_and_attributes, hence a count difference.  Here is a useful select statement to find the duplicates:
      • select distinct (rec_nm) from curr_pvs where system='LCLS'
        minus
        select distinct(rec_nm) from devices_and_attributes where system='LCLS'
      • The fix will have to occur in the implicated IOC application...and the next crawl following the fix will resolved the problem.
      • For a shortterm mitigation, edit IRMISDataValidation.pl and comment out the check_DEVICES_AND_ATTRIBUTES step.
    • check for successful completion of all crawler steps: check to make sure all crawler steps enter a start and a completion row in controls_global.data_validation_audit.
      some circumstances where steps are missing:
      • step launched but didn’t finish: check the status of processes launched by the cron job using ps --ef | grep. An example: when perl dbi was hanging due to the 199-day-Linux-server-uptime bug. Several LCLS PV crawler jobs had launched, but had hung in the db_connect statement, and had to be killed from the Linux command line.
      • step launched and finished, but the completed step was never written: the getPwd problems cause this symptom. See entries starting 9/23 9 pm for an illustration.
      • step never launched: is the script available? Is the server up? Is crontab/trscrontab configured correctly? Are there permission problems? etc.
      • other mysteries: figure out where the job in question stopped, using ps --ef, logfiles, etc…

...

  1. Synchronization to MCCO that bypasses error checking
    • If you need to run the synchronization to MCCO even though IRMISDataValidation.pl failed (i.e. the LCLS crawler ran fine, but others failed), you can run a special version that bypasses the error checking, and runs the sync no matter what. It’s:
      /afs/slac/u/cd/jrock/jrock2/DBTEST/tools/irmis/cd_script/runSync-no-check.csh

  2. Comment out code in IRMISDataValidation.pl
    • If the data validation needs to bypass a step, you can edit IRMISDataValidation.pl (see above tables for location) to remove or change a data validation step and enable the crawler jobs to complete. For example, if a problem with the PV client crawlers causes the sync to MCCO not to run, you may want to simply remove the PV Client crawler check from the data validation step.

  3. Really worst case! Edit the MCCO tables manually
    • If the PV crawlers will not complete with a sync of good data to MCCO, and you decide to wait til November for me to fix it (this is fine – the PV crawler parser is a complicated piece of code that needs tender loving care and testing!), AND accelerator operations are affected by missing PVs in one of these tables, the tables can be updated manually with names that are needed to operate the machine:
      • aida_names (see Bob and Greg)
      • bsa_root_names (see Elie)
      • devices_and_attributes (see Elie)

#top

...

crawling a brand new directory structure

  • start off by testing in the SLACDEV instance!! To do this, must check the pv crawler dir out of cvs into a test directory, and modify db.properties to point to SLACDEV instance PLUS cvs checkout crawl scripts and setenv TWO_TASK to SLACDEV for set*IOCsActive scripts.
  • all directories have to be visible from each the IOC boot directory
  • the host running the crawler must be able to "see" Oracle (software and database) and the boot directory structure.
  • set up env vars in run*PVCrawler.csh and pvCrawlerSetup.csh.
  • if necessary, create crawl group in IOC table, and set*IOCsActive.csh to activate.
  • add call to pv_crawler.csh to run with the new env vars...
  • hope it works!

...