If they are in a Failed state, you can just do a standard rollback.
If findChunks is stuck, then bkill the process and wait for the reaper to get it. Most of them should then auto rollback without intervention.
Most likely someone has created run locks manually to halt runs from being processed with wrong moot keys and successively failing. Don't forget to remove these run locks.

History

The MOOT Key Mismatch problem occurred on 2023 July 18. See the Fermi news item for 2023 July 20 at https://fermi.gsfc.nasa.gov/ssc/library/news/
- 15 LPA runs from the LAT had to have their MOOT keys manually fixed in Oracle and MySQL before they could be processed correctly through L1:
  - 711402686, 711405073, 711411046, 711416992, 711422925, 711428850, 711434773, 711440537, 711446088, 711451782, 711457476, 711463169, 711468863, 711474557, 711478257
The MOOT Key Mismatch problem occurred on 2017 November 30. See: https://www-glast.stanford.edu/protected/mail/opsprob/10893.html

Problems with Magic 7 File

For instance, this error in ft2Runs stream 240415006.734877891:

Code Block

language	bash

> terminate called after throwing an instance of 'std::runtime_error'
> what(): FATAL: the provided Magic 7 file does not cover the requested time interval. To cover the requested interval we would need to extrapolate position and attitude (forward) more than what permitted by the current configuration (see the parameter 'extrapolationLimit').

Michael fixed this by:

1) doRun.ft2Runs (case of 240415006.734877891) reads from the runs area:
stageIn for: /nfs/farm/g/glast/u41/L1/runs/734/r0734877891/r0734877891_v000_magic7L1.txt

The magic7 file in the staging area, which is also being copied into the
runs area, is complete. makeM7L1 reads from the staging area.

Thus, I just rolled back makeM7L1, which found all packets and created a
valid magic7L1 file in the runs area, to be read by ft2Runs.

2) doRun.doChunk.fakeFT2 (case of 240414007.734792595.6674757) stages from
the staging area:
stageIn for: /nfs/farm/g/glast/u28/stage/240414007/magic7_240414007.txt

This file was incomplete! I replaced it by the 240414008 magic7 file

PGWave drpMonitoring fails due to duplicate stream

PGWave reports a duplicate stream error, e.g.,

Code Block

language	bash

Task drpMonitoring Process launchDrpMonitoring Stream 745027200.0
org.srs.pipeline.server.sql.DatabaseUtilities$DuplicateStreamException: A stream ALREADY exists with specified task, parent, and id

According to Jim, if the downstream drpMonitoring task has already managed to be submitted, the failed PGWave.launchDrpMonitoring stream can be left as-is. If you want to clean it up, temporarily disable the code to launch the drpMoninitoring stream and then roll it back.

Beginning October 31, 2024, login to Confluence and Jira will change. Read more.

Space shortcuts

Child pages

Versions Compared

Old Version 43

New Version Current

Key

Problems with Magic 7 File

PGWave drpMonitoring fails due to duplicate stream

Beginning October 31, 2024, login to Confluence and Jira will change. Read more.

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 43

New Version Current

Key

Problems with Magic 7 File

PGWave drpMonitoring fails due to duplicate stream