Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

No Format
http://glastlnx24:5441: org.apache.xmlrpc.XmlRpcException: Failed to create input stream: 
  Connection reset
or 
  Connection refused

This is a problem with the XML-RPC python server. This problem should be brought to the attention of the FO shifter.

19:20 (Richard, for Tony)

 Batch jobs were taking a long time, apparently being slow, but were in fact failed with no log files produced. Was tracked down to DNS failures on the balis. It has been reset (reported by Neal Adama at 18:15).

July 5

7:10 pm I restarted tomcat12 since the monitoring programs were complaining and ServerMonitoring showed it missing - Tony

Anchor
July 4
July 4
July 4

6:00 pm Old DQM ingestion script put back into production. The new script worked fine for some 24 hours and then we started having "idle" sessions locking out all the following ones. There were some 60 of them waiting. Killing the first one did not solve the problem as the next one went in "idle" state. We decided to kill all the waiting sessions and put the old script back in production. The failed ingest scripts are being rolled back.

Panel
"Mail from Ian"
titleAll the sessions have been killed off. Is it the same script that ran successfully yesterday. The database was waiting on sql*net message from client which usually means a process has gone idle. The two processes both went idle after issue
insert into DQMTRENDDATAID (dataid, loweredge, upperedge, runversionid) values(:1,:2,:3,:4)
There was no further action being taken by either session such as reads, execute counts, etc. So either the process was idle or it didn't have enough resource to even attempt what was to be executed next.
I think for now the old script is probably best to run. It would be nice is serialization wasn't done via locking.
It would also be good if I could adjust a couple of database parameters which requires a shot shutdown.{null

Anchor
July 3
July 3
July 3

01:00 Restarted tomcat07 due to Data Quality Monitoring Unresponsive.

01:00 PM New DQM ingestion script put into production to avoid ORACLE slowdowns. If any problems, please contact Max.

Anchor
July 2
July 2
July 2

02:55 Restarted tomcat07 due to Data Quality Monitoring Unresponsive.

Anchor
July 1
July 1
July 1 (Canada Day)

19:50 - Data Processing page went unresponsive for 2.5 hours. See GDP-26@JIRA and SSC-84@JIRA

Anchor
June 29
June 29
June 29

3:38 pm Restarted glast-tomcat07. Data Quality Monitoring Unresponsive

RunQuality Exception

Cannot set the run quality flag due to GRQ-4@JIRA

Anchor
June 27
June 27
June 27

11:55am Restarted glast-tomcat07. Data Quality Monitoring Unresponsive

12:25am: OpsLog and Monitoring/Trending web-apps interfering with each other

Anchor
June 26
June 26
June 26

Outstanding Issues:

David Decotigny requests we get calibration trending working again.

10:18pm Web severs are working again:

Panel
titleMail from Antonio Ceseracciu

The root problem was a software crash on rtr-slb1.
I just came in after Mike called me on the phone and power cycled the machine. It has come up fine and all services should now be restored.

...