Page History

...

Anchor
June 27
June 27
June 27

12:25am: There seems to be a problem with the OpsLog and Monitoring/Trending web-apps interfering with each other. I was able to reproduce the problem, see the emails from OpsProb below for descriptions of the symptoms and problem reproduction. I have notified Steve Tether and Max Turri about the problem by email.

Panel

title	email from Flath, Daniel

Fabio, Anders, Max, and Steve:

I was following up with Fabio's report, and was having great success selecting and updating runs, and viewing the plots. The monitoring front end was very responsive.

Then Anders had a problem with the OpsLog. So I opened the OpsLog (I didn't even have the application opened in my browser previously) and logged in.

Then I returned to the monitoring application, tried to update a single run and it took a couple minutes, re-selected ALL the runs and "timed me out" of the ops log with this message (which I received only after trying to force authentication-checking by creating a new entry):

<quote>
Your session has timed out.

If there is no activity from the browser in 3 hours the server releases the session. We limit the length of the session to control the load on the server. The server has no way of telling whether the browser is still connected other than to monitor its activity. If the sessions are too long the server keeps abandoned session open for an infinite time. They accumulate quickly and consume the memory.

Please return to the index and log in again.
</quote>

I'm going to hazard a guess that there's something conflicting between Max's and Steve's sessions, but I can't be sure as I'm not an expert web-developer.

For the moment, perhaps you can work around this by using one browser for the OpsLog and another for the monitoring/trending applications? I will notify Max and Steve by email, and copy my notes into the OpsLog (once I log in again) so they can follow up tomorrow morning.

If the problem is so severe that it can't wait until morning I can start waking people up. Let me know.

-Dan

Panel

title	email from Borgland, Anders

And at the same time I got looged out of Ops Log.

anders

On Thu, 26 Jun 2008, Anders Borgland wrote:

>
> I tried to look at the quantity SGPSBA_CURRDOP and got:
>
> java.lang.NullPointerException
> org.freehep.webutil.tree.TreeUtils.nodesForPath(TreeUtils.java:17)
>
> org.glast.base.application.web.filter.ApplicationFilter.defaultApplica
> tionFilter(ApplicationFilter.java:248)
>
> org.glast.base.application.web.filter.ApplicationFilter.doFilter(Appli
> cationFilter.java:173)
>
> org.glast.base.web.multipart.filter.MultipartFilter.doFilter(Multipart
> Filter.java:37)
>
> com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilte
> r.java:118)
>
> com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter
> .java:52)
>
> org.glast.base.web.datasource.DataSourceFilter.doFilter(DataSourceFilt
> er.java:119)
>
> org.glast.base.web.groupchecker.filter.GroupCheckerFilter.doFilter(Gro
> upCheckerFilter.java:43)
>
> org.glast.base.web.login.filter.LoginFilter.doFilter(LoginFilter.java:
> 127)
>
> org.glast.base.web.checkcookies.filter.CheckCookiesFilter.doFilter(Che
> ckCookiesFilter.java:41)
>
>
> Seems to happen to all quantities.
>
> anders

Panel

title	email from Gargano, Fabio

During my last two shifts (177C and 178C) sometimes I have had a problem with the Data Quality Monitoring In the DataQualityMonitoring page http://glast-ground.slac.stanford.edu/DataQualityMonitoring/^{Image Added}
i don't manage to check only one run in the run table. Every time i check only one and then click on the "update" button, it takes a while "thinking" and at the end all the run are checked again If I instead click directly to the data products link (digi, recon, etc )of the run I'm interested in, i go the the data monitoring page but i cannot see any plot.
I have also noticed that when this happen I'm automatically logged off from the opslog Do you have suggestions?

Anchor
June 26
June 26
June 26

Outstanding Issues:

...

Had to restart PROD data crawler one time (because Nagios and http://glastlnx20.slac.stanford.edu:5080^{Image Removed} were complaining). Looks like problem was caused by MC writing to DEV version of xrootd on glastlnx22.
Problems with Run Quality monitoring reported yesterday now fixed.

...

Panel

Watching plots on the web is right now very slow...

**Comment by David Paneque on Thursday, June 26, 2008 5:20:39 AM UTC
It is so from both, the shifter computers and our laptops.
**Comment by David Paneque on Thursday, June 26, 2008 5:21:33 AM UTC
now it is fast again...
**Comment by Tony Johnson on Thursday, June 26, 2008 5:55:18 AM UTC
Trending plots, data quality plot, all plots? One possibility is that xrootd load slows down plotting (some plots are read using xrootd). I noticed there were some ASP and MC jobs running in the pipeline around this time which may have been slowing things down.
**Comment by Tony Johnson on Thursday, June 26, 2008 6:07:59 AM UTC
Indeed in the DataQualityMonitoring log file around this time I see lots of messages about waiting for response from xrootd.

Outstanding Issues:

Wiki Markup
[ELG-18@jira] OpsLog session times out immediately after login.


\[\] Some plots in data monitoring are inaccessible (message says plot not available), even while the same plots are accessible from another browser at the same time. Maybe this is also related to the workstations in the control room running firefox 1.5 (seems unlikely)? See discussion in OpsProbList.


\[\] Jim reported that several of his scripts were running and were killed when the pipeline server was restarted. We need to understand why his data catalog queries are still taking so long >10 minutes to run.


[GRQ-1] Run quality status is not being updated even though change entries are made in the history table. I looked at the code but could not see anything obvious wrong, maybe I am too sleepy.


[LONE-72] Attach intent as meta-data to files


[LONE-71] Digi merging loses IObfStatus

–

 -- results in Digi files being marked as ContentError in Data Catalog

18:24 PDT A new version (1.2.4) of the pipeline has been installed. See https://jira.slac.stanford.edu/browse/SSC-74^{Image Removed}

DataQualityMonitoring hanging

...

Space shortcuts

Child pages

Versions Compared

Old Version 22

New Version 23

Key

Anchor
June 27
June 27
June 27

Anchor
June 26
June 26
June 26

Outstanding Issues:

DataQualityMonitoring hanging

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 22

New Version 23

Key

AnchorJune 27June 27June 27

AnchorJune 26June 26June 26

Outstanding Issues:

DataQualityMonitoring hanging

Anchor
June 27
June 27
June 27

Anchor
June 26
June 26
June 26