diff --git a/docs/users/index.rst b/docs/users/index.rst index 1cee67f7..7014cc36 100644 --- a/docs/users/index.rst +++ b/docs/users/index.rst @@ -5,7 +5,7 @@ User Documentation ================== The https://monitor.sns.gov is a portal for the users to monitor -the status of data aquisition and reduction during experiments. +the status of data acquisition and reduction during experiments. Different views are described below, sorted by various access levels. @@ -17,6 +17,12 @@ Different views are described below, sorted by various access levels. General_user Instrument_scientist +.. toctree:: + :maxdepth: 2 + :caption: Troubleshooting + + troubleshooting/index + .. toctree:: :maxdepth: 2 :caption: Release Notes diff --git a/docs/users/troubleshooting/Autoreducer_configuration_file.rst b/docs/users/troubleshooting/Autoreducer_configuration_file.rst new file mode 100644 index 00000000..d9d19f53 --- /dev/null +++ b/docs/users/troubleshooting/Autoreducer_configuration_file.rst @@ -0,0 +1,21 @@ +============================== +Autoreducer configuration file +============================== + +The configuration file is located at ``/etc/autoreduce/post_processing.conf`` on each of the +autoreducers. Contact the Linux sysadmins to make changes to the configuration file. + +Parameters that may be changed: + +- ``"processors"``: List of post-processors – this determines the queues that the autoreducer + subscribes to. If not defined, the autoreducer will use the default post-processors:: + + [ + "oncat_processor.ONCatProcessor", + "oncat_reduced_processor.ONCatProcessor", + "create_reduction_script_processor.CreateReductionScriptProcessor", + "reduction_processor.ReductionProcessor" + ] + +- ``"jobs_per_instrument"``: Limit on the number of concurrent jobs per instrument of the + autoreducer instance. diff --git a/docs/users/troubleshooting/Autoreduction_report.rst b/docs/users/troubleshooting/Autoreduction_report.rst new file mode 100644 index 00000000..e74589a3 --- /dev/null +++ b/docs/users/troubleshooting/Autoreduction_report.rst @@ -0,0 +1,27 @@ +.. _autoreduction_report: + +==================== +Autoreduction report +==================== + +The script +`ar_report.py `_ +can be used to extract and aggregate information about autoreduction from the reduction logs, e.g. +which host the autoreduction ran on and how long it took. + +Example of creating a report for one run:: + + python ar_report.py /HFIR/HB2C/IPTS-31640/nexus/HB2C_1238907.nxs.h5 out_ar_report/ + +Example of creating a report for all runs of one IPTS:: + + python ar_report.py /HFIR/HB2C/IPTS-31640/ out_ar_report/ + +Example output: + +.. csv-table:: + :file: HB2C-IPTS-31640.csv + :header-rows: 1 + +Note that the last column ``meas-redux`` of ``HB2C_1238907`` shows that the autoreduction time was +longer than the measurement time for this run. diff --git a/docs/users/troubleshooting/HB2C-IPTS-31640.csv b/docs/users/troubleshooting/HB2C-IPTS-31640.csv new file mode 100644 index 00000000..4c1d74d3 --- /dev/null +++ b/docs/users/troubleshooting/HB2C-IPTS-31640.csv @@ -0,0 +1,3 @@ +runID,runStart,runStop,runDuration,runCTime,eventSizeMiB,host,numReduced,version,reduxStart,logCTime,longAlgo,algoSec,loadSecTotal,loadNexusSecTotal,reduxEstTime,meas-redux +HB2C_1238915,2024-02-28T13:04:28,2024-02-28T13:04:50,21.5,2024-02-28T13:04,4.3,autoreducer4.sns.gov,0,6.9.20240226.1932,2024-02-28T18:04Z,2024-02-28T13:04,UNKNOWN,0.0,0.0,0.0,0.0,21.5 +HB2C_1238907,2024-02-28T12:42:55,2024-02-28T12:43:39,44.3,2024-02-28T12:43,5.5,autoreducer4.sns.gov,0,6.9.20240226.1932,2024-02-28T17:43Z,2024-02-28T12:44,LoadWANDSCD,39.3,40.0,0.0,48.8,-4.5 diff --git a/docs/users/troubleshooting/Message_broker.rst b/docs/users/troubleshooting/Message_broker.rst new file mode 100644 index 00000000..838361cd --- /dev/null +++ b/docs/users/troubleshooting/Message_broker.rst @@ -0,0 +1,13 @@ +======================= +ActiveMQ message broker +======================= + +ActiveMQ Classic comes with a web console that can be used for monitoring queues, subscribers, +etc. At the moment users/developers need to work with Linux sysadmins to access the ActiveMQ web +console at https://amqbroker.sns.gov:8161/, but this will be addressed as part of the migration to +ActiveMQ Artemis. + +.. image:: images/activemq_web_console.png + :width: 100% + :align: center + :alt: ActiveMQ web console diff --git a/docs/users/troubleshooting/images/activemq_web_console.png b/docs/users/troubleshooting/images/activemq_web_console.png new file mode 100644 index 00000000..7da9b669 Binary files /dev/null and b/docs/users/troubleshooting/images/activemq_web_console.png differ diff --git a/docs/users/troubleshooting/index.rst b/docs/users/troubleshooting/index.rst new file mode 100644 index 00000000..99e34ef8 --- /dev/null +++ b/docs/users/troubleshooting/index.rst @@ -0,0 +1,30 @@ +=============== +Troubleshooting +=============== + +Autoreduction +------------- + +- Check that all autoreduction services are up: + https://monitor.sns.gov/dasmon/common/diagnostics/ +- Examine the autoreduction logs on the analysis cluster: + ``///IPTS-XXXX/shared/autoreduce/reduction_log/``, see also + :ref:`autoreduction_report`. +- Examine the autoreducers logs on the individual autoreducers at + ``/var/log/SNS_applications/postprocessing.log``. +- Verify the configuration of the autoreduction workflow at: + https://monitor.sns.gov/database/report/task/. + +Database view +------------- +Users with admin privileges can open the Django admin interface at +https://monitor.sns.gov/database/ to view the database tables. Log tables, e.g. "Run status", may +be useful for troubleshooting. + +.. toctree:: + :maxdepth: 2 + :caption: Topics with more detail: + + Autoreduction_report + Autoreducer_configuration_file + Message_broker