RCC Active Incidents

RCC Active Incidents

For information about the FlashLite and Euramoo clusters, please consult QRIScloud portal.

This page lists the currently active or recently resolved incidents that have impacted other HPC services.

Active cases

There are 3 active cases.

#9722: RCC HPC: Tinaroo scheduler not starting some jobs (Incident Solved)

Logged 2017-05-17 07:53:16 +1000
Last updated 2017-05-17 11:12:17 +1000

RCC is investigating a problem that is preventing some jobs from starting.
The jobs show up in Q state in the qstat output, and even though there are resources available, the jobs do not start.
The affected jobs appear to have a persistent scheduler hold placed on them.

The queued jobs that were stuck have been freed and should run when resources are available.
Problem appears to have been caused by a "sour" compute node.

#9615: Tinaroo/FlashLite/Euramoo HPC: Please refrain from running heavy processing on login nodes (Incident Solved)

Logged 2017-04-18 12:04:40 +1000
Last updated 2017-04-18 12:44:37 +1000

Users are reminded that running heavy processing and/or memory intensive applications on the login nodes can seriously disadvantage other users.

Please use the qsub -I facility of the batch system to obtain a dedicated compute node.

If you require a remote desktop on Tinaroo, use the facility that is already provided .
Do not run your own VNC service on the login nodes.

Please attend Intro to HPC training if you have not already done so.

#9261: Tinaroo: Job related emails are not working (Incident Active)

Logged 2016-11-10 10:54:28 +1000
Last updated 2016-11-10 10:54:28 +1000

Emails out of the Tinaroo PBS server do not work.

Recent cases

There are no recent cases.