IBM®
Skip to main content
    Country/region [select]      Terms of use
 
 
     Home      Products      Services & solutions      Support & downloads      My account     
  IBM Wikis > HPC Central Wiki > HPC Central > RSCT
HPC Central Wiki Log In | Sign Up   View a printable version of the current page.
RSCT
Added by Michael Parkes, last edited by Michael Parkes on Nov 19, 2007  (view change)
Labels: 
(None)

Known Issues

Date Added: October 29, 2007
Date Last Updated: November 19, 2007

hags daemon core dumps on RSCT 2.3.11.4, 2.3.11.5, 2.4.7.4 or 2.4.7.5

Users Affected:
A cluster with more than 150 nodes, or any HACMP clusters, at rsct.basic.rte 2.3.11.4, rsct.basic.rte 2.3.11.5, rsct.basic.rte 2.4.7.4 or rsct.basic.rte 2.4.7.5.

Issue:
The hags daemon may core in clusters with more than 150 nodes or any HACMP clusters.  This core may be in check_lost_lines_by_logthread ("TraceStream.C").

Solution:
Reject rsct.basic.rte 2.3.11.4, rsct.basic.rte 2.3.11.5, rsct.basic.rte 2.4.7.4 and rsct.basic.rte 2.4.7.5 from all nodes in the cluster where it has been applied.

Fox AIX. order the following APARs:
IZ07443 rsct.basic.rte 2.3.12.1
IZ06869 rsct.basic.rte 2.4.8.1

For Linux, download the following, as part of the CSM 1.7.0.1 update (http://www14.software.ibm.com/webapp/set2/sas/f/csm/download/home.html):
rsct.basic.rte 2.4.8.1


Date Added: September 18, 2007

Peer Domain resiliency enhancements added in RSCT 2.4.7.4 (APAR IZ01378) and RSCT 2.3.11.4 (APAR IZ01379)

In heavily-loaded systems, contention for resources like memory, I/O, or CPU may result in RSCT daemons not being able to make progress in a timely manner. That may result in false node failures, or in RSCT daemons being recycled. To minimize the possibility that the daemons be prevented from accessing system resources, the Topology Services, Group Services, and Configuration Resource Manager daemons now run with a fixed realtime CPU priority, which should allow them to access CPU resources even when several other processes in the system are running.

Note that the use of a realtime fixed CPU priority will not result in the RSCT daemons using additional CPU resources. The priority will only ensure that the daemons will be allowed to access the CPU whenever needed.

The second step in improving the daemons' resilience to resource contention involves locking ("pinning") their pages in real memory. Once the pages are brought to physical memory, they are not allowed to be paged out, thus minimizing the possibility that daemons become blocked or delayed during periods of high paging activity.

Because the daemons' pages are locked in memory, the corresponding physical pages are dedicated to the daemons and cannot be used by other processes in the system. Therefore the amount of physical memory available for other processes is slightly reduced.

By default, the daemons will use a fixed CPU priority and lock the pages in memory. This behavior can be changed, with the following commands:

/usr/sbin/rsct/bin/cthatstune -p 0
will direct the RSCT daemons not to use a fixed CPU priority.
For the Group Services daemon, the setting will only take effect the next time RSCT Peer Domain is onlined on the node.

CT_MANAGEMENT_SCOPE=2 chrsrc -c IBM.RSCTParameters TSPinnedRegions=256
will direct the RSCT daemons not to lock their pages in memory.
The setting will only take effect the next time RSCT Peer Domain is onlined on the node.


Date Added: June 29, 2007

hats packet incompatibility with RSCT 2.3.11.2 or RSCT 2.4.7.2

Users Affected:
A cluster with some, but not all, nodes at rsct.basic.rte 2.3.11.2 or rsct.basic.rte 2.4.7.2

Issue:
Any nodes in a cluster at rsct.basic.rte 2.3.11.2 or rsct.basic.rte 2.4.7.2, will be unable to communicate to other nodes at lower levels of rsct.basic.rte via hatsd.

Symptoms of this problem may include:

  • A partitioned RSCT peer domain
  • HACMP - no heartbeating to lower level nodes over the IP networks
  • a lost of host responds in PSSP

Solution:
Reject rsct.basic.rte 2.3.11.2 and rsct.basic.rte 2.4.7.2 from all nodes in the cluster where it has been applied.

Order the following APARs when available:
IZ00913 rsct.basic.rte 2.3.11.3
IZ00912 rsct.basic.rte 2.4.7.3


Date Added: June 15, 2007

IBM.ConfigRMd core dump after migrating nodes in existing peer domains to RSCT 2.3.11.0 or RSCT 2.4.7.0

Users Affected:
Nodes in a peer domain migrated to rsct.core.rmc 2.3.11.0 or rsct.core.rmc 2.4.7.0.

Issue:
IBM.ConfigRMd will dump core every 24 hours on a node in a peer domain that was migrated to rsct.core.rmc 2.3.11.0 or rsct.core.rmc 2.4.7.0.

IBM.ConfigRM will restart automatically after the core dump.

Solution:
Apply IY99078 (rsct.core.rmc 2.3.11.1) or higher.
Apply IY99077 (rsct.core.rmc 2.4.7.1) or higher.


Date Added: June 15, 2007

Potential deadlock within certain Resource Managers

Users Affected:
Users with rsct.core.rmc 2.3.9.4, rsct.core.rmc 2.3.10.0, rsct.core.rmc 2.4.5.4 or rsct.core.rmc 2.4.6.0

Issue:
A potential deadlock exists within certain Resource Managers. The Resource Managers that are known to possibly be affected are IBM.WLMRM and IBM.HostRM.

Issuing lssrc against these Resource Managers, or lsrsrc against a resource class managed by one of the Resource Managers (e.g. IBM.Program) may hang.

A hang of IBM.HostRM may cause gui dlpar functions to hang and lspartition -debug to not display all of the lpars.

Solution:
Apply IY90698 (rsct.core.rmc 2.3.10.1) or higher.
Apply IY90697 (rsct.core.rmc 2.4.6.1) or higher.



Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.2.10 Build:#528 Nov 29, 2006)
    About IBM Privacy Contact