This section is principally aimed at a Blueworx Voice Response Single System Image (SSI) environment running in an HACMP environment, but does not require HACMP and can be beneficial in a non-HACMP SSI environment.
This section describes how Blueworx Voice Response detects problems with DB2 and NFS servers, and what it does when it finds a problem.
This section describes DBHEALTH, a Blueworx Voice Response process that monitors DB2 and filesystems containing voice and customer applications. DBHEALTH also provides information on the accessibility of key resources to the rest of the system.
Most installations can use the default values of the system parameters used by DBHEALTH, and you should not need to read this section unless you see error messages about DBHEALTH or you are monitoring system status using SNMP or a custom server.
A key problem in an SSI environment is auto-restart after a power failure on the client and server systems in the cluster. At initialization Blueworx Voice Response keeps trying to contact resources on the server until they become available.
Product initialization scripts such as vaeinit, vaeinit.nox (and scripts called by them) retry during server outage. During this retry cycle a message such as "Will retry DB2 connection in 20 seconds" is displayed.
If you want to abort a Blueworx Voice Response startup that is stuck in retry mode run DT_shutdown.
If Blueworx Voice Response detects any resource problems when it starts, Blueworx Voice Response won't start any AUTOEXEC custom servers, or process other requests to start custom servers until the resources are available.
While waiting for a resource problem to clear the following message is added to DTstatus.out
CA_CNTL: initialization waiting on DBHEALTH system_state=8
Where system_state is the DTstatus returned by SNMP, see Blueworx Voice Response resources information.
A similar message is added to trace.
This is a very unlikely condition representing a resource problem after vaeinit has successfully checked the resources, possibly caused by instability of the resource server.The DBHEALTH process, which monitors resource availability, is started by NODEM with other programs in $SYS_DIR/tasklist.data.
The following parameters control the time DBHEALTH waits for a response from a resource before it considers there to be a problem with a resource and Blueworx Voice Response takes action:
These parameters are described in more detail in the Configuring the System guide.
The default timeout is good enough for most systems. A resource which causes a delay greater than 15 seconds is likely to be unacceptable to a caller, since the caller hears nothing during this time.
Operations that put a heavy load on the filesystem, such as backup, can cause a slow response which might be interpreted as a resource problem. If this happens, lower the priority of the operation rather than increasing the timeout values so that the response to your callers is not affected.If you configure Blueworx Voice Response to disconnect calls in progress when DBHEALTH detects a problem and an HACMP failover is planned anyway, use the System Monitor or your switch to quiesce trunks first, to minimize the number disconnected calls.
Note the following about DBHEALTH:
In most cases the System Administrator need not be aware of these details.
The resources monitored by DBHEALTH are DB2 itself and a file on five filesystems under $CUR_DIR. These files are:
These are the same resources initialized by $VAETOOLS/fsupdate when Blueworx Voice Response is installed or a Single System Image is configured. If these resources have been deleted, shut down Blueworx Voice Response then run $VAETOOLS/fsupdate. If problems persist with an SSI configuration see the section on creating and managing a single system image in Configuring the System guide.
This is like How monitors each file: :
If the signal SIGUSR1 (kill -30 <DBHEALTH process id>) is sent to DBHEALTH the debugging mode is toggled.
The debugging mode is not intended for normal use but might be useful when debugging system problems or deciding values for the Database Availability Check Timeout and File Availability Check Timeout system parameters. DBHEALTH's debugging mode generates a lot of information causing DTstatus.out to be archived and deleted.
In debugging mode the time taken for each resource to be polled and any error information from the polling command (DB2 query or filesystem access) is recorded. This additional error information might be useful because DBHEALTH records only the first problem per resource in the errorlog.
The fields written to DTstatus.out when the system is running normally are as follows:
For example:
Timestamp Process Function Line Time Resource
14:44:45.86 DBHEALTH DBpollingThread LINE 982 0.003s DB2(dtdbv230)
Similar information is also written to the system trace buffer.
Trace shows an entry like the following:
456 5.59218667 0.083580 CA_LIB: [29106] CA_CNTL Request 110 rejected system_state=8
(system state is the DTstatus as reported by SNMP).
Many of the Blueworx Voice Response windows continue to function after an outage, some display the message 'Database server unavailable. Access to data including HELP system not possible at this time.', but some appear to hang. However, the following functions are still available:
When DBHEALTH detects a resource problem it sets a variable which you can monitor as follows:
System Stat : run
Errors logged by DBHEALTH (numbers 5301, and 5310 through 5317) are described in the Problem Determination information.
If DBHEALTH issues a red error indicating that the system is experiencing resource problems, there should be a corresponding green error (5314) indicating that the problem has cleared.
Check information in the error message for details of the problem.
When your system is experiencing problems and unable to handle new calls, you might want to let callers know. You can do this by changing the message played when there are technical difficulties. See the section on changing the technical difficulties message in Configuring the System.