Finding the right node
The Policy Engine is the component that takes the cluster’s current state, decides on the optimal next state and produces an ordered list of actions to achieve that state.
You can get a summary of what the cluster did in response to resource failures and nodes joining/leaving the cluster by looking at the logs from pengine
:
grep -e pengine\\[ -e pengine: /var/log/messages
Although the
pengine
process is active on all cluster nodes, it is only doing work on one of them. The “active” instance is chosen through thecrmd
’s DC election process and may move around as nodes leave/join the cluster.
If you do not see anything from pengine
at the time the problem occurs, continue to the next machine.
If you do not see anything from pengine
on any node, check your cluster if logging to syslog is enabled and the syslog configuration to see where it is being sent. If in doubt, refer to Pacemaker Logging.
Once you have located the correct node to investigate, the first thing to do is look for the terms ERROR and WARN, eg.
grep -e pengine\\[ -e pengine: /var/log/messages | grep -e ERROR -e WARN
This will highlight any problems the software encountered.
Next expand the query to all pengine
logs:
grep -e pengine\\[ -e pengine: /var/log/messages
The output will look a little like:
pengine[6132]: notice: LogActions: Move mysql (Started corosync-host-1 -> corosync-host-4)
pengine[6132]: notice: LogActions: Start www (corosync-host-6)
pengine[6132]: notice: process_pe_message: Calculated Transition 7: /var/lib/pacemaker/pengine/pe-input-4424.bz2
pengine[6132]: notice: process_pe_message: Calculated Transition 8: /var/lib/pacemaker/pengine/pe-input-4425.bz2
In the above logs, transition 7 resulted in mysql
being moved and www
being started. Later, transition 8 occurred but everything was where it should be and no action was required.
Other notable entries include:
pengine[6132]: warning: cluster_status: We do not have quorum - fencing and resource management disabled
pengine[6132]: notice: stage6: Scheduling Node corosync-host-1 for shutdown
pengine[6132]: warning: stage6: Scheduling Node corosync-host-8 for STONITH
as well as
pengine[6132]: notice: LogActions: Start Fencing (corosync-host-1 - blocked)
which indicates that the cluster would like to start the Fencing
resource, but some dependancy is not satisfied.
pengine[6132]: warning: determine_online_status: Node corosync-host-8 is unclean
which indicates that either corosync-host-8
has failed, or a resource on it has failed to stop when requested.
pengine[6132]: warning: unpack_rsc_op: Processing failed op monitor for www on corosync-host-4: unknown error (1)
which indicates a health check for the www
resource failed with a return code of 1
(aka. OCF_ERR_GENERIC
). See Pacemaker Explained for more details on OCF return codes.
Is there anything from the Policy Engine at about the time of the problem?
If not, go back to thecrmd
logs and see why no recovery was attempted.Did
pengine
log why something happened? does that sound correct?
Excellent, thanks for playing.
Getting more detail from the Policy Engine
The job performed by the Policy engine is a very complex and frequent task, so to avoid filling up the disk with logs, it only indicates what it is doing and rarely the reason why. Normally the why can be found in the crmd
logs, but it also saves the current state (the cluster configuration and the state of all resources) to disk for situations when it can’t.
These files can later be replayed using crm_simulate
with a higher level of verbosity to diagnose issues and, as part of our regression suite, to make sure they stay fixed afterwards.
Finding these state files is a matter of looking for logs such as
crmd[1811]: notice: run_graph: Transition 2 (... Source=/var/lib/pacemaker/pengine/pe-input-473.bz2): Complete
pengine[1810]: notice: process_pe_message: Calculated Transition 0: /var/lib/pacemaker/pengine/pe-input-473.bz2
The “correct” entry will depend on the context of your query.
Please note, sometimes events occur while the
pengine
is performing its calculation. In this situation, the calculation logged byprocess_pe_message()
is discarded and a new one performed. As a result, not all transitions/files listed by thepengine
process are executed by thecrmd
.
After obtaining the file named by run_graph()
or process_pe_message()
, either directly or from a crm_report
archive, pass it to crm_simulate
which will display its view of the cluster at that time:
crm_simulate --xml-file ./pe-input-473.bz2
Does the cluster state look correct?
If not, file a bug. It is possible we have misparsed the state of the resources, any calculation we make based on this would therefor also be wrong.
Next, see what recovery actions the cluster thinks need to be performed:
crm_simulate --xml-file ./pe-input-473.bz2 --save-graph problem.graph --save-dotfile problem.dot --run
In addition to the normal output, this command creates:
problem.graph
, the ordered graph of actions, their parameters and prerequisitesproblem.dot
, a more human readable version of the same graph focussed on the action ordering.
Open problem.dot
in dotty
or graphviz
to obtain a graphical representation:
- Arrows indicate ordering dependencies
- Dashed-arrows indicate dependencies that are not present in the transition graph
- Actions with a dashed border of any color do not form part of the transition graph
- Actions with a green border form part of the transition graph
- Actions with a red border are ones the cluster would like to execute but cannot run
- Actions with a blue border are ones the cluster does not feel need to be executed
- Actions with orange text are pseudo/pretend actions that the cluster uses to simplify the graph
- Actions with black text are sent to the
lrmd
- Resource actions have text of the form
${rsc}_${action}_${interval} ${node}
- Actions of the form
${rsc}_monitor_0 ${node}
is the cluster’s way of finding out the resource’s status before we try and start it anywhere - Any action depending on an action with a red border will not be able to execute.
- Loops are really bad. Please report them to the development team.
Check the relative ordering of actions:
- Are there any extra ones?
Do they need to be removed from the configuration?
Are they implied by thegroup
construct? - Are there any missing?
Are they specified in the configuration?
You can obtain excruicating levels of detial by adding additional -V
options to the crm_simulate
command line.
Now see what the cluster thinks the “next state” will look like:
crm_simulate --xml-file ./pe-input-473.bz2 --save-graph problem.graph --save-dotfile problem.dot --simulate
- Does the new cluster state look correct based on the input and actions performed?
If not, file a bug.