Over, under, sideways, down


Position, orientation and mental models
I recently conducted a Human Factors investigation on an incident that happened on a plant as a result of human error. As always, there were multiple causes, some of which I won’t go in to. However I’d like to share one cause in particular with blog readers.
Some background first. The equipment that we are looking at is a safety instrumented system, protecting the plant from process hazards. The system diagnostics reported a fault in the communications path between I/O clusters. These clusters are connected using a pair of redundant fibre optic cables in an A/B configuration. In order to trace the location of the fault, the engineer had to disconnect the cables in turn. As long as one of either A or B remained connected, the system would remain running. Unfortunately the engineer disconnected the A cable at one cluster, and the B cable at the other, and the communication path was lost. The safety instrumented system then did exactly what it was designed to do; it shut the process down safely. Of course this resulted in significant lost production and associated cost. It’s also worth mentioning that shutdown and start-up operations are themselves more risky than normal operation, and introduce hazards of their own. So any unplanned shutdown increases risk as well as costing money.
Simple human error? The owning company didn’t think so and wanted to know whether any factors might have contributed to the chance of the error happening, and therefore possibly might be eliminated or controlled in the future.
In taking a look at the possible factors that might have contributed to the error (‘Performance Influencing Factors’), one that stood out was the internal layout of the cabinets. Here’s a couple of photos.
The yellow rectangle is a ‘cluster’. At the top are the two A and B modules. The units with the yellow labels are I/O modules (not relevant to this discussion). The clusters are normally mounted horizontally, with the A and B modules on the left hand side of the cluster. A is on the left, B on the right. In this case, because of the layout of the cabinets and restricted space, the clusters have been mounted vertically, with A at the top and B immediately below it. This allows the cables a clean run into the cable trunking on the left side of the cabinet. When I interviewed the engineers who look after these units, they commonly refer to the ‘top’ and ‘bottom’ unit, rather than A and B. So far, so good.
The sharp-eyed amongst you will have noticed that there is a lot of grey in these cabinets, and this is indeed the case. The modules are grey in colour and the cables are black with grey connectors. The modules were not labelled (they are now!). On one level this makes sense as they form common spares and a fixed label would introduce its own problems. Of course the lack of a label makes the engineer’s mental model of where the A and B module are located more critical to the task.
So, one end of the disconnected cable is mounted vertically, with ‘A over B’. Let’s have a look at the other end of the cable, the next cluster in the chain.
As you can see, it’s also mounted vertically, but with the A and B modules at the bottom. This allows the cables on the I/O modules to exit to the right and have a clear, straight path to the trunking. However, it also puts the A and B modules in the opposite orientation to the majority of other clusters. They are now mounted ‘B over A’, and at the time again not labelled. The engineer’s mental model of the modules is compromised by the different orientation. So instead of disconnecting the ‘top and top’, the engineer needs to disconnect ‘top and bottom’. Unfortunately, on the day in question, under some pressure to get the job done, he made the wrong choice.
Is this the only problem with the orientation and installation of the modules? Consider the location of the A and B modules in the extreme bottom right of the cabinet. This makes access difficult just from the need to be working at almost ground level. Awkward, uncomfortable and unusual body postures require a degree of mental resources to maintain, just the resources the engineer requires to ‘flip’ his mental model of A and B when making the decision about which cable to disconnect. Don’t believe me? Consider the common scenario of walking and talking at the same time. On a clear path, it’s no problem to maintain a conversation requiring mental resources at the same time as performing the well-rehearsed skill of walking. Change that clear path for an icy path, and you need to concentrate on balance and locomotion and the cognitive resources available for conversation dry up. It’s the same with any task, in this case a maintenance/engineering task. The more resources dedicated to an awkward or difficult posture, the less available for problem solving or decision making, especially if the posture needs to be maintained or is perceived to put the person at risk (such as working at heights). Something to consider the next time you risk-assess a maintenance job involving sending someone up on a ‘cherry-picker’ platform to perform a task. The location of the module at ground level contributes to the chance of human error.
As well as the body posture required to access the units, the ability to view the units and the cable connections is also compromised by the location. The relatively cramped installation (see photo) also erodes his mental resources, requiring conscious thought to maintain the uncomfortable and unfamiliar hand and finger movements required.
So we have compromised mental models, awkward location and cramped and restricted access all contributing to the likelihood of error. Combine this with the stress of perceived pressure to complete the job, plus other personal and work related pressures and the likelihood rises even further. Can we definitively say that any one of these ‘caused’ the error? Not really. However we can say that they made the error more likely.
As you can see, the modules are now labelled, to help with identification. What else could have been done to make the incident less likely to have happened?
Many thanks to the owners of the plant in question for allowing me to use their example. It’s always good to share learning on these things. Hopefully you will see installation standards and the effect on maintenance and fault-finding in a new light. Let me know in the comments section what you think could have been done to prevent or manage this error. Or maybe you have examples of installation practices that might contribute to the potential for error. If so, maybe you’d like to share them?
Image credit: By Corentin Lamy via Wikimedia Commons