John Ridgway on safety-critical systems

John Ridgway studied physics at the University of Newcastle Upon Tyne and Sussex University before embarking upon a career in software engineering. As part of that career he worked for 28 years in the field of Intelligent Transport Systems (ITS), undertaking software quality management and systems safety engineering roles on behalf of his employer, Serco Transportation Systems. In particular, John provided design assurance for Serco’s development of the Stockholm Ring Road Central Technical System (CTS) for the Swedish National Roads Administration (SNRA), safety analysis and safety case development for Serco’s M42 Active Traffic Management (ATM) Computer Control System for the UK Highways Agency (HA), and safety analysis for the National Traffic Control Centre (NTCC) for the HA.

John is a regular contributor to the Safety Critical Systems Club (SCSC) Newsletter, in which he encourages fellow practitioners to share his interest in the deeper issues associated with the conceptual framework encapsulated by the terms ‘uncertainty’, ‘chance’ and ‘risk’. Although now retired, John recently received the honour of providing the after-banquet speech for the SCSC 2014 Annual Symposium.

Luke Muehlhauser: What is the nature of your expertise and interest in safety engineering?

John Ridgway: I am not an expert and I would not wish to pass myself off as one. I am, instead, a humble practitioner, and a retired one at that. Having been educated as a physicist, I started my career as a software engineer, rising eventually to a senior position within Serco Transportation Systems, UK, in which I was responsible for ensuring the establishment and implementation of processes designed to foster and demonstrate the integrity of computerised systems. The systems concerned (road traffic management systems) were not, initially, considered to be safety-related, and so lack of integrity in the delivered product was held to have little more than a commercial or political significance. However, following a change of safety policy within the procurement departments of the UK Highways Agency, I recognised that a change of culture would be required within my organisation, if it were to continue as an approved supplier.

If there is any legitimacy in my contributing to this forum, it is this: Even before safety had become an issue, I had always felt that the average practitioner’s track record in the management of risk would benefit greatly from taking a closer interest in (what some may deem to be) philosophical issues. Indeed, over the years, I became convinced that many of the factors that have hampered software engineering’s development into a mature engineering discipline (let’s say on a par with civil or mechanical engineering) have at their root, a failure to openly address such issues. I believe the same could also be said with regard to functional safety engineering. The heart of the problem lies in the conceptual framework encapsulated by the terms ‘uncertainty’, ‘chance’ and ‘risk’, all of which appear to be treated by practitioners as intuitive when, in fact, none of them are. This is not an academic concern, since failure to properly apprehend the deeper significance of this conceptual framework can, and does, lead practitioners towards errors of judgement. If I were to add to this the accusation that practitioners habitually fail to appreciate the extent to which their rationality is undermined by cognitive biases, then I feel there is more than enough justification for insisting that they pay more attention to what is going in the world of academia and research organisations, particularly in the fields of cognitive science, decision theory and, indeed, neuroscience. This, at least, became my working precept.

Luke: What do you think was the origin of your concern for the relevance of philosophy and cognitive science to safety engineering? For example did you study philosophy, or Kahneman and Tversky, in university?

John: As a physics student I was offered little opportunity (and to be honest had little ambition) to pursue an education in philosophy. Of course it is true that quantum mechanics made a mockery of my cherished belief in the existence of unobserved, objective reality, but I was very much in the ‘I know it makes no sense whatsoever but just shut up and carry on calculating’ school of practice. In fact, it wasn’t until I had become a jobbing software engineer, investigating the use of software metrics to predict development timescales, that philosophical issues started to take on an occupational importance.

The practice within my department had been to use single-point estimations for the duration of tasks, then feed these figures and the task dependencies into a Gantt chart and simply use the chart to read off the predicted project end date. Sometimes the results were very impressive: predicted project end dates often proved to be in the right decade! Then one bright spark said, ‘You’re living in the dark ages. What you need are three-point estimations for the duration of each task (i.e. estimate a most likely duration together with an upper and lower band). You then use Monte Carlo Simulation to create a project outturn curve; a curve which indicates a statistical spread of possible project durations. Everyone does this nowadays. And it just so happens I have an expensive risk management tool to do all the nasty calculations for you’. Imagine his look of disgust when I told him that I still thought the old way was better!

Was he talking to a smug idiot? I like to think not. The basis of my objection was this: As a physicist, I was well aware of the benefits of using Monte Carlo Simulation in areas such as solid state and nuclear physics. Here it is used to determine the macroscopic behaviour of a physical system, where probability distribution curves can be used to model stochastic variability of the micro behaviour of the system under study. Now, however, I was being invited to use Monte Carlo methods to draw conclusions based upon the averaging of various levels of posited (and decidedly non-stochastic) ignorance regarding the future. In such circumstances, no one could provide me with a convincing argument for deciding the most appropriate form for the probability distribution curve upon which the simulation would be based. In fact, I was told this choice didn’t matter, though quite clearly it did. If, as seemed likely, a realistic distribution curve would have a fat tail, the results would be hugely influenced by the choice of curve. Furthermore, the extra two estimates (i.e. the upper and lower bands for task duration) were supposed to represent a level of uncertainty, but the uncertainty behind their selection was at least of the same order of magnitude as the uncertainty these bounds were supposed to represent. In other words, one could not be at all certain how uncertain one was! It occurred to me that no real information was being added to the risk model by using these three-point estimates, and so no amount of Monte Carlo Simulation would help matters.

This led me to philosophical musing about the nature of uncertainty and the subtleties of its relationship with risk. These musings had a potentially practical value because I thought that a lot of people were spending a lot of time and money using inappropriate techniques to give themselves false confidence in the reliability of their predictions. Unfortunately, none of my colleagues seemed to share my concern and all I could do to try and persuade them was to wave my hands whilst speaking vaguely about second order uncertainty, model entropy and the important distinction between variability and incertitude. So I decided there was a gap in my education that needed filling.

As it happened, my investigations soon led me to the work of Professor Norman Fenton of Queen Mary University London, and I familiarised myself with concepts such as subjective probability and epistemic versus aleatoric uncertainty. Furthermore, once subjectivity had been placed centre stage, the relevance of cognitive sciences loomed large and although I can’t claim to have studied Tversky and Kahneman, I became familiar with ideas associated with decision theory that owe existence to their work.

Suddenly, my career seemed so much more interesting. And once I moved on to safety engineering, the same issues cropped up again in the form of over-engineered fault tree diagrams replete with probability values determined by ‘expert’ opinion. Now it seemed all the more important that practitioners should think more deeply about the philosophical and psychological basis for their confident proclamations on risk.

Luke: In many of your articles for the Safety-Critical Systems Club (SCSC) newsletter, you briefly discuss issues in philosophy and cognitive science and their relevance to safety-critical systems (e.g. 1, 2, 3, 4, 5, 6). During your time working on safety engineering projects, how interested did your colleagues seem to be in such discussions? Were many of them deeply familiar with the issues already? Does philosophy of probability and risk, and cognitive science of (e.g.) heuristics and biases, seem to be part of the standard training for those working safety engineering — at least, among the people you encountered?

John: Perhaps my experience was atypical, but the sad fact is that I found it extremely difficult to persuade any of my colleagues to share an interest in such matters, and I found this doubly frustrating. Firstly, I thought it to be a missed opportunity on my colleagues’ part, as I felt certain that application of the ideas would be to their professional advantage. Their diffidence was to a certain extent understandable, however, since there was nothing in the occupational training provided for them that hinted at the importance of philosophy or psychology. However, what really frustrated me was the fact that no one appeared to be at all excited by the prospect of introducing these subjects into the workplace. How could that be? How could my colleagues fail to be anything other than utterly fascinated? In fact, their lack of interest seemed to me to represent nothing less than a wanton refusal to enjoy their job!

The key to the problem lay, of course, with the training provided by my employer, and it didn’t help that the internal department that provided such training glorified under the title of ‘The Best Practice Centre’. Clearly, anything that I might say that differed from the company endorsed view was, by definition, less then best! And I soon found that berating the centre’s risk management course for failing to explore the concept of uncertainty was, if anything, counter-productive. Upon reflection, I think that some of these frustrations led me to seek an alternative forum in which I could express my thinking. Publishing articles for the Safety Critical Systems Club newsletter provided such an outlet.

Luke: You say that you “felt certain that application of the ideas would be to their professional advantage.” Can you give me some reasons and/or example for why you felt certain of that?

John: I think that my concerns were the product of working within a profession that appears to see the world rather too much in frequentist terms, in which the assumption of aleatoric uncertainty would be valid. In reality, it is increasingly the case that the risks a systems safety engineer has to analyse are predicated predominantly upon epistemic uncertainty. I cite, in particular, safety risks associated with complex software-based systems, adaptive systems and so-called Systems of Systems (SoS), or indeed any system that interacts with its environment in a novel or inherently unpredictable manner. Whilst it is true that analysing stochastic failure of physical components may play a significant role in predicting system failure, the probabilistic techniques involved in such analysis simply cannot address epistemic concerns, i.e. where the parameters of any posited probability distribution curve may be a matter for pure speculation. (I am aware that Monte Carlo simulation is sometimes used to probabilistically model the parametric uncertainty in probabilistic models, but this strikes me as an act of desperation reminiscent of the invention of epicycles upon epicycles to shore up the Ptolemaic cosmology).

There are a number of suitable approaches available to the safety analyst, which seek to accommodate epistemic uncertainty (Bayesian methods, Possibility Theory and Dempster-Schafer, to name but three). However, whilst the practitioner is not even aware that there is an issue, and continues to assume the objectivity of all probability, there is little hope that these methods will attract the attention they deserve.

Then, of course, we have to consider the pernicious effect that cognitive bias has upon the analyst’s assessment of likelihood. It is in the nature of such biases that the individual is unaware of their impact. Surely, therefore, even the most basic training in this area would be of considerable benefit to the practitioner. On a similar theme, I have become concerned that the average safety analyst is insufficiently mindful of the distinction to be made between risk aversion and ambiguity aversion. This may lead to a failure to adequately understand the rationality that lies behind a particular decision, but it may also explain why my colleagues didn’t appear to appreciate the importance of undertaking uncertainty management alongside risk management.

Finally, when one considers the interconnectivity of risks, and the complications introduced by multi-stakeholders, it becomes very difficult to think about risk management strategies without having to address ethical issues associated with risk transfer, optimisation and sufficing. But perhaps that is another story.

Luke: Yes, can you say more about the “ethical issues associated with risk transfer, optimization, and sufficing”?

John: In UK health and safety legislation there is an obligation to reduce existing risk levels ‘So Far As Is Reasonably Practicable’ (SFAIRP). This leads to the idea of residual risks being ‘As Low As Reasonably Practicable’ (ALARP). The ALARP concept assumes that an upper limit can be defined, above which the risk is considered ‘intolerable’. In addition, there is a lower limit, below which the risk is considered to be ‘broadly acceptable’. A risk may be allowed to lie between these two limits as long as it can be demonstrated that further reduction would require disproportionate cost and effort. Apart from the vagueness of the terminology used here, the main problem with this view is that it says nothing about the possibility that the management of one risk may bear upon the scale of another. Indeed, one can envisage a network of inter-connected risks in which this knock-on effect will propagate, resulting in the net level of risk increasing (keep in mind that the propagation may include both positive and negative feedback loops). For this reason, there exists the Globally At Least Equivalent (GALE) principle, which holds that upon modifying a system, one must assess the overall, resulting risk level posed by the system rather than focussing purely upon the risk that the modification was intended to address. The idea, of course is that the overall level should never increase.

So far this has all been very basic risk management theory and, on the face of it, the ALARP and GALE principles appear to complement each other. But do they always? Well, in the simple case where all risks are owned and managed by a single authority, this may be the case. But what if the various risks under consideration have differing owners and stakeholders? In such circumstances, parties who own risks and seek to reduce them SFAIRP may find themselves in conflict with each other, with the various stakeholders and with any body that may exist to ensure that the global risk level is not increased.

Perhaps we are now in the province of game theory rather than decision theory. If so, it seems reasonable to insist that the game be played in accordance with ethical constraints, but has anyone declared what these might be? Some seem obvious; for example, never transfer risk to another party without their knowledge and consent. Others may not be so straightforward. I think we are all familiar with the old chestnut of the signalman who can prevent a runaway train from ploughing into a group of schoolchildren by changing the points, but only by causing the certain death of the train driver. Does the signalman have the moral authority to commit murder? Would it be murder whether or not he or she switches the points? If we find this difficult to answer, one can easily envisage similar difficulties when deciding the ethical framework associated with the management of risk collectives.

Luke: Which changes do you most desire for the safety-critical systems industry?

John: I think that there is a lot to be said for making membership of a suitably constructed professional body a legal imperative for undertaking key systems safety engineering roles. Membership would require demonstration of a specified level of competence, adherence to formulated codes of conduct and the adoption of appropriate ideologies. Given my responses to earlier questions, your readers will probably be unsurprised to hear that I hope that this would provide the opportunity to promote a greater understanding of the conceptual framework lying behind the terms ‘risk’ and ‘uncertainty’. In particular, I would like to see a professional promotion of the philosophical, ethical and cognitive dimensions of system safety engineering.

I appreciate that the various engineering disciplines are already well served by a number of professional societies and that, for example, an Independent Safety Assessor (ISA) in the UK would be expected to be a chartered engineer and a recognised expert in the field (whatever that means). However, the epistemic uncertainties surrounding the development and application of complex computer-based systems introduce issues that perhaps the commonly encountered engineering psyche may not be fully equipped to appreciate. It may be that the safety engineering professional may need to think more like a lawyer. Consequently, the professional body I am looking for could be modelled upon the legal profession, as much as upon the existing engineering professions. I know for some people ‘lawyer’ is a dirty word, but so is ‘subjectivity’ in some engineers’ minds. Being pragmatic, and in order to stay in the game, we may all have to become sophists.

Luke: Thanks, John!

Browse

John Ridgway on safety-critical systems

Categories