These papers are available for download subject to the standard copyright restrictions. A copy may be made for personal research use only. The document may not be copied or sold to third parties. Quoted extracts should acknowledge the source document. Use of larger portions of the document require the permission of the copyright holder.
Recent publications are listed first.
- Overview of Approaches to the Use and Licensing of COTS Digital Devices in Safety Critical Industries
- The Role of Certification in the Safety Demonstration of COTS EDDs
- Justifying PLC-based applications with limited cooperation from platform supplier - the COGS approach
- Emphasis class 1 and class 2 assessment of Rosemount pressure and temperature transmitters
- Templates, databases and other harmonised approaches to the safety justification of embedded digital devices.
- Justification of commercial industrial instrumentation and control equipment for nuclear power plant applications.
- Safety Demonstration of a Class 1 Smart Device
- V&V Techniques for FPGA-Based I&C Systems – How Do They Compare with Techniques for Microprocessors?
- Security-Informed Safety: Integrating Security Within the Safety Demonstration of Smart Device
- The Risk Assessment of ERTMS-Based Railway Systems from a Cyber Security Perspective: Methodology and Lessons Learned
- Justifying Digital COTS Components when Compliance Cannot be Demonstrated – The Cogs Approach
- Why are I&C Modernisations So Difficult? Experiences with Requirements Engineering and Safety Demonstration in Swedish NPPs
- Understanding, assessing and justifying I&C systems using Claims, Arguments and Evidence
- Building Blocks for Assurance Cases
- Compliance with Standards or Claim-based Justification? The Interplay and Complementarity of the Approaches for Nuclear Software-based Systems
- Interpreting ALARP
- Combining testing and proof to gain high assurance in software: A case study
- Security-Informed Safety: If It's Not Secure, It's Not Safe
- Does Software have to be Ultra Reliable in Safety Critical Systems?
- HARMONICS EU FP7 Project on the Reliability Assessment of Modern Nuclear I&C Software
- Justification of a FPGA-Based System Performing a Category C Function: Development of the Approach and Application to a Case Study
- Safety Justification Frameworks: Integrating Rule-Based, Goal-Based, and Risk-Informed Approaches
- Toward a Formalism for Conservative Claims about the Dependability of Software-Based Systems
- Diversity for Security: a Study with Off-The-Shelf AntiVirus Engines
- Assessment and Qualification of Smart Sensors
- Overcoming Non-determinism in Testing Smart Devices: A Case Study
- An Approach to Using Non Safety-Assured Programmable Components in Modest Integrity Systems
- Infrastructure interdependency analysis: Introductory research review
- Infrastructure interdependency analysis: Requirements, capabilities and strategy
- Reliability Modeling of a 1-Out-Of-2 System: Research with Diverse Off-The-Shelf SQL Database Servers
- Measuring Hazard Identification
- Justification of smart sensors for nuclear applications
- Independent Safety Assessment of Safety Arguments
- Software and SILS
- Application of a Commercial Assurance Case Tool to Support Software Certification Services
- An Exploration of Software Faults and Failure Behaviour in a Large Population of Programs
- An Empirical Exploration of the Difficulty Function
- The future of goal-based assurance cases
- Estimating PLC logic program reliability
- MC/DC based estimation and detection of residual faults in PLC logic networks
- Using a Log-normal Failure Rate Distribution for Worst Case Bound Reliability Prediction
- Integrity Static Analysis of COTS/SOUP
- Software Criticality Analysis of COTS/SOUP
- Learning from incidents involving E/E/PE systems
- Worst Case Reliability Prediction Based on a Prior Estimate of Residual Defects
- Estimating Residual Faults from Code Coverage
- Rescaling Reliability Bounds for a New Operational Profile
- Learning from incidents involving electrical/ electronic/ programmable electronic safety-related systems. Project outline.
- Graphical Notations, Narratives and Persuasion: a Pliant Systems Approach to Hypertext Tool Design
- Process Modelling to Support Dependability Arguments
- The Practicalities of Goal-Based Safety Regulation
- Use of SOUP in safety related applications
- The REVERE project: experiments with the application of probabilistic NLP to systems engineering
- The Development of a Commercial 'Shrink-Wrapped Application' to Safety Integrity Level 2: The DUST-EXPERT™ Story
- Requirements for a Guide on the Development of Virtual Instruments
- The Formal Development of a Windows Interface
- A Methodology for Safety Case Development
- Using Reversible Computing to Achieve Fail-safety
- Viewpoints on Improving the Standards Making Process: Document Factory or Consensus Management?
- PERE: Evaluation and Improvement of Dependable Processes
- A Conservative Theory for Long-Term Reliability Growth Prediction
- Data Reification Without Explicit Abstraction Functions
- The SHIP Safety Case
- The SHIP Safety Case - A Combination of System and Software Methods
- Software Fault Tolerance by Design Diversity
- Stepwise Development and Verification of a Boiler System Specification
- The Variation of Software Survival Times for Different Operational Input Profiles
- characterise the types of fault that are present in these programs
- characterise how programs are debugged during development
- assess the effectiveness of diverse programming.
- Methods for assessing the safety integrity of safety-related software of uncertain pedigree (SOUP) Report No: CRR337 HSE Books 2001 ISBN 0 7176 2011 5http://www.hse.gov.uk/research/crr_pdf/2001/crr01337.pdf
- Justifying the use of software of uncertain pedigree (SOUP) in safety-related applications Report No: CRR336 HSE Books 2001 ISBN 0 7176 2010 7 http://www.hse.gov.uk/research/crr_pdf/2001/crr01336.pdf
Overview of Approaches to the Use and Licensing of COTS Digital Devices in Safety Critical Industries
Commercial-Off-The-Shelf (COTS) components are increasingly used in nuclear Instrumentation and Control (I&C) applications. They have several commercial advantages, as nuclear specific products may not be available and the cost of developing bespoke components may be prohibitive. In addition, commercial components typically benefit from a wider user base, and therefore, greater amounts of operating data that increase the chances of detecting (and fixing) systematic faults. While there are several commercial benefits in the use of COTS components, there are also several challenges and concerns with regard to their safety demonstration and justification. This paper summarises a report that considered the use of COTS components in a range of safety-critical applications.
The Role of Certification in the Safety Demonstration of COTS EDDs
Embedded digital COTS devices are increasingly being used in Nuclear Power Plants. Although these devices are often not developed according to nuclear standards, they still need to be justified to be deployed in nuclear applications. Different countries have been developing their own processes to justify COTS digital devices. In many cases, this justification is based on the assessment of the development process. This is consistent with traditional standard-based approaches to safety justification – compliance to accepted practice was deemed to imply adequate safety. This could be demonstrated either directly through a review of the development artefacts or indirectly through consideration of existing certification, e.g., IEC 61508. This paper discusses the use of development process-based approaches to the safety justification of EDDs COTS components, the link between development processes and reliability, how certification may support the justification, and some of the pitfalls of relying on certification.
Justifying PLC-based applications with limited cooperation from platform supplier - the COGS approach
Several control and monitoring applications are implemented using commercial-off-the-shelf (COTS) PLCs that were not necessarily developed according to nuclear standards. The UK nuclear regulatory regime requires that a safety case be developed to justify and communicate their safety. Typically, the assessment of COTS components has been done with a focus on standards compliance – compliance to accepted practice was deemed to imply adequate safety. However, there may be a number of difficulties with justifying COTS products related to limited knowledge of the internal structure of the components or their development processes, especially when the supplier of the PLC platform is not willing to provide the necessary information to complete a compliance case. This paper describes a claim-based approach to the justification of COTS PLC components using Cogs, developed in a project funded by the UK nuclear industry. The approach focuses on the behaviour of the system rather than on the process followed to develop the PLC platform structures the justification around behaviour attributes (such as functionality, performance and reliability) and considers them in terms of the application and/or platform uses information about the platform that is likely to be publicly available from the supplier
Emphasis class 1 and class 2 assessment of Rosemount pressure and temperature transmitters
This paper describes the Class 1 assessment of the Rosemount 3051 Pressure Transmitter and the Class 2 assessment of the Rosemount 644 Temperature Transmitter using Emphasis at SIL 3 and 2 respectively. Emerson has pursued many approvals and certifications on these transmitter platforms. The audit for each of these assessments is unique and probes information at varying levels of detail. As compared to other approvals and certifications audits, the Emphasis assessment is a much more productive and in-depth review of design and project materials. The assessment is focused on reviewing quality procedures, design and project artefacts that prove practical engineering practices, and processes that would lead to good product design. This paper describes Emerson’s approach to the assessment. For this assessment, Emerson answered the over 300 assessment questions and provided over 150 archived documents as evidence for each individual product. Throughout the assessment, Emerson’s knowledge of IEC 61508, quality standards, product development processes and software engineering practices showed that, as a smart device manufacturer, Emerson is approaching design processes and procedures with the necessary rigor to produce devices capable of meeting the most stringent requirements. Key Words: smart devices, safety demonstration, embedded digital devices
Templates, databases and other harmonised approaches to the safety justification of embedded digital devices.
This paper describes work funded by Energiforsk to consider the feasibility of using harmonised component level safety demonstration and, in particular, on using aspects of the UK approach to licensing and qualification of smart devices in Finland. We concluded that the use of harmonised component justification is feasible. In shorter timescales, this seems more likely to succeed if such an approach is developed within Finland. Using the assessments performed in the UK in Finland would have several advantages, but there are a number of technical and commercial issues that would need to be overcome for this to be feasible. Key Words: Embedded digital devices, commercial-off-the-shelf components, smart devices
Justification of commercial industrial instrumentation and control equipment for nuclear power plant applications.
This paper discusses work done by the authors to develop an IAEA Nuclear Energy Series report to provide guidance on what would constitute an adequate justification process for a COTS device to be installed in a NPP for important to safety applications such that there is reasonable assurance of high quality and that the application of the COTS does not introduce new, unanalysed failure modes. The publication provides a process for justification of digital COTS devices that may be used to guide the incorporation of these devices into the design of I&C systems important to safety, such that there is sufficient evidence to demonstrate that these products have adequate integrity to meet the requirements for their intended nuclear applications.
Safety Demonstration of a Class 1 Smart Device
Horizon Nuclear Power intends to build Advanced Boiling Water Reactors (ABWR) at Wylfa and Oldbury in the UK, based on the Hitachi design. In accordance with UK policy for new nuclear build, Hitachi, as the reactor designer, is the requesting party to the Generic Design Assessment (GDA) during which the reactor design will be reviewed by the Office for Nuclear Regulation (ONR) and the Environment Agency. This paper describes the scope, criteria, process, and approach for the safety class 1 (SC1) pilot study and summarizes the results of the study.
V&V Techniques for FPGA-Based I&C Systems – How Do They Compare with Techniques for Microprocessors?
We compare verification and validation (V&V) techniques for FPGA and microprocessor-based instrumentation and control (I&C) systems from the point of view of standards compliance, an approach based on behavioural properties, and the analysis of vulnerabilities. We found that the non-technology-specific elements of the standards considered are very similar. Differences are more marked when considering behavioural properties and vulnerabilities: the amount of effort required and confidence level obtained depend on a number of properties of the particular design under verification.
Security-Informed Safety: Integrating Security Within the Safety Demonstration of Smart Device
In this paper we discuss the impact of integrating security when developing a safety demonstration of a smart device. A smart device is an instrument, device or component that contains a microprocessor (and therefore contains both hardware and software) and is programmed to provide specialised capabilities, often measuring or controlling a process variable. Examples of smart devices include radiation monitors, relays, turbine governors, uninterruptible power supplies and heating ventilation, and air conditioning controllers.
The Risk Assessment of ERTMS-Based Railway Systems from a Cyber Security Perspective: Methodology and Lessons Learned
The impact that cyber issues might have on the safety and resilience of railway systems has been studied for more than five years by industry specialists and government agencies. This paper presents some of the work done by Adelard in this area, ranging from an analysis of potential vulnerabilities in the ERTMS specifications through to a high-level cyber security risk assessment of a national ERTMS implementation and detailed analysis of particular ERTMS systems on behalf of the GB rail industry. The focus of the paper is on our overall methodology for security-informed safety and hazard analysis. Lessons learned will be presented but of course our detailed results remain proprietary or sensitive and cannot be published.
Justifying Digital COTS Components when Compliance Cannot be Demonstrated – The Cogs Approach
This paper describes a claim-based approach to the justification of COTS components (called Cogs) that was developed in a project sponsored by the UK nuclear industry. The Cogs approach is based on a set of top-level claims that remain the same for the different components but which allows for different types of evidence to be used to support specific COTS products. This allows greater flexibility in making a justification while ensuring that all safety relevant attributes of the COTS are justified.
Why are I&C Modernisations So Difficult? Experiences with Requirements Engineering and Safety Demonstration in Swedish NPPs
Several I&C modernisation projects have encountered issues and difficulties resulting in delays and overspend. This paper describes the work we have done with the aim of identifying the main issues that have been experienced in I&C modernization projects, and any lessons learnt during these projects. For this, we conducted a number of interviews in Swedish nuclear plants, focusing on the demonstration of safety and requirements engineering. The paper discusses the findings from our interviews.
Understanding, assessing and justifying I&C systems using Claims, Arguments and Evidence
I&C systems important to safety need to be demonstrably safe. Usually this is performed by demonstrating compliance with some relevant standards. This paper argues that compliance is not necessarily enough, and suggested using a claim-based approach to understand, assess and justify the safety of I&C systems.
Building Blocks for Assurance Cases
The paper introduces an approach to structuring assurance cases using specially-designed CAE building blocks. The blocks are derived from an empirical analysis of the real case structures and can standardise the presentation of assurance cases by simplifying their architecture. CAE building blocks might also increase the precision and efficiency of the claims in arguments and can be used as self-contained reusable components of formal and semi-formal assurance cases.
Compliance with Standards or Claim-based Justification? The Interplay and Complementarity of the Approaches for Nuclear Software-based Systems
In the past, safety justifications tended to be standards-based – compliance to accepted practice was deemed to imply adequate safety. Over the last 20 years, there has been a trend towards an explicit claim-based approach, where specific safety claims are sup- ported by arguments and evidence at progressively more detailed levels. This paper discusses software-based systems with only a modest integrity requirement, and the interplay of the two approaches. It describes our experience with justifying such systems for the nuclear industry, and it claims that there are a number of benefits of taking both approaches together.
This paper explores some of the common difficulties in interpreting the ALARP principle, and traces the potential effects of these difficulties on system risk. We introduce two categories of risk reduction approach which permit us to characterise the risk profile of a system in more detail and discuss their application to Systems of Systems (SoS).
Combining testing and proof to gain high assurance in software: A case study
There are potential benefits in combining static analysis and testing because the results obtained can be more general than standalone dynamic testing but less resource-intensive than standalone static analysis. This paper presents a specific example of this approach applied to the verification of continuous monotonic functions. This approach combines a monotonicity analysis with a defined set of tests to demonstrate the accuracy of a software function over its entire input range. Unlike “standalone” dynamic methods, our approach provides full coverage, and guarantees a maximal error. We present a case study of the application of our approach to the analysis and testing of the software-implemented transfer function in a smart sensor. This demonstrated that relatively low levels of effort were needed to apply the approach. We conclude by discussing future developments of this approach.
Security-Informed Safety: If It's Not Secure, It's Not Safe
Traditionally, safety and security have been treated as separate disciplines, but this position is increasingly becoming untenable and stakeholders are beginning to argue that if it’s not secure, it’s not safe. In this paper we present some of the work we have been doing on “security-informed safety”. Our approach is based on the use of structured safety cases and we discuss the impact that security might have on an existing safety case. We also outline a method we have been developing for assessing the security risks associated with an existing safety system such as a large-scale critical infrastructure.
Does Software have to be Ultra Reliable in Safety Critical Systems?
This paper argues that higher levels of safety performance can be claimed by taking account of: 1) external mitigation to prevent an accident: 2) the fact that software is corrected once failures are detected in operation. A model based on these concepts is developed to derive an upper bound on the number of expected failures and accidents under different assumptions about fault fixing, diagnosis, repair and accident mitigation. A numerical example is used to illustrate the approach. The implications and potential applications of the theory are discussed.
HARMONICS EU FP7 Project on the Reliability Assessment of Modern Nuclear I&C Software
This paper discusses the HARMONICS EU FP1 project on reliability assessment.
Justification of a FPGA-Based System Performing a Category C Function: Development of the Approach and Application to a Case Study
Field Programmable Gate Arrays (FPGAs) have been gaining interest in the nuclear industry for a number of years. Their simplicity compared to microprocessor-based platforms is expected to simplify the licensing approach, and therefore reduce licensing project risks compared to software based solutions. However, few safety-related applications have been licensed in the nuclear industry; those that have are typically safety applications at Category A, and work on standardizing the licensing approach has been focused on this category. This paper presents work currently being performed on the justification of an FPGA that performs a Category C function, i.e., a function of the lowest safety category. The FPGA is part of the system monitoring vibration of the gags of the fuel assembly in one of the UK nuclear plants. Part of this work involves developing an approach for the justification which is consistent with the UK nuclear regulatory framework and commensurate with the safety category of the function performed. We draw on a number of standards, including those for software performing a function of similar criticality. However, evidence that the design and verification of the system followed a well-structured development process does not provide direct evidence that the system achieves the required behavior. Therefore, the approach also considers behavioral attributes that are important for the system, using a goal-based approach. This is complemented by a risk-informed approach, in which postulated hazards are evaluated to ensure they have been addressed and any remaining vulnerabilities of the system mitigated.
Safety Justification Frameworks: Integrating Rule-Based, Goal-Based, and Risk-Informed Approaches
The reliability and safety of the digital I&C systems that implement safety functions are critical issues. In particular, software defects could result in common cause failures that defeat redundancy and defence-in-depth mechanisms. Unfortunately, the differences in current safety justification principles and methods for digital I&C restrict international co-operation and hinder the emergence of widely accepted best practices. These differences also prevent cost sharing and reduction, and unnecessarily increase licensing uncertainties, thus creating a very difficult operating environment for utilities, vendors and regulatory bodies. The European project HARMONICS (Harmonised Assessment of Reliability of MOdern Nuclear I&C Software) is seeking to develop a more harmonised approach to the justification of software-based I&C systems important to safety. This paper outlines the justification framework we intend to develop in HARMONICS. It will integrate three strategies commonly used in safety justifications of I&C system and its software: rule based evidence of compliance to accepted standards; goal-based evidence that the intended behaviour and other claimed properties has been achieved; and risk-informed evidence that unintended behaviour is unlikely. The paper will present general forms of safety case that can be adapted to a variety of specific topics.
Toward a Formalism for Conservative Claims about the Dependability of Software-Based Systems
Here, we consider a simple case where an expert makes a claim about the probability of failure on demand (pfd) of a subsystem of a wider system and is able to express his confidence about that claim probabilistically. An important, but difficult, problem then is how such subsystem (claim, confidence) pairs can be propagated through a dependability case for a wider system, of which the subsystems are components. An informal way forward is to justify, at high confidence, a strong claim, and then, conservatively, only claim something much weaker: e.g. if I am 99 percent confident that the pfd is less than 0.00001 it is reasonable to be 100 percent confident that it is less than 0.001.In this paper, we provide formal support for such reasoning.
Diversity for Security: a Study with Off-The-Shelf AntiVirus Engines
In this paper we present an emprical analysis using a known set of software viruses to explore the detection gains that can be achieved from using more diversity (i.e. more than two AntiVirus products), how diversity may help to reduce the “at risk time” of a system and a preliminary model-fitting using the hyper-exponential distribution.
Assessment and Qualification of Smart Sensors
This paper describes research work done on approaches to justifying smart instruments, and in particular, how some of this research has successfully been applied to the safety substantiation of such instruments.
Overcoming Non-determinism in Testing Smart Devices: A Case Study
Non-determinism can arise due to inaccuracy in an analogue measurement made by the device when two alternative actions are possible depending on the measured value. This non-determinism makes it difficult to predict the output values that are expected from a test sequence of analogue input values. The paper presents two approaches to dealing with this difficulty: (1) based on avoidance of test values that could have multiple responses, (2) based on consideration of all possible interpretations of input data.
An Approach to Using Non Safety-Assured Programmable Components in Modest Integrity Systems
There is a problem in justifying the use of programmable components if the components have not been safety justified to an appropriate integrity (e.g. to SIL 1 of IEC 61508). This paper outlines an approach (called LowSIL) developed in the UK CINIF nuclear industry research programme to justify the use of non safety-assured programmable components in modest integrity systems.
Infrastructure interdependency analysis: Introductory research review
This paper presents an introductory review of research in infrastructure interdependency modelling and analysis. In particular, it focuses on network models, interdependency analysis, infrastructure models, simulation under federation and visualization.
Infrastructure interdependency analysis: Requirements, capabilities and strategy
This paper aims at assessing the technical and commercial feasibility of the development of tools and services for analysing interdependency between infrastructures, particularly information infrastructures, and assessing associated risks, as well as establishing “interdependency analysis” as a distinct and recognisable service supported by tools and data.
Reliability Modeling of a 1-Out-Of-2 System: Research with Diverse Off-The-Shelf SQL Database Servers
This paper discusses two methods for modelling the reliability growth of a fault-tolerent database constucted from diverse database servers.
Measuring Hazard Identification
This paper discusses an experiment that measured the effectiveness of a hazard identification process used to support safety in Defence Standard 00-56 project. The experimental case study utilised a Ministry of Defence project that assessed simultaneously two potential suppliers who were competing for a MOD equipment contract. The UK MOD Corporate Research Programme funded the comparison work and the MOD Integrated Project Team funded the project which included each contractor's project safety processes.
Justification of smart sensors for nuclear applications
This paper describes the results of a research study sponsored by the UK nuclear industry into methods of justifying smart sensors. Smart sensors are increasingly being used in the nuclear industry; they have potential benefits such as greater accuracy and better noise filtering, and in many cases their analogue counterparts are no longer manufactured. However, smart sensors (as it is the case for most COTS) are sold as black boxes despite the fact that their safety justification might require knowledge of their internal structure and development process. The study covered both management aspects of interacting with manufacturers to obtain the information needed, and the technical aspects of designing an appropriate safety justification approach and assessing feasibility of a range of technical analyses. The analyses performed include the methods we presented at Safecomp 2002 and 2003.
Independent Safety Assessment of Safety Arguments
The paper describes the role of independent Safety Auditor (ISA) as carried out at the present in the defence and other sectors in the UK. It outlines the way the ISA role has developed over the past 15–20 years with the changing regulatory environment. The extent to which the role comprises audit, assessment or advice is a source of confusion, and the paper clarifies this by means of some definitions, and by elaborating the tasks involved in scrutinising the safety argument for the system. The customers and interfaces for the safety audit are described, and pragmatic means for assessing the competence of ISAs are presented.
Software and SILS
This short article for the UK Safety Critical Systems Club Newsletter suggests an alternative interpretation of the SIL concept for software.
Application of a Commercial Assurance Case Tool to Support Software Certification Services
This short paper for the SoftCeMent 05 workshop presents an approach to delivering a range of software certification processes based on the commercial assurance case tool, ASCE.
An Exploration of Software Faults and Failure Behaviour in a Large Population of Programs
A large part of software engineering research suffers from a major problem---there are insufficient data to test software hypotheses, or to estimate parameters in their models. To obtain statistically significant results, a large set of programs is needed, each set comprising many programs built to the same specification. We have gained access to such a large body of programs (written in C, C++, Java or Pascal) and in this paper we present the results of an exploratory analysis of around 29\thinspace000 C programs written to a common specification.
The objectives of this study were to:
The findings are discussed, together with the potential limitations on the realism of the findings.
An Empirical Exploration of the Difficulty Function
The theory developed by Eckhardt and Lee (and later extended by Littlewood and Miller) utilises the concept of a "difficulty function" to estimate the expected gain in reliability of fault tolerant architectures based on diverse programs. The "difficulty function" is the likelihood that a randomly chosen program will fail for any given input value. To date this has been an abstract concept that explains why dependent failures are likely to occur. This paper presents an empirical measurement of the difficulty function based on an analysis of over six thousand program versions implemented to a common specification. The study derived a "score function" for each version. It was found that several different program versions produced identical score functions, which when analysed, were usually found to be due to common programming faults. The score functions of the individual versions were combined to derive an approximation of the difficulty function. For this particular (relatively simple) problem specification, it was shown that the difficulty function derived from the program versions was fairly flat, and the reliability gain from using multi-version programs would be close to that expected from the independence assumption.
The future of goal-based assurance cases
Most regulations and guidelines for critical systems require a documented case that the system will meet its critical requirements, which we call an assurance case. Increasingly, the case is made using a goal-based approach, where claims are made (or goals are set) about the system and arguments and evidence are presented to support those claims. In this paper we describe Adelard's approach to safety cases in particular, and assurance cases more generally, and discuss some possible future directions to improve frameworks for goal-based assurance cases.
Estimating PLC logic program reliability
This paper applies earlier theoretical work to an industrial PLC logic example. This study required extensions to the previous to estimate the number of residual logic faults (N). and we show that the worst case bound theory is applicable.
MC/DC based estimation and detection of residual faults in PLC logic networks
Coverage measurement has previously been used to estimate residual faults in conventional program code. The basic idea is that the relationship between code covered and faults found is nearly linear, so it is possible to estimate the number of residual faults from the proportion of uncovered code. In this paper we apply the same concept to PLC logic networks rather than conventional program code, combined with a random test strategy designed to maximize coverage growth. This proved to be very efficient in detecting the known faults in an industrial logic example
Using a Log-normal Failure Rate Distribution for Worst Case Bound Reliability Prediction
Prior research has suggested that the failure rates of faults follow a log normal distribution. We propose a specific model where distributions close to a log normal arise naturally from the program structure. The log normal distribution presents a problem when used in reliability growth models as it is not mathematically tractable. However we demonstrate that a worst case bound can be estimated that is less pessimistic than our earlier worst case bound theory.
Integrity Static Analysis of COTS/SOUP
This paper describes the integrity static analysis approach developed to support the justification of commercial off-the-shelf software (COTS) used in a safety-related system. The static analysis was part of an overall software qualification programme, which also included the work reported in our paper presented at Safecomp 2002. The analysis addressed two main aspects: the internal integrity of the code (especially for the more critical functions), and the intra-component integrity, checking for covert channels. The analysis process was supported by an aggregation of tools, combined and engineered to support the checks done and to scale as necessary. Integrity static analysis is feasible for industrial scale software, did not require unreasonable resources and we provide data that illustrates its contribution to the software qualification programme.
Software Criticality Analysis of COTS/SOUP
This paper describes the Software Criticality Analysis (SCA) approach that was developed to support the justification of commercial off-the-shelf software (COTS) used in a safety-related system. The primary objective of SCA is to assess the importance to safety of the software components within the COTS and to show there is segregation between software components with different safety importance. The approach taken was a combination of Hazops based on design documents and on a detailed analysis of the actual code (100kloc). Considerable effort was spent on validation and ensuring the conservative nature of the results. The results from reverse engineering from the code showed that results based only on architecture and design documents would have been misleading.
Learning from incidents involving E/E/PE systems
The UK Health and Safety Executive (HSE) commissioned a research study into methods of learning from incidents involving electrical, electronic and programmable elactronic systems (E/E/PES). The approach is designed to comply with the IEC 61508 standard and to be suitable for organisations at different levels of maturity.
The three reports resulting from this work can be downloaded from the HSE web site:
Part 1: Review of methods and industry practice.
HSE Contract Research Reports RR179 December 2003,
Part 2: Recommended scheme.
HSE Contract Research Reports RR181, December 2003,
Part 3: Guidance examples and rationale.
HSE Contract Research Reports RR182, December 2003,
Worst Case Reliability Prediction Based on a Prior Estimate of Residual Defects
In this paper we extend an earlier worst case bound reliability theory to derive a worst case reliability function R(t), which gives the worst case probability of surviving a further time t given an estimate of residual defects in the software and a prior test time T. The earlier theory and its extension are presented and the paper also considers the case where there is a low probability of any defect existing in the program. The implications of the theory are discussed and compared with alternative reliability models.
Estimating Residual Faults from Code Coverage
Many reliability prediction techniques require an estimate for the number of residual faults. In this paper, a new theory is developed for using test coverage to estimate the number of residual faults. This theory is applied to a specific example with known faults and the results agree well with the theory. The theory is used to justify the use of linear extrapolation to estimate residual faults. It is also shown that it is important to establish the amount of unreachable code in order to make a realistic residual fault estimate.
Rescaling Reliability Bounds for a New Operational Profile
One of the main problems with reliability testing and prediction is that the result is specific to a particular operational profile. This paper extends an earlier reliability theory for computing a worst case reliability bound. The extended theory derives a re-scaled reliability bound based on the change in execution rates of the code segments in the program. In some cases it is possible to derive a maximum failure rate bound that applies to any change in the profile. It also predicts that (in principle) a fair test profile can be derived where the reliability bounds are relatively insensitive to the operational profile. In addition the theory allows unit and module test coverage measures to be incorporated into an operational reliability bound prediction. The implications of the theory are discussed, and the theory is evaluated by applying it to two example programs with known faults.
Learning from incidents involving electrical/ electronic/ programmable electronic safety-related systems. Project outline.
The UK Health and Safety Executive (HSE) has initiated a programme of work that will eventually provide guidance for those responsible on how to learn from their own incident data; a means for HSE to ensure that it has the best information attainable on incidents involving electrical/ electronic/ programmable electronic (E/E/PE) safety-related systems; and a stimulus to industry. HSE has contracted a consortium, led by Adelard and also involving the Glasgow (University) Accident Analysis Group (GAAG) and Blacksafe Consulting, to carry out a 7-month interactive project that will: 1) identify and evaluate existing schemes for classifying causes from incident data and generating lessons to avoid recurrence of similar incidents; 2) select and modify an existing scheme or schemes, or derive a new one, in order to create a method for analysing and classifying incident data to match the principles and activities of IEC 61508; 3) test the new method using data from a small number of real incidents; and 4) identify and present the significant strengths and weaknesses of the proposed method and how it fits in with wider issues such as incident reporting, incident investigation and process improvement. This project is part of HSE's longer-term programme to provide best advice in this field. The paper provides an outline of the project.
Graphical Notations, Narratives and Persuasion: a Pliant Systems Approach to Hypertext Tool Design
The Adelard Safety Case Editor (ASCE) is a hypertext tool for constructing and reviewing structured arguments. ASCE is used in the safety industry, and can be used in many other contexts when graphical presentation can make argument structure, inference or other dependencies explicit. ASCE supports a rich hypertext narrative mode for documenting traditional argument fragments. In this paper we document the motivation for developing the tool and describe its operation and novel features. Since usability and technology adoption issues are critical for software and hypertext tool uptake, our approach has been to develop a system that is highly usable and sufficiently "pliant" to support and integrate with a wide range of working practices and styles. We discuss some industrial application experience to date, which has informed the design and is informing future requirements. We draw from this some of the perhaps not so obvious characteristics of hypertext tools which are important for successful uptake in practical environments.
Process Modelling to Support Dependability Arguments
This paper reports work to support dependability arguments about the future reliability of a product before there is direct empirical evidence. We develop a method for estimating the number of residual faults at the time of release from a "barrier model" of the development process, where in each phase faults are created or detected. These estimates can be used in a conservative theory in which a reliability bound can be obtained or can be used to support arguments of fault freeness. We present the work done to demonstrate that the model can be applied in practice. A company that develops safety-critical systems provided access to two projects as well as data over a wide range of past projects. The software development process as enacted was determined and we developed a number of probabilistic process models calibrated with generic data from the literature and from the company projects. The predictive power of the various models was compared.
The Practicalities of Goal-Based Safety Regulation
"Goal-based regulation" does not specify the means of achieving compliance but sets goals that allow alternative ways of achieving compliance, e.g. "People shall be prevented from falling over the edge of the cliff". In "prescriptive regulation" the specific means of achieving compliance is mandated, e.g. "You shall install a 1 meter high rail at the edge of the cliff". There is an increasing tendency to adopt a goal-based approach to safety regulation, and there are good technical and commercial reasons for believing this approach is preferable to more prescriptive regulation. It is however important to address the practical problems associated with goal-based regulation in order for it to be applied effectively. This paper discusses the motivation for adopting a goal-based regulatory approach, and then illustrates the implementation by describing SW01 which forms part of the CAP 670 regulations for ground-based air traffic services (ATS). The potential barriers to the implementation of such standards together are discussed, together with methods for addressing such barriers.
Use of SOUP in safety related applications
The UK Health and Safety Executive (HSE) recently commissioned research from Adelard into how pre-existing software components may be safely used in safety-related programmable electronic systems in a way that complies with the IEC 61508 standard. Two reports resulted from this work and are now published on the HSE web site:
The first report summarises the evidence that is likely to be available in practice relating to a software component to assist in assessing the safety integrity of a safety function that depends on that component.
The second report considers how the available evidence can best be used within the framework of the IEC 61508 safety lifecycle to support an argument for the safety integrity achieved by a safety function.
The REVERE project: experiments with the application of probabilistic NLP to systems engineering
Despite natural language's well-documented shortcomings as a medium for precise technical description, its use in software-intensive systems engineering remains inescapable. This poses many problems for engineers who must derive problem understanding and synthesise precise solution descriptions from free text. This is true both for the largely unstructured textual descriptions from which system requirements are derived, and for more formal documents, such as standards, which impose requirements on system development processes. This paper describes experiments that we have carried out in the REVERE project to investigate the use of probabilistic natural language processing techniques to provide systems engineering support.
The Development of a Commercial 'Shrink-Wrapped Application' to Safety Integrity Level 2: The DUST-EXPERT™ Story
We report on some of the development issues of a commercial "shrink-wrapped application" - DUST-EXPERT™ - that is of particular interest to the safety and software engineering community. Amongst other things, the following are reported on and discussed: the use of formal methods; advisory systems as safety related systems; safety integrity levels and the general construction of DUST-EXPERT's safety case; statistical testing checked by an "oracle" derived from the formal specification; and our achieved productivity and error density.
Requirements for a Guide on the Development of Virtual Instruments
Adelard is producing a good-practice guide and training course on the development of virtual instruments as part of the DTI's Software Support for Metrology programme. This paper describes our requirements capture process and presents some of the principal issues that are emerging.
The Formal Development of a Windows Interface
This paper describes an approach to the use of the formal method VDM in the design and implementation of Microsoft Windows™ interfaces. This approach evolved during the development of Dust-Expert™, a Windows-based system for providing design advice on the prevention and control of dust explosions, developed for the Health and Safety Executive (HSE). The approach we have adopted is deliberately conservative: we have aimed to see how we can take guidance in the design of the system from the standard Vienna Development Method rather than inventing new language constructs or new proof obligations. One advantage of this is that we can continue to use the tools that are available for supporting the standard language.
A Methodology for Safety Case Development
A safety case is a requirement in many safety standards for computer systems and it is important that an adequate safety case is produced. In regulated industries such as the nuclear industry, the need to demonstrate safety to a regulator can be a major commercial risk. This paper outlines a safety case methodology that seeks to minimise safety risks and commercial risks by constructing a demonstrable safety case. The safety case ideas presented here were initially developed in a European and UK research programmes and have subsequently been applied in industry. To implement the safety case we advocate the integration of safety case development into the design process so that the costs and risks of the associated safety case can be included in the design trade-offs. We propose a layered structure for the safety case that allows the safety case to evolve over time and helps to establish the safety requirements at each level. For large projects with sub-contractors, this "top-down" safety case approach helps to identify the subsystem requirements and the subsystem safety case can be made an explicit contractual requirement to be delivered by the sub-contractor.
Using Reversible Computing to Achieve Fail-safety
This paper describes a fail-safe design approach that can be used to achieve a high level of fail-safety with conventional computing equipment which may contain design flaws. The method is based on the well-established concept of "reversible computing". Conventional programs destroy information and hence cannot be reversed. However it is easy to define a virtual machine that preserves sufficient intermediate information to permit reversal. Any program implemented on this virtual machine is inherently reversible. The integrity of a calculation can therefore be checked by reversing back from the output values and checking for the equivalence of intermediate values and original input values. By using different machine instructions on the forward and reverse paths, errors in any single instruction execution can be revealed. Random corruptions in data values are also detected. An assessment of the performance of the reversible computer design for a simple reactor trip application indicates that it runs about ten times slower than a conventional software implementation and requires about 20 kilobytes of additional storage. The trials also show a fail-safe bias of better than 99.998% for random data corruptions, and it is argued that failures due to systematic flaws could achieve similar levels of fail-safe bias. Potential extensions and applications of the technique are discussed.
Viewpoints on Improving the Standards Making Process: Document Factory or Consensus Management?
Emerging standards and guidelines need to be timely and reflect the requirements of the industrial sector they are designed to support. However, often, the delay between the identification of a need for a standard and its eventual release is too long. There is a need for increased understanding of the sources of delay and deadlock within the standards process. In this paper we describe an application of PERE (Process Evaluation in Requirements Engineering) to the standards process. PERE provides an integrated process analysis that identifies improvement opportunities by considering process weaknesses and protections from both mechanistic and human factors viewpoints. The resulting analysis identified both classical resource allocation problems and also specific problems concerning the construction and management of consensus within a typical standards making body. A number of process improvement opportunities are identified that could be implemented to improve the standards process. We conclude that consensus problems are the real barrier to timely standards production. Ironically the present trend for more distributed working and electronic support (via email etc.) may make the document factory aspect of standards production more efficient at the expense of consensus building.
PERE: Evaluation and Improvement of Dependable Processes
In the development of systems that have to be dependable, weaknesses in the requirements engineering (RE) process are highly undesirable. Such weaknesses may either introduce undetected system weaknesses, or otherwise significant costs may arise in their correction later in the development process. Typically, the RE process contains a number of individual and group activities and thus is particularly subject to weaknesses arising from human factors. Our work has concerned the development of PERE (Process Evaluation in Requirements Engineering), which is a structured method for analysing processes for weaknesses and proposing process improvements against them. PERE combines two complementary viewpoints within its process evaluation approach. Firstly, a classical engineering analysis is used for process modelling and generic process weakness identification. This initial analysis is fed into the second analysis phase, in which those process components that are primarily composed of human activity, their interconnections and organisational context are subject to a systematic human factors analysis. In this paper we briefly describe PERE and provide examples of the application experience to date.
A Conservative Theory for Long-Term Reliability Growth Prediction
While existing reliability growth theories employ a wide range of underlying models, the basic strategy is the same: to extrapolate future reliability from past failures. This approach works reasonably successfully over the short term but lacks predictive power over the long term (i.e. for usage times which are orders of magnitude greater than the current usage time). This paper describes a different approach to reliability growth modelling which should enable conservative long term predictions to be made. Using relatively standard assumptions it is shown that the expected value of the failure rate after a usage time T has an upper bound of N/eT where N is the initial number of faults and e is the exponential constant. This is conservative since it places a worst case bound on the reliability rather than making a best estimate. It is shown that less pessimistic results can be obtained if additional assumptions are made about the distribution of failure rates over the N faults. We also show that the predictions might be relatively insensitive to assumption violations over the longer term. The theory offers the potential for making long term software reliability growth predictions based solely on prior estimates of the number of residual faults (e.g. using the program size and other software development metrics). Some empirical evaluations of the theory have been made using a range of industrial and experimental reliability data and the results appear to agree with the predicted bound.
Data Reification Without Explicit Abstraction Functions
Data reification in VDM normally involves the explicit positing of an abstraction function with certain properties. However, the condition for one definition to reify another only requires that a function with such properties should exist. This suggests that it may be possible to carry through a data reification without giving an explicit definition of the abstraction function at all. This paper explores this possibility and compares it with the more conventional approach.
The SHIP Safety Case
This paper presents a safety case approach to the justification of safety-related systems. It combines methods used for handling software design faults with approaches used for hazardous plant. The general structure of the safety argument is presented together with the underlying models for system failure that can be used as the basis for quantified reliability estimates. The approach is illustrated using plant and computer based examples.
The SHIP Safety Case - A Combination of System and Software Methods
Software Fault Tolerance by Design Diversity
N-version programming is vulnerable to common faults. It was thought that the primary source of common faults arose from ambiguities and omissions in the specification but the Knight and Leveson experiment showing that failure independence of design faults cannot be assumed. This result is backed up by later experiments and qualitative evidence from other experiments. In addition an "error masking" mechanism that will cause failure dependency in almost all programs. This catalogue of problems may paint too gloomy a picture of the potential for N-version programming, because: back-to-back testing can certainly help to eliminate design faults, and failure dependency only arise if a majority of versions are faulty. For small applications developed with good quality controls, the probability of having multiple design faults can be quite low so N-version programming can be a useful safeguard against residual design faults.
Stepwise Development and Verification of a Boiler System Specification
In attempting to demonstrate the safety of the Generic Boiler System, two main problems are faced. First, there are a wide range of possible failures that can occur. For example, the physical devices themselves can fail, sensors can fail, and sensed values can be delayed or lost in transmission. Taking careful account of all possible failures is difficult. A second problem, common to all safety-critical systems, is that absolute safety cannot be shown. One can only hope to demonstrate partial or probable safety. However, estimates of the probability of safety are hard to calculate, and it is hard to know whether one can place much confidence in them. The approach demonstrated here addresses both of these issues. Our report has two parts. In Part I, the technique of step-wise elaboration of the boiler controller is demonstrated. In Part II, verification of safety and failure properties is shown for a boiler system model developed at a late step of elaboration.
The Variation of Software Survival Times for Different Operational Input Profiles
This paper provides experimental and theoretical evidence for the existence of contiguous failure regions in the program input space ("blob" defects). For real-time systems where successive input values tend to be similar, blob defects can have a major impact on the software survival time because the failure probability is not constant. For example, with a "random walk" input sequence, the probability of failure decreases as the time from the last failure increases. It is shown that the key factors affecting the survival time are the input "trajectory", the rate of change of the input values and the "surface area" of the defect (rather than its volume). It is shown that large defects can exhibit very long mean times to failure when the rate of change of input values is decreased.