Optimisation of inspection policy for multi-line production systems

: This paper develops a simulation model to determine the cost-optimum inspection policy for a multi-line production system taking account of simultaneous downtime. The machines in the multi-line system are subject to a two stage failure process that is modelled using the delay-time concept. Our study indicates that: consecutive inspection of lines with priority for failure repair is cost-optimal, with a cost reduction of 61% compared to a ‘run -to- failure’ policy; and maintainers need to be responsive to operational requirements. Our ideas are developed in the context of a case study of a plant with three parallel lines, one of which is on cold-standby.


Introduction
Many studies highlight the importance of maintenance within the production context.In an early paper, Geraerds (1978) argued that The Netherlands spent 14% of GDP on maintenance activities, 34% of which was associated with expenditure for industrial plant.More recently, Komonen (2002) reported that in Finland maintenance cost is typically 5.5% of the turnover of a company, but could be as much as 25%.Generally, organisations have become increasingly aware that proper maintenance of their production facilities is a vital part of their everyday business (Cholasuke et al., 2004).Alsyouf (2009) observes that the maintenance function contributes a potential improvement of 14% in return on investment, and in a later work Alsyouf et al. (2016) show that good maintenance planning can reduce maintenance costs significantly.There is, therefore, a great deal of financial interest to optimise maintenance operations and thus reduce the effect of plant downtime by identifying and removing defects (faults) before they cause plant to fail.
However, studies show that little research is directed towards the realistic scenario of optimising maintenance for a system composed of several machines and the focus is instead on optimising a single machine without considering the production configuration (Van Horenbeek et al., 2013).This view is supported by considering the many review papers that address the optimisation of preventive maintenance (e.g.Wang, 2002;Nicolai and Dekker, 2008;Das and Sarmah, 2010;Ding and Kamaruddin, 2015;Olde Keizer et al., 2017).Here all models that have closed-form solutions relate to single-line production facilities.Furthermore, most, if not all these models, are based on assumptions that simplify real life situations and make them less practical.In practical situations, simplifying assumptions are undesirable but necessary to some extent for the convergence, as far as possible, of theory and practice.To relax or eliminate some assumptions from these models, one would require more detailed modelling, which may not provide a closed-form solution.Note, by "closed-form" solution, we strictly mean that the optimisation criterion is expressed in closed-form.
In this paper, we determine the cost-optimum inspection policy for a multi-line production system taking account of simultaneous downtime.In general, we suppose simultaneous downtime occurs when more than one line is down simultaneously, arising when lines are inspected concurrently or when failures of lines coincide with either inspections or failures of other lines.We study a two-out-of-three line system, so that simultaneous downtime occurs when at least two lines are down simultaneously, whence production ceases entirely.Our review of the literature above indicates that a closed-form solution to the optimisation problem we pose is not available.
Simulation has the potential to address the increasingly difficult and dynamic nature of optimisation problems in manufacturing in general (Villarreal-Marroquín et al., 2013) and maintenance in particular.Alrabghi and Tiwari (2015) surveyed the literature and reported the state-of-the-art in simulation-based optimisation of preventive maintenance research, with 59 research articles since the year 2000.Discrete event simulation is the most reported technique for modelling maintenance systems.Specialised simulation software provides several advantages over general-purpose programs such as rapid modelling, animation, automatically collected performance measures and statistical analysis (Banks et al., 2010).Boschian et al. (2009) discuss the difficulty of obtaining closed-form solutions to maintenance optimisation problems for a 'small size' production system (two machines working in parallel) that is prone to random failures and undergoes preventive and/or corrective maintenance.'To get round this complexity' they also chose an approach based on simulation.
For single-line systems, a number of models that aim to optimise an inspection interval have been proposed and tested.Early models include those due to, for example, Abdel-Hameed (1995).More recently, authors have integrated production quality into the inspection problem (e.g.Lu et al., 2016) and considered preventive maintenance planning in job shop scheduling (e.g.Thörnblad et al., 2015).In our paper we use the delay-time concept, first applied to the maintenance of industrial equipment by Christer and Waller (1984), and later developed further by many (e.g.Van Oosterom et al., 2014;Flage, 2014;Chellappachetty and Raju, 2015;Jiang, 2017).Related case studies include those due to Jones et al. (2010) and Zhao et al. (2015).The delay-time concept has the advantage that it explicitly models the relationship between plant failures and the inspection interval.The latest review of the recent advances in delay-time-based maintenance modelling including industrial applications is Wang (2012).
Based on the literature review undertaken, we make the following contribution in this paper: we model for the first time the inspection of multi-line production facilities using the delay-time concept.In so doing, we attempt to bring the theory of delay-time modelling closer to practice by solving a realistic industrial problem for which a closedform solution is not available.This absence of a closed-form solution is a consequence of simultaneous downtime in the multi-line parallel system that we study.While we study a two-out-of-three line system, the solution methodology can be generalized to a k-out-of-n line system.The importance of our work lies in its implications for the design of preventive maintenance for multi-line production systems and the contribution that good maintenance can make to economic performance.
Our paper is structured as follows.Section 2 describes the delay-time concept, our modelling methodology, and introduces models of multi-line production systems with focus upon how downtime affects production.In section 3, a case study is described and the focus is solely on the development of our simulation models and analysing the results of several alternative policy scenarios, beginning with a single-line packing facility, and developing to several model extensions for multi-line production systems.In the final section, detailed conclusions are drawn and their implications are discussed.

Delay-time model development
The delay-time model describes the evolution of defects in industrial equipment in two separate, but linked stages (Figure 1).The first stage is the time lapse from new (or as new) until a defect (fault) arises.This is the time-to-defect arrival, U. Equivalently, it is the sojourn in the good state.The second stage is the time lapse from defect arrival to the point at which this defect causes the equipment to fail.This is the delay-time, H. Equivalently, it is the sojourn in the defective state.This second stage is the window of opportunity for inspection to identify and repair defects before they can cause a failure of operational function (Figure 2), and more frequent inspection implies fewer failures per unit time.Thus the delay-time concept (Christer, 1999) conceives of: (i) a defective state that precedes failure; and (ii) that the defective state is identifiable by inspection.The 'change point' from the good state to the defective state occurs at a random time, failure occurs some random time later, and the time of transition from the good to defective state is unobservable.Nonetheless, using failure times and counting instances of defects found at inspection, the distributions of time of defect arrival and delay-time may be estimated (Baker and Wang, 1991).In our study, we assume:  A complex-system delay-time model (Wang, 2008), illustrated in Figure 3, with the associated notation given in section 2.1.Here, multiple concurrent defects are possible;  Failures are repaired immediately but not instantaneously;  Failure repair takes   time units at a cost of   per unit time;  Inspections are carried out every  time units;  Each inspection takes   time units again at a cost of   per unit time, where   <   < ;  All defects identified at inspection are repaired during the inspection, I;  During the failure stoppage F, the system is returned to the operational state, but any defects present are not removed;  During inspection and failure stoppage (repair), plant components are assumed to be in a state of suspension, so that the system is then not ageing and defects are not 'growing', and thus defects and failures can only arise when plant is operating;  The system has operated sufficiently long to be in a steady state condition.These assumptions represent an inspection problem in the class reported by Christer (1999), who, under these assumptions, gives the expected number of failure breakdowns over the interval (0, ),   () , and the expected downtime per unit time, ().Provided that   () and   (ℎ) can be estimated, either through the consideration of data or subjective, expert opinion or both, () and () equivalently can be calculated, and then the  that minimises () or () can be determined.It is this optimisation step that links the inspection frequency to the defect arrival and failure rates, and the cost and downtime parameters.
In many practical industrial situations, like ours with multiple lines, it is difficult to apply these models.For example, for a production system consisting of a two-out-ofthree set-up with an inventory buffer (storage) facility, the mathematical analysis is very difficult.Thus we consider a different approach.

Modelling multi-line production systems
The main objective of our research is to determine the optimum inspection policy for a multi-line production system.To do so, we further assume that:  if production stops then the downtime accrued costs   per unit time, with   >>   , where this downtime is defined as the duration of a stoppage to the downstream and/or the upstream processes (Figure 4), occurring only when the individual stoppages coincide (period of length z in Figure 4).In this way, production downtime accrues when the lines are down simultaneously, and it is precisely this simultaneous downtime in a multi-line system that makes it hard to obtain closed-form expressions for decision criteria in the delay-time model.

Figure 4
Plant downtime in a simple multi-line production system, indicating downtime for L1 of duration x, downtime of L2 that is concurrent with L1 of duration y, and complete system downtime of duration z.
In other situations, upstream and downstream downtime may have different consequences and upstream and/or downstream inventory buffers may exist.For example, in a production system with a two-out-of-three line set up (see e.g.Smith andDekker, 1997, or De Smidt-Destombes et al., 2007, for a general discussion of k-of-out-n systems) and one line used as standby (Figure 5), the definition of downtime should depend on the way the management operates the facility.
There are two principal ways in which inspection can be performed for the system in Figure 5, namely, simultaneous (concurrent) inspection of all parallel lines, or consecutive inspection, inspecting each in sequence.If inspection is performed simultaneously, assuming that the required resources (spares, personnel) are available, then the inspection time itself is downtime (similar to a single-line scenario), and the long run cost per unit time (cost-rate) for the realisation shown in Figure 6 since  ′ <   .In practice, it may be possible to reduce the cost of downtime further by modifying the policy so that if a failure occurs while another line is being inspected, the inspection is suspended until the failed line becomes operational.Then for the realisation shown in Figure 6(c) for example the cost-rate is:  2008) studied a chocolate cake manufacturing plant with production downtime issues on its packing lines.They determined a closed-form expression for the cost-rate to optimise the inspection interval for a single-line packing system.In practice, the 'existence' of a defect may be identifiable by some operational "signal", such as, excess heat or vibration, and in this particular case study, a defect was observable as the presence of significant chocolate contamination on the production line.This was the direct cause of several major failure modes on the packing lines, so that regular inspection (and removal of chocolate contamination if required) was considered to be of value.Figure 5 is a schematic representation of the real production process at this plant, in which the upstream process bakes cakes and the two-out-of-three system packs them.A stoppage of the upstream baking process is considered as downtime (as it leads to lost revenue) whilst a stoppage of one of the packing lines is seen as lost time (as the packing process can still continue).Under normal production conditions, baked cakes are packed on lines 1 and 3, the inventory buffer is empty, and line 2 is on cold-standby.If there is a stoppage to either line 1 or 3, line 2 (the standby) is started and the cakes are routed through the inventory buffer to this line.The inventory buffer storage area is designed to provide sufficient capacity to start line 2 without having to stop upstream production.When normal production is resumed after a stoppage, there is sufficient capacity in lines 1 and 3 to empty the inventory buffer storage.Production downtime accrues from the instant the inventory buffer is full and two lines are down.
Although the real system has three parallel lines, to obtain a closed-form solution to the delay-time model, Akbarov et al. (2008) consider this system as a single-line packing facility.We are interested to know what are the implications (both for maintenance management of the plant and for modelling of the system) of this limiting assumption.We began by simulating the single-line proposed by Akbarov et al. (2008) (section 3.3) as a complex-system delay-time model of multiple components.In doing so, we ensured that our base model was validated against known results, as did Boschian et al. (2009) for their case study, and not simply based on an arbitrary situation.With this impetus we simulated the real practical situation for which a closed-form solution is not available (section 3.4).This scenario, which is precisely the system operated by the company's management, we call a modified two-out-of-three parallel line system.Thirdly, a further simulation model was developed for a standard two-out-of-three packing parallel system, in which any two lines are operational at any one time.Although the company did not operate the packing facility in this way, the development of such a model was useful for comparison purposes (section 3.5).Finally, the scenario in which all three parallel packing lines are operated concurrently was also simulated (section 3.6).
The data for our base model were taken from Akbarov et al. (2008).Defect arrivals were described by the exponential distribution   (u) = 1 − exp(− ) with rate  = 3 per day; delay-times were described by the Weibull distribution   (h) = 1 − exp(−( ℎ/)  ) with  = 6.27 and  = 0.193 days, implying a mean delay-time of 4.3 hours and a standard deviation of 0.8 hours.Many previous studies have proposed, in detail, ways to select and estimate these parameters in practice (see, for example, Wang, 2008).The duration of a stoppage of a line due to failure was   = 10 minutes; and due to inspection was   = 2 minutes.Both defect removal and failure repair in practice corresponded to removal of chocolate contamination, the former being carried out preventively, the latter correctively.The cost-rates of "inspection" and "repair" were thus assumed equal, and assigned as   = £30 per hour.The production downtime cost-rate was   = £1,000 per hour, based on the value of product output per unit time.The cost of a single failure event is then     plus the cost of production downtime (if any) resulting from the failure.

Simulation modelling
ProModel (ProModel, 2016), a process-based discrete-event simulation language (see e.g.Harrell et al., 2011), was used for developing the base model and the various model extensions.The models were developed in three stages.

Stage 1: Overall model-framework construction with minimum system requirements
The development of any model using this programming environment requires, at least, the use of the paradigm 'LEAP'; Locations, Entities, Arrivals, and Processing.'Locations', which may be single or multiple capacity, are generally fixed positions in the system, where Entities wait to be processed, such as, machines, queues, or storage areas (buffers).'Entities' are the objects that enter into, flow through and depart from the system as complete objects, such as parts, or even defects and failures.'Arrivals' describe the precise pattern: timing; quantity; frequency; and location of Entities (defects, failures) entering into the system.And finally, 'Processing' defines the exact route that an Entity follows, from entering into, to leaving the system.This includes any activity that happens at a Location such as the required operations that need to be performed, the amount of time an Entity spends at a Location, and the Resources it needs to complete Processing.Although, the most simple model in this environment needs to have 'LEAP' described, any further sophistication needed almost certainly will require the use of other 'modules' and/or development of special programming routines.

Stage 2: Detailed programming of the maintenance strategy
The arrival time of defects (faults) and their evolution into failures over the delay-time period are generated and scheduled based on their respective distribution functions.The maintenance strategies are programmed by scheduling inspection intervals or failures occurrences, whichever occurs first, at which time the production of line L  is interrupted by the downtime process and is terminated after   or   periods respectively.All the relevant costs, system variables and attributes are constantly updated to determine the expected cost per unit time.

Stage 3: Development of model scenarios, input data, output analysis, and optimisation
The developed simulation models are non-terminating and the unit of time is days.Macros were set up to be able to instantly change input data, such as,   ,   ,   ,   ,   (ℎ),   (), , , simulation time, warm-up period and number of replications.The continuous onscreen data for each model replication includes updating inspection duration, failure duration, downtime duration, total expected cost per unit time, number of defects present, number of defects removed, number of failure occurrences, and number of inspections taken place.The simulation output report includes various data and graphs, including the total expected cost per unit time and the total expected downtime per unit time.The models were 'warmed-up' to a steady-state before experimentations could begin, with a suitable warm-up period determined using Welch's graphical procedure (e.g.see Banks et al., 2010).To achieve steady-state in the output results, each of the experiments were continued with a run length of 1,000 days and results from the first 10 days (warm-up period) were excluded to eliminate the transient components of the results, thus achieving steady-state.Experimentations were conducted for 30 replications to achieve sufficient narrow 95% confidence intervals in the output data.The models were run through various simulation scenarios with different values of the inspection interval .Finally, the optimum value of  was determined.
Figure 7 shows the flow chart of the base model, developed for the first simulation representing a single-line packing facility.The graphical representation refers to eight different processing routings (modelling routines) which were developed for different aspects of the model conceptualisation.Table 1 displays a sample ProModel code written for the failure occurrence routine (see the flowchart), the time between a defect arising and the subsequent failure.Further details regarding the precise structure and programming content of our simulation models can be made available, upon request, from the corresponding author.

Base model (validation)
Figure 8(a) compares the results obtained from our simulation model with that of the closed-form solution in the Akbarov et al. (2008) study.It shows downtime per day, in minutes, against inspection interval, in hours.The results are clearly very close.Akbarov et al. (2008) recommended the same optimal inspection interval of 4 hours, with an expected production downtime of 12.3 minutes per day against a simulated downtime of 13.2 minutes per day.Two factors may have contributed towards the difference of 6.7% between the results of the two approaches: i) the suspension of aging of plant components during inspection and failure; ii) the possible overlap time of inspection and failure processes, both of which were included in the development of our simulation model and ignored in the previous study.With the defect arrival rate of 3 per day and the duration of a stoppage of a line due to failure of 10 minutes, there will be an expected downtime of 30 minutes/day when no inspection is carried out.The results suggest that regular inspection can reduce production downtime by 57%, with the number of failures per day reduced from 3 to almost zero (Figure 8(b)).The fact that the optimum inspection interval corresponds closely with the mean delay-time is not surprising given the relatively small delay-time standard deviation (since  = 6.27).

Modified two-out-of-three parallel system
If a policy of simultaneous inspection were to be followed for maintaining the modified two-out-of-three parallel line facility, then the simulation results shown in Figure 9(a) suggest that there is no optimal inspection interval; run-to-failure is then optimal.Here essentially the cost of lost production due to the stoppage of the upstream process during simultaneous inspection outweighs the cost of stoppages due to failure.In contrast, under a consecutive-inspection policy, there is no planned downtime.There are, however, occasions when downtime may occur: i) at least one failure and one inspection process occurring concurrently; ii) two or more simultaneous failures.Furthermore, under a consecutive-inspection policy prioritising failure-repair, if a failure occurs while the inspection of another line is taking place, the inspection operation is stopped and then restarted once the failed line becomes operational.
The simulation results for these consecutive policies are also shown in Figure 9(a), but are shown again in Figure 9(b) to resolve the cost-rates for the policies of interest.Figure 9(b) suggests the optimal inspection interval is every 4 hours for consecutive inspection and 5 hours if failure repair is prioritised.The advantage of following the latter policy is less frequent inspections and a cost-rate reduction of 8.3%.In Figure 9(a), we can see that as T increases the cost-rates converge fairly quickly (as expected since the delay-time variance is small).Finally, the cost-rate reduction for the best policy relative to a run-to-failure policy is of the order of 60%.

Standard two-out-of-three parallel system
A standard two-out-of-three parallel line configuration was also investigated in order to compare results with the modified two-out-of-three parallel line system discussed above.For such a system, any two parallel lines would be operational at any one time, so that all packing lines would be equally utilised in the long run.A failed line would be repaired and ready to use at the next line failure.For the consecutive-inspection policy, the costrate appears to be either equal or higher than that for the modified parallel system (Figure 10(a)).Similarly, for the consecutive-inspection-prioritising-failure-repair policy, the cost-rate appears to be equal or slightly higher than that for the modified parallel system.This is due to all three lines having been utilised more uniformly and hence causing more simultaneous failures.For this system, the optimal interval remains the same, at 4 and 5 hours, for the consecutive-inspection policy and the consecutive-inspection-prioritising failure-repair policy, respectively.However, there will be 1.6% and 0.6% increases in the cost-rate for these policies when compared to the modified two-out-of-three mode of operation.

Three-parallel lines system
The final modelling scenario considered the system with three parallel lines.Although there cannot be any direct comparison between this and the previous two systems, looking at the related results alongside the two-out-of-three parallel systems is useful in case production needs to be increased.For the three-line system, downtime will necessarily be greater than under both two-out-of-three systems (Figure 10(b)) because there is more chance of at least two failures occurring simultaneously.However, production output will also be higher.As discussed before, the most sensible policy applicable in practice will be consecutive inspection prioritising failure repair.Figure 10(c) compares the costs for this policy under all three systems.The downtime-rate is higher if all three lines are operated at the same time.

Sensitivity analysis
We investigated the sensitivity of the consecutive-inspection-prioritising-failure-repair policy for the principal mode of operation of interest (modified two-out-of-three system) to parameter values.Figure 11(a) shows the sensitivity to inspection duration,   .The behaviour is as expected here, with the cost-rate of the optimum policy for 0.5  and 2  at respectively 54% and 173% of the baseline.Sensitivity to variation in the failure stoppage duration (Figure 11(b)) shows a somewhat different pattern (optimum cost-rate is 84% and 110% of the original cost for 0.5  and 2  , respectively).Varying   has the greatest effect when inspection is infrequent; varying   has the greatest effect when inspection is frequent, again as we would expect, since failure stoppage duration dominates when inspection is infrequent, and downtime due to inspection duration dominates when inspection is frequent.Sensitivity to the rate of arrival of defects, Figure 11(c), shows anticipated effects although the doubling of the failure rate is not sufficient to increase the optimum inspection frequency.

Conclusions
Almost all previous delay-time inspection models in the literature are concerned with single-line single-component systems or series systems with multiple components, with restrictions.This paper uses simulation to determine optimal inspection policy for a number of multi-line production facility scenarios using the delay-time concept.In the first scenario, a single-line facility is simulated to validate an earlier closed-form solution.
In the second, a modified two-out-of-three parallel system is analysed to help address the issue of plant downtime under the actual operating conditions in the case study.Two further model extensions are developed and analysed to consider whether modifications to either the operation of the system or the design of the system in the case study would be worthwhile.The latter three models extend the study by Akbarov et al. (2008), in which the multi-line production facility is modelled as if it is a single line.Indeed, in their survey, Alrabghi and Tiwari (2015) found that studies that dominate the literature, such as cases of single machines producing single products, are oversimplified and do not reflect the interactions of real systems in practice.
We find that: 1) our simulation of the single-line system reproduces earlier results (see e.g.Boylan, 2016, for a discussion of reproducibility); 2) consecutive inspection with prioritised failure repair lowers the cost-rate (by 8.3%) and reduces the frequency of inspections (by 20%) compared to consecutive inspection; 3) the standard two-out-ofthree design configuration lowers the cost-rate by 1.6% and 0.6% for the consecutiveinspection and consecutive-inspection-prioritising-failure-repair policies, respectively, compared to the modified two-out-of-three configuration operated by the management; and 4) not surprisingly, the three parallel-line design configuration increases the frequency of inspections (by 25%) and increases the cost-rate (by 5.2%) for the consecutive-inspection-prioritising-failure-repair policy.This is clearly due to the third line being used permanently and not as a standby, which would naturally increase the number of inspections and the possibility of further failure occurrences.However, it should be noted that the production throughput would increase as well, increasing revenue.
The solution proposed in this paper may seem rather 'obvious' as it recommends the consecutive inspection policy with priority given to failure repair for the maintenance management of multi-line production systems.However, the implications for this case study are substantial as the policy proposition suggests a cost reduction of 61.3% compared to the 'run-to-failure' policy.Furthermore, we contend that the scenarios and policies we study have economic and engineering implications for the management of production lines and maintenance planning and execution therein.There is also scope to extend the simulation to analyse the wider implications of maintenance planning (Ding et al., 2014;Zahedi-Hosseini et al., 2017).
The simulation model for this paper has been specifically developed to address the optimisation of the inspection interval for a very specific two-out-of-three parallel production system with an inventory buffer.It takes 10 minutes to simulate 30,000 system-days on a standard desktop PC, and the model is easily scalable.

Figure 2
Figure 2 The effect of inspection on failure development: (a) no inspection; b) infrequent inspection; c) frequent inspection.

Figure 3
Figure 3 Defect arrivals (○), failures (•), failure repair F, and inspection I in our complex-system delay-time model of multiple components.

Figure 5
Figure5A multi-line production system with a two-out-of-three line set-up and inventory buffer.

Figure 6
Figure 6 Policy schematic for two-out-of-three line system: (a) simultaneous inspection; (b) consecutive inspection; (c) consecutive inspection prioritising failure repair.

Figure 7
Figure7The base model for the single-line packing facility showing eight programming algorithms.

Table 1
Sample ProModel Code for the failure occurrence routine.