Risk Assessment of Railway Switch and Crossing Failures: Case Study of an Urban Rail Transit in Thailand

Railways are becoming the main transportation mode in Thailand. Thus, maintaining the optimal service level to passengers is one of the significant issues that need to be addressed. This makes the way to manage railway assets extremely important, especially with the essential assets on a railway network such as switches and crossings (S&C). This study presents how the risk of switch and crossing failures could be assessed and managed. The process of the risk management is based on ISO 31000, and the method used to analyse the risk is the Failure Modes, Effects and Criticality Analysis (FMECA). In detail, the switches and crossings on the Airport Rail Link city line (ARL), Thailand, are considered as a case study. The interview data from six experts, engineers and technicians who are responsible for the maintenance of S&C were used to determine the risk of each S&C component. The results show that the point machines, check and wing rails are the most critical components of S&C in the case study, and the least critical components are heel blocks, slide chairs and closure rails. Based on these findings, it is suggested that the current S&C maintenance program of the ARL may need to be improved. The priority of the inspection and maintenance activities of each S&C component could be adjusted according to the risk evaluation results presented in this study.


Introduction
Railways have a major influence on the social development and the economic growth of the country. The benefits of railway systems are not only about the travel time saving, but it can also help reduce air pollution, the number of road accidents and social inequality. These are the reasons why recently there are many railway construction projects happening in Thailand. When these projects are completed, Thailand will become the centre of the railway network in Southeast Asia [1].
Although there are many advantages to have a substantial railway network, railway assets, such as trains, tracks and signalling systems are required to be maintained and managed properly in order to obtain the optimal service level. This is because if one of these assets fail. It could lead to a reduction in system performance such as delays and cancellations affecting over a widespread area [2]. Therefore, an effective asset management strategy is required for Thai railway network. One of the essential assets of a railway network would be switches and crossings [3]. These assets are normally located at the conflict points in a network because their main function is to change the direction of trains from one track to another. Hence, to achieve the maximum safety and service reliability level, these assets should be always at a good condition. The inspection and maintenance strategies of these assets are needed to be planned efficiently.
This study aims to manage the risk of switch and crossing failures. The process of the risk management is dependent on ISO 31000. The Failure Modes, Effects and Criticality Analysis (FMECA) method is applied. Switches and crossings on the Airport Rail Link city line (ARL), Thailand are considered as a case study. This enables the current maintenance program of the ARL to be evaluated, and the results obtained can also be used to create a proper preventive maintenance strategy for switches and crossings on the ARL.
The paper is organised into six sections, Section 2 explains the functions of railway switch and crossing components. Section 3 summarises the risk management process based on ISO 31000. Section 4 and 5 describe the methodology and the application. Finally, Section 6 concludes the paper.

Railway Switches and Crossings
Switches and crossings (S&C), or sometime called turnouts, are at a junction where two or more railway tracks diverge or converge. The primary function of S&C is to enable trains to change routes [4]. Thus, its condition is directly related to the performance of railway operations. Twelve main components of S&C considered in this study are depicted in Figure 1. Point machines are devices used to operate S&C remotely. Normally, they are driven by either an electric, hydraulic or pneumatic system and have three main tasks: moving, locking and ensuring that switch rails are in the correct and safe position for trains to travel across a junction. Signal/electrical wires are basically connected between control systems and point machines. Their primary function is to transfer signal or electrical power to drive a point machine. The stretcher bar is directly linked between a point machine and switch rails. Its main function is not only used to transfer either pulling or pushing lateral force from point machine to switch rails, but it is also applied to maintain a distance between a coupled pair of switch rails in a designed position. Consequently, S&C can have more than one stretcher bar. This will be dependent on the length of S&C. Switch rails are always in a pair of tapering rails. They are moveable and used to guide the train wheels to the desired directions (i.e., straight or diverge). Plates or slide chairs are the support of switch rails. They are basically fixed to sleepers, same as fastening devices, but they still allow the switch rails to move at a limited distance. Stock rails are standard running rails fixed on the sleepers. They are normally bent and installed alongside switch rails in order to provide a lateral support when switch rails are in the closed position. Closure rails are considered as the static parts of S&C that continue from switch rails until the crossing nose. Heel blocks provide a pivot point for switch rails to move laterally. In addition, they are also used to maintain the distance apart between switch rails, closure rails and stock rails for the trains to run safely on switches. Check rails or wing rails are the short sections of rails installed parallel to stock rails at the crossing nose. The main function of these pieces of rails is to ensure that the train wheels are still on running rails when passing through the crossing nose. Crossing nose refers to a crossing of two rails at the back of S&C. Generally, the crossing is in V-shape and considered as a static component. However, for high-speed lines, the crossing can be moveable in order to reduce the impact force generated by the wheels. This is sometime called a swing-nose crossing. Fastenings or Fixings are devices used to fix rails to sleepers. Finally, sleepers or bearers help maintain track gauge and spread load from rails to foundation structures such as ballasts, subgrade and natural ground.

Risk Management Process
Nowadays, ISO 55001: Asset management is widely used in railway industry in order to manage the assets of the rail organisation. However, in term of risk management, ISO 55001 still explicitly refers to the ISO standard on risk management (ISO 31000) [5]. This international standard provides principles and guidelines for managing risk. In addition, it is also very flexible and can be applied to different types of risks at any organisations [6]. The framework of the risk management process in ISO 31000 can be seen in Figure 2 and summarised as follows [7].

Scope, Context and Criteria
The first step of the process is to establish the context about the risk being focused. This step involves defining the objectives of the risk management activities, considering external (e.g., markets and stakeholders) and internal factors (e.g., regulations and organisation's culture) and specifying the evaluation criteria that will be used to decide whether the risk should be taken into account.

Risk Assessment
After setting the scope of the risk management, the next step is to assess the risk. This step includes three main sub-steps which are: risk identification, risk analysis and risk evaluation. These sub-steps can be briefly described as follows:  Risk identification is the process to find the events, scenarios or any changes that might occur by any causes and affect the risk management objectives.  Risk analysis refers to the process to estimate risk based on the scenarios found in the risk identification process. This can generally be obtained by using the probability of the occurrence of the scenarios and their impact if the scenarios happen.  Risk evaluation compares the results from the risk analysis step to the criteria set in first process in order to determine whether the risk is acceptable or not.

Risk Treatment
This step proposes treatment options to counter the unacceptable risks obtained from the previous step. The treatment options can be related to the ways to reduce either the likelihood, the consequences of the risky events or both.

Communication and Consultation
Communication and consultation are the step to receive information or comments from stakeholders. This task can generally be done in every activity in order to create a proper risk management plan for the organisation.

Monitoring and Review
This step monitors and evaluates the effectiveness of the risk management strategies after implementation based on the key performance indicators set in the objectives. Normally, it is done periodically to ensure that the risk management strategies are applicable or needed improvement.

Recording and Reporting
The last step is to report the outcomes of the risk management process. The contents of the report could include, for example, the requirement of stakeholders, objectives, decision making process, risk management strategies and their outcomes, cost, monitoring results and so on. The framework of the risk management process in ISO 31000 [7] As explained, risk management (ISO 31000) is a key standard that has an explicit process to reduce and mitigate risk in any systems. This is the reason why it is applied in this study to manage the risk of railway switch and crossing failures. In addition, this study uses the Failure Modes, Effects and Critical Analysis (FMECA) in the risk assessment step. This is because various researchers have proved that this method is effective and can be used to assess the risk of complex systems. For example, Srijuntra [8] applied FMECA to develop a module for condition-based maintenance of rolling stock compressors. Cicek et al. [9] presented the risk-based preventive maintenance planning evaluation for marine engine systems using the FMEA approach. Kim and Jeong [10] proposed the preventive maintenance tasks and the logic of maintenance decisions of the railway brake system using an FMACA. Dinmohammadi et al. [11] also evaluated the potential risks of railway rolling stock, a case study of passenger door systems, using this technique. The recent research by Szkoda and Satora [12] demonstrated the use of the failure modes and effects analysis (FMEA) to improve the maintenance program of railway vehicles (the 6Dg type shunting locomotive). Last but not least, Catelani et al. [13] also applied FMECA method to assess the failure risk of heating, ventilation and air conditioning system on trains. As can be seen, the FMECA method is applied around the world in previous studies to evaluate the risk and improve the maintenance program of many systems. However, studies related to railway switch and crossing failures in Thailand are lacking. Therefore, this study presents the application of this method to assess the risk and develop a proper maintenance strategy of railway switch and crossing (a case study) in Thailand.

Methodology
Failure Modes, Effects and Critical Analysis (FMECA) is the method applied to assess the risk of S&C failures in this study. This method can be described into 5 main steps as follows.

Understanding the Function of S&C Components
The first step of the method is to understand and collect the function information of S&C components. This can be achieved by doing a literature review, checking the specification of interested S&C type or interviewing engineers or technicians who have responsibility for the maintenance activities of S&C.

Identifying Potential Failure Modes, Causes and Effects of Each Component
This step refers to the qualitative results of the risk assessment method. For each component, the failure modes, causes and effects could be identified from the past failure data and inspection records. In some cases where the historical data is not available, interviewing engineers or technicians who work directly with S&C can also be done in order to obtain the useful information about the failures of S&C.

Determining the Risk of Each Failure Mode
In this study, the risk priority number (RPN) is applied to measure the level of risk of each S&C component. This RPN can be calculated from the product of the occurrence probability of the failure mode (O), the severity of the failure mode (S) and the ability to detect the failure mode (D) as depicted in equation (1). The values of these factors usually are dependent on the expert opinions, and they can use the rating scales in Table 1, 2 and 3 in order to express the reality. In this way, the RPN will range from 1 to 180.

Scope, Context, Criteria
Risk identification

Risk analysis
Risk evaluation

Risk Treatment Communication and Consultation
Monitoring and Review

Risk Assessment
Recording and Reporting  Table 3. Rating scales for the failure detection ability (D).

Low
Failures are easy to find by using visual inspection or sensors.

Moderate
Failures are quite difficult to find (might need some time to detect them). 3

High
Failures are difficult to find (might require special skills and equipment to detect them).

Evaluate Risk Based on the Criticality Level
The critical levels of S&C components can be evaluated using the criteria given in Table 4 [10]. These include "very low", "low", "medium", "high" and "very high". "Very low" refers to a very low-risk exposure group, which means it is almost unnecessary to take any actions to improve the components in this group. Meanwhile, "very high" implies the very high-risk exposure group. The components in this group should be considered as the top priority and required a special care to reduce the risk of S&C failures. It is noted that, to use the criteria, the highest RPN (180) is normalised to 100. Then, the RPN of other S&C components is specified as its proportion of the highest value. This allows the RPN of each S&C component to be presented in the range of 0 to 100, which is easier to understand.

Propose the Maintenance Strategy to Reduce the Risk of S&C Failures
After obtaining the critical level of each S&C component, a new maintenance program can be created. This can be done by comparing the current maintenance strategy to the priority of the components required to take an action. If the maintenance order in each component is different, the new maintenance program should be adjusted, and the periodicity of each maintenance activity could also be changed according to the risk evaluation results.

Airport Rail Link City Line
The Airport Rail Link city line (ARL) is the railway line in Bangkok, Thailand, opened in 2010. The length of the line is approximately 28 km double-track and comprises 8 stations which are Phayathai, Ratchaprarop, Makkasan, Ramkhamhaeng, Hua Mark, Ban Thab Chang, Lad Krabang and Suvarnabhumi Airport station. The line is operated from 6:00 AM to 12:00 PM, serving about 80,000-90,000 passengers daily. In the future, this line will be a part of the high-speed rail project connecting 3 airports: Don Mueang, Suvarnabhumi and U-Tapao airports. Therefore, the assets on this line should be maintained properly, especially S&C, to reduce the risk of component failures to a minimum.
In total, ARL has 49 switches and crossings located at the different stations and depots as depicted in Figure 3. These S&C are planned to maintain two times per year or every 6 months. This means the risk of failures of each S&C component is assumed to be the same for the current maintenance program. Thus, this study intends to investigate this assumption and attempts to improve the maintenance program for ARL.

Data Collection
Due to the commercial sensitivity of the failure data, this study used a FMECA questionnaire to collect the failure modes, causes and effects of each S&C component. Six experts, engineers and technicians who are responsible for the maintenance of S&C on the ARL, were interviewed and asked to express the value of RPN parameters based on the information provided in Table 1, 2 and 3. The results obtained from each expert were then calculated into the averaged numbers before transforming to the RPN of each failure mode of each component as can be seen in Table 5. Table 5 presents the results of FMECA of ARL switches and crossings. It is found that the point machines, the check rails and wing rails seem to be the components which have the greatest averaged RPN value of 28.13 and 21.55, respectively. However, when considering each factor of RPN, the failure modes that have highest occurrence rating (O = 5) are the plastic deformation and breakout of material (Spalling). These failure modes belong to check rail and wing rail components. The other failure modes that also have high occurrence rating are the cracking of crossing noses (O = 4.5), the deformation of crossing noses (O = 4) and the breakage of spring clips (O = 4). These should be taken into an account when developing an inspection and maintenance strategy. For severity factor, it is notice that the failure mode that has a high impact on the rail services is the cracking at welding points of stock rails (S = 6). This failure mode is mostly caused by high dynamic impact wheel load and defect from welding. The main reason that experts from ARL have given this failure mode to be the serious one might be related to safety issue. When this happen, it does not only affect the services, but it could also lead trains to derail making most of passengers unsafe in the worst case. The failure mode that has also high severity rating is the failure of point machines in both normal and reverse operations (S = 4.5). Thus, if the operators would like to reduce the impact of the services to as low as reasonably practicable level, these failure modes should be given as the priority. The last factor of RPN is the ability to detect the failure. The results show that the improper position of switch rails and the failure of point machines are likely to be the failure modes that are difficult to detect. This is because these failures require special skills and equipment to detect them. The operators may need to consider the use of modern technologies such as health monitoring equipment to improve the detectability of these components.  Table 6 presents the risk evaluation and the suggested maintenance plan of switch and crossing components according to the risk evaluation criteria shown in Table 4. The results show that most of the components are in the very low and low level of risk exposure. However, there are two components that have the risk exposure in the medium level. These include point machines and check and wing rails, which have the normalised RPN of 15.63 and 11.97 respectively. Based on these findings, it can be seen that the existing inspection and maintenance plan of ARL seems to be not consistent with the risk of the component failures. ARL recently relies on the time-based maintenance scheduled to do an inspection and maintenance of S&Cs in every 6 months. If the failures of any components are found, the maintenance program is recommended immediately after the operational period in order to avoid the impact to passengers. In this study, the risk exposure of each S&C components is not the same and can be divided into 3 groups. In this way, it is recommended that an enhanced maintenance program could be established based on the average normalised RPN value of each group. If the components in the very low risk exposure group are still set to be inspected and maintained at every 6 months, thus, the components in the low and medium risk exposure group could be inspected and maintained at every 2 and 1 months according to the proportional ratio of the average normalised RPN value. It is noted that the plan can be adjusted based on the availability of resources; however, it still needs to align with the risk evaluation results.

Conclusions
Switches and crossings are the essential assets on a railway network. This study thus aims to manage the risk of switch and crossing failures based on ISO31000 and use the Failure Modes, Effects and Criticality Analysis (FMECA) to create a proper preventive maintenance strategy. The case study of the urban rail transit network (ARL) in Thailand was considered, and the results found that the current S&C maintenance program of the ARL could be improved in order to achieve the maximum safety and service reliability level. The most critical component of S&C in the case study is the point machines, and the least critical components are heel blocks, slide chairs and closure rails. In detail, the S&C components were divided into 3 groups based on their risk exposures: very low, low and medium. Therefore, it is suggested that the priority of the inspection and maintenance activities of each S&C component could be adjusted according to the risk evaluation results presented in this study. In the future, if the historical maintenance data is available, such as failure data, cost, etc., an asset management model of S&Cs will be constructed in order to support the decision-making process of S&C maintenance in real time.