|
|
 | | From: | Feargal Timon | | Subject: | Model Accuracy | | Date: | Thu, 14 Oct 2004 08:05:55 -0500 |
|
|
 | REAL DATA COMPARED WITH STATISTICAL ANALYSIS
Feargal Timon B.E., M.Sc.Eng., MSCS, CIM Ireland Ltd, Brooklawn, Salthill, Galway, Rep of Ireland.
ABSTRACT
Simulation models by their nature are made up of a variety of interdependent timed events. In some models, the time of these events are approximated and simplified. In other cases, they are analysed and statistical distributions are applied. This paper proposes a context where real data logs should be used in the development of a simulation model.
KEYWORDS: Validation, Real Data, Event Logs
1. INTRODUCTION
Two of the most important aspects of building a simulation model are; customer confidence and model accuracy/validation. This paper proposes that, to achieve the most accurate model possible, real data should directly feed into the model; as opposed to analysing real data and converting it into statistical distributions. This paper also discusses when to use real data logs. Three case studies are presented where real data logs were used and where model accuracy of over 95% was achieved. The use of real data in modelling is particularly relevant to situations where one or more element has a high degree of variability.
2. REAL DATA DRIVEN MODELS
A real data driven model is one where real data logs, which were recorded from the system being modelled , control one or more elements in the model. An example of a real data log would be: a down time log. This log would record the time of all failures and their respective durations. The real data log may be manually, but preferably electronically recorded. In such a model, where real data logs are used, the simulated machine would go down at the same times as the real machine and for the same durations. Hence, the real data logs completely control this element of the model. Table 1 demonstrates what a real data log may look like and shows how such a log can be converted into a format to be used by a simulation model.
Down Time Log "Model Event File" Time of Failure Time of Recovery Time of Failure (Min) Duration (Min) 29-Aug-04 8:15 AM 29-Aug-04 8:19 AM 495.00 4.32 29-Aug-04 9:31 AM 29-Aug-04 9:45 AM 571.25 14.40 29-Aug-04 11:46 AM 29-Aug-04 11:47 AM 706.15 1.44 29-Aug-04 12:01 PM 29-Aug-04 12:29 PM 721.02 28.80
Table 1: A Down Time Log and its Conversion to a Format Suitable for Simulation
The decision to use real data driven elements in a model will be based initially on data availability. Taking this as a given, real data driven models should be built: a. When one or more elements has a sufficiently high degree of variability such that, if a statistical distribution were to be used, the model would have to be run multiple times to ensure a valid result. b. Where two variable elements interact to produce highly different results and a more stable model is needed to analyse a specific problem. c. To give the customer confidence so that he or she can see how the system would react under more realistic conditions. The author has found that real data logs simplify the validation process and improve model accuracy.
3. MODEL VALIDATION
The validation of a model is the cornerstone of any simulation project. The use of real data logs both simplifies the model and improves its accuracy, thus making it easier to validate. The modelling task is simplified for those elements using real data logs, as there is no data to analyse or model logic to be developed. The use of real data logs also improves the model's accuracy as these logs represent reality; thus there are no assumptions or approximations required. The most beneficial aspect of real data logs is that by keeping the real data controlled elements representing reality, the other elements can be more intensely scrutinised and validated. In other words, as the accuracy of the controlled elements are ensured, the accuracy of the other modelled elements become more apparent. The validation process itself should also become simpler, as real data logs can be used to compare simulated results with actual events. For example, the actual start and end times of a batch can be compared with the simulated times. This will be illustrated in the second case study where a whole factory is modelled and every production order is validated. To demonstrate the effectiveness of using real data driven models, three distinctly different case studies will now be discussed from the following perspectives: a) why the elements were selected, b) how the model was validated, and c) the resulting model accuracy.
4. CASE STUDY 1 - HIGH VOLUME INK CARTRIDGE MANUFACTURER
The first case study is of high volume, closed loop conveyor system, where parts are placed on a pallet conveyor at the beginning of the loop and removed at the end of said loop. The units, which are produced in one hour, can vary in number from 900 to 2000 units. This variation in output is caused by multiple short downtimes (less than 15 minutes) on all equipment. The objective of the project was to increase output by 5% to 15% on the loop. For the initial model, downtime was analysed and specific distributions were developed for each piece of equipment. When the model was run under the same conditions, the average output would vary by over 10%. To get a stable result, the model would have to be run ten times in order to evaluate each scenario correctly. However, this proved to be too time-consuming and it was difficult to be sure of the benefits, due to the inherent variability in the simulated results. Consequently, the model was rebuilt in such a way that the downtime logs were fed directly into it and the other elements were then simulated. Table 2 outlines those elements that were simulated and those elements that were controlled by real data. This table also presents the model accuracy in terms of units produced and the correlation of units produced per hour. Using real data logs, it is possible to have a model with an accuracy level of 99.5% even though the output of the real system varies by over 100% per hour. This demonstrates the benefits of real data driven models compared to those models run using statistical distributions. Elements Output Per Hour Modeled Real Data Controlled Accuracy Correlation Conveyors Downtime Log 99.5% 0.93 Equipment Logic Run Rates Pallets
Table 2: Case Study 1 - Components Modelled and Components Driven by Real Data, Plus Model Accuracy
Figure 1 which further demonstrates the accuracy of the simulated model was key in validating this project. The customer gained full confidence in the analysis, once they saw the model reacting to real data.
The next case study looks at a large batch processing plant.
5. CASE STUDY 2 - PROCESS PLANT. This case study looked at a complete factory, from batch manufacturing to packaging. The manufacturing areas would produce batches and then wait for a packaging line to become free. When a suitable line became available, the product would be packaged. The packaging times would vary depending on the type of packaging and the changeover from the previous product. The next manufacturing batch could not be started until the previous batch was packaged. Also, one manufacturing batch could be processed on more than one packaging line. Similarly, the production time of a batch in the manufacturing area could vary significantly. This variation would define when a packaging line would be required. After collecting the real data, the batch manufacturing times and packaging times were fed directly into the model and all other elements were modelled, as shown in Table 3. The accuracy of the model was based on the start and finish times of the manufactured batches. These parameters were selected because a batch could not start until the packaging of the previous batch was completed. Figure 2 compares the simulated start time to the actual start time of all orders.
Elements Order Completion Modeled Real Data Controlled Accuracy Correlation Tanks Batch Times 95.3% 0.98 Pipes Packaging Times 98.4% Flow rates Logic
Table 3: Case Study 2 - Components Modelled and Components Driven by Real Data, plus model accuracy
To validate the model, the customer wanted to use real schedules and real data. This approach proved to be the most effective, particularly during the validation steps of the project as it provided a high degree of accuracy . This case study shows that real data driven modelling is effective for a factory wide model. To further test the proposed approach, a non-manufacturing example i.e. a call center will also be studied.
6. CASE STUDY 3 - CALL CENTER
This business receives a variety of calls; customer, business and corporate calls. These calls can be further divided into sales or service and finally into language requirements. This combination results in over one hundred different of Skillsets. The objective of this project was to discover how many agents were required and what type of calls they should answer. The elements of the call center that varied the most and could not be controlled by management were the number of calls arriving into the center. Three months of actual calls were fed directly into the model (see Table 4). The main performance measure of this call center was the number of calls abandoned for each Skillset. The approach of using real data driven elements in the model delivered an accuracy level of 97.5% based on the number of actual abandoned compared with abandoned simulated calls.
Elements Abandon Modeled Actual Accuracy Correlation Agent Call Time 97.5% 0.99 Breaks Arrival Log Logic Emails Arrival Email Response Time Skill Sets Hang-up threshold
Table 4: Case Study 3 - Components Modelled and Components Driven by Real Data, Plus Model Accuracy
7. CONCLUSION
In conclusion, most models have one or more elements that have a high degree of variability. The three case studies, which were presented, have demonstrated that by controlling these elements using real data logs the modelling task is simplified; the model accuracy is increased and the validation requirements can be reduced. Equally, a customer has a higher confidence level in the model when they see it react to real data. Finally, the most beneficial aspect of real data logs is that by keeping the real data controlled elements representing reality, the other elements can be more intensely scrutinized.
|
|
|