## THE SIMULATION OF A PIPELINED EVENT SET PROCESSOR John Craig Comfort Florida International University Department of Mathematical Sciences Miami, Florida 33199 Anita Miller Department of Mathematics Ransom-Everglades School Coconut Grove, Florida 33133 Abstract. The availability of inexpensive, sophisticated processing elements affords the computer system designer the opportunity to create tailored computer systems for specialized applications. In this paper, the authors present a design for a discrete event simulation computer, in which event set manipulation is performed by a pipelined set of microprocessors. A simulation model for the system is presented and the selection of optimal parameters for the system are discussed. The results of the simulation and suggestions for further research are presented. ## INTRODUCTION When a program is designed using a top down methodology, each step of the refinement process results in the specification of a set of subtasks, where each subtask has associated with it a set of input specifications, a set of output specifications, a process specification, and a designated environment in which it functions. With the availability of inexpensive, powerful microprocessing elements (such as the Motorola 68000 and the Intel 432), a system designer has flexibility in assigning selected subtasks to diverse processing elements. Within the context of a discrete event simulation . program, certain groups of related tasks may be identified. One of the most significant of these is the group associated with maintaining the event set. In a recent study by one of the authors (Comfort, 1981), the behavior of a moderately large simulation program was studied. In various executions of the program, it was observed that between 32 and 38 percent of all instructions executed by the processor were used for event set maintainence. In this paper, we present an algorithm for event set processing which is distributed over a set of processors and memory units. Considerable work has been done in recent years in the design and analysis of event set algorithms and data structures (all cited references except Comfort and Coats, 1980). All of the referenced articles assume that a single processor performs both general simulation computation and event set processing. In the study mentioned above, a rough simulation was done of a simulation processor/event set processor system, indicating that further research into this problem is justified. # 2. EVENT SET DATA STRUCTURES AND OPERATIONS Every individual (here referred to as an "entity") which is represented in the simulation program will be assigned a unique set of data locations in the main memory of the simulation computer. This block contains information about the state of the entity, times used for the calculation of statistics, and various pointers to other entities in the system. Further, for those entities which have a defined time of next processing, an "event notice" will be created. This event notice will contain (minimally) the address of the block of the memory assigned to the entity and the entity's time of next processing. For the purposes of this study, event notice memory will be assumed to be physically disjoint from main memory. Further, in a multiple processor system, a separate event notice memory will be assigned to each processor. Of the operations associated with event set processing, three may be considered primitive: "schedule (time,entity)"-Insert an event notice with the given "time" for the given "entity" in the event set. - "next" Return the "entity" with the smallest next processing "time" to the calling program. Update the simulation clock to "time". - 3) "remove (entity, time)" Destroy the event notice associated with "entity". Do not change the simulation clock. Note that the "time" is not necessary for this operation, as an "entity" may have at most one event notice associated with it. The specification of the time greatly facilitates the location of the event notice to be removed. For the course of this discussion, our attention will be focused principally on the "schedule" and "next" operations. In simulations done by the authors, these operations accounted for over 98 percent of all calls on the event set programs (For a situation in which this is not the case, see Blackstone et.al. (1981)). ## 3. A PIPELINED EVENT SET ALGORITHM Let us assume that the computer system being simulated consists of one simulation process (called SIMP) and its associated memory, and N event set processors (called EVSPs), each with its own memory. Each event set processor has associated with it two additional control parameters, called UNDER and OVER. The event set algorithm always attempts to maintain the event set size between these two bounds. In addition each processor maintains: - A doubly linked list of event notices anchored at each end by the cells PAST, with time of -1, and FUTURE, with time infinity (or an engineering approximation thereof). - The variable MAXTIME, containing the largest next processing time of an event notice in this processor. In the situation where the SIMP wishes to schedule an event notice in EVSP(1), or when EVSP(n) attempts to schedule an event notice in EVSP(n+1), let the entity being referenced be "EN", and the time of processing "tsim". Then the SIMP or EVSP(n) will - Wait for processor EVSP(n+1) to become unbusy. - 2) Transfer "EN" and "tsim" to EVSP(n+1). Viewed from EVSP(n+1), this processor will: - Accept "EN" and "tsim" from EVSP(n) or SIMP, and acknowledge the receipt. - 2) If the processor has at least UNDER event notices, and "tsim" > MAXTIME, then - 2.1) Wait for EVSP(n+2) to become unbusy. - 2.2) Transfer "EN" and "tsim" to EVSP (n+2). - If the processor has fewer than UNDER event notices, or "tsim" ≤ MAXTIME, then schedule the event notice in the local event set memory of EVSP(n). - 4) If the processor now has more than OVER event notices, then unlink the event notice with largest next processing time. Let it's time of next processing be "tmax" and let it refer to the entity "ENmax". Update MAXTIME to relect the largest simulation time of the remaining event notices. - Wait for processor N+2 to become unbusy. - 4.2) Transfer "ENmax" to "tmax" to EVSP (n+2). If the SIMP or EVSP(n) (n>1) has requested to perform a "next" operation, EVSP(n+1) must: - Unlink the event notice with smallest next processing time "tnext" and its entity number "ENnext". If there are no such notices, set "tnext" and "ENnext" to zero. - Transmit "tnext" and "ENnext" to EVSP(n) or SIMP. - If the number of event notices in EVSP(n) is now less than UNDER (but greater than zero), then - 3.1) Wait until EVSP(n+2) is unbusy. - 3.2) Initiate a "next" operation on EVSP (n+2), and wait. - 3.3) Retrieve the time, "t", and the entity, "EN", from EVSP(n+2). - 3.4) Link "t", "EN" into the event set on processor n+1. ## 4. THE SIMULATION EXPERIMENT The simulation experiment consists of configuring a (simulated) computer system consisting of N EVSPs and the SIMP. This sytem is exercised using trace data taken from a real simulation program (described in Comfort and Coats, 1981). The primary dependent variable of the simulation is the percent of time that the SIMP must wait on the first EVSP in the pipeline - presumably all of the remainder of its time may be used in running simulations. Also of interest are the idle times of the EVSP's in the system. Timing data were obtained by counting the number of simulated machine instructions executed during and between calls on the event set routines. This information was gathered by inserting software probes into an already existing simulation program. Similar data were obtained for the event set routines by examining a compiled assembly language listing of the program. This metric was chosen instead of elapsed time because the Univac 11.00 real time clock, which has a resolution of 200 microseconds, is far too coarse for the purpose of this study. In addition, the Univac system has a large cache memory - measured performance would have a substantial amount of unexplained variance due to the memory access patterns of other jobs being run at the same time. The simulation program was written in University of Wisconsin Pascal, and designed using a top down methodology. The parameters to the simulation program are: - 1) N, the number of event set processors, with 0<N<3. - The overflow and underflow limits (OVER and UNDER) for each processor in the pipeline. - 3) An external trace file, consisting of one | | | . , | | | BLE 2 | | | | | |-------------------------------------|----------------------------------------------|------------------------------------------|----------------------------------------------|--------------------------|--------------------------------|------------------------------|--------------------------|--------------------------------|-----------------------------| | | | | PERFO | RMANCE OF | A TWO EVS | PSYSTEM | | | | | | | RUN H<br>Mean Event<br>Set Size<br>203.1 | | | RUN M<br>ean Event<br>Set Size | | , Me | RUN L<br>ean Event<br>Set Size | | | UNDER [1] | %<br>SIMP<br>Wait | %<br>EVSP1<br>Busy | %<br>EVSP2<br>Busy | %<br>SIMP<br>Wait | %<br>EVSP1<br>Busy | %<br>EVSP2<br>Busy | '%<br>SIMP<br>Wait | %<br>EVSP1<br>Busy | %<br>EVSP2<br>Busy | | 25<br>20<br>15<br>10<br>9<br>8<br>7 | 14.2<br>11.9<br>11.6<br>11.5<br>11.7<br>12.2 | 16.3<br>11.2<br>9.8<br>9.5<br>8.9<br>8.4 | 19.3<br>24.6<br>26.5<br>26.4<br>28.7<br>30.6 | 12.1<br>10.6<br>9.0 | 15.4<br>13.1<br>10.7 | 8.0<br>9.1<br>11.6 | 14.9<br>12.7<br>10.0 | 17.9<br>14.9<br>12.1 | 2.7<br>4.5<br>6.9 | | 6<br>5<br>4<br>3<br>2 | 13.3<br>15.9<br>20.9 | 7.8<br>7.0<br>6.9 | 33.3<br>38.6<br>47.1 | 8.1<br>8.7<br>9.2<br>9.7 | 7.7<br>5.8<br>4.8<br>4.8 | 20.4<br>34.2<br>42.5<br>47.8 | 8.6<br>7.7<br>6.9<br>6.2 | 9.4<br>7.9<br>6.6<br>5.2 | 9.7<br>14.5<br>21.4<br>29.6 | record for each event set call made in the original program. Each record consists of: - a) the kind of call ("schedule" or "next") - b) the incremental simulation clock time - c) the instruction execution count between this call and the previous call on the event routines - the instruction execution count to perform the event set operation in the driving program. - The time required to communicate between two processors. The program used to generate the trace files is an event driven simulation of a large specialized computer network (described in Comfort 1980). The program was implemented in FORTRAN on a Univac 1100 computer system. It was designed using a top down methodology, and no especial effort was made to minimize running time. The event set algorithm used was a variant of the adaptive indexed linked list algorithm (described in Comfort 1979 and Wyman 1975). The results presented are chosen to represent conditions of light, moderate, and moderately heavy loading of the simulated computer system. They will be referred to as RUN L, M, AND H respectively. ## 5. RESULTS The behavior of a system using the pipelined list event set in a system with one SIMP and one EVSP is shown in Table 1. By most standards, this performance is clearly unacceptable, as the CPU is idle (waiting on the EVSP) more than 20 percent of the time and more than 50 percent in a heavily loaded system. An analysis of the sequence of calls to the event set processors (performed in Comfort 1981) reveals that it is quite common to have a "next" operation immediately following a "schedule", requiring the CPU to wait for the completion of the "schedule". Further, in a one EVSP system, the pipelined algorithm reduces to the simple doubly linked list algorithm (Comfort 1979), which is quite inefficient in time. Thus, it is necessary either to employ a more sophisticated event set algorithm, or to add additional event set processors to the system. | | TAI | BLE 1 | | | |-------------|---------|-------|-------|--| | | RUN H | RUN M | RUN L | | | % SIMP Wait | 53.78 | 31.08 | 24.05 | | | % EVSP Busy | 60.41 . | 62.26 | 27.99 | | | | | | | | Following the second approach, a simulated three processor system (one SIMP plus two EVSP's was configured, and a set of simulation experiments was run. One of the first observations made was that the value of the OVER parameter is largely irrelevant. There are two mechanisms for transferring an event notice from EVSP(n) to EVSP(n+1). In the first, the new event notice has time larger than that of the most distant event notice in EVSP(n) and is passed on directly. In the second, the event notice is scheduled in EVSP(n), a local memory overflow occurs, and the most distant event is 'bumped' to EVSP(n+1). Apparently the first mechanism is used almost exclusively unless the OVER value is very close to that of UNDER. As a result, the OVER value was set to the maximum size for each processor's event set memory, and subsequently ignored. In a two EVSP system, more nearly satisfactory results have been obtained (as shown in Figure 1 and Table 2). Under conditions of heavy system loading (labeled RUN H), an optomal CPU wait of about 11 percent is obtained with an underflow value of 9. Under moderate load, the optomal point is less well defined, and occurs when UNDER is 5. With light loading, best behavior is seen when UNDER assumes its minimum value of 1. Fixing the parameter value at that which optimizes performance in the heaviest system loading, CPU wait percentages of 9.81, 10.53, and 12.08 are obtained for the three runs. It is interesting to observe that optimal behavior seems to occur when EVSP(2) is doing roughly three times as much work as EVSP(1). When the load ratio is greater than 3, the time that SIMP waits on EVSP(1) while that processor waits on EVSP(2) becomes significant. At a lesser load, EVSP(1) is doing too much scheduling, and the time that SIMP waits on EVSP(1) dominates. If a two EVSP system were desirable for some a priori reason, it would be quite possible to design an algorithm in which the SIMP is able to dynamically alter the value of UNDER (1). The pragmatics of such an algorithm are beyond the scope of this paper, however. By proceeding to a three EVSP system and varying the UNDER parmaters to EVSP (1) and (2), even better performance was obtained. Starting with the optomal value of UNDER (1) obtained in the two processor system and varying UNDER (2), the CPU idle percent is reduced to 8.38 (with an UNDER (2) value of 10). For this set of parameter values, the middle event set processor is fairly lightly loaded with respect to the front end processor. By analogy with the observed behavior of the two EVSP system, this suggested that a smaller value of UNDER (1) could be employed. Values of 5,3,2 and I were used for this parameter (see Table 3 and Figure 2). As expected, the value of one produces the best behavior, yielding a SIMP idle percentage of 5.30 and a relative loading ratio of 3.5:1. Using the same parameter values for the moderately and lightly loaded system, similar values for SIMP idle percentage were obtained. ### 6. CONCLUSION AND DIRECTIONS FOR FURTHER RESEARCH The simulation study here performed certainly indicates that a pipelined event set processor may result in a significant increase in the utilization of the central simulation processor and that the performance increase is nearly independent of processor loading. (see summary in Figure 4) | | | | | 7 | TABLE 3 | | | | |-----------------------|--------------------------------------|---------------------------------------|----------------------------------------|-------------------------------------------|--------------------------------------|---------------------------------------|-----------------------------------------|-------------------------------------------| | | | | PERFORMA | NCE OF A TH | REE EVSP SYST | EM (RUN H) | ) | | | | U | NDER(2): | 100 | | | UNDER ( | (2): 50 | | | UNDER | %<br>SIMP<br>Wait | %<br>EVSP1<br>Busy | %<br>EVSP2<br>Busy | %<br>EVSP3<br>Busy | %<br>SIMP<br>Wait | %<br>EVSP1<br>Busy | %<br>EVSP2<br>Busy | %<br>EVSP3<br>'Busy | | 9<br>5<br>3<br>2<br>1 | 8.51<br>8.35<br>11.09<br>13.34 | 9.92<br>7.80<br>5.03<br>4.55 | 10.79<br>16.83<br>32.60<br>42.90 | 3.29<br>3.61<br>3.78<br>3.75 | 8.36<br>7.16<br>6.48<br>6.10<br>6.40 | 10.08<br>7.88<br>6.27<br>5.20<br>4.36 | 4.46<br>7.82<br>15.74<br>23.10<br>27.90 | 15.21<br>15.36<br>15.69<br>16.50<br>16.89 | | | Ui | NDER(2): | 25 | | | UNDER( | (2): 10 | | | UNDER [1] | %<br>SIMP<br>Wait | %<br>EVSP1<br>Busy | %<br>EVSP2<br>Busy | %<br>EVSP3<br>Busy | %<br>SIMP<br>Wait | %<br>EVSP1<br>Busy | %<br>EVSP2<br>Busy | %<br>EVSP3<br>Busy | | 9<br>5<br>3<br>2<br>1 | 8.40<br>7.20<br>6.25<br>5.73<br>5.30 | 10.61<br>8.00<br>6.20<br>5.14<br>4.35 | 2.77<br>4.40<br>9.70<br>13.68<br>16.78 | 12.75<br>19.14<br>19.96<br>20.23<br>21.03 | 8.38<br>7.27<br>6.40<br>5.89<br>5.50 | 9.89<br>7.87<br>6.12<br>5.06<br>4.29 | 2.19<br>3.07<br>6.45<br>8.57<br>10.27 | 22.79<br>25.34<br>26.34<br>26.78<br>27.73 | | SIMP N | WAIT PERCENT AS A FUNCTION | ON OF THE NUMBER OF E<br>% CPU WAIT | EVSPS | |------------------|----------------------------|-------------------------------------|-------| | | RUN. H | RUN M | RUN L | | Original program | 38.1 | 34.2 | 32.3 | | One EVSP | 53.8 | 31.1 | 24.1 | | Two EVSPs | 11.5 | 9.2 | 10.6 | | Three EVSPs | 5.3 | 6.0 | 5.4 | The results as presented are not "real" in the sense that they are based on instruction counts taken from high level compiled code on a computer whose use as an event set processor would be moderately infeasible. In the next phase of this research a verifiable system will be simulated using a PDP-11/44 as the SIMP, and Motorola 68000 computers as the EVSPs. We plan to use data from this composite computer sytem to validate the simulation program. It seems reasonable that an extremely cost effective simulation computer system may be designed as a network of dedicated microprocessing elements. Extensions of the simulation program discussed here will be a major tool in the design and refinement of such a system. ### **ACKNOWLEDGEMENTS** The authors wish to thank Martin Miller for his assistance in the preparation of this manuscript. #### REFERENCES - Blackstone, J.M., Hogg, G.L., and Phillips, D.T., "A Two-List Method for Synchronization of Event Driven Simulation" <u>Proceedings of the Fourteenth Annual Simulation Symposium</u>, March 1981, pp.95-101. - Comfort, J.C.: "The Simulation of a Microprocessor Based Event Set Processor", <u>Proceedings of the</u> Fourteenth Annual Simulation Symposium, Mach 1981, pp. 17-33. - Comfort, J.C. and Coats, P.K., "Electronic Funds Transfer Networks: The Impace of Performance and Security Considerations", <a href="Proceedings of the Thirteenth Annual Simulation Symposium">Proceedings of the Thirteenth Annual Simulation Symposium</a>, March 1980, pp. 1-26. - Comfort, J.C., "A Taxonomy and Analysis of Event Set Management Algorithms for Discrete Event Simulation", Proceedings of the Twelfth Annual Simulation Symposium, March 1979, pp. 115-146. - Franta, D., and Maly, W., "An Efficient Data Structure for the Simulation Event Set", CACM 20, 8 (August 1977) pp. 596-602. - Gonnet, G.H., "Heaps Applied to Event Driven Mechanisms", CACM 19, 2 (July 1970), pp. 417-418. - Vaucher, J.G. and Duval, P., "A Comparison of Simulation Event Set Algorithms", CACM 18, 4 (April 1975), pp. 223-230. - Wyman, P.F., "Improved Event Scanning Mechanisms for Discrete Event Simulation", CACM 18,4 (April 1975), pp. 350-353.