# Simulation of Multiprocessor Computer with Local Memories ## MARCEL JIŘINA Research Institute of Mathematical Machines (VÚMS) Lužná 2, 160 00 Praha 6 - Vokovice, Czechoslovakia The performance of one arrangement of the multiprocessor computer with local memories is established. Namely the following influences are shown: the processing time between two accesses to main memory, the time for page transfer, and the time needed for working up each such interrupt on the degree of usage of the processors and on the throughput of the system. #### 1. Introduction The problem of the multiprocessor system behaviour was solved in many works. The survey can be found in /1/. In all cases the computer is considered as a queing network. In some papers /2, 3, 4/ analytical models are given, and therefore are limited to most simple cases. Another attempt is based on the simulation of the system described in Fortran, or in some discrete events simulation language e.g. in GPSS or SIMSCRIPT /1,5,6/. Sometimes contemporarily with such a simulation a simple analytical model is given for qualitative interpretation of some dependences found by simulation. In this contribution the latter attempt was used. The model of the computer was written in GPSS ٧. ### 2. The simulated computer The multiprocessor computers can be divided into two groups according to the access to information. There are computers where each computer has a direct access to main memory, and multiprocessor computers where each processor has its own local memory accessible only to it. In this latter case the transfer of information between local memory and main memory is possible only in larger portions (pages). In the computer considered the main memory is divided into modules MP1, ... MPm, and there are several operational processors OP1, ... OPn, see Figure 1. Each processors has its own local memory LP. The transfer of data and programs is fulfilled across the switch. Each processor can be connected to only one module of main memory and one module of main memory can be connected to only one processor. In the local memory of each processor there are two tasks. One of them is processed and contemporarily for the another task the data and parts of program can be transferred to or from some module of main memory. The transfer is made in portions of fixed length, the pages. When the processing of one task is interrupted, the processor automatically begins to process another task, if it is prepared, i.e. has all information needed. Otherwise the processor waits for the first of the two tasks, that which is prepared sooner. The interruption of the task is the sign of a demand for the transfer of one or more pages between local memory and the main memory. The attendance of all interruptions. the assignment of tasks and the modules of main memory to processors, the decision in controversial situations etc. solves the control processor CP. The computer is characterized by - number of processors -n, - number of modules of main memory -m. - the mean time for working up of the interruption in the control module $T_1$ , - the mean time for transfer of one page between a module of main memory and the local memory $T_p$ , - the mean time of uninterrupted processing of one task, i.e. the mean time for which the processor computes till the interruption occurs tz, and the statistics of this time. This parameter includes several influences. In the first place the speed of the processor, the capacity of the local memory etc., - the delay before each page transfer To, - the switching time of the switch T, - the percentage of cases when the interruption needs two page transfers between local and main memory - p<sub>2</sub>. The parameters established by the simulation are - the percentage of usage of the processors v, - the number of processors idle, - the length of queue interruption waiting for attendance in control processor, - the length of queue of the tasks waiting for operational processor. We need to know the dependance of these parameters of the computer. ## 3. Model for the simulation The modul of the computer was written in GPSS V. When starting the simulation, there are generated as many interruptions as equals to the number of tasks being possible to process in all an processors, i.e. 2n. Each interruption is characterized by the number of the processor from which the interruption comes and the number of the task in the processor (1 or 2). Further to each interruption are assigned the priority, the number of module of main memory to which the data from local memory should be send (zero denotes that no data will be transferred in this direction), and the number of module from which new data or part of program should be transferred to local memory. Contemporarily all n processors come to idle state waiting for the first task prepared. In Figure 2 the left hand column of blocks roughly describes the function of the control processor and the transfers between local memories and the main memory. The right hand column of blocks denotes the function of one operational processor. In Figure 2 there are two main queues SERVE and IDLE. The queue SERVE is a queue of interruptions waiting for a module of main memory and for the way in the switch from the module to processor. The transfer is impossible if the module of main memory is connected to another processor, or if the processor is connected to another module of main memory. The queue SERVE is a queue according to priority without interruption of service already started. The queue IDLE express the waiting for the first prepared task in the processor. The number of objects in this queue is just a number of processors idle. # 4. The influence of delays in the procesor In Figure 3 a simplified diagram of delays in the system is shown. The indivi- dual interruptions run in the direction of arrows and different blocks delay them. The number of interruptions is twice the number of processors. In Figure 3 $\rm t_p$ denotes the mean total time needed for each contact between the main memory and the local memory. It holds $$t_p = (1 + p_2) (T_p + T_s).$$ (1) In Figure 3 x denotes the time when the processor is idle. The delay in the control processor for one interruption is $$t_{cp} = T_1 + T_2 (1 + p_2).$$ Let us assume, that if the interruption in some processor occurs, nearly in all cases the processor immediately starts to compute another task in its local memory. Then in the loop in Figure 3 run only n interruptions, n others stay in the processors. From this we can derive the percentage of usage of the processors as follows: $$v = \frac{t_z}{x + t_z} = \frac{t_z}{n \cdot t_{cp} + t_p - t_z}. (2)$$ This formula represents the upper, optimistic estimation, because these facts are neglected - the conflict situation in the switch, - only n interruptions running in the loop are assumed, - the time of the uninterrupted processing has a large deviation (equal to $\mathbf{t_z}$ if it has an exponential distribution). This simplest analytical model could be made more precise. For the interpretation and check of results obtained by simulation it is sufficient. ## 5. The main results of simulation The parameters of the computer considered can be established sufficiently reliable. The most problematical parameter is the mean time of uninterrupted processing $\mathbf{t_z}$ . It depends heavily on the character of programs and data, and on page size and capacity of local memory. Therefore it has close connection with hit relation /7/. Our target is not to analyze this connection; for us $\mathbf{t_z}$ is simply one of the parameters of the system. It is clear, that the relative parameter as the percentage of processor usage v does not depend on real magnitude of mean values of individual delays but on their ratio. Therefore all time statements are related to the mean total time of contact between the main memory and local memory, $t_{\rm D}$ , see (1). In Figure 4 the dependence of v on the ratio $nt_{cp}/t_p$ is shown. The parameter of the individual curves is the ratio $t_z/t_p$ . The ratio m/n is supposed to be 5/3. It is easily seen that for large values of $nt_{cp}/t_p$ the usage of the processors is decreased due to relatively slow work of the control processor. The speed of control processor is sufficient if $nt_{ep}/t_p$ is less than approximately 1.5. Beside the speed of control processor, the usage of processors and therefore the throughput of the system is governed by the ratio tz/tn. In Figure 5 the dependence of v on t, for system of 6 processors and 10 modules of main memory is shown. In essence it is the vertical section across the Figure 4. In Figure 5 the influence of different distributions of $t_{_{\rm Z}}$ is shown. In this figure E4 denotes Erlang's distribution with k = 4, RO2 homogenous distribution in interval 0-2, EX denotes exponential distribution. In Figure 6 the dependence of v on number of processors n is given. The hyperbolic curve is the upper bound according to (2). #### 6. Conclusions For the system with rather immense trace of different actions simple and sufficiently general and reliable relations expressed in Figure 4 were found. If the control processor is sufficiently quick, the effective function of the multiprocessor system is given by the speed of processors in relation to the speed of one page transfer, and by behaviour of programs in paging environment. #### References - (1) Baer J.L.: <u>Performance analysis of</u> <u>multiprocessor systems</u>. Proc. of the Eight Hawaii Int. Conf. on System Sciences, Honolulu, Hawaii, Jan.7-9., 1975, pp. 24-26. - (2) Franta W.R.: The mathematical analysis of the computer system modelled as a two stage cyclic queue. Acta Informatica 6 (1976), pp.187-209. - (3) Peterson J., Bulgeen W.: Studies in Markov models of computer systems. Proc. ACM 75 Ann. Conf., Minneapolis, Sept. 20-22, 1975, pp. 102-107. - (4) Baskett F., Smith A.J.: Interference in multiprocessor computer systems with interleaved memory. CACM 19, No. 6, June 1975, pp. 327-334. - (5) Mitchell J., Knadler C., Lunsford G., Yang S.: <u>Multiprocessor performance</u> <u>analysis</u>. National Electronic Conference 1974, pp. 399-403. - (6) Sastry K.V., Kain R.Y.: On the performance of certain multiprocessor organizations. IEEE Trans. on Computers C-24, No.11, Nov. 1975, pp. 1066-1074. - (7) Mattson R.L.: Evaluation of multilevel memories. IEEE Trans. on Magnetics, MAG-7, No.4, Dec. 1971, pp. 814-819. The utilization of processors for m/n = 5/3 Relationship between utilization of processors and meantime of uninterrupted processing Figure 2 Gross Logic flow chart Figure 1 The multiprocessor computer Figure 3 Simplified diagram of delays Relationship between utilization of processors and number of processors $\mathbf{1}$