# **UC Berkeley** # **Research Reports** # **Title** Distributing Synchronous Programs Using Bounded Queues, a coordinated traffic signal application # **Permalink** https://escholarship.org/uc/item/0846252m # **Authors** Zennaro, Marco Sengupta, Raja # **Publication Date** 2005-05-01 Institute of Transportation Studies University of California at Berkeley Distributing Synchronous Programs Using Bounded Queues, a coordinated traffic signal application Marco Zennaro and Raja Sengupta RESEARCH REPORT UCB-ITS-RR-2005-4 # Distributing Synchronous Programs Using Bounded Queues, a coordinated traffic signal application Marco Zennaro and Raja Sengupta, {zennaro, raja}@path.berkeley.edu ### **Abstract** This paper is about the modular compilation and distribution of a sub-class of Simulink programs [10] across networks using bounded FIFO queues. The problem is first addressed mathematically. Then, based on these formal results, a software library for the modular compilation and distribution of Simulink program is given. The performance the library is given. The value of synchronous programming for the next generation of traffic control value is discussed. The adoption of these tools seems to be the natural candidate to address the needs of the traffic engineers. As a case study we present an implementation in Simulink of a controller for coordinated traffic signal in an asymmetric peak hour traffic scenario and we evaluate its computational performances in a distributed environment. # I. INTRODUCTION The synchronous paradigm was introduced in order to simplify the programming of reactive systems, hiding from the user the complexity of interleaving and its associated non determinism [1],[2],[3],[4]. The compiler takes care of translating the synchronous system into sequential code while preserving its semantic [4]-[5]. Synchronous programming languages like ESTEREL [6]-[7], LUSTRE [8], SIGNAL [9], or Simulink [10] are modular and compositional. This is essential for the programming of large control systems. Communication networks enable systems to be distributed, enhancing both concurrency and non-determinism, due to the asynchronous nature of the communication medium. In the synchronous philosophy, the resulting complexity should be hidden from the user and automatically taken care of by the compiler. This is now an active field of research. [11]-[12] propose algorithms to distribute particular subsets of ESTEREL programs, starting with a single synchronous program and splitting it into synchronous subsystems intercommunicating through an asynchronous medium creating what is called a Globally Asynchronous Locally Synchronous (GALS) system [13]. This approach preserves the synchronous semantics but does not maintain or exploit the modular structure in the original synchronous program. Consequently, modification to one module of the synchronous program may require re-compilation and re-distribution of the entire system. We try to achieve the same objectives while retaining any modular structure in the synchronous program in its asynchronous, semantic preserving, equivalent. Our aim is a distribution method in which any modification to a module of the synchronous program will only require recompilation of the altered module. The work was supported by Office of Naval Research (AINS), grant N00014-03-C-0187 and SPO 016671-004 Authors are with the Center for Collaborative Control of Unmanned Vehicles, University of California, 2105 Bancroft way, Berkeley CA 94704 [13] proves such a mapping to GALS, preserving modularity, exists for a particular class of synchronous systems. However, no algorithm computing on a finite representation of synchronous systems is given. In [14] we proposed such an algorithm based on CSP style randezvous [15]. In this paper we present an enhanced approach for the distribution. Our approach is most similar to [16] and [17]. In [16] a blocking scheme is used to distribute discrete event systems. In the discrete event system setting particular attention has to be paid to avoid deadlock and livelock, while we prove this is not necessary for the class of problem we address. In [17] microcircuit components are composed together under the assumption that they are stallable, and the communication between components is modeled using fix sized FIFO queues. Synchronous programs are here modelled using a finitary version of the Synchronous Transition System [18], modified to resemble Simulink. The asynchronous formalism is similar to the I/O automata of [19]. Define a synchronous and asynchronous composition operator. The synchronous composition operator is Simulink-like. The asynchronous composition operator is similar to the one used in Kahn Process Networks [20], [21], [22], but we assume the communication queues to have bounded size so that it can be implemented over reliable FIFO channels. An implementation algorithm to map synchronous programs to asynchronous ones is then given and it is proven that the implementation map preserves the synchronous semantics in the sense of [13]. The main result is that the implementation is a monomorphism with respect to the synchronous and asynchronous compositions. The monomorphism is our argument that a local change can be handled locally and that a subsystem can be re-used in different systems. The theoretical results are then transformed into software. The architecture of the BSDP library and its performances are presented. The results in this paper apply only to single rate Simulink programs without causal loops (see section II). Our compilation targets execution in a network of sequential machines communicating over reliable FIFO channels with bounded memory. This execution model fits the GALS architecture. The class of Simulink programs we consider lie within the endochronous programs [13]. Our compilation process does no global scheduling computation. Thus if a block is changed, only the block itself needs to be re-compiled. On the other hand, our methods only preserve the synchronous semantic in the sense of the logical order of computation. It does not try to meet any real-time deadlines. The paper is organized as follows. Section II introduces synchronous systems and section III asynchronous ones. Section IV formulates the problem mathematically. Section V presents the map from synchronous to asynchronous systems, discusses its abstraction of Simulink, and proves the map preserves the synchronous semantic. Section VI presents the main theorem supporting the distribution of Simulink programs. Section VII presents our software architecture while VIII presents its performances. Section IX presents a coordinated traffic control system developed using the tools introduced in this paper. # II. SYNCHRONOUS SYSTEMS Several synchronous system formalisms exists in the literature. The basic idea behind all of them is of a system evolving through discrete steps. At every step all the variables are updated and they do not change values until the next step is taken. # A. STS and FSTS The Synchronous Transition System formalism, was introduced by Manna and Pnueli in [18] as follows: Definition 1. A Synchronous Transition System (STS) is a pair (P, B) where P is the set of I/O ports and state variables of the system, B is the set of traces admitted by the system; A trace is an infinite sequence of states and a state is a valuation of all the elements of P. If P is a set of ports, $\sigma(P)$ denotes a valuation of the ports in P and $\Lambda(P)$ denotes the set of possible valuations of the ports in P. This lightweight formalism is easy to handle set-theoretically but is not finitary, since the set of traces is not. Hence it cannot be input to an algorithm. In this paper the Finitary STS (FSTS) formalism is used. The FSTS is chosen to relate to Simulink. A system is described in term of input and output ports, and internal state variables. The evolution of the system is captured by a set of functions used to compute the output and update the state. # Definition 2. A Finitary Synchronous Transition System (FSTS) is a tuple $(S, I, O, \sigma_0, \psi_O, \psi_S, \prec)$ where: - 2.a. S is the finite set of state variables of the system; - 2.b. I is the finite set of input ports of the system. I and S are required to be disjoint. - 2.c. O is the finite set of output ports of the system. O and S are required to be disjoint. O and I are not necessarily disjoint (this is needed for feedback as illustrated in the second example in section II-B). - 2.d. $\sigma_0(S)$ is the initial valuation of the state variables. $\sigma_0(s)$ denotes the initial value of the variable $s \in S$ . - 2.e. $\Psi_O$ is a set of computable functions indexed by the output ports, used to compute the system outputs. $\psi_o$ denotes the function indexed by the output port o. Assume the functions to have the standard syntax of a term in first order logic (see [34]), where the symbols occouring are either function symbols, state variables symbols or input port symbols. $I_o$ denotes the set of input port symbols occurring in $\psi_o$ and $S_o$ the set of state variable symbols occurring in $\psi_o$ . $P_o \stackrel{def}{=} S_o \cup I_o$ . For example if $\psi_o \stackrel{def}{=} 3 * i_3 + s_7$ , where $i_3 \in I$ and $s_7 \in S$ then $I_o = \{i_3\}$ and $S_o = \{s_7\}$ . - 2.f. $\Psi_S$ is a set of computable functions indexed by the state variables, used to compute the next system state. $\psi_s$ denotes the function indexed by the state variable s. Assume the functions to have the standard syntax of a term in first order logic (see [34]), where the symbols occouring are either function symbols, state variable symbols or input port symbols. $I_s$ denotes the set of input port symbol occurring in $\psi_s$ and $S_s$ the set of state variable symbol occurring in $\psi_s$ . $P_s \stackrel{def}{=} S_s \cup I_s$ . - 2.g. $\prec$ is an acyclic partial order over $I \cup O$ expressing the causality relation between input and output ports. Assume for example that the output $o_i$ is the sum of the two imputs $i_1$ and $i_2$ . Then $o_i$ depends upon $i_1$ and $i_2$ , written $i_1 \prec o_i$ and $i_2 \prec o_i$ . If P is a set of ports then $\forall p \in P$ . $p \prec p'$ is written as $P \prec p'$ . The concepts $\prec$ and $I_p$ are linked: $\prec$ is defined as follows: $$(\alpha, \beta) \in \prec \Leftrightarrow (\exists \psi_p \in \Psi_O : \alpha \in I_p \land \beta = p) \tag{1}$$ In the following sections $P = S \cup O \cup I$ and $\Psi = \Psi_O \cup \Psi_S$ and suscripts are used when more than one FSTS are used (e.g. $\Psi_O^{s_1}$ refers to the set of output port functions of the FSTS $s_1$ ). A Simulink block can be described by its I/O ports, state variables and the function used to update them. Later we capture Simulink using FSTS to make it work in a distributed computing environment. Some examples are given in II-B. # B. FSTS examples Consider the simple Simulink system in figure 1. It is composed of a single gain block. It reads from the input port $i_1$ and outputs its value multiplied by two on the port $o_1$ . Fig. 1. A simple Simulink program This system can be described as an FSTS $(S, I, O, \sigma_0(S), \Psi_O, \Psi_S, \prec)$ : - $S = \emptyset$ , $I = \{i_1\}$ , $O = \{o_1\}$ , - $\sigma_0(S) = \emptyset$ , - $\Psi_S = \emptyset$ , $\Psi_O = \{\psi_{o_1} \stackrel{def}{=} 2 * i_1\}$ - $\prec = \{(i_1, o_1)\}$ Notice that for the example in figure $1 \ I \cap O = \emptyset$ . In the example in figure $2 \ I \cap O \neq \emptyset$ . It is a block that accepts two inputs $i_1$ and $i_2$ and has two outputs $o_1$ and $o_2$ . $o_1$ and $o_2$ are twice $i_1$ and $i_2$ respectively. As a result $o_2$ is four times $i_1$ . This system can be described as the FSTS $(S, I, O, \sigma_0(S), \Psi_O, \Psi_S, \prec)$ : - $S = \emptyset$ , $I = \{i_1, o_1\}$ , $O = \{o_1, o_2\}$ , - $\sigma_0(S) = \emptyset$ - $\Psi_S = \emptyset$ , $\Psi_O = \{ \psi_{o_1} \stackrel{def}{=} 2 * i_1, \psi_{o_2} \stackrel{def}{=} 2 * o_1 \}$ - $\leftarrow = \{(i_1, o_1), (o_1, o_2)\}\$ Fig. 2. A simple Simulink program with feedback # C. FSTS semantics The semantic is given in terms of traces. Given a set of variables V, $\sigma(V)$ denotes a valuation of them and $\Lambda(V)$ the set of possible value assumed by the variables in V. As for STS systems a trace is defined as follows: Definition 3. A **trace** is an infinite sequence of valuations of $S \cup I \cup O$ . The $i^{th}$ vector of valuations in a trace t is denoted $t_i$ , where $t_i \in \Lambda(S \cup I \cup O)$ . t|P denotes the projection of the trace t over the set of ports and/or variables P. Definition 4. Tuple satisfaction: given a trace t, the tuple $t_i$ satisfies the system s, denoted $s \models t_i$ , if the following holds: $$s \models t_i \quad \Leftrightarrow \quad (i = 0 \Rightarrow \forall s \in S : t_0 | s = \sigma_0(s)) \land$$ $$\forall p \in O : t_i | p = \psi_p(t_i | (I_p \cup S_p)) \land$$ $$\forall s \in S : t_{i+1} | s = \psi_s(t_i | (I_s \cup S_p))$$ Where the semantic of function application is assumed to have no side effect. Definition 5. Trace satisfaction: An FSTS system s admits a trace t (or equivalently the trace t satisfies the system s), written $s \models t$ , as follows: $$s \models t \Leftrightarrow \forall i \in \mathbb{N} \ s \models t_i$$ where $\mathbb{N}$ denotes the set of natural numbers including 0. If $\prec$ is acyclic each $t_i$ and a valuation of the inputs at time i+1 dictates an unique $t_{i+1}$ . On the contrary, if $\prec$ has a cycle, there may be zero or multiple possibilities for $t_{i+1}$ . Some authors have assumed out cycles ([12]), while others have looked for a fixed-point solution ([29]). In this paper we follow the first approach. Thus every FSTS is *input deterministic*, i.e. given an input there is only one possible behaviour. . # D. Compatible FSTS composition In this section a composition operator for FSTS is defined. Once again, this is choosen to include Simulink. A complex system is composed of subsystems with interconnected inputs and outputs ports. Not all systems can be composed. Definition 6. Two FSTS systems $s_1 = (S^{s_1}, I^{s_1}, O^{s_1}, \sigma_0^{s_1}(S^{s_1}), \Psi_O^{s_1}, \Psi_S^{s_1}, \prec^{s_1})$ and $s_2 = (S^{s_2}, I^{s_2}, O^{s_2}, \sigma_0^{s_2}(S^{s_2}), \Psi_O^{s_2}, \Psi_S^{s_2}, \prec^{s_2})$ are **compatible** if and only if: - 6.a. $O^{s_1} \cap O^{s_2} = \emptyset$ , - 6.b. $S^{s_1} \cap S^{s_2} = \emptyset$ , - 6.c. $S^{s_1} \cap (O^{s_2} \cup I^{s_2}) = \emptyset$ , - 6.d. $S^{s_2} \cap (O^{s_1} \cup I^{s_1}) = \emptyset$ , - 6.e. $I^{s_1} \cap I^{s_2} = \emptyset$ , - 6.f. $\prec_a \cup \prec_b$ is acyclic. The first condition ensures the two subsystems do not race to write the same output (this would introduce non-determinism). The second, third and fourth conditions ensure that state variables are local and not shared between components. The fifth condition ensures that every input is received by a unique subsystem and that one output cannot be read by more than one inputs (this is not a limitation as it can be seen in the fourth example in II-B). The last condition ensures the composed system does not have cyclic causal dependencies between variables. Definition 7. The **composition** $s_1 \times_{FSTS} s_2 = (S, I, O, \sigma_0(S), \Psi_O, \Psi_S, \prec)$ of two compatible FSTS is defined as follows: - 7.a. $I = (I^{s_1} \cup I^{s_2}),$ - 7.b. $P_O = (O^{s_1} \cup O^{s_2}),$ - 7.c. $P_S = (S^{s_1} \cup S^{s_2}),$ - 7.d. $\sigma_0(S) = (\sigma_O^{s_1}(S^{s_1}) \cup \sigma_O^{s_2}(S^{s_2})),$ - 7.e. $\Psi_O = \Psi_O^{s_1} \cup \Psi_O^{s_2}$ , - 7.f. $\Psi_S = \Psi_S^{s_1} \cup \Psi_S^{s_2}$ , - 7.g. $\prec = (\prec_a \cup \prec_b)$ . In the following sections $\times_{FSTS}$ is denoted with $\times$ when it will not cause confusion. Notice that $s_1 \times s_2$ is an FSTS because the compatibility hypothesis ensures there are no circular dependences between ports preserving input determinism. As defined, $\times_{FSTS}$ is a partial function over the FSTS set, i.e. it is defined only for compatible FSTS. Some examples are given in II-E # E. FSTS examples Consider the Simulink system in figure 3. The system is composed of two blocks similar to the one described in section II-B. Both multiply the input but they do so by different factors; Fig. 3. A Simulink program composed of multiple blocks The composition is: • $I = \{p_1, p_2\}, O = \{p_2, p_3\}, S = \emptyset$ • $\sigma_0(S) = \emptyset$ • $\Psi_S = \emptyset$ , $\Psi_O = \{\psi_{p_2} \stackrel{def}{=} 2 * p_1, \psi_{p_3} \stackrel{def}{=} (3 * p_2)\}$ • $\prec = \{(p_1, p_2), (p_2, p_3)\}$ The composed system has the expected semantic. It multiplies the input by 6. It may apear that the compatibility conditions as defined in 6 are too restrictive, ruling out systems where the output of a block is feeded to more than one subsystem. This is not the case as illustrated by the example in figure 4. Fig. 4. A Simulink program composed of multiple blocks The system has three subsystems. Two of them are the gain blocks described in the previous examples. The third one is the *duplicate* block that is formally described as: • $$I = \{i_1\}, O = \{o_a, o_b\}, S = \emptyset$$ • $$\sigma_0(S) = \emptyset$$ $$\bullet \ \Psi_S = \emptyset, \, \Psi_O = \{ \psi_{o_a} \stackrel{def}{=} i_1, \psi_{o_b} \stackrel{def}{=} i_1 \}$$ • $$\prec = \{(i_1, o_a), (i_1, o_b)\}$$ The composition of the three block is described with the following FSTS: • $$I = \{i_1, o_a, o_b\}, O = \{o_a, o_b, o_1, o_2\}, S = \emptyset$$ - $\sigma_0(S) = \emptyset$ - $\Psi_S = \emptyset$ , $\Psi_O = \{ \psi_{o_a} \stackrel{def}{=} i_1, \psi_{o_b} \stackrel{def}{=} i_1, \psi_{o_1} \stackrel{def}{=} 3 * o_a, \psi_{o_2} \stackrel{def}{=} 3 * o_b \}$ - $\prec = \{(i_1, o_a), (i_1, o_b), (o_a, o_1), (o_b, o_2)\}$ # F. Properties of FSTS composition Next we state two simple propositions. The propositions merely assert our FSTS formalism has the usual properties of other STS formalisms in the literature. **Proposition II.1.** (FSTS, $\times_{FSTS}$ ) is a commutative monoid, with the identity element being the empty FSTS. *Proof:* Follows from the associativity and commutativity of the union operator and by the fact that the identity element of the union operator is the empty set. **Proposition II.2.** Given two FSTS $s_1$ and $s_2$ , $$s_1 \times_{FSTS} s_2 \models t \Leftrightarrow s_1 \models t | P^{s_1} \wedge s_2 \models t | P^{s_1}$$ Proof: We first prove $\Rightarrow$ by contradiction. Assume that: $$s_1 \times_{FSTS} s_2 \models t \land$$ $(s_1 \not\models t | P^{s_1} \lor s_2 \not\models t | P^{s_2})$ It follows by the definition (5) of trace satisfaction, that: $$\forall i \in \mathbb{N}, \quad s_1 \times s_2 \models t_i \ \land \tag{2}$$ $$\exists i \in \mathbb{N} \quad s_1 \not\models t_i | p^{s_1} \lor s_2 \not\models t_i | p^{s_2} \tag{3}$$ Now pick the smalles i for which (3) holds. There are two possible cases. Either i = 0 or i > 0. Case i > 0: By definition (4) of tuple satisfiability and by (2) it follows that $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ such that: $$t_i|p = \psi_p^{s_1 \times s_2}(t_i|P_p^{s_1 \times s_2}) \quad \text{if } p \in O^{s_1 \times s_2}$$ $$t_i|p = \psi_p^{s_1 \times s_2}(t_{i-1}|P_p^{s_1 \times s_2}) \quad \text{if } p \in S^{s_1 \times s_2}$$ (4) By definition (4) of tuple satisfiability and by (3) it follows that $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ such that: $$t_{i}|p \neq \psi_{p}^{s_{1}}(t_{i}|P_{p}^{s_{1}}) \quad \text{if } p \in O^{s_{1}}$$ $$t_{i}|p \neq \psi_{p}^{s_{1}}(t_{i-1}|P_{p}^{s_{1}}) \quad \text{if } p \in S^{s_{1}}$$ $$t_{i}|p \neq \psi_{p}^{s_{2}}(t_{i}|P_{p}^{s_{2}}) \quad \text{if } p \in O^{s_{2}}$$ $$t_{i}|p \neq \psi_{p}^{s_{2}}(t_{i-1}|P_{p}^{s_{2}}) \quad \text{if } p \in S^{s_{2}}$$ $$(5)$$ For the minimal i pick a minimal port for which conditions (2-3) hold with respect to $\prec_{s_1 \times s_2}$ . Denote this minimal port by p. We assume that $p \in P_O^{s_1 \times s_2}$ , the case $p \in P_S^{s_1 \times s_2}$ has a similar proof. By definition of FSTS composition it follows that either $p \in O^{s_1}$ or $p \in O^{s_2}$ . Assume that $p \in O^{s_1}$ . The proof for $p \in O^{s_2}$ is the same up to a change of superscipt. Now: $$t_i|p = \psi_p^{s_1 \times s_2}(t_i|P_p^{s_1 \times s_2}) \quad \text{from (4)}$$ $$= \psi_p^{s_1}(t_i|P_p^{s_1}) \quad \text{by def. of FSTS comp.}$$ (6) But this contradict (5). Case i=0: By definition (4) of tuple satisfiability and by (2) the following must hold: $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ . $$t_i | p = \sigma_0^{s_1 \times s_2}$$ if $p \in S^{s_1 \times s_2}$ (7) $t_i | p = \psi_p^{s_1 \times s_2}(t_i | P_p^{s_1 \times s_2})$ if $p \in O^{s_1 \times s_2}$ By definition (4) of tuple satisfiability and by (3) the following must hold: $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ . $$t_{i}|p \neq \sigma_{0}^{s_{1}}(p) \quad \text{if } p \in S^{s_{1}}$$ $$t_{i}|p \neq \psi_{p}^{s_{1}}(t_{i}|P_{p}^{s_{1}}) \quad \text{if } p \in O^{s_{1}}$$ $$t_{i}|p \neq \sigma_{0}^{s_{2}}(p) \quad \text{if } p \in S^{s_{2}}$$ $$t_{i}|p \neq \psi_{p}^{s_{2}}(t_{i}|P_{p}^{s_{2}}) \quad \text{if } p \in O^{s_{2}}$$ (8) For i=0, pick a minimal port for which the above conditions hold with respect to $\prec_{s_1\times s_2}$ and denote it p. If $p\in O^{s_1\times s_2}$ , we can follow the same proof as the previous case. Therefore let $p\in S^{s_1\times s_2}$ . Assume that $p\in P^{s_1}_S$ . The proof for the case $p\in S^{s_2}$ is the same up to a change of superscript. By definition of FSTS composition, given the assumption $p \in S^{s_1}$ , from (7) follows that: $$t_i|p = \sigma_0^{s_1}(p).$$ But this contradicts (8). We now prove the second implication $\Leftarrow$ by contradiction. Assume that: $$s_1 \times_{FSTS} s_2 \not\models t \land$$ $$(s_1 \models t | P^{s_1} \land s_2 \models t | P^{s_2})$$ It follows by the definition (5) of trace satisfaction that: $$\forall j \in \mathbb{N} \quad s_1 \models t_j | p^{s_1} \wedge s_2 \models t_j | p^{s_2} \wedge \tag{9}$$ $$\exists i \in \mathbb{N} \quad s_1 \times s_2 \not\models t_i \tag{10}$$ Now pick i to be the smallest number for which (10) holds. There are two possible cases. Either i = 0 or i > 0. Case i > 0: By definition (4) of trace satisfaction and by (9) it follows that: $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ . $$t_{i}|p = \psi_{p}^{s_{1}}(t_{i}|P_{p}^{s_{1}}) \quad \text{if } p \in O^{s_{1}}$$ $$t_{i}|p = \psi_{p}^{s_{1}}(t_{i-1}|P_{p}^{s_{1}}) \quad \text{if } p \in S^{s_{1}}$$ $$t_{i}|p = \psi_{p}^{s_{2}}(t_{i}|P_{p}^{s_{2}}) \quad \text{if } p \in O^{s_{2}}$$ $$t_{i}|p = \psi_{p}^{s_{2}}(t_{i-1}|P_{p}^{s_{2}}) \quad \text{if } p \in S^{s_{2}}$$ $$(12)$$ By definition (5) of trace satisfaction and by (10) it follows that: $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ . $$t_{i}|p \neq \psi_{p}^{s_{1} \times s_{2}}(t_{i}|P_{p}^{s_{1} \times s_{2}}) \quad \text{if } p \in O^{s_{1} \times s_{2}}$$ $$t_{i}|p \neq \psi_{p}^{s_{1} \times s_{2}}(t_{i-1}|P_{p}^{s_{1} \times s_{2}}) \quad \text{if } p \in S^{s_{1} \times s_{2}}$$ (13) For the minimal i for which conditions (10) holds, pick a minimal port for which conditions (9-10) hold with respect to $\prec_{s_1 \times s_2}$ . Denote this minimal port as p. We assume that $p \in O^{s_1 \times s_2}$ . The case $p \in S^{s_1 \times s_2}$ has a similar proof. By definition of FSTS composition it follows that either $p \in O^{s_1}$ or $p \in O^{s_2}$ . Assume that $p \in O^{s_1}$ (the proof for $p \in O^{s_2}$ is the same up to a change of superscipt). Now: $$t_i|p \neq \psi_p^{s_1 \times s_2}(t_i|P_p^{s_1 \times s_2}) \quad \text{from (13)}$$ $$= \psi_p^{s_1}(t_i|P_p^{s_1}) \quad \text{by def. of FSTS comp.}$$ (14) But this contradicts (11). Case i=0: By definition (5) of tuple satisfiability and by (10) follows that: $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ . $$t_{i}|p = \sigma_{0}^{s_{1}}(p) \quad \text{if } p \in S^{s_{1}}$$ $$t_{i}|p = \psi_{p}^{s_{1}}(t_{i}|P_{p}^{s_{1}}) \quad \text{if } p \in O^{s_{1}}$$ $$t_{i}|p = \sigma_{0}^{s_{2}}(p) \quad \text{if } p \in S^{s_{2}}$$ $$t_{i}|p = \psi_{p}^{s_{2}}(t_{i}|P_{p}^{s_{2}}) \quad \text{if } p \in O^{s_{2}}$$ (15) $\exists p \in (O^{s_1} \cup O^{s_2} \cup S^{s_1} \cup S^{s_2})$ . $$t_i|p \neq \sigma_0^{s_1 \times s_2}(p) \qquad \text{if } p \in S^{s_1 \times s_2}$$ $$t_i|p \neq \psi_p^{s_1 \times s_2}(t_i|P_p^{s_1 \times s_2}) \quad \text{if } p \in O^{s_1 \times s_2}$$ $$(16)$$ For i=0 pick a minimal port for which the above conditions hold with respect to $\prec_{s_1\times s_2}$ . Denote this port with p. If $p\in O^{s_1\times s_2}$ , we can follow the same proof as the previous case. Therefore $p\in S^{s_1\times s_2}$ . Assume that $p\in S^{s_1}$ . The proof for the case $p\in S^{s_2}$ is the same up to a change of superscript. By definition of FSTS composition, given the assumption $p\in S^{s_1}$ , from (16) follows that: $$t_i|p \neq \sigma_0^{s_1}(p)$$ but this contradict (15). This conclude the proof. # III. ASYNCHRONOUS SYSTEMS There are many asynchronous system formalisms in the literature. One of them is the asynchronous version of STS, called the Asynchronous Transition System (ATS) model, introduced by Benvenieste in [3]. In ATS an asynchronous system is a couple $(P_a, B_a)$ where $P_a$ is the set of I/O ports and $B_a$ the set of the possible behaviors. A behavior is an infinite sequence of valuations and a valuation is a couple (port number, value). Again the simplicity of the model makes it easy to handle it set-theoretically, but we seek a finitary formalism to be input to an algorithm. Instead we use automata augmented with queue variables. We call them Reactive Automata (RA). A reactive automaton is a labeled finite automaton communicating through shared variables. It is a discrete version of the IO-automata described in [19] augmented with communication ports. $\mathbb V$ denotes the set of variables, $\mathbb P$ the set of ports and for any port p in $\mathbb P$ , $\beta(p)$ is the bound (maximum capacity) of the queue p. Formally an RA is a tuple $(L, l_0, V, \sigma_0(V), P_I, P_O, T)$ where - L is a finite set of locations of the automaton; - $l_0$ is the initial location, $l_0 \in L$ ; - V is a finite set of variables read and written only by the RA; - $\sigma_0(V)$ is the initial value of the state variables; - P<sub>I</sub> is a finite set of communication ports, considered as environmental queues read by this RA; - $\bullet$ $P_O$ is a finite set of communication ports, considered as environmental queues, written by this RA; - T is a finite set of labeled transitions of the form $(l_i, l_f, (c, A))$ where $l_i, l_f \in L, c$ is a boolean condition over the values of the elements in V. A is defined by the following grammar: $$A \rightarrow ?p(v)$$ where $p \in P_I$ and $v \in V$ $$A \rightarrow !p(v)$$ where $p \in P_O$ and $v \in V$ $A \to v := f(V_1)$ where $v \in V$ , $V_1 \subseteq V$ , $f \in \mathbb{F}(V_1)$ is the set of functions with the standard syntax of a term in first order logic (see [34]), where the symbols occouring are either function symbols or variable symbols in $V_1$ . In the following sections P denotes the set $P_I \cup P_O$ . An example of an RA is given in figure 5 and is fomalized as the following RA: $$(\{W, P, S\}, W, \{v_1, v_2\}, \{0, 0\}, \{input\}, \{output\},$$ $$\{(W, P, True, ?input(v_1), (P, S, True, v_2 := v_1 + 1), (S, W, True, !output(v_2)\})$$ Fig. 5. A simple reactive automaton # A. RA semantic The semantic of an RA is in terms of runs and traces. Definition 8. A **run** of a Reactive Automaton is an infinite sequence of (location, variables valuation, transition, ports valuation) tuples. The actions are reads denoted p(v), writes denoted p(v), computations denoted v := f(V), and the silent action denoted $\epsilon$ . The silent action is introduced to denote the reception of data in a input queue due to an action of the environment. A transition with an input action removes the element at the head of an input port and writes it to an internal state variable, while a transition with an output action adds the value of a variable to the tail of an output port. Definition 9. A reactive automaton **trace** is a tuple, where each element of the tuple is an infinite sequence of valuations for a particular variable of the reactive automaton. The $i^{th}$ valuation of a variable v in a trace t is denoted by $(t|v)_i$ . The following is a representation of the initial part of a run of the RA in figure 5 for the input port valuation $\{1\}$ and the output port valuation $\emptyset$ : $$(W, (0,0), True \rightarrow ?Input(v_1), (<1>, \emptyset)),$$ $(P, (1,0), True \rightarrow v_2 := v_{1+1};, (\emptyset, \emptyset)),$ $(S, (1,2), True \rightarrow !Output(v_2), (\emptyset, \emptyset)), (W, (1,2), -, (\emptyset, <2>)), ...$ where W, P and S are the wait for input, Process Input and Send Output location respectively and the second element is a valuation of $v_1$ and $v_2$ , and the third element is a valuation for the two ports Input and Output. Thus mathematically a run is a sequence of tuples like the one above. The $i^{th}$ tuple in a run r is denoted by $r_i$ and its element are extracted using projection, for example $r_i|location$ denotes the location element of the tuple $r_i$ . Given a run, the **associated trace** can be computed by examining the update action on every variable of the RA, i.e. the $i^{th}$ element of the sequence associated with the state variable v is given by the $i^{th}$ update on that variable. A variable v can be updated in two possible ways: because of a read action p(v), or because of a computation action v := f(V). Given a RA run $v = (r_0, r_1, r_2, ...)$ , p(t|v) is computed extracting a sequence p(v) from v such that for all v is an update action for v and for all v is not an update action for v. An update action for v is an input action on the form v is an update of For the previous run, the associated trace is <(0,1,...),(0,2,...)> where the first and the second sequences are the successive valuations of $v_1$ and $v_2$ respectively. Definition 10. **Tuple satisfaction**: Given a reactive automaton run r, we say that the tuple $r_i$ satisfies a RA w, denoted $w \models r_i$ iff the following holds: $$(i = 0 \Rightarrow (r_0|_{location} = l_0 \land r_0|V = \sigma_0(V) \land \forall p \in P_0 \ r_0|p = \emptyset)) \land$$ $$(r_i|_{action} = \epsilon \Rightarrow \forall v \in Vr_i|v = r_{i+1}|v \land$$ $$\forall p \in P_O \ . \ (r_i|p = r_{i+1}|p \lor r_{i+1}|p = tail(r_i|p)) \land$$ $$\forall p \in P_I \ . \ (r_i|p = tail(r_{i+1}|p)) \lor (r_i|p = r_{i+1}|p)) \lor$$ $$(\exists (s, s', (c, a)) \in T \Rightarrow r_i|_{location} = s \land r_{i+1}|_{location} = s' \land$$ $$c \models r_i|(V \cup P) \land r_{i+1}|(V \cup P) = act(a, r_i|(V \cup P)))$$ Observe that the values of a port may change value without any input or output by the component, by its environment, simulating the reception of a message through that port, through an $\epsilon$ -transition. At the same time, by the definition of act in the next paragraph, input actions on empty input ports and output actions on full output ports are not defined. Hence input and output actions are blocking. Assume for now that $P_I \cup P_O = \{p_1, ..., p_m\}$ and that $V = \{v_1, ..., v_n\}$ . Then the function act is defined as follows: $$act(a, \sigma(p_1), ..., \sigma(p_m), \sigma(v_1), ..., \sigma(v_n)) =$$ $$\begin{cases} (\sigma(p_1),..,\sigma(p_m),\\ \sigma(v_1),..,\sigma(v_{j-1}),\sigma(f)(\sigma(v_{i_1}),..,\sigma(v_{i_k})),\sigma(v_{j+1}),..,\sigma(v_n)) & \text{if } a="v_j:=f(v_{i_1},..,v_{i_k})"\\ (\sigma(p_1),..,\sigma(p_{j-1}),push(\sigma(v_i),\sigma(p_j)),\sigma(p_{j+1}),..,\sigma(p_m),\\ \sigma(v_1),..,\sigma(v_n)) & \text{if } a="!p_j(v_i)" \wedge \neg full(\sigma(p_j))\\ (\sigma(p_1),..,\sigma(p_{j-1}),tail(\sigma(p_j)),\sigma(p_{j+1}),..,\sigma(p_m),\\ \sigma(v_1),..,\sigma(v_{i-1}),head(\sigma(p_j)),\sigma(v_{i+1}),..,\sigma(v_n)) & \text{if } a="?p_j(v_i)" \wedge \neg empty(\sigma(p_j)) \end{cases}$$ denotes the variable and port valuation. The function $full,empty,head,tail$ and $push$ are the standard where $\sigma(.)$ denotes the variable and port valuation. The function full, empty, head, tail and push are the standard operations over bounded size queues. Assume the semantic of function application to be the same used in the case of FSTS. In particular, a function evaluation has no side effects. Definition 11. Run satisfaction: A run r satisfies a reactive automaton w, denoted $w \models r$ iff: $$\forall i \in \mathbb{N} \ w \models r_i$$ Definition 12. Trace satisfaction: A trace t satisfies a RA w, denoted $w \models t$ iff there is a run r such that $w \models r$ and t is associated to r. We now define a composition operator $\times_{RA}$ for reactive automata. Definition 13. Given two reactive automata $(L^1, l_0^1, V^1, \sigma_0^1(V^1), P_I^1, P_O^1, T^1)$ and $(L^2, l_0^2, V^2, \sigma_0^2(V^2), P_I^2, P_O^2, T^2)$ they are **compatible** if the following condition hold: $$V^1 \cap V^2 = \emptyset \ \land \ P_O^1 \cap P_O^2 = \emptyset \ \land \ P_I^1 \cap P_I^2 = \emptyset.$$ The first conjunct requires the variables of each RA to be local. The last two say that two distinct automata cannot write the same port or read the same port. Definition 14. Reactive automaton composition: Given two compatible reactive automata $w_1 = (L^1, l_0^1, V^1, \sigma_0^1(V^1), P_I^1, P_O^1, T^1)$ and $w_2 = (L^2, l_0^2, V^2, \sigma_0^2(V^2), P_I^2, P_O^2, T^2)$ Their composition $w_1 \times_{RA} w_2$ is defined as the automaton $(L, l_0, V, \sigma_0(V), P, T)$ where: - 1) $L = \bigcup_{l_1 \in L^1, l_2 \in L^2} \{\{(w_1, l_1), (w_2, l_2)\}\}$ - 2) $l_0 = \{(w_1, l_0^1), (w_2, l_0^2)\}$ - 3) $V = V^1 \cup V^2$ - 4) $\sigma_0(V) = \sigma_0(V)^1 \cup \sigma_0(V)^2$ - 5) $P_I = (P_I^1 \cup P_I^2)$ - 6) $P_O = (P_O^1 \cup P_O^2)$ - 7) $T = \{(s, d, c, a) | ((s|L_1, d|L_1, c, a') \in T^1) \land (s|L_2 = d|L_2)) \lor ((s|L_2, d|L_2, c, a) \in T^2) \land (s|L_1 = d|L_1)) \}$ This is an interleaving of the executions of the two original automata. **Lemma III.1.** $(RA, \times_{RA})$ is a commutative monoid, with the identity element being the empty RA. *Proof:* Follows from the associativity and commutativity of the union operator and by the fact that the identity element of the union operator is the empty set. $\prod_{w \in W} w$ denotes an n-ary composition of RA's. Lemma (III.1) shows this is well-defined as the usual extension of the binary operator $\times_{RA}$ . Definition 15. Given a run w of the automaton $\prod_{w \in W} w$ , the projection of the product to one of the factors $w \in W$ is formally defined as follows: $\forall i \in \mathbb{N} . (r|w)_i|_{location} = l \wedge (w, l) \in (r_i|_{location}) \wedge$ - . $\forall v \in V^w . (r|_w)_i|_v = (r_i|_v) \land$ - $(r_i|_{transition}) \in w \Rightarrow (r|_w)_i|_{transition} = (r_i|_{transition}) \land$ - $(r_i|_{transition}) \notin w \Rightarrow (r|_w)_i|_{transition} = \epsilon \land$ - . $\forall p \in (P_O^w \cup P_I^w) \cdot (r|_w)_i|_p = (r_i|_p)$ Every tuple $r_i$ of the run of the product is projected to the variables and locations of w and the tuple with transition not belonging to w|T are replaced with a silent transition. **Lemma III.2.** Given two compatible reactive automata $w_1$ and $w_2$ and given a run r of their composition, the following holds: $$(w_1 \times w_2 \models r) \Rightarrow (w_1 \models r | w_1 \land w_2 \models r | w_2)$$ *Proof:* Follows from the observation that every $r_i$ in r belongs to $r|w_1$ or to $r|w_2$ . This is so because the transition in each tuple belongs to one of the two automata or it is an $\epsilon$ action. If the action belongs to $r|w_1$ , by definition of RA composition, it does not modify the location, variables or output ports of $w_2$ and viceversa. RA can be easily compiled to run on a sequential machine. A product of reactive automata could be compiled in a few ways. The composition can be carried out generating a third automaton, or the two original automata can be run in parallel as long as the following hyphothesis (embedded in our definition of satisfaction) holds: **Hyphothesis III.3.** The communication queues are FIFO queues, the values are not lost and their order is maintained. In the second approach the composition can be implemented within a single machine between processes using monitors and semaphors (see [30]), as well as with 3-way handshakes protocols over a network (see [31]). This means we can compose RAs located at different sites across networks. In section (VII) we will explore an approach that takes full advantage of the distribution of the code (maximising pipeline gain). # IV. PROBLEM STATEMENT Given the definition of FSTS and RA in the previous sections, we can now formally define our problem. Figure 3 illustrates the research program. First we need to find a way to associate RA and FSTS traces, that is to say we need a trace map $\chi : \mathbb{T}_{RA} \to \mathbb{T}_{FSTS}$ where $\mathbb{T}_{RA}$ and $\mathbb{T}_{STS}$ are the set of traces of STS and RA respectively. In [13] the following definition of $\chi$ is given: Definition 16. $$t' = \chi(t) \Leftrightarrow \forall i \in \mathbb{N} \ \forall v \in V \ . \ (t|_v)_i = t'_i|_v$$ We need to find a way to implement FSTS as RA while preserving the synchronous semantic, that is to say we need to find an implementation map $\phi : \mathbb{F}STS \to \mathbb{R}A$ such that the following holds: $$\forall w \in RA \ \forall s \in FSTS \ . \ w = \phi(s) \Rightarrow (r \models t \Leftrightarrow s \models \chi(t)) \tag{17}$$ If this holds then $\phi$ maps a synchronous system into an asynchronous system while preserving the synchronous semantic. It has been proved in [13] that for the set of *endochronous* programs such a $\phi$ exists. In section V we define a $\phi$ for the class of FSTS. So far we have just obtained what a Simulink compiler does, or what is done in [4]. Given such maps we can now formulate our problem (like [13]) as follows: we seek a composition operator $\times_{RA}$ such that, for any two FSTS $s_1$ and $s_2$ and RA $w_1$ and $w_2$ , the following holds: $$w_1 = \phi(s_1) \land w_2 = \phi(s_2) \Rightarrow$$ $$(w_1 \times_{BA} w_2 \models t \Leftrightarrow s_1 \times_{STS} s_2 \models \chi(t))$$ $$(18)$$ If this holds and if the composition operator $\times_{RA}$ can be implemented across a network then this constitutes a way to distribute the synchronous system $s_1 \times_{STS} s_2$ across a network while preserving its synchronous semantic. It has been proved in [13] that when the pair $(s_1, s_2)$ is *isochronous* than such an operator exists. In section VI we prove that property (18) holds if the two synchronous system are *compatible* (as defined in section II). Thus we claim $\phi$ is a monomorphism between (FSTS, $\times_{FSTS}$ ) and (RA, $\times_{RA}$ ). Fig. 6. A graphical representation of property (18) # V. IMPLEMENTATION OF FSTS SYSTEMS In this section $\phi$ , a mapping of FSTSs into RAs is given. It is then proven that the $\phi$ satisfies(17). $\phi$ is defined by the following algorithm: # Algorithm $\Phi$ Inputs: an FSTS $s=(S, I, O, \sigma_0(S), \psi_O, \psi_S, \prec)$ ``` Outputs: An RA r = (L, l_0, V, \sigma'_0(V), P_I, P_O, T) that implements the input system 1 \quad P_I := \{ p_j | j \in I \backslash O \} 2 P_O := \{p_i | j \in O \setminus I\} 3 \quad V = I \cup O \cup S 4 \quad \forall i \in (I \cup O) \ . \ \sigma'_0(i)| = 0 5 \forall j \in S : \sigma'_0(j) = \sigma_0(j) l_0 := l_{root} 7 \hspace{0.5cm} (N,E) := CG( \prec | (I \cup O), (I \cup O), root, leaf) 8 For all n \in N add l_n in L 9 For all (n, n', j) \in E do if j \in (I \backslash O) then do 10 add (l_n, l_{n'}, (true, ?p_j(j))) to T 11 12 od if j \in (O \backslash I) then do 13 14 add l_{n,i} in L add (l_n, l_{n,j}, (true, j := \psi_j(V|P_j))) to T 15 add (l_{n,j}, l_{n'}, (true, !p_j(j))) to T 16 17 od if j \in (O \cap I) then do 18 add (l_n, l_{n'}, (true, j := \psi_j(V|P_j))) to T 19 20 od 21 od 23 Let < be any linearization of | |_S 24 (N, E) := CG(<), (S), leaf, root) 25 For all n \in N add l_n in L 26 For all (n, n', j) \in E do add (l_n, l_{n'}, (true, j := \psi_j(V|P_i)) to T 27 28 od ``` # Algorithm CG (Compute Graph) ``` Input: (\prec, P, root, leaf) where \prec is a partial order over a set P, the set P, and two labels root, leaf Output: A graph (Nodes, Edges) 1 Nodes := \{root, leaf\} Edqes := \emptyset 3 % max-int is a global variable that holds the highest integer used to label a node counter := max - int + 1 \forall linearization w = (w_1, w_2, ..., w_m) of \prec in P do 5 pointer = root For all i \in [1, m] do 6 7 if (pointer, n, w_i) \in Edges do pointer = n 8 else do 9 add n_{counter} to Nodes 10 add (pointer, n_{counter}, w_i) to Edges 11 pointer := n_{counter} 12 counter + + 13 od 14 od 15 od ``` The algorithm is guaranted to terminate for every FSTS. All the for loops terminate in finitely many steps because the set of variables and ports of an FSTS is finite. If $\prec$ is not acyclic then the algorithm cannot be applied because $\prec$ would not be linearizable. Some lemmas are now proved. Replace the sinks in *Nodes* and *Edges* with *leaf* **Lemma V.1.** Compute $Graph(\prec, P, root, leaf)$ produces an acyclic graph with source, named root, and sink, named leaf. Every path in the graph from source to sink has one and only one edge labelled with an element of P. Moreover if $p' \prec p$ and $\{p, p'\} \subseteq P$ then the edge labelled p' appears before the one labelled p in every path from root to leaf. *Proof:* Every time an edge is added (on line 11), it does not create a loop because it connects an existing node to a new one. Line 17 does not create any loop since it flattens all the sinks into a single sink. Therefore the graph is acyclic, it has a source *root* and a single sink *leaf*. By construction every path corresponds to a linearization of $\prec$ in V Therefore an element p' of P appears as a label only once in a path and it appears before all the p for which $(p', p) \in \prec$ . **Lemma V.2.** For all w in $\phi[FSTS]$ and every infinite run r of w, r visits the location $l_{leaf}$ and $l_{root}$ infinitely often. **Proof:** Proof: The automaton generated by algorithm $\phi$ are obtained linking two graphs generated by ComputeGraph, so that the source of one is the sink of the other. The only nodes shared by the two graphs are root and leaf. Each graph is acyclic, has finitely many states, one source and one sink by lemma (V.1). Since every run correspond to an infinite length path in the combined graph and the two graphs are acyclic $l_{leaf}$ and $l_{root}$ are visited an infinite amount of times. From lemma (V.2), we see that any run $\mathbf{r} = \langle r_0, r_1, r_2, ... \rangle$ of an RA in $\phi[FSTS]$ has an infinite subsequence $\langle r_{i_0}, r_{i_1}, r_{i_2}, ... \rangle$ such that $\forall k \in \mathbb{N} \ r_{i_k} | location = l_{leaf}$ and $\forall k \in \mathbb{N} \ r_i \neq r_{i_k} \Rightarrow r_i | location \neq l_{leaf}$ . Thus we can write $\mathbf{r}$ equivalently as $\mathbf{r} = \langle u_0, u_1, u_2, ... \rangle$ where $u_0 = \langle r_0, ..., r_{i_0} \rangle, u_1 = \langle r_{i_0+1}, ..., r_{i_1} \rangle, u_2 = \langle r_{i_1+1}, ..., r_{i_2} \rangle$ and so on. We call these $u_i$ 's cycles. We can also define the function cycle(r, n), for a run r and $n \in \mathbb{N}$ as $cycle(r, n) = r_{i_n} | V$ , i.e. as the valuation of V at the $n^{th}$ visit to $l_{leaf}$ , where V is the set of variables of the RA. In the following, the initialization of the state variables is considered the $0^{th}$ write of the variables. **Lemma V.3.** Let $w = \phi(s)$ . In every cycle all the input ports of w are read once and only once. Similarly all the output ports and all the variables of w are written once and only once every cycle. If $v' \prec v$ in s then v' is written before v in every cycle of w. *Proof:* Proof: Follows from lemma (V.1) and the definition of algorithm $\phi$ . **Lemma V.4.** For a given run r of an RA $\phi(s) \in \phi(FSTS)$ , let t be the associated trace. Then the following holds $\forall i \in \mathbb{N} \ \forall v \in V \ . \ (t|_v)_i = cycle(r,i)|v$ . Moreover: $$\begin{split} (\forall i \in \ \mathbb{N} : \ \forall v \in O \ . \ (t|_v)_i &= cycle(r,i)|v = \psi_v(\chi(t)_i|I_v \cup S_v) \wedge \\ \forall v \in S \ . \ (t|_v)_{i+1} &= cycle(r,i)|v = \psi_v(\chi(t)_i|I_v \cup S_v)) \wedge \\ \forall v \in S \ . \ \ (t|_v)_0 &= cycle(r,0)|v = \sigma_0(v) \end{split}$$ *Proof:* By lemma (V.3) in every cycle a variable is written once and only once before hitting $l_{leaf}$ . When a run hits the location $l_{leaf}$ for the $i^{th}$ time, all the variables have been written exactly i times. Write actions are introduced by $\phi$ in lines 15, 19 and 23. Every write to v is $\psi_v$ applied to t|V. By lemma (V.3) we then get $(t|_v)_i = \psi_v((t|_{V_v})_i)$ . The result then follows by definition (16) of $\chi$ . The first theorem stated below asserts algorithm $\phi$ constructs an RA implementing of an FSTS while preserving its semantics in the sense of $\chi$ (the proof is in the appendix). **Theorem V.5.** Algorithm $\phi$ satisfies property (17), i.e. $$\forall w \in RA \ \forall s \in FSTS \ . \ w = \phi(s) \Rightarrow (r \models t \Leftrightarrow s \models \chi(t))$$ *Proof:* First the left to right $(\Rightarrow)$ implication is proved by contradiction. Assume that the implication does not hold. Then the following must hold: $$\exists s \in FSTS, \ \exists t \in \Gamma, \ w = \phi(s) \in RA. \ w \models t \land s \not\models \chi(t)$$ where $\Gamma$ is the set of traces of w. Let $s = (S, I, O\sigma_0(S), \Psi_O, \Psi_S, \prec)$ . Since $s \not\models \chi(t)$ , by definition of FSTS satisfiability, the following must hold: $$\exists i \in \mathbb{N} . s \not\models \chi(t)_i$$ By definition 4 and 5 of FSTS satisfaction it follows that: $\exists i \in \mathbb{N} . \exists v \in (O \cup S)$ . $$v \in O \Rightarrow \chi(t)_i | v \neq \psi_v(\chi(t)_i | I_v \cup S_v) \land$$ (19) $$v \in S \land i > 0 \Rightarrow \chi(t)_i | v \neq \psi_v(\chi(t_{i-1}) | I_v \cup S_v)$$ $$v \in S \land i = 0 \Rightarrow \chi(t)_i | v \neq \sigma_0(v)$$ (20) Assume $v \in O^s$ (the proof for the case $v \in S^s$ is similar). Since by hyphotesis $w \models t$ , by lemma (V.4) the following holds: $$\forall i \in \mathbb{N} . \forall v \in O . t_i | v = \psi_v(\chi(t)_i |_{I_v \cup S_v})$$ (21) Pick i to be the minimal for which (19) holds. Pick then v to be one of the minimal (with respect to $\prec^s$ ) variables for which (19) holds. It then follows from (19) and (21) that: $$\chi(t)_i | v \neq t_i | v$$ But this contradict definition (16) of $\chi$ . Hence the first implication of the theorem holds. The proof for the right to left ( $\Leftarrow$ ) is now given. It is shown that for any trace t' of an FSTS s there is a run r of $\phi(s)$ with associated trace t such that $t' = \chi(t)$ . Consider the run r constructed cycle by cycle as follows. Fix any linearization of $\prec |_{O \cup I}$ . This linearization correspond to an unique path from $l_{root}$ to $l_{leaf}$ , where each edge label updates a variable in the order given by the linearization. Extend the linearization with the total order defined on line (22) of $\phi$ so that the state variables in S follow all the others in the order. This order fixes now a unique cycle from $l_{root}$ back to itself, where by lemma (V.3) each variable is updated once and only once and in the order given by this extended linearization. Coonsider the run starting from location $l_0$ with all the variables initialized to $\sigma_0(V)$ . At the begining of every cycle, through through $\epsilon$ transition, all the output ports are emptied and all the external inputs are supplied. Then the run goes through the cycle identified by the selected linearization. The value written in each ports $p_v$ in the $i^{th}$ cycle is $(t_i')|_v$ . The value written in each variable v in the $i^{th}$ cycle is $(t_i')|_v$ . This run is associated by construction to a trace t such that $t' = \chi(t)$ . We need to show that it satisfies w. By lemma (V.4) the values of the variables and of the ports are the ones satisfying w. It is left to show that that w would not deadlock at any point of the run. All the reactive automata generated through $\phi$ have no sink states, and since all their transitions have only true guards, there is always a transition enabled. This means that the execution of w can be blocked only on a read from an empty input queue or a write on a full output queue. The external input ports are written at the beginning of each cycle of the constructed run and, by lemma (V.3), the queue is then read once and only once so there are no blocking reads on an external input queue. At the end of the cycle the queue is empty preventing writes on full queues at the beginning of the next cycle. The external output ports are emptied at the beginning of each cycle of the constructed run and, by lemma (V.3), they are written once and only once per cycle. Hence there are no blocking writes on external output queues. This conclude our proof. A Simulink program goes through the following phases (see figure 7): it starts in the initialization phase computing sample times and parameters, determining the block execution order and allocating memory. Then the loop phase starts, where the following steps are repeated: read the input (input step), compute the output and propagate it (output step) and update the state (state step). Least in the termination phase the memory is released. Fig. 7. Implementation of a Simulink System In Simulink programs without causal loops, the order of computation produced in the initialization step is computed through a linearization of the causality relation between inputs and outputs. The algorithm used by Simulink (Real-Time workshop) for the simulation (implementation) of a system is hence different from the one given in the previous section. For single rate systems with no causal loops the main difference is that an FSTS is not mapped into a RA able to receive its inputs in all the possible orders, but only in a particular order. The subroutine CG is no longer necessary and line 7 is replaced with a routine that constructs a single path graph. Alternatively we can just pass to the CG routine a linearization of $\prec$ instead of $\prec$ . In the next sections $\phi_{sim}$ denotes the algorithm with this modifications. All the claims and proof of the previous section will hold for $\phi_{sim}$ as well. However in the next sections it is showed that $\phi$ can be distributed with fewer assumption than $\phi_{sim}$ . # VI. DISTRIBUTION OF FSTS SYSTEMS We have proved in the previous section that there is a map $\phi$ between FSTS and RA satisfying property (17). We now prove that the composition operator $\times_{RA}$ as introduced in section III satisfies property (18). Since two different RAs may be running on different machines, they do not share the same notion of time. But, if we are using $\times_{RA}$ , then we can claim the following: if a variable v in one RA is valuated before writing on a port P and on the other side a variable v' is valuated after reading from P then we can be sure that v has been valuated before v'. For the class of reactive automata implementing an FSTS, i.e. $\psi[FSTS]$ this is formalized by the following observation: **Proposition VI.1.** Consider two compatible RA $w_1 = \phi(s_1)$ and $w_2 = \phi(s_2)$ with variables $v_1$ , $v_3$ of $w_1$ and $w_2$ respectively and a port $p_2$ written by $w_1$ and read by $w_2$ . If in each cycle of $w_1$ , $v_1$ is written before $p_2$ is written and in each cycle of $w_2$ , $p_2$ is read before $v_3$ is written by $w_2$ , then $v_1$ is written for the $i^{th}$ time after $v_3$ is written for the $i^{th}$ time in $w_1 \times_{RA} w_2$ . *Proof:* Since the two RA are compatible only one automaton can write on any port. By hyphothesis only $w_1$ writes on $p_2$ and by (III.3) no messages are lost. Hence, since every read operation removes an element from the queue and that the queues are initially empty, for $w_2$ to be reading from $p_2$ (i)<sup>th</sup> times, $w_1$ must have written $p_2$ (i)<sup>th</sup> times. By hyphothesis for $w_2$ to be writing $v_3$ for the i<sup>th</sup> time, it must have read $p_2$ i<sup>th</sup> times. Thus, by hyphothesis on $w_1$ $v_1$ has been written at least i<sup>th</sup> times. We have claimed in section III that $\times_{RA}$ can be implemented across comunicating machines. Hence, we argue that we can distribute a Simulink-like synchronous system across a network with the following theorem: **Theorem VI.2.** The composition operator $\times_{RA}$ satisfies property (18), i.e. for any two compatible FSTS $s = (S^s, I^s, O^s, I_O^s, \Psi_O^s, \Psi_O^s, \Psi_S^s, \prec^s)$ and $s' = (S^{s'}, I^{s'}, O^{s'}, I_O^{s'}, \Psi_O^{s'}, \Psi_S^{s'}, \prec^{s'})$ the following holds: $$\forall t \in \Gamma . \phi(s) \times_{RA} \phi(s') \models t \Leftrightarrow s \times_{STS} s' \models \chi(t)$$ *Proof:* The theorem is proved proving the two implications separately, starting with the left to right $(\Rightarrow)$ implication, now proved by contradiction. Assume that the thesis does not hold. Than the $\exists s, s' \in FSTS, \exists w, w' \in RA, \exists t \in \Gamma^{r \times r'}$ . $$w = \phi(s) \land w' = \phi(s') \land \tag{22}$$ $$(w \times_{RA} w') \models t \land \tag{23}$$ $$(s \times_{STS^*} s') \not\models \chi(t) \tag{24}$$ where $\Gamma^{w\times w'}$ denotes set of traces of $w\times_{RA}w'$ . Assume $w=(L^w,l_0^w,V^w,V_0^w,P_I^w,P_O^w,T^w)$ and $w'=(L^{w'},l_0^{w'},V^{w'},V_0^{w'},P_I^{w'},P_O^{w'},T^{w'})$ , From the definition (5) of trace satisfaction (24) is equivalent to: $$\exists i \in \mathbb{N} \ . \ (s \times_{FSTS} s') \not\models \chi(t)_i$$ Pick the smallest i for which the above condition holds and denote it with i. From definition (4) of tuple satisfaction it then follows that: $$\exists v \in (S^{s \times s'} \cup O^{s \times s'}) \ . \ \chi(t)_i|_v \neq \psi_v^{s \times s'}(\chi(t)_i|_\ell I_v^{s \times s'} \cup S_v^{s \times s'}))$$ (25) Amongst the variables at i satisfying (25) pick a minimal one w.r.t. $\prec_{s\times s'}$ , and denote it v, Assume that $v \in O^{s\times s'}$ . The proof for the case $v \in S^{s\times s'}$ is similar. Assume that in particular $v \in O^s$ . The proofs for the case $v \in O^{s'}$ is the same up to a superscript. Now by contradiction hyphothesis (23) and lemma (III.2) the following hold: $$w \models t$$ Hence, by lemma (V.4) and by the assumption the following hold: $$\forall k \in \mathbb{N} \ \forall y \in O^s \ . \ (t|_y)_k = \psi_y^s(\chi(t)_k|_{(I_v^s \cup S_y^s)}) \tag{26}$$ In particular this holds for y = v and k = i. So that: $$\begin{split} \chi(t)_i|_v &= (t|_v)_i & \text{by definition } (\chi) \text{ of } \chi \\ &= \psi^s_v(\chi(t)_i|_{(I^s_v \cup S^s_v)} & \text{from (26) by lemma (V.4)} \\ &= \psi^{s \times s'}_v(\chi(t)_i|_{I^{s \times s'}_v \cup S^{s \times s'}} & \text{by def. of FSTS comp.} \end{split}$$ But this contradict (25) hence the first implication is proved. The proof of the right to left implication ( $\Leftarrow$ ) is now given. It is shown that for any trace t' of an FSTS $s \times s'$ there is a run r of $\phi(s) \times \phi(s')$ with associated trace t such that $t' = \chi(t)$ . Consider the run r constructed cycle by cycle as follows. Let $E = ((O^{s \times s'} \setminus I^{s \times s'}) \cup (I^{s \times s'} \setminus O^{s \times s'}))$ be the set of external input and output ports. Fix a linearization of $\prec |_E$ . This linearization, projected on the ports of s identifies an unique path from $l_{root}$ to $l_{leaf}$ in $\phi(s)$ . Similarly when projected on the ports of s', it identifies an unique path from $l_{root}$ to $l_{leaf}$ in $\phi(s')$ . In both cases each label of each edge of the paths updates a variable in the order given by the linearization. Extend the linearization with the total orders defined on line (22) of phi. Then the state variables in $S^{s \times s'}$ follow all the other variables in the order. The orders fix a unique cycle from $l_{root}$ back to itself in both $\phi(s)$ and $\phi(s')$ , where by lemma (V.3) each variable is updated once and only once and in the order given by the selected total order. The run starts from location $\{(\phi(s), l_0^s), (\phi(s'), l_0^{s'})\}$ with all the variables initialized to $\sigma_0(V)$ and $\sigma_0^{s'}(V)$ . At the begining of every cycle, through $\epsilon$ transitions, all the external output ports are emptied and all the external inputs are given. Then the run goes through the two automata along the paths identified by the just constructed linearization. The value written in each ports $p_v$ in the $i^{th}$ cycle is $(t_i')|_v$ . The value written in each variable v in the $i^{th}$ cycle is $(t_i')|_v$ . This run is associated by construction to a trace t' such that $t' = \chi(t)$ . We need to show that it satisfies $\phi(s) \times \phi(s')$ . By lemmas (III.2-V.4) the values of the variables and of the ports are the ones satisfying $\phi(s) \times \phi(s')$ . It is left to show that that $\phi(s)$ and $\phi(s')$ would not deadlock at any point of the run. All the reactive automata generated through $\phi$ have no sink states, and since all their transitions have only true guards, there is always a transition enabled. This means that the execution of $\phi(s)$ and $\phi(s')$ can be blocked only on a read from an empty input queue or a write on a full output queue. The external input ports are written at the beginning of each cycle and, by lemma (V.3), the queue is then read once and only once so there are no blocking reads on an external input queue. At the end of the cycle the queue is empty preventing writes on full queues at the beginning of the next cycle. The external output ports are emptied at the beginning of each cycle of the constructed and, by lemma (V.3), they are written once and only once per cycle. Hence there are no blocking writes on external output queues. The only remaining blocking condition possible is on internal input (i.e. ports that belongs to $(I^s \cap O^{s'}) \cup (I^{s'} \cap O^s)$ ). Since they are not internal inputs these ports are empty at the beginning of every cycle. They are written once by one automaton and read once and only once by the other. Hence they are empty at the end of each cycle. By lemma (VI.1) the write action take place before the read action. Thus there is no blocking read or write on internal inputs. This conclude our proof. As noted in section V the implementation algorithm used by Matlab Simulink / RealTime Workshop differs from $\phi$ proposed for FSTS in the sense that it fixes the order in which the input are received and the outputs are computed and propagated to the other subsystems. The results proved in section VI for $\phi$ do not extend in the general case for $\phi_{sim}$ . It suffices to consider the FSTS in figure 8 (taken from [3]). It is easy to see that $s_0$ cannot be compiled through $\phi_{sim}$ without deadlocking if composed with $s_1$ or $s_2$ . If it is compiled to accept $i_1$ before $i_2$ then it will block if composed with $s_2$ . If compiled to accept $i_2$ before $i_1$ it will deadlock when composed with $s_1$ . In reality a Simulink systems reads all the inputs before computing any of the outputs. This means that $s_0$ will deadlock with both $s_1$ and $s_2$ . This shows that as long as the Matlab Simulink interpreter / Realtime Workshop compiler is used, synchronous systems cannot be distributed in the general case. However it can be done in the following particular case: Fig. 8. Three FSTS systems Consider though this particular case: **Theorem VI.3.** Given an FSTS $s' = \prod_{s \in S} s$ , if $\prec^{s'}$ projected to the ports of each subsystem s is a total order (i.e. the external outputs depends on all the external inputs), then for any two compatible FSTS $s = (S^s, I^s, O^s, I^s_O, \Psi^s_O, \Psi^s_S, \prec^s)$ and $s' = (S^{s'}, I^{s'}, O^{s'}, I^{s'}_O, \Psi^s_O, \Psi^s_S, \prec^s)$ the following holds: $$\forall t \in \Gamma . \phi_{sim}(s) \times_{RA} \phi_{sim}(s') \models t \Leftrightarrow s \times_{STS} s' \models \chi(t)$$ *Proof:* Since $\prec^{s'}$ projected over the subsystems is a total order the output of CG is a single path graph with root $l_{root}$ and sink $l_{leaf}$ . As a result $\phi$ and $\phi_{sim}$ produce the same output. The theorem follows. # VII. BDSP ARCHITECTURE In this section the software architecture for the distribution of Simulink programs (see figure 9) is described. We call this architecture Berkeley Distributed Simulink Program (BDSP) library. An initial version of the BDSP library has been implemented using a simple rendezvous scheme. The first version was developed as a proof of concept, a second version, utilising bounded queues as described in this section is currently under development. The current implementation relies on the Simulink interpreter. Because of it the systems are distributed as follows: first the original Simulink model is decomposed into atomic blocks. Then all the broken connections are replaced with external - linkboxes (i.e. S-function boxes we provide). These boxes hide the complexity of the distribution to the user. Fig. 9. BDSP architecture Input and Output external-link boxes structure: the structure of an Input external-link box and of an Output external-link box are the same but for the ports. While the input box has a single input and no outputs the output box should have one output and no inputs. The boxes have three parameters: the IP/port pair for the sender, the IP/port pair for the receiver and a name that is going to be used to resolve for the first two parameters. The box uses two UDP sockets to communicate with the queue manager (UDP is lightweight and since the communication is local there is no need for retransmission). One socket is used to receive messages from the queue manager and the second is used to send messages to it. Queue Manager structure: the structure of the queue manager is shown in the right side of figure 9. It consists of many queues, one for every input or output port of the block. It has a couple of UDP sockets to comunicate with the S-function boxes on the machine and a list of UDP sockets to communicate with the other queue managers. Every queue is associated with two flags (the *datarequested* and *queuefull*) and a counter. A reliable transmission protocol is implemented using a standard retransmission strategy. External-link box to queue manager interface: The life cycle of an external-link box is the same of any Simulink box (described in section V). In the initialization phase the box sends a packet to the queue manager to reserve a queue and pass the IP/port address to the other end of the pipe. If it is an input block it requests its input from the queue manager in the Input Read phase. If the queue is empty it blocks until something is available. The flag datarequested is switched on if the queue is empty. If it is not empty the data is removed from the queue and sent to the box. If it is an output block, in the Ouput Phase the output is sent to the Queue manager. If the queue is not full an ack is sent back to the output box. The box is blocked until the ack is received. If the queue is full and the box is trying to send, the flag Full is switched on. When the queue is empty andthe flag Full is on an ack is sent to the Output box. Queue manager to queue manager interface: the communication protocol between queue managers needs to be reliable and to preserve message order. A possible candidate is TCP, or a UDP with a acknowledgment-timeout protocol implemented on top. When an output queue is not empty the queue manager will try to send the message as soon as possible. It removes the message from the queue only when the ack is received. When it receives a message it will put it on the right queue. If the queue is full it will drop the packet (the message will not be lost, just retransmitted later). # VIII. PERFORMANCE ANALYSIS Code distribution may lead to a system speed-up through concurrency, but it has also a cost overhead associated with the rendezvous communication protocol. In this section this overhead is estimated for the first implementation of the BDSP library as described in section VII. We decompose the system in figure 10 into three subsystems running on two separate Pentium 4 850 Mhz, 512 Mb ram machines. The source and the sink gain are located on the same machine, while the middle gain is run on a second one. A timestamp is recorded by the external-link boxes at the beginning and at the end of each time step. Since the source and sink gain are on the same machine, i.e. they are running according to the same clock, the time stamps can be compared to get a conservative estimate of the overhead due to the rendezvous protocol. Fig. 10. The model used to estimate the overhead The measured overhead is conservative because it includes the middle gain computation time and the two Simulink processes on the first processor are competing on the first computer. The computers are connected through a shared 802.11b wireless ethernet. Fig. 11. A conservative estimate of the distribution overhead The results are plotted in figure 11. The overhead average is smaller that 0.2 seconds and the standard deviation is close to 30 ms. This result is promising considering that we are currently using the Simulink interpreter and not the Real-time workshop compiler. # IX. APPLICATION TO TRAFFIC SIGNAL CONTROL We are working to introduce synchronous programming techniques into traffic signal control. As they are growing rapidly in complexity we see an excellent opportunity for synchronous programming tools as a way to greatly simplify software development for these large-scale systems. In order to maximize the flow and minimize the average waiting time, the cycle length, defined as the time needed to go through all the phases, and the interval splitting defined as the ratio of the green time for the two directions, need to be properly set. The early traffic signal controllers were non-programmable devices. However, with time these devices has reached a high level of sophistication. An example of such a device is the 2070 controller, used widely in California, which support pre-timed, semi-actuated and fully actuated operation rules and support a wide set of sensors. These devices support many pre-defined rules that can be adjusted on the field or remotely (in the case of the 2070 the remote setting is fixed by the National Transportation Communications for ITS protocol, set of standards). At the same time signal control systems are growing spatially. The first dynamically adjustable lights were driven by traffic measurement sensors located next to them. They were isolated. Next it became possible to coordinate all the lights along an arterial to have the lights turn green in succession. Modern systems like [23] seek to coordinate entire downtown urban grids. The entire grid is operated on a common cycle time adjusted on the timescale of tens of minutes as demand changes. Controllers like 2070 are only partially programmable and the programming interfaces are low-level. Furthermore, embedded computing and the wireless revolution are being brought together by the US governments Vehicle Infrastructure Initiative (VII) [24]. It is envisaged that every roadside cabinet will have a general purpose computer with wired or wireless backhaul. A large-signal control system could be developed in high-level tools like Simulink and compiled to suit the hardware architecture at hand. The entire system may compute in a traffic management center with low-level commands going out to the filed or be distributed to compute entirely in field cabinets. As a first step we have used SIMULINK to model the system in figure 12: a major arterial road is intersected by 4 minor low traffic streets. Consider a peak hour asymmetric scenario, where almost all the traffic flow is in one direction on the major street. The flow is maximized by coordinating the traffic lights to create green waves thus a car that just got the right-of-way at the first intersection will get a green at all the intersections (see [25] and [26]). This is done synchronizing the controllers, fixing the same cycle length, and offsetting the beginning of each cycle by a statically determined d \* v, where d is the distance between the two intersections and v is the target traffic speed. Fig. 12. Asymmetric peak hour traffic on a major road intersected by four minor streets If the intersection has inductive loops the vehicle speed can be directly estimated. This value can be passed through a simple filter to make the system resistant to insignificant minor speed fluctuations, while adjusting to significant and permanent changes (due for example to congestion, road construction or minor accident). This actuated scheme is implemented by the Simulink model in figure 13. The average speed in response to the traffic light has been computed using traffic flow theory as described in [28]. The sensor input is passed through a simple filter to make the system resistant to insignificant minor speed fluctuations, while adjusting to significant and permanent changes. Fig. 13. Simulink Model for the distributed traffic controller We have run a simulation of the system where an accident is happening between the first and the second intersection during the $100^{th}$ cycles and it is cleared out during the $180^{th}$ , and a minor one happened between the third and the fourth one during the $150^{th}$ cycle and it is cleared out during the $200^{th}$ . The offsets computed by the last three intersection (the offset for the first one is always 0) computed using the model in figure 13 are plotted in figure 14. Fig. 14. The offset as computed by the model described in figure 13 This model can be compiled and run centrally at the traffic management center, making the traffic signal operating, quoting [27], "exactly they way the designer think it should be controlled". Moreover, as shown in sections VI, the model can be compiled into distributed code, with the same behaviour, that can be run by the controllers without any need of external coordination. The test to evaluate the performances of the system has been carried over the same hardware used in the previous section. In this case the performances have been measured as the total computation time needed to carry a step (i.e. from the end of the previous cycle to the end of the computation of all the offsets). The computation time is on average 0.3 s (the standard deviation is 6 ms). We expect this result to improve when moving from Simulink interpretation to direct execution of the code as generated by Real-Time workshop. Even interpreting the code though, the system largely met the time constraints of the application as described in [27]. # X. CONCLUSION The problem of distributing large scale synchronous systems across a network has been addressed. We defined a synchronous and asynchronous composition operator. The synchronous composition operator is Simulink-like. The asynchronous composition operator is similar to the one used in Kahn process networks. We presented an algorithm to implement a synchronous program into an asynchronous one and we proved the implementation map preserves the synchronous semantics in the sense of [13]. The main result was that the implementation is a monomorphism with respect to the synchronous and asynchronous compositions. The monomorphism is our argument that a local change can be handled locally and that a subsystem can be re-used in different systems. We have presented a software architecture consistent with our mathematics and studied its performantees. We have motivated the development of synchronous programming tools for traffic signal control. ### REFERENCES - [1] E. A. Lee, Concurrent Models of Computation for Embedded Software, Technical Memorandum UCB/ERL M05/2, University of California, Berkeley, 2005 - [2] E. A. Lee and Stephen Neuendorffer, Concurrent Models of Computation for Embedded Software, Technical Memorandum UCB/ERL M04/26, University of California, Berkeley, 2004. - [3] G. Berry, A. Benvenieste, *The synchronous approach to reactive and real-time systems*, Proceedings of the IEEE, 79(9):1270-1282, September 1991 - [4] C. Andre', F. Boulanger, A. Girault, *Software implemenentation of synchronous programs*, IEEE International Conference on Application of concurrency to System Design, June 2001 - [5] C. Andre', M.A. Peraldi, Effective implementation of ESTEREL programs, 5th Euromicro workshop on real-time systems, June 1993. - [6] G. Berry, The Foundations of Esterel, Proof, Language and Interaction: Essays in Honour of Robin Milner, G. Plotkin, C. Stirling and M. Tofte, editors, MIT Press, 1998. - [7] G. Berry, The Constructive Semantics of Pure Esterel, July 2, 1999 - [8] N. Halbwachs, P. Caspi, P. Raymond and D. Pilaud, *The synchronous dataflow programming language Lustre*, Proceedings of the IEEE, vol. 79, nr. 9. September 1991. - [9] B. Houssais The synchronous programming language SIGNAL, a tutorial, IRISA, April 2002 - [10] Learning Simulink 5, MathWorks edition, 2002 - [11] A. Girault, Automatic distribution of synchronous programs, ERIM News, January 2003 - [12] A. Girault, C. Menier, Automatic production of Globally Asynchronous Locally Synchronous Systems, EMSOFT 2002 - [13] A. Benvenieste, B. Caillaud, P. Le Guernic, Compositionality in dataflow synchronous languages: specification and distributed code generation, Information and Computation, vol.163, no.1, 25 Nov. 2000, pp.125-71. Publisher: Academic Press, USA. - [14] M. Zennaro, R. Sengupta Distributing Synchronous Systems with Modular Structure, IEEE 2004 44th Conference on Decision and Control, December 2004 - [15] C.A.R. Hoare, Communicating sequential processes, Prentice Hall, 2003 - [16] Jayadev Misra, Distributed discrete-event simulation, ACM Computing Surveys (CSUR), Volume 18 Issue 1 March 1986 - [17] L. P. Carloni, K. L. McMillan, A. L. Sangiovanni-Vincentelli, *Theory of Latency-Insensitive Design*, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems., 20(9):18, September 2001. - [18] Manna, Pnueli, The temporal logic of reactive and concurrent systems, Springer-Verlag 1992 - [19] N. Lynch, R. Segala, F. W. Vaandrager Hybrid I/O automata, Hybrid System III, LNCS 1066, Springer-Verlag, 1996, p.496-510 - [20] G. Kahn, The Semantics of a Simple Language for Parallel Programming, Proceedings of the IFIP Congress74. North Holland Publishing Company. - [21] G. Khan and D.B.MacQueen, Coroutines and networks of parallel processes, Information Processing, North-Holand Publishing Co. 1977 - [22] E. A. Lee, T. M. Parks, Dataflow process networks, Proceedings of the IEEE, 1987 - [23] http://www.scoot-utc.com - [24] http://www.its.dot.gov/initiatives/initiative9.htm - [25] J. H. Kell, Coordination of fixed-time traffic signal, J. H. K. and Associates Internal report, 1973 - [26] M. Boydstun, Coordinated Traffic Signal Systems, National Institute for Advanced Transportation Technology, Traffic Signal Summer Workshop, 2004 - [27] D. Gitelson, Traffic Signal Computers, California Division of Highways Internal report, 1972 - [28] C. F. Daganzo, Fundamentals of transportation and traffic operation, Pergamon Edition, 1997 - [29] S. Edwards, The specification and execution of Heterogeneous Synchronous Reactive Systems, PhD thesis, University of California at Berkeley, 1997 - [30] J. L. Hennessy, D.A. Patterson, D. Goldberg, Computer Architecture: A quantitative approach 3<sup>rd</sup> edition, Morgan Kaefmann, 2002 - [31] A. S. Tanenbaum, M. van Steen, Distributed Systems, Principles and Paradigms, Prentice Hall 2002 - [32] Simulink Help Manual: Writing S-functions, MathWorks edition, 2002 - [33] G. C. Sih, E. A. Lee, A compile-time scheduling heuristic for interconnection-constrained heterogeneous processor architectures, IEEE Transactions on Parallel and Distributed Systems, vol.4, no.2, Feb. 1993, pp.175-87. USA. - [34] H.B. Enderton, A Mathematical Introduction to Logic, Academic Press; 2 edition (December, 2000) - [35] Advantages and disadvantages of traffic signals, Institute of transportation and Traffic Engineering, University of California at Berkeley, Internal report UCB-ITS-RR-73-XX, 1973 - [36] Manual on Uniform Traffic Control Devices for street and highways, Federal Highway Administration. Department of Transportation, Washington, D.C., 1970 - [37] Traffic Manual, Chapter IX, Traffic Control Signals, California Division of Highways, 1971 - [38] J. C. Ray, Effective Traffic Signal Control, Unpublished report, Sacramento County, California - [39] M. Boydstun, Actuated Traffic Signal Systems, National Institute for Advanced Transportation Technology, Traffic Signal Summer Workshop, 2004 - [40] Sepac Actuated Signal Control Software with NTCIP User manual, Eagle SIEMENS, 2001 - [41] M. Zennaro, J. Misener A State Map Architecture for Safe Intelligent Intersections, ITS America 2003 13th annual meeting, May 2003 - [42] PATH website http://www.path.berkeley.edu - [43] P. Caspi, M. Ponzet, Synchronous Kahn networks ACM. Sigplan Notices (Acm Special Interest Group on Programming Languages), vol.31, no.6, June 1996, pp.226-38. USA. - [44] J.G.F. Francis, The QR Transformation I, Comput. J., vol. 4, 1961, pp 265-271. - [45] A. Garcia, C. Lucena, F. Zambonelli, A. Omicini, J. Castro, Software Engineering for large-scale multi-agent systems: research issues and practical applications, Sringer Verlag, 2003