Translating Hierarchical Simulink Applications to Real-time multi-core Execution

Matlab & Simulink i s i s widely used as a defacto standard to design i ndustrial applications, video coding & decoding, and s ignal processing applica-tions. However, with the spectacular i ncrease i n the num-ber of the cores available i n hardware platforms over these l ast years, passing f rom Simulink to multi-core execution becomes more and more complex. I n this context, several researches are done to take benefit f rom the high degree of parallelism and to perform multi-core programming of Simulink applications. In this paper, we present an automated method for transforming hierarchical Simulink applications to embedded parallel software implementation. Our method consists of using IBSDF (Interfaced based Synchronous Dataflow) as an intermediate representation to extract parallelism. Moreover, our approach permits preserving synchronous semantics and hierarchical behavior of the Simulink model. The model-based approach makes it possible to verify the key properties of the system at compile-time, such as deadlock freeness and memory boundedness. The method has been implemented as an extension of the rapid prototyping tool named Preesm. Experiments show that our proposal gives, as a transformation result, a schedulable IBSDF graph equivalent in size to the Simulink model and allows better multi-core implementation performance than Matlab&Simulink sequential execution.


Introduction
In recent years, the number of cores available in hardware platforms has increased dramatically, from tens of cores to hundreds of heterogeneous processing elements. Concurrently, the development in digital communications, telecommunications, digital signal processing and coding/decoding videos becomes increasingly complex, thus requiring more computational power. Then, there is a need for modeling software applications with a high-level and implementation independent model that can be translated efficiently into high performance implementation on modern parallel architectures. These reasons motivate the transition from imperative programming languages, which are intrinsically sequential, to parallel models of computation. This transition offers two main advantages for the software/hardware applications programming: parallelism and performance improvement.
To take benefit of this approach and exploit the data parallelism, SDF (Synchronous Dataflow) [14] models has proven to be a well suited representation for programming multi-core architectures. Indeed, thanks to its semantics, the SDF MoC (model of computation) provides a practical mean to decompose applications into coarse grain computational entities: the actors. Mapping and scheduling these actors on available processing elements makes it possible to optimize diverse performance criteria of the application, such as throughput, speedup and latency. Moreover, each SDF actor is described by a host program. In our work, host programs are written in C language which will contribute to the generation of compatible C codes targeting multi-core architectures.
Currently, Simulink, a software package from Mathworks, is widely used as defacto standard to design, simulate and validate industrial applications in many domains, such as digital communication, digital signal processing, and image processing. However, Simulink models are difficult to be implemented into multi-core platforms. This is due to the fact that passing from a Simulink application to multi-core platform must preserve model semantic and consistency, besides to the determinism in data exchanged between different Simulink blocks. Additionally, the implementation must meet parallel hardware constraints and ensure that the behavior of the generated code is conformed to the Simulink model behavior. Further, up to now, most references in this context do not gain advantages from the hierarchical behavior of Simulink model to extract parallelism and optimize code generation. These references focus only on the transformation of atomic blocks and flattened systems.
The question left is how to overcome lacks cited above, facilitate and perform multi-core programming of hierarchical Simulink applications. This question leads to an idea of extending rapid prototyping tool with code genera-tion capabilities to support Simulink applications. In fact, rapid prototyping tools allow to model hardware/software applications and give early decisions that contribute to improve and perform applications deployment into multicore platforms. We choose to extend an open source tool called Preesm (the Parallel and Real-time Embedded Executives Scheduling Method) [21] thanks to its ability to prototype efficient multi-core applications starting from SDF representation. Therefore, we allow designers to go from a Simulink model to an efficient hardware implementation. To concretize the idea above, we put forward an efficient solution to automate Simulink models transformation and deployment over multi-core architecture.
In this paper, we propose a translation approach and a rapid prototyping tool extension to support multi-core programming of Simulink applications. The method consists of automatically translating an application designed with Simulink into an extended version of the SDF MoC called IBSDF (Interfaced based Synchronous Dataflow) [15] while keeping models hierarchical behavior. The choice of IBSDF MoC as an intermediate representation is based on the fact that it is a specific class of SDF MoC characterized by a hierarchy mechanism. This mechanism enables the description of the internal behavior of nodes with SDF subgraphs and ensures deadlock freeness between levels thanks to interfaces mechanism. Moreover, using IBSDF MoC allows us to benefit from recent researches such as [16] which exploit hierarchical behavior of IBSDF MoC to optimize code generation process and enhance performance on multi-core architectures.
The aim of this work is to provide an effcient solution to automatize Simulink programs deployment over multi-core architectures. The two main contributions of this work are as follows:First, we translate hierarchical Simulink models into IBSDF graphs in such way we preserve hierarchy and synchronous semantics of both models. During the translation we respect deadlock-freeness and consistency constraints in order to obtain a schedulable IBSDF graph. Secondly, we implement our proposal as a plugin extension into Preesm work-flow to generate optimal C codes compatible for multi-cores platform.
To achieve this, we perform, as first step, a mathematical study to demonstrate that hierarchical Simulink systems can be translated into schedulable IBSDF graphs. To do t this, we define, first, multi-levels precedence, concistency and deadlock-freeness constraints which must be taken into account in the translation process. Then, we differntiate three block communication cases: direct communication, delayed communication and hybrid communication. Considering these three cases, we propose our translation theorems. Each proposed theorem was followed with a proof. For each communication case, we illustrate the translation process with simple Simulink models. The objective of the mathematical study is to demonstrate that during the translation process, consistency and deadlock freeness were conserved and the resulting hierarchical graph is a schedulable graph. To the best of our knowledge, our paper is the first that gives a detailed study about conserving hierarchy during the translation process of hierarchical Simulink models. All previous works focus only on the synchronous behavior of Simulink models and do not consider the internal hierarchical behavior which is very important to extrat paral-lelism and allows better multi-core implementation performance.
Our proposed translation process was implemented into three main sub-tasks: Simulink parser, translator and IB-SDF generator integrated in the extended Preesm workflow. To the best of our knowledge, this work is the first that proposes a whole work-flow starting from Hierarchical Simulink models to parallel C code generation for multi-cores architecture. Indeed, Previous works, such as [1], [2] [3] and [4], focused only on the translation process and did not integrate their results into code generation tools to show the effictiveness of their approaches.
In this work, we consider only multi-rate and discretetime Simulink models with multiple sampling times blocks. Blocks can be hierarchical or atomic. This means that blocks with dynamic behavior, such as enabledsubsystem and triggered-subsystem, and continuous-time part of Simulink models are not considered in this work. Throughout the translation to IBSDF, we ensure that the resulted graph has the same Simulink model behavior and the same size. This means that semantics of both models were conserved during the execution of the whole extended work-flow starting from Simulink model to multi-core deployment.
The remainder of this paper is structured as follows: In Section 2, we give a background of models of computation used in the presented work. Section 3 gives a study of deadlock freeness and consistency constraints and details our solution to translate hierarchical Simulink models. In section 4, we present Preesm tool support and the overall extended workf-low. Experimental results which are based on the state-of-art telecommunication application "LTE QPSK transmitter" are presented in Section 5. Finally, the last section concludes the paper and underlines the future work.

Synchronous Data-flow and Interface-Based Synchronous Data-flow
Synchronous data-flow SDF, introduced by Lee and Messerschmitt [9,10], are MoC providing high level design and implementation of embedded programs, which are notably popular for specifying digital signal processing applications.
An SDF graph G = (A, F, P ) consists of a set of actors A interconnected by a set of FiFo F carrying data tokens and a set of port P used as anchors for FiFo connection such that: • A = (a 1 , a 2 , a 3 , ...) is the set of actors that consume and/or produce data tokens.
is the set of channels carrying data streams.
• P = (inP, outP ) is the port set of an actor a i .
• inP (a i ) = (inP 1 (a i ), inP 2 (a i ), inP 3 (a i ), ...) is the set of input ports of an actor a i .
is the set of output ports of an actor (a i ).
• IN (a i ) = (in 1 (a i ), in 2 (a i ), in 3 (a i ), , ...) is the set of data consumed by the different a i input ports at each firing.
• OU T (a i ) = (out 1 (a i ), out 2 (a i ), out 3 (a i ), ...) is the set of data produced by the different a i output ports.
• d(a i , a j ) is the amount of initial tokens on a FiFo F connecting the actors a i and a j , we refer to it as a delay. The delay allows to represent data dependencies between successive graph iteration [23] and perform analysis. A graph iteration is the minimal fixed sequence of actor firings that can be repeated indefinitely to execute the graph.
Further, to ease graph development and to allow optimization for the scheduling process, SDF graph may support hierarchical behavior. An extended version of SDF MoC, called IBSDF, adds to the SDF semantics a hierarchy mechanism. This mechanism allows more expressiveness in SDF graph by enabling the specification of the internal behavior of an actor with an encapsulated subgraph. This hierarchy mechanism offers more flexibility to optimize application for the scheduling and generation code process. IBSDF semantics, in addition to SDF semantics, is based on interfaces mechanism allowing a hierarchical graph design. Interfaces mechanism obeys the "compositionality" principle. In fact, interfaces mechanism was proven in [15] to be able to ensure that if a graph is instantiated, its behavior might not be modified by its parent graph. Further, its behavior might not also introduce deadlock in its parent graph.
In addition to the SDF semantics, an IBSDF graph G = (A, F, P, D, I) consists of a set of hierarchical actors HA and a set of interfaces I separating the hierarchical levels where: • HA = (Ha 1 , Ha 2 , Ha 3 , ...) is the set of hierarchical actors that consume and/or produce data tokens.
• srcI is the vertex transmitting to the subgraph the amount of tokens received by its corresponding parent actor.
• snkI is the vertex transmitting to the parent actor the amount of tokens produced by its corresponding subgraph.
• srcI data is the amount of data available on its corresponding source interface.
• snkI data is the amount of data available on its corresponding sink interface.
In the following sections, a hierarchical actor will refer to an actor which encapsulates a hierarchy level, an actor will refer to an atomic actor and a sub-graph will refer to the graph embedded into a hierarchical actor. Figure 1 and Figure 2 illustrates SDF and IBSDF graphs, respectively.

Description of Simulink
Simulink, developed by Math-works, is a wide-spread commercial tool for embedded system simulation and  model based design. Thanks to its graphical user interface (GUI) and its rich library of blocks, Simulink users are able to build, simulate and verify a variety of embedded systems such as telecommunication, signal processing, image and video processing application.
Simulink modeling consists of creating a block network connected by lines, representing signals. Blocks ports (inports and outports) represent connection endpoints for signals. There are two types of block parts, data ports, which gives the dataflow of the Simulink model and control ports which produce conditional (enabled, triggered) events for the execution of subsystems. In this work, we consider only the data ports type.
Further, Simulink is a synchronous model with synchronous language based on hierarchical behavior. Indeed, Simulink enables users to group basic blocks in a recursive manner to design more complex diagrams. We refer to the composite diagram as subsystem and the noncomposite as atomic block. Each port on the subsystem corresponds to an InPort-Block or an OutPort-Block. A subsystem may contain blocks sampled at different rates; we called as multi-rate subsystem. We distinct two classifications of subsystems: virtual subsystems and nonvirtual (functional) subsystems. A virtual subsystem is helpful to graphically organize the Simulink model and increases the design readability, but it does not influence the internal behavior of the hierarchy. While the non-virtual subsystem is provided to model functionalities and control the internal hierarchy.
Simulink supports simulation for discrete (sampled data) and continuous systems. For the discrete-time blocks, the sample time is defined as the time between two consecutive instants when a block executes. Networks can include discrete-time blocks operating at different rates (so-called multi-rate systems) where each block is associated with a different sample time period. Regarding to the continuous time blocks, the sample time is obtained by simulating ordinary differential equations.
We differentiate two simulation options: multi-tasking which assign priority to each block and single-tasking which does not assign priority to blocks and respect the simulation time execution semantics. Both multitasking and single-tasking support different communication mechanisms: direct communication mechanism and delayed communication mechanism. In direct communication mechanism, a block uses the data produced by the block connected to its input at the same time step. Otherwise, in the delayed communication, a block does not use the data produced by the block connected to its input at the same time step. A combination of both mechanisms, so-called hybrid communication mechanism, is used when the data is transferred from low priority block to high priority block and block periods are different. Indeed, when blocks composing multi-rate system are not executed at the same time, direct data exchanging mechanism is adopted because the lower priority block driving the input is fired after the higher priority block. As well, when blocks composing multi-rate system are executed at the same time, delayed data exchanging mechanism is adopted because the lower priority block driving the input is fired after the higher priority block. These communication rules are applied for blocks from the same level of hierarchy and for blocks from different level of hierarchy. Then, additionally to the hierarchical behavior, relation precedence exists between levels and obeys the same communication rules.
Through the next sections we use the following terminology: is the set of atomic blocks or/and atomic subsystems.
is the set of composed subsystems.
• T Si and T bi denote the sample time period of a composed subsystem S i and the sample time period of an atomic block b i , respectively.

Translation principle
In this section, we present our transformation technique of discrete-time Simulink model to its corresponding IB-SDF representation. The main aim of this transformation methodology is to obtain a schedulable IBSDF where structural properties of the Simulink model are preserved. To reach this goal, the translation must be performed in a way that ensures the consistency and the deadlock freeness of the obtained IBSDF. For this, we have to determine data tokens consumed/produced by each actor and the delay available on each FiFo while respecting precedence, consistency and deadlock freeness constraints. In the following subsections we give an overview about these constraints before detailing the translation process.

Deadlock freeness and consistency constraints
We consider a hierarchical actor Ha 1 encapsulating n atomic actors a 1 , a 2 , ..., a n where a 1 is the consumer actor (consumes tokens produced by the hierarchical actor) and a n is the producer actor (produces tikens to the hierarchical actor). We note srcI the source interface of Ha 1 that transfers tokens to its sub-graph and inP (a 1 ) is the target port of the first consumer sub-actor a 1 . We also note snkI the sink interface of Ha 1 which consumes tokens produced by the last producer sub-actor a n and outP (a n ) is the source port of a n as showed in figure 3. As mentioned in section ??, IBSDF MoC introduces interface elements to model the hierarchy behavior of an SDF graph and insulate its level of hierarchy. This hierarchy semantic must obey some constraints, imposed by the IBSDF model, in order to ensure deadlock freeness and consistency at every level of graph hierarchy: • Deadlock freeness constraints: during a subgraph iteration, source and sink interfaces must stay writelocked and read-locked, respectively. Meaning that the internal behavior is independent of any external actor during an iteration. Further, if a consumer subactor needs an amount of tokens greater than the amount available in the source interface, this latest will behave like a ring buffer. Regarding the sink interface, if the producer sub-actor produces an amount of tokens greater than required, it will behave like a circular buffer. Hence, it will forward only the number of tokens produced during the subgraph execution and required by the hierarchical actor.
• Consistency constraints: to ensure the consistency of the internal SDF subgraph, the subgraph must consume all the tokens made available by the source interface during an iteration. Symmetrically, all tokens required by the sink interface must be produced by the sub-graph during an iteration.
Lemma 1 highlights the conditions required to ensure deadlock freeness and consistency in every level of the hierarchy.
Lemma 3.1 Let us consider a hierarchical actor Ha with source interface srcI, sink interface sinkI and encapsulating n atomic actors a 1 , a 2 , ..., a n where a 1 is the consumer actor and a n is the producer actor. The internal SDF of the hierarchical actor Ha is consistent and deadlock free if source and sink interfaces obey these conditions at each iteration: sinkI data = u γ out(a n ) if sinkI data ≤ u · out(a n ). u β out(a n ) otherwise.
(2) Where: • x ∈> 1 and α ∈ [0..1] are positive constants representing duplication numbers of the rate of tokens available in the source interface within two different cases.
• γ > 1 and β ∈ [0..1] representing duplication numbers of the rate of tokens available in the sink interface within two different cases.
• v ∈ N * is the execution repetition number of a 1 .
• u ∈ N * is the execution repetition number of a n .
Proof: Deadlock freeness constraints can be written as: If sinkI data ≤ u · out(a n ) then γ · sinkI data = u · out(a n ).
Consistency constraints can be written as: If sinkI data > u · out(a n ) then β · sinkI data = u · out(a n ).
Equations (3) and (5) give the source interface conditions. Similarly, equations (4) and (6) give the sink interface conditions. Figure 4 helps to illustrate Lemma 1 by an example. In this case, in(Ha 1 ) = 2 is the amount of tokens produced to the source interface srcI, in(a 1 ) = 1 and its execution repetition number v is equal to 6. We have, consequently, srcI data = 2 ≤ V ·in(a 1 ) = 6. According to Lemma 13.1, the number of data tokens available in the source interface must be duplicated x time. The duplication number is determined by applying equation (1). Then, x is equal to 3. Likewise, out(Ha 1 ) = 1 is the amount of data tokens required to be produced by the sink interface sinkI. Although, out(a n ) = 3 and its execution repetition number u is equal to 4. We have, consequently, sinkI data = 1 ≤ u · out(a n ) = 12. Based in Lemma 1 3.1, the amount of tokens available in the sink interface must be duplicated γ time in such way that sinkI data = u γ out(a n ). Then, we have γ equal to 12.

One-level precedence constraints
In [18], Marchetti gives a detailed study about couples firings in SDF graphs, considering only flattened graphs (no hierarchy). Based on the fact that a schedule is feasible if the number of tokens is positive in every FiFo of the graph, author demonstrates the existence of precedence constraint between two executions of connected actors. Let us consider a couple of actors a i and a j linked by a is the u th execution of a j , out(a i ) is the amount produced by a i and in(a j ) is the amount consumed by a j .
A precedence relationship exists between a i and a j if firings obey the following two conditions: cannot.
Lemma 2 characterizes the precedence constraint between the v th execution of a i and u th execution of a j . Lemma 3.2 A precedence constraint exists between the v th execution of a i and u th execution of a j iff: Proof: A precedence relation is modeled between the v th execution of a i and u th execution of a j if condition (1) and condition (2) are fulfilled: Combining these resulting inequalities, we obtain inequality (7).

Multi-levels precedence constraints
We consider a hierarchical actor Ha containing n atomic actors a 1 , a 2 , . . . , a n . We denote the actor a 1 as the consumer sub-actor which consumes the amount of tokens in(a 1 ) , the actor a n as the sub-actor which produced the amount of tokens out(a n ), srcI data is the amount of data available on the source interface linking the hierarchical graph Ha and the consumer sub-actor a 1 and d(Ha, a 1 ) is the initial amount of tokens in the FiFo linking the source interface and the consumer sub-actor. We also denote snkI data as the amount of data available on the sink interface linking the hierarchical actor Ha and the producer sub-actor d(a n , Ha) as the initial amount of tokens in the FiFo linking the sink interface and the producer sub-actor. Building on the precedence constraints between two firings of the same level and the hierarchy dependency, we define the precedence constraints between an hierarchical actor and its sub-graph.
A precedence relationship exists between an hierarchical actor and its sub-graph if firings obey the following conditions:

246
Translating Hierarchical Simulink Applications to Real-time multi-core Execution Where: • #v is the execution number of a 1 up to [v].
• [u] is the u th execution of a n .
• #u is the execution number of a n up to [u].
We can now determine initial amounts of tokens modeling the precedence relation between firings of an hierarchical actor and its subgraph.

Lemma 3.3 A precedence relation between an hierarchical actor and its sub-graph if:
Proof: A precedence relation is modeled between the w th execution of Ha and v th execution of a 1 if condition (1)and condition (2) presented in Defintion 1 are fulfilled: Then, when combining these two inequalities, we obtain inequality (8). A precedence relation is modeled between the w th execution of Ha and u th execution of a n if condition (3)and condition (4) presented in Defintion 1 are fulfilled: Condition (3) ⇐⇒ d(a n , Ha) + out(a n ) · (w · #u + u) − snkI data · w ≥ 0. Condition (4) ⇐⇒ out(a n ) > d(a n , Ha) + out(a n ) Then, when combining these two inequalities, we obtain inequality (9).

One level blocks translation
To illustrate one level blocks translation, we consider a Simulink system S containing three atomic Blocks B i−1 , B i and B i+1 .
Atomic subsystems and basic blocks (such sum blocks, constant blocks,...) are translated into atomic actors in the IBSDF graph. Each resulting atomic actor a i is named with the corresponding name of the Simulink block B i . Simulink Blocks sample times T Bi are recuperated when simulating the Simulink model.
Unit delays block is not taken into account during the translation process. It is only used to mark the delayed behavior of the communication between blocks. Further, blocks belonging to one level can be atomic or composed. During the translation of one level of an hierarchical Simulink model , composed blocks behave as atomic blocks.
The input and output blocks connecting blocks of the same level are respectively converted into input and output ports transferring data in the IBSDF graph. We refer to the rules proved in [1] to determine the amount of tokens available in these ports (consumed data rates and produced data rates). We differentiate three communication cases; direct communication case, delayed communication and hybrid communication. In the three cases, the consumed data available in the import of an actor a i , in(a i ), and the produced data available in a i out-port, out(a i ), are similarly determined: Input and output blocks are also used to transfer signals between levels of an hierarchical Simulink model. Input and output blocks respectively correspond to source interface srcI and sink interface sinkI in the IBSDF graph. To translate rates available in these interfaces and ensure deadlock freeness and consistency between levels, we based on theorems detailed and proved in the following section.
Lines transferring signals in Simunlink model are converted into FiFo channels connecting actors in the IBSDF graph. Each resulting FiFo is characterized with an initial amount of tokens obtained depending on the communication type: • Hybrid communication : d(a i , a i+1 ) = out(a i ).

Hierarchical subsystems translation
We consider S 1 a composed subsystem with sample time T S1 containing a set of atomic blocks B 1 , B 2 , · · · , B n . A sample time T Bi is associated to each block B i . Two subsystems S 2 and S 3 are connected to S 1 with sample times T S2 and T S3 .The block B 1 is the sub-consumer block and B n is the sub-producer block as depicted in figure 5. We note that virtual subsystems are not taken into account during the translation process; they are only used to group blocks. We pose: g (S1,B1) the greatest common divisor of T S1 and T B1 . g (S1,Bn) the greatest common divisor of T S1 and T Bn . g (S1,S2) the greatest common divisor of T S1 and T S2 . g (S3,S1) the greatest common divisor of T S3 and T S1 .

Modeling levels direct communication
To model levels direct communication, we have to model source/sub-consumer actor communication and sink/subproducer actor communication.
Source interface/sub-consumer actor direct communication: a direct communication between the source interface and the consumer sub-block is defined through the following hierarchical dependency conditions: Based on these conditions we deduce Lemma 4.
Lemma 3.4 Let S 1 be a composed subsystem with sample period T S1 containing a set of atomic blocks B 1 , B 2 , ..., B n firing in direct communication mode. B 1 represents the sub-consumer atomic block with sample period T B1 . A hierarchical dependency exists between the w th execution of S 1 and v th execution of B 1 if: Where b is a coefficient superior or equal to 1.
When we added inequality (12) and inequality (13) we obtain: We multiply inequality (11) by −1 and add it with the resulted inequality. We obtain: Since w·#v+v−2 w > 1, it exists a coefficient b ≥ 1 such that: To ensure deadlock freeness between the hierarchical actor and the sub-consumer actor in direct communication case, we refer to Theorem 1: Theorem 3.5 To ensure deadlock freeness between an hierarchical actor and its sub-consmer actor, IBSDF introduces the source interface concept such that: Proof: We multiply equality (10) Equality (1) of Lemma 1 is obtained by replacing, in the resulting equation, by in(a 1 ), x and α by · v. Where x and α represent duplication numbers of the rate of tokens available in the source interface within two different cases. Hence, the source interface srcI and the sub-consumer actor a 1 obey the deadlock freeness and consistency condition already proved in Lemma 1.
Based on the precedence constraints between two levels we determine the initial token amount in the FiFo connecting the hierarchical actor and the sub-consumer actor.
Theorem 3.6 In the direct communication case, the initial amount of tokens d(Ha, a 1 ) of FiFo connecting the hierarchical actor and the sub-consumer actor is given by in(a 1 ) − 1.
Since Z · srcI and srcI are both strictly superior than 0, then, even if we replace Z · srcI data by srcI data this inequality remains true. Hence, referring to inequality (8) , we obtain a precedence relation between an hierarchical actor and its sub-consumer actor when replacing in(a 1 )−1 by d (Ha, a 1 ).
Sink interface/sub-producer actor direct communication: a direct communication between the subproducer actor and the sink interface is defined through the following hierarchical dependency conditions: Based on these conditions we deduce the Lemma 5.
Lemma 3.7 Let S 1 be a composed subsystem with sample period T S1 containing a set of atomic blocks B 1 , B 2 , ..., B n firing in direct communication mode. B n represents the sub-producer atomic block with sample period T Bn . A hierarchical dependency exists between the w th execution of S 1 and u th execution of B n if: Where c is in [0..1[
When combining inequalities (16), (17) and (18) we obtain: To ensure deadlock freeness between the hierarchical actor and the sub-producer actor in direct communication case, we refer to Theorem 3.
Theorem 3.8 To ensure deadlock freeness between an hierarchical actor and its sub-producer actor, IBSDF introduces the sink interface concept such that: if sinkI data ≤ u · out(a n ). u β out(a n ) otherwise.
Equality (2) of Lemma 1 is obtained by replacing, in the resulting equation, Bn ) by out(a n ), γ and β by Where γ and β represent duplication numbers of the rate of tokens available in the sink interface within two different cases. Hence, the sink interface sinkI and the sub-producer actor a n obey the deadlock freeness and consistency condition already proved in Lemma 1.
Based on the precedence constraints between two levels we determine the initial token amount in the FiFo connecting the hierarchical actor and the sub-producer actor.
Theorem 3.9 In the direct communication case, the initial amount of tokens d(a n , Ha) of FiFo connecting the hierarchical actor and the sub-producer actor is given by snkI data − 1.
Since Z .out(a n ) and out(a n ) are both strictly superior than 0, then, even if we replace Z .out(a n ) by out(a n ), this inequality remains true. Hence, referring to inequality (9) , we obtain a precedence relation between an hierarchical actor and its sub-consumer actor when replacing snkI data − 1 by d(a n , Ha).
To transform a composed Simulink subsystem S 1 , with direct communication between levels, into a deadlock free and consistent hierarchical actor, we rely on the two following Corollary 1 and Corollary 2.
Corollary 3.9.1 To model direct communication between two levels of the hierarchy and ensure deadlock freeness and consistency, IBSDF introduces the source and sink interfaces concept such that: where srcI data = if sinkI data ≤ u · out(a n ).
u β out(a n ) otherwise.
where snkI data = Corollary 3.9.2 In the direct communication case, the initial amount of tokens d(Ha, a 1 ) of FiFo connecting the hierarchical actor and the sub-consumer actor is given by in(a 1 ) − 1 and the initial amount of tokens d(a n , Ha) of FiFo connecting the hierarchical actor and the subproducer actor is given by snkI data − 1.
To illustrate Simulink to IBSDF transformation in the direct communication case, we consider a multi-rate Simulink system S shown in figure 6 containing five blocks S 1 , S 2 , S 3 , B 1 and B 2 . S 1 , S 2 and S 3 are the blocks of the top level with sample times T S1 = 100ms, T S2 = 50ms and T S3 = 80ms, respectively. S 1 is composed subsystem containing two atomic blocks B 1 and B 2 with sample times T B1 = 20ms and T S3 = 30ms, respectively. ( S 2 and S 3 can be atomic or composed blocks, in this example S 2 and S 3 are composed subsystems but we only focus on S 1 transformation to illustrate our results.) Subsystems S 1 , S 2 and S 3 are transformed into hierarchical actors Ha 1 , Ha 2 and Ha 3 , respectively. Atomic blocks B 1 and B 2 are transformed into atomic actors a 1 and a 2 , respectively. Communications between S 1 and S 2 , Communications between S 1 and S 3 , Communications between B 1 and B 2 are obtained according to the rule of modeling one-level direct communication mentioned in section 3.3.1. We obtain as results: • in(Ha 1 ) = srcI data =   Communication between S 1 and B 1 and Communication between S 1 and B 2 are both direct multi-levels communications. To model these communications and ensure deadlock freeness and consistency during the transformation process , we apply Corollary 1. Note that the execution repetition numbers v and u are obtained basing on the "Compute Repetition Algorithm".

Modeling levels delayed communication
To model levels delayed communication, we have to model source/sub-consumer actor delayed communication and sink/sub-producer actor delayed communication.
Source/sub-consumer actor delayed communication: a delayed communication between the source interface and the sub-consumer actor is defined through the following hierarchical dependency conditions: Based on these conditions we deduce the Lemma 6.
Lemma 3.10 Let S 1 be a composed subsystem with sample period T S1 containing a set of atomic blocks B 1 , B 2 , ..., B n firing in delayed communication mode. B 1 represents the sub-consumer atomic block with sample period T B1 . A hierarchical dependency exists between the w t h execution of S 1 and v t h execution of B 1 if: Where b is a coefficient superior or equal to 1.
Combining the three inequalities (21), (22) and (23) we obtain: To ensure deadlock freeness between the hierarchical actor and the sub-consumer actor in direct communication case, we refer to Theorem 5.
Equality (1) of Lemma 1 is obtained by replacing, in the resulting equation, · v. Where x and α represent duplication numbers of the rate of tokens available in the source interface within two different cases. Hence, the source interface srcI and the sub-consumer actor a 1 obey the deadlock freeness and consistency condition already proved in Lemma 1. Based on the precedence constraints between two levels we determine the initial token amount in the FiFo connecting the hierarchical actor and the sub-consumer actor.
Theorem 3.12 In the delayed communication case, the initial amount of tokens d(Ha, a 1 ) of FiFo connecting the hierarchical actor and the sub-consumer actor is given by in(a 1 ) + srcI data − 1.
Since Z ·srcI data and srcI data are both strictly superior than 0. Then, even if we replace Z · srcI data by srcI data this inequality remains true. Hence, referring to inequality (8) , we obtain a precedence relation between an hierarchical actor and its sub-consumer actor when replacing in(a 1 ) + srcI data − 1 by d (Ha, a 1 ).
Sink interface/sub-producer actor delayed communication: a delayed communication between the sub-producer actor and the sink interface is defined through the following hierarchical dependency conditions: Based on these conditions we deduce the Lemma 7.
Lemma 3.13 Let S 1 be a composed subsystem with sample period T S1 containing a set of atomic blocks B 1 , B 2 , ..., B n firing in delayed communication mode. B n represents the sub-producer atomic block with sample period T Bn . A hierarchical dependency exists between the w th execution of S 1 and u th execution of B 1 if: Where c is in [0..1[
When combining inequalities (26), (27) and (28) we obtain: To ensure deadlock freeness between the hierarchical actor and the sub-producer actor in direct communication case, we refer to Theorem 7.
Theorem 3.14 To ensure deadlock freeness between an hierarchical actor and its sub-producer actor, IBSDF introduces the sink interface concept such that: if sinkI data ≤ u · out(a n ).
Equality (2) of Lemma 1 is obtained by replacing, in the resulting equation, T S 1 g (S 1 ,S 3 ) by sinkI data , T Bn g (S 1 ,Bn) by out(a n ), γ and β by Where γ and β represent duplication numbers of the rate of tokens available in the sink interface within two different cases. Hence, the sink interface sinkI and the sub-producer actor a n obey the deadlock freeness and consistency condition already proved in Lemma 1.
Based on the precedence constraints between two levels we determine the initial token amount in the FiFo connecting the hierarchical actor and the sub-producer actor.

Theorem 3.15
In the delayed communication case, the initial amount of tokens d(a n , Ha) of FiFo connecting the hierarchical actor and the sub-producer actor is given by snkI data + out(a n ) − 1.
Since Z > 0, then, even if we replace Z · out(a n ) by out(a n ), this inequality remains true. By consequence, referring to inequality (9) , we obtain a precedence relation between an hierarchical actor and its sub-consumer actor when replacing snkI data + out(a n ) − 1 by d(a n , Ha).
To transform a composed Simulink subsystem S, with delayed communication between levels, into a deadlock free and consistent hierarchical actor, we rely on the two following corollaries.
where srcI data = if sinkI data ≤ u · out(a n ). u β out(a n ) otherwise.
where snkI data = T S 1 g (S 1 ,S 3 ) ; out(a n ) =    (Ha, a 1 ) of FiFo connecting the hierarchical actor and the sub-consumer actor is given by in(a 1 ) + srcI data − 1 and the initial amount of tokens d(a n , Ha) of FiFo connecting the hierarchical actor and the sub-producer actor is given by snkI data + out(a n ) − 1. To illustrate Simulink to IBSDF transformation in the delayed communication case, we consider a multi-rate Simulink system S shown in figure 8 containing five blocks S 1 , S 2 , S 3 , B 1 and B 2 . S 1 , S 2 and S 3 are the blocks of the top level with sample times T S1 = 80ms, T S2 = 100ms and T S3 = 100ms, respectively. S 1 is composed subsystem containing two atomic blocks B 1 and B 2 with sample times T B1 = 60ms and T B2 = 20ms, respectively. ( S 2 and S 3 can be atomic or composed blocks, in this example S 2 and S 3 are composed subsystems but we only focus on S 1 transformation to illustrate our results.) Subsystems S 1 , S 2 and S 3 are transformed into hierarchical actors Ha 1 , Ha 2 and Ha 3 , respectively. Atomic blocks B 1 and B 2 are transformed into atomic actors a 1 and a 2 , respectively. Communication between S 1 and S 2 , Communication between S 1 and S 3 , Communications between B 1 and B 2 are obtained according to the rule of modeling one-level delayed communication mentioned in section 3.3.1. We obtain as results: • d(a 1 , a 2 ) = out(a 1 ) − 1 = 0.
Communication between S 1 and B 1 and Communication between S 1 and B 2 are both delayed multi-levels communication. To model these communications and ensure deadlock freeness and consistency during the transformation process , we apply Corollary 3. Note that the execution repetition numbers v and u are obtained based on the "Compute Repetition Algorithm": 20 20 . 60 80 .3 = 3 4 ≤ 1.

Modeling levels hybrid communication
To model levels direct communication, we have to model source/sub-consumer actor communication and sink/subproducer actor communication.
Source/sub-consumer actor hybrid communication: a hybrid communication between the source interface and the consumer sub-actor is defined through the following hierarchical dependency conditions: Based on these conditions we deduce the Lemma 8.
Lemma 3.16 Let S 1 be a composed subsystem with sample period T S1 containing a set of atomic blocks B 1 , B 2 , ..., B n firing in hybrid communication mode. B 1 represents the sub-consumer atomic block with sample period T B1 . A hierarchical dependency exists between the w th execution of S 1 and v th execution of B 1 if: Where b is a coefficient superior or equal to 1.
Proof: Hierarchical dependency conditions are translated into the following in-equations: When we added inequality (32) and inequality (33) we obtain: We multiply inequality (31) by -1 and added it with the resulted inequality. We obtain: Since w·#v+v−2 w > 1, it exists a coefficient b > 1 such that: To ensure deadlock freeness between the hierarchical actor and the sub-consumer actor in direct communication case, we refer to Theorem 9.
Equality (1) of Lemma 1 is obtained by replacing, in the resulting equation, · v. Where x and α represent duplication numbers of the rate of tokens available in the source interface within two different cases. Hence, the source interface srcI and the sub-consumer actor a 1 obey the deadlock freeness and consistency condition already proved in Lemma 1.
Based on the precedence constraints between two levels we determine the initial token amount in the FiFo connecting the hierarchical actor and the sub-consumer actor.
Theorem 3.18 In the hybrid communication case, the initial amount of tokens d(Ha, a 1 ) of FiFo connecting the hierarchical actor and the sub-consumer actor is given by in(a 1 ).
Since Z > 0, then, even if we replace Z · srcI data by srcI data this inequality remains true. Hence, referring to inequality (8) , we obtain a precedence relation between an hierarchical actor and its sub-consumer actor when replacing in(a 1 ) by d (Ha, a 1 ).
Sink interface/sub-producer actor hybrid communication: a hybrid communication between the subproducer actor and the sink interface is defined through the following hierarchical dependency conditions: • [w] fires strictely after the beginning time of [w ·#u+ u].
• [w − 1] fires before or at the same beginning time of [w · #u + u].
• [w] fires before or at the same beginning time of [w · #u + u + 1].
Based on these conditions we deduce the Lemma 9.
Lemma 3.19 Let S 1 be a composed subsystem with sample period T S1 containing a set of atomic blocks B 1 , B 2 , ..., B n firing in hybrid communication mode. B n represents the sub-producer atomic block with sample period T Bn . A hierarchical dependency exists between the w th execution of S and u th execution of B n if: Where c is in [0..1[
When combining inequalities (36), (37) and (38), we obtain: To ensure deadlock freeness between the hierarchical actor and the sub-producer actor in hybrid communication case, we refer to Theorem 11. Theorem 3.20 To ensure deadlock freeness between an hierarchical actor and its sub-producer actor, in hybrid communication case, IBSDF MoC introduces the sink

254
Translating Hierarchical Simulink Applications to Real-time multi-core Execution interface concept such that: if sinkI data ≤ u · out(a n ). u β out(a n ) otherwise.
Equality (2) of Lemma 1 is obtained by replacing, in the resulting equation, T Bn g (S 1 ,Bn) by out(a n ), γ and β by Where γ and β represent duplication numbers of the rate of tokens available in the sink interface within two different cases. Hence, the sink interface sinkI and the sub-producer actor a n obey the deadlock freeness and consistency condition already proved in Lemma 1.
Based on the precedence constraints between two levels we determine the initial token amount in the FiFo connecting the hierarchical actor and the sub-producer actor.

Theorem 3.21
In the hybrid communication case, the initial amount of tokens d(a n , Ha) of FiFo connecting the hierarchical actor and the sub-producer actor is given by srcI data .
We add T S1 and we combine the three inequalities, which results in the following equation: T Bn > T Bn ·(w·#u+u)−T S1 ·w+T S1 ≤ max(T Bn −T S1 , 0).
Since Z > 0, then, even if we replace Z · out(a n ) by out(a n ), this inequality remains true. Hence, referring to inequality (9) , we obtain a precedence relation between an hierarchical actor and its sub-consumer actor when replacing sinkI data by d(a n , Ha).
To transform a composed Simulink subsystem S, with hybrid communication between levels, into a deadlock free and consistent hierarchical actor, we rely on the two following Corollary 5 and Corollary 6.
if sinkI data ≤ u · out(a n ). u β out(a n ) otherwise. Where snkI data = Corollary 3.21.2 In the hybrid communication case, the initial amount of tokens d(Ha, a 1 ) of FiFo connecting the hierarchical actor and the sub-consumer actor is given by in(a 1 ) and the initial amount of tokens d(a n , Ha) of FiFo connecting the hierarchical actor and the sub-producer actor is given by sinkI data .
To illustrate Simulink to IBSDF transformation in the hybrid communication case, we consider a multi-rate Simulink system S shown in figure 10 containing five blocks S 1 , S 2 , S 3 , B 1 and B 2 . S 1 , S 2 and S 3 are the blocks of the top level with sample times T S1 = 100ms, T S2 = 10ms and T S3 = 100ms, respectively. S 1 is a composed subsystem containing two atomic blocks B 1 and B 2 with sample times T B1 = 50ms and T S3 = 30ms, respectively. ( S 2 and S 3 can be atomic or composed blocks, in this example S 2 and S 3 are composed subsystems but we only focus on S 1 transformation to illustrate our results.) Subsystems S 1 , S 2 and S 3 are transformed into hierarchical actors Ha 1 , Ha 2 and Ha 3 , respectively. Atomic blocks B 1 and B 2 are transformed into atomic actors a 1 and a 2 respectively. Communications between S 1 and S 2 , Communications between S 1 and S 3 , Communications between B 1 and B 2 are obtained according to the rules of modeling one-level hybrid communication mentioned in section 3.3.1. we obtain as results: • in(Ha 1 ) = srcI data =    Delays between levels are obtained according to Corollary 6: • d(Ha 1 , a1) = in(a 1 ) = 1.

Implementation
The overall extended work-flow ( figure 12) of our proposed approach is based on a specification of the application behavior with Simulink model, multi-core described with IPXACT Language, performance metrics estimation and automatic C code generation.
The first task is to transform a given Simulink model into IBSDF graph. During this task, three main functionalities are executed: As first step, Simulink model elements are gathered and converted into software objects using a Simulink Parser. Secondly, these software objects are translated into IBSDF objects as detailed in section 3 . Then an IBSDF graph Generator reconstructs the obtained objects into the IBSDF graph elements and generates the IBSDF graph format. The resulted graph undergoes some transformations [25] until obtaining a DAG graph to expose parallelism in an intuitive manner to the mapping.
The next step is to map each actor of the DAG into the multi-core platform in a specific manner using the simple ordering heuristic algorithm [21] which is a modified version of list scheduling algorithm [26]. The scheduling solution performance is evaluated using ABC modules [27]. The performance metrics estimation serves to evaluate the parallel system and helps designer to take the suitable decisions.
Once the mapping decision is made, the last task of the work-flow is to automatically generate a compatible C code for the target hardware platform. In order to achieve this, a host C code library is required. This library is resulted from the code generation of each Simulink block composing the model using Simulink coder tool. This workf-low was implemented into S-Preesm tool.

Results and Discussion
In this section, an embedded signal processing application is used to illustrate the efficiency of our approach in a realistic setting. Such LTE QPSK is a complex model adopting multi-core architecture on the transmitter and receiver sides, it is a well suitable example to demonstrate our approach capabilities. We have used S-Preesm tool to translate the Simulink model provided by [22] and generate a compatible C code to the parallel hardware platform.

Case study overview: LTE QPSK
The Long-Term Evolution LTE QPSK is a wireless communication of high speed data for mobile. The LTE system design, based on the MIMO OFDM technology and Turbo coding, is required to optimize mobile speeds ranging from 15 to 120km/h. The LTE QPSK is a multi-rate Simulink model which has three levels of hierarchy. The top level contains three adjacent subsystems: the transmitter, the receiver and the channel. The channel is required only for simulation, consequently, we do not take it into account. Further the transmitter and receiver channels are alike and treated in the same way. Then, in the rest of our work we only illustrate the LTE QPSK transmitter side. The Simulink model of the transmitter is presented in Figure 13. The top level includes 8 subsystems and atomic blocks: • The Bernoulli Binary Generator: it creates a Bernoulli random binary number. It generates 20 samples.
• The CRC encoder: it produces cyclic redundancy code bits for each input data frame.
• The Turbo encoder: it encodes continuous stream of data using a concatenated encoding structure and an iterative algorithm to decode the sequence. The turbo encoder was implemented as a composed subsystem. More details are found in [22].
• Modulation QPSK, 16QAM, 64QAM : the modulation is performed with a gray mapping.
• OFDM block: the orthogonal frequency division multiplexing (OFDM) is based on the fast reverse fourier transform (IFFT) of each data symbol corresponding to each transmitting antenna. OFDM is known as the best kind of modulation which is able to overcome multipath problems. OFDM block is implemented as a composed subsystem. OFDM block is implemented as a composed subsystem such as depicted in figure  14.
• The serial-to-parallel P/S block: it consists of converting multiple data stream, received simultaneously, from serial format to parallel format.

Transformation
The Simulink model of the LTE QPSK Transmitter chain had a hierarchy depth of two levels. We count 24 atomic blocks and 4 subsystems (we did not count output and input blocks).
As first step, we simulated the Simulink model to obtain sample times of each block (atomic and composed) composing the given model. We have, then, executed the transformation task of the work-flow. The transformation of the LTE QPSK transmitter Simulink model is successfully done by applying algorithms detailed and proved in section 3.
The resulting IBSDF graph is a consistent and deadlock free graph with the same hierarchy depth, the same numbers of actors (atomic and hierarchical) and the same number of FiFo channels as the input Simulink model. The output graph can be seen in figure 15.
To illustrate how our proposed approach is applied to this case study, we focused on the OFDM subsystem and detailed its translation. The OFDM subsystem belongs to the top level of the LTE QPSK transmitter model. This  figure 14. Communications between blocks and levels are direct. According to corollaries 1 and 2 we obtained: • the amounts of tokens available in the source interfaces and sink interface are respectively srcI data 1 = 25, srcI data 2 = 1 and sinkI data 1 = 25.
• The consumed data by the sub-consumer concate-nate2, the sub-consumer Select rows and the produced data by the sub-producer Add cyclic prefix are respectively equal to in(concatenate2) = 1, in(Selectrows) = 1 and out(Addcyclicpref ix) = 1.
• Source and the sink interfaces must be respectively duplicated α = 1/25 x = 1 and β = 1/25 times to ensure deadlock freeness between levels and the output sub-graph consistency.
• The initial amount of tokens available in the FiFo connecting the hierarchical actor OFDM and the sub-consumer concatenate2 is equal to d(OF DM, concatenate2) = 0. The initial amount of tokens available in the FiFo connecting the hierarchical actor OFDM and the sub-consumer Select rows is equal to d(OF DM, Selectrows) = 0. Similarly, the initial amount of tokens available in the FiFo connecting the hierarchical actor OFDM and the sub-producer Add cyclic prefix is equal to d(Addcyclicpref ix, OF DM ) = 0.
The resulted graph is illustrated in figure 16.
Since the transformation task is realized, the resulting graph is converted into a DAG containing 203 actors and 305 FiFo channels. The information about generated LTE QPSK transmitter graph is shown in Tab.1. Starting from this result, we can provide solutions for scheduling and code generation. The execution of the whole work-flow using S-Preesm tool is achieved over the shortest feasible time intervals. It takes only few Milli-seconds.

Code generation results and performance evaluation
After the transformation of the LTE QPSK transmitter side into a schedulable IBSDF graph through the pro-  posed model transformation framework, The IBSDF undergoes several operations to be ready for the generation code process, as described in section 4. The LTE transmitter Simulink application is translated into C parallel code utilizing the AAM (Algorithm Architecture Matching) [28] method. This method is based on generating selftimed coordination code from data-flow graph schedule. The host code and communication libraries are obtained by generating the code of each Simulink block composing the model by means of Simulink coder tool. In the onecore architecture case, the generated code using S-Preesm counts 1084 lines in which the host code library is not considered.
The output result from the code generation using Simulink coder is a vector of size (1000*1). This output corresponds to the data stream transferred to the LTE QPSK receiver side via the AWGN channel. To prove the correctness and efficiency of our approach we generate the code using S-Preesm tool. The result file was fairly close to the Simulink coder result.
To show the positive impact of our approach on the application performance, we deploy the generated codes into the Raspberry pi3 architecture. The Raspberry pi3 had a Broadcom BCM2837 processor 64Bit with Quad cores ARM Cortex-A53 and a clock speed with 1.2 GHZ. The Raspberry pi 3 represents a good hardware platform to introduce multi-core programming. Furthermore, we choose to evaluate performance using Raspberry pi3 because most of smart phone devices are using similar multi-core ARM processors as Raspberry. Metrics taken into account during the overall process are: execution time, speedup and efficiency. The speedup measure allows programmer to detect how much an application executed on multiple processors is faster than its execution on a single processor. Efficiency, the second performance metric, is deduced from the speed up metric. In fact, efficiency is the average utilization of n processors. It is obtained from the ratio of speed up and the number of processors allocated.

Performance results using Simulink coder tool
In this section we present the result using Simulink coder tool. Since passing from Simulink applications to multi-core implementations is not trivial as detailed in previous sections, we generated, using Simulink coder, a C code compatible only for single-core architecture. Then, we deployed the generated code into the Raspberry pi3 platform. The resulted execution time is of 0.288s. The achieved speedup and efficiency are equal to 1. This result is due to the fact that we use only a single core.

Performance results using S-Preesm tool
In this section, we generated C compatible codes of the LTE transmitter side application for several multi-core architecture using S-Preesm tool. First, generated code was deployed onto single-core. The resulted execution time is of 0.273 s. Since the target architecture is constructed with one core, speedup and efficiency are consequently equal to 1.
To analyze the impact of multi-core architecture on the "LTE transmitter side" execution time, speedup and efficiency, we generated C codes compatible for dual-cores, 3-cores and 4-cores using S-Preesm and starting from the application Simulink model. When dealing with dual core, the application execution time is of 0.185s. Speedup and efficiency values reach 1.49 and 0.745, respectively. We can observe in figure  17 that deploying into dual-cores noticeably improves the Simulink application performance compared to deploying into single-core.
Better performance results in terms of execution time and speedup when deploying into 3-cores are realized. Indeed, executing the Simulink application 3-cores architecture return an implementation with an execution time of 0.153s and improvement of 178% in speedup compared to the reached speedup when using single-core as depicted in figure 17.
The same application was deployed into 4-cores. Figure 17 showed That the minimum execution time of the  case study application is achieved on 4-cores compared to the single-core execution. Likewise, the best speedup value is reached with 2.48 on 4-cores architecture meaning that, compared to single-core execution, speedup is approved with 248%. Table 2 summarizes performance measurements for each target hardware implementation when using S-Preesm tool.

Results analysis
The obtained results in previous sections showed the efficiency of our proposal to improve Simulink applications performance. In fact, even deploying generated codes on single-core platform, the execution time of the code generated using our proposal is lower than the one generated using Simulink coder. This is due to the fact that Simulink Coder tool enforces the addition of memory buffers and latencies whenever there is a rate transition among nonvirtual blocks. Hence, the time performance of the application is negatively influenced. However, these additions are not requiered when using our approach. Further, S-Preesm implements a scheduling module which splits the scheduling/mapping functionnality and the evaluation cost of the generated solutions functionnality into two submodules. This division produces an advanced scalability in terms of schedule quality and execution time.
In order to demonstrate the effectiveness of our approach in improving Hierarchical Simulink application performance, we investigate the execution time-efficiency profile.
This profile represents an important costbenefit trade-off in evaluating multi-core application performance. Efficiency indicates benefit and execution time indicates cost. Figure 18 illustrates the profile for "LTE Transmitter side" Simulink application when using S-Preesm. In the first instance, we compare only ratios of efficiency to execution time resulting from Simulink coder and S-Preesm when deploying generated codes on singlecore platform. The ratio of efficiency to execution time resulting from Simulink coder is equal to 3.47 and the ratio of efficiency to execution time resulting from S-Preesm is equal to 3.66. We find that the use of S-Preesm yields better result. Furthermore, as depicted in Figure 18 the ratio of efficiency to execution time reaches the maximum when the execution is achieved onto 4-cores architecture.
Hence, compared to single-core execution, our Simulink application archives the most efficiency utilization of each core when executing onto 4-cores with a ratio of efficiency to execution time equal to 5.63. Thus, surveying results above, we reveal the impact of transforming hierarchical Simulink models into multi-core execution using our proposal in improving performance in terms of execution time, speedup and execution time-efficiency. Further, transforming the Simulink model of LTE  Figure 18. Execution time-Efficiency profile for "LTE Transmitter side.
QPSK Transmitter chain into multi-core execution using S-Preesm ease its parallelizing and allows us to take advantages of this high degree of parallelism. Moreover, OFDM subsystem can be reused for other similar systems. The use of our open source proposal allows to eliminate many constraints and configurations imposed by the commercial toolbox "real-Time Workshop Embedded Coder" required before code generation. S-Preesm allows also a cost-free parallel C code generation.

Conclusion
In this article, we have described an efficient approach to automatically optimize and transform hierarchical Simulink to multi-core execution. The proposed methodology consists of converting hierarchical Simulink models into an intermediate model before generating parallel codes. In this work, we proposed IBSDF as an intermediate representation. Our translation approach is the first to preserve and exploit hierarchy behavior of Simulink applications.
To achieve this, we extended the existing tool Preesm to support Simulink applications which we named S-Preesm. S-Preesm has been successfully applied to the hierarchical Simulink application "LTE transmitter side". Thanks to our translation strategy, we succeed to transform the complex Simulink application into a deadlock free and consistent IBSDF graph; where we can determine initial amount of tokens for each FiFo channel, consumed and produced data according to communication type between blocks and level of the Simulink model. After transforming the Simulink application, the obtained graph is subject to scheduling/mapping algorithm to perform parallel code generation. In addition, a host C code library corresponding to each graph actor, is created to contribute to the code generation.
Based on the complex Signal processing application "LTE transmitter side", experiments show the effectiveness and the potential of our approach in embedded systems developments. The comparison of our approach results and Simulink coder results demonstrates the efficiency of our technique to perform and facilitate the transformation of hierarchical Simulink applications into multicore execution.
For further developments, we may extend this work to support other block types such as conditional execution block which is characterized with variable periods. As well as future work , we may also adopt the hierarchical approach proposed in [16] to perform hierarchical Simulink applications mapping into multi-core architecture. We also aim to extend S-Preesm work-flow to support the optimal scheduler proposed by Rebaya et al. [24].