Performance realization of Bridge Model using Ethernet-MAC for NoC based system with FPGA Prototyping

Received Apr 9, 2019 Revised Mar 13, 2020 Accepted Mar 24, 2020 The System on Chip (SoC) integrates the number of processing elements (PE) with different application requirements on a single chip. The SoC uses busbased interconnection with shared memory access. However, buses are not scalable and limited to particular interface protocol. To overcome these problems, The Network on Chip (NoC) is an emerging interconnect solution with a scalable and reliable solution over SoC. The bridge model is essential to communicate the NoC based system on SoC. In this article, a cost-effective and efficient bridge model with ethernet-MAC is designed, and also the placement of the bride with NoC based system is prototyped on Artix-7 FPGA. The Bridge model mainly contains FIFO modules, Serializer and deserializer, a priority-based arbiter with credit counter, packet framer, and packet parser with Ethernet-MAC transceiver Module. The Bridge with a single router and different sizes of the NoC based systems with mesh topology are designed using adaptive-XY routing. The performance metrics are evaluated for Bridge with NoC in terms of average Latency and maximum throughput for different Packet Injection Rate (PIR).


125
 Most of the existing approaches are used as bus-based protocols for interconnection with NoC systems and lack a cost-effective solution and hardware complexities.  The work done on On-chip FPGA Prototyping for Bridge architecture with the inclusion of Ethernet-MAC is very less.  Most of the Hardware-based Bridge architecture uses off-chip Xilinx Ethernet-MAC Wrapper, and these systems are failed to maintain the synchronization issues.
Thus a cost-effective standalone bridge architecture with the inclusion of Ethernet-MAC for NoC based systems needs to fulfill the above gaps with better outcomes. The next section explains the detailed architecture of the Bridge Model to overcome the above gaps.

BRIDGE MODEL
The hardware architecture of the Bridge Model is represented in Figure 1. The Bridge model has four First in First out (FIFO) modules, multiplexer and demultiplexers, Serializer and deserializer, packet framer and packet parser, priority-based arbiter, credit counter, and Ethernet-MAC transceiver.
The Network Interface (NI) receives data signals via different request ports externally and sends it to the bridge model. The bridge models store the individual port data in different FIFO's. The synchronous FIFO used to write the data sequentially into corresponding memory locations. Each memory location is holding 32bit data. Once data values are filled in memory locations and full signal set to high, Read the data from the same memory locations till the last data and empty signal indicate the memory is location is empty and set to high. The width of the FIFO depends upon the number of data transferred on the Bridge model. The multiplexer receives the two or more FIFO's data parallelly and establishes the connection based on the priority arbitration. The priority-based arbiter receives the requests from the different ports like Data transaction level (DTL) with memory-mapped access and generates the grants based on the priority. The priority-based arbiter operates as a scheduler, to improve the QoS parameters in every bridge connections. The serializer process the multiplexed 32-bit data concurrently and generates the 8-bit sequential data. The Serializer acts as a Parallel In Serial Out (PISO) for data conversion, and Serializer works based on the credit counter. The deserializer provides the input credits to the credit counter. The credit counter is a 2-bit counter, and for every successive count, parallel to serial data conversion is performing in the Serializer. These serial data used as payload data in Ethernet-framer. The ethernet frame and packet format are represented in Figure 2.  Figure 2. Ethernet frame and Packet format The Ethernet frame is defined as per IEEE 802.3 [19] format and used in the bridge model. The Ethernet packet consists of an Ethernet frame with preamble and Start of delimiter (SFD). The Ethernet frame has 56-bit (7-byte) preamble used to synchronize the packet data with proper timing. The 8-bit SFD is used to initiate the frame data. The 6-byte of the destination address in the receiver and 6-byte of the source address in the transmitter is used to configure the network. The Medium access control (MAC) packet information is accessed by using 2-byte type or length field. A user can send up to 1500 bytes of data and store it in the payload. The Cyclic Redundancy check (CRC) is used for detection and correction error bits in the 4-bytes of Frame Check Sequence (FCS).
The Hardware architecture of the Ethernet MAC transceiver architecture [20] is represented in Figure  3, which is having Receiver and transmitter models. The receiver model contains receiver-Finite state machine (FSM), Frame Length Counter (FLC), CRC, Receiver-FIFO. The transmitter model has transmitter-FIFO, padding with zeros followed by CRC and transmitter-FSM. The carrier and collision signals are used to check the Ethernet-MAC is operate as a half or full-duplex mode. The receiver-FSM perform the data processing operation. If the data is a valid packet and it will continue the process, otherwise it will discard the packet. The data packets are stored in Receiver-FIFO, while the excess packets are dropped. The FLC will check the received packet is valid or error or rejected.  Figure 3. The hardware architecture of the Ethernet-MAC Transceiver In transmitter-FIFO, hold the packets till the packet timer expires. The transmitter-FSM receives the data packets and processes by using transmitter FIFO till the end of the frame. If the frame is short, the data packets are padded by zeros and applied to the FCS using CRC to validate the packets. If the packet collision happens in the data processing, the data enter into jam state in FSM. After successive retries of data transmission and move to the next stage, and it is ready to transmit the valid data packets to the next stage of the bridge model.
The packet parser receives the data packets from the payload of the Ethernet-MAC Transceiver and divides into 8-bits of serial packet sequentially and which is inputs to deserializer module. The deserializer works based on the Serial in parallel out (SIPO) manner and receives the 8-bit and convert to 32-bit parallel data using shifting operation with the counter. These counter values are input to a credit counter. The 32-bit parallel data received as an input to the demultiplexer, based on the counter values, generates the two or more data values and inputs to FIFO's. The transmitted FIFO's data must be the same as received FIFO's to validate the bridge model. In next section describes the bridge placement in NoC.

BRIDGE WITH NOC USING ADAPTIVE-XY ROUTING
The bridge model is interconnected to NoC based systems that offer the on-chip and off-chip data flow control, off-chip data transaction, off-chip with various interconnections, and arbitration between multiple connections of the NoC based Multiprocessing SoC (MPSoC). The MPSoC chips are considered as an ASIC or FPGA devices for prototyping of Bridge interconnection. An example of the Bridge interconnection with 2x2 NoC architecture is represented in Figure 4. In this example, The Bridge model receives the data signals from the external resources like LAN or internet cable to Ethernet-MAC port. The Ethernet-MAC receives the valid data packets and transmits to the bridge model. The bridge model receives the data packets and performs the bridge interconnection operation and pass to the router model via NI. The Mesh topology is used in this design for building the different 2X2, 3X3, and 4X4 NoC architectures. The 2x2 NoC has four routers (R1, R2, R3, and R4), and all the routers are interconnected using link wires. The R1 receives the Bridge interconnected data and perform the data transaction based on the destination address of the Routers. Each router model has the five -port input registers, followed by packet framing with arbitration and adaptive-XY routing algorithm [21].
The 5-port input register have east, west, north, south, and local port and receives the bridge data packets in local port and store it temporarily, and passed to the priority encoder. The priority encoder works based on an arbitration request. The arbiter receives the request from the input registers and generates the grants to process the encoded data Future-the encoded data used in the packet formation.  The router packet formation for Bridge NoC is represented in Figure 5. The packet formation is framed based on the request, destination XY address, and priority output. The 32-bit data acts as flit information, 2-bit destination X, and 2-bit destination Y address and 1-bit request form the router packet, which is having 37bits, and it is considered as single 'Phit.' In this design, NoC is having 37-bits (1-Phit), which forms the 5-bytes.

Request
Dest.X

1-bit 37 bits
Dest.Y Flit 2-bits 2-bits 32-bits The current XY address is fixed as an identity for each router. The router R1 has 4-bit XY current address "0000". Similarly, for R2-0001, For R3-0100 and R4-0101. The routing computation in NoC is performed based on the routing algorithm. In this work, the adaptive-XY routing algorithm is considered. An adaptive-XY routing algorithm is an adaptive form of the Normal -XY Routing algorithm. The routing algorithm operates the 1 st dimension in X-direction, and 2 nd dimension in Y-direction with less number of routing path is confirmed, and the packet is assisted to destination dimension with less congestion. The shortest routing path is best, which finds the alternative congestion-free routing to avoid the input traffic based on the congestion parameter configuration. In this example, The Router R4 is considered as a destination location, and the bridge data packets will be reached to the destination based on the routing algorithm. In the next section, The Hardware synthesis results and performance analysis in terms of Latency and throughput are evaluated for Bridge with NoC Architecture.

RESULTS AND DISCUSSION
This section gives the outcome of hardware synthesis and performance analysis of the Bridge -NoC architecture using mesh topology with different network sizes. The proposed work is synthesized on Xilinx Environment using Verilog-HDL, and the implementation with Hardware prototyping is considered using the Artix-7 FPGA Platform.

Hardware synthesis Results
The Standalone Bridge with and without the inclusion of the Ethernet-MAC packet transceiver is incorporated in the designs. In advanced FPGA development boards, the Ethernet-MAC wrapper is available as an IP Core by Xilinx, and it is not considered in the proposed bridge architecture. The synthesis results of a bridge with and without the inclusion of the Ethernet-MAC are presented and tabulated in Table 1  The resource utilization of the Bridge with NoC includes the single router, 2x2, 3x3, and 4x4 using Mesh topology-based architecture are tabulated in table 2.and represented in Figure 6. The bridge with a single router, 2x2, 3x3 and 4x4 area cost is < 2%, >2% , >3 % , > 4% of FPGA resources respectively. If the size of the network increases, the Bridge-NoC Architecture area resource utilization will also increase. The Bridge with 4x4 NoC requires 2379 slice Registers, 2846 slice LUT's, 2100 LUT-FF pairs. The Bridge with 4x4 NoC operated up to a 214.24MHz frequency on Artix-7 FPGA, which speeds up the Bridge with NoC architecture, and it is suitable for real-time scenarios. These synthesis results show that the proposed Bridge -with NoC architecture is effectively implemented and prototyped on the FPGA platform. The proposed work also stated with better speed, and less FPGA resource utilization has been achieved.

Performance Analysis
The performance analysis of the Bridge with NoC architecture for different sizes are analyzed, which includes average Latency and Maximum throughput for input load in terms of PIR. The uniform traffic pattern is considered as an input load for analysis purpose. The total number of input data packets that can be sent in a single clock cycle is called a PIR. For example, if the Bridge is having a PIR of 0.5 means, Bridge can send 50 input packets in 100 clock cycles. The average Latency per flit of the Bridge is represented in terms of clock cycles (ns). The flit is a 32-bit input data packet used in the NoC systems. The minimum Latency of the Bridge with Ethernet-MAC is calculated using the number of flits used and bridge architecture latency. The Bridge with Ethernet-MAC consumes 1232 clock cycles to complete the bridge packet transceiver operation. For a bridge with a single router or NoC latency is calculated based on the number of IP's used, minimum bridge latency along with router data flow logic. The router data flow logic takes two clock cycles to perform the routing operating in NoC from source to destination based on the adaptive -XY algorithm. The average Latency of a bridge with a single router and 2X2, 3x3, and 4x4 NoC architectures are analyzed for PIR are presented in figure 7. The Latency of the Bridge with a single router uses 740 clock cycles, and Bridge with 4x4 NoC uses 11846 clock cycles at 0.6 PIR.  The comparative study of the proposed Bridge with Ethernet -MAC with existing Bridge with AXI [22] and Ethernet-MAC [11] are tabulated in Table 3  The Bridge based NoC Router is compared with existing wishbone-Bridge based NoC Router [4] with constraint improvements by concerning Slices, Flip-flops, and Fmax.

CONCLUSION
This research article presents an efficient and cost-effective bridge model with the inclusion of the Ethernet-MAC and also adopted the bridge model with NoC based systems. The bridge model offers the robustness by the inclusion of the Ethernet-MAC and which prototype quickly on on-chip FPGA Devices. The Bridge with a single router and different sizes of Mesh topology-based NoC is designed using congestion-free adaptive-XY routing. The synthesis results of the bridge architecture without and with the inclusion of the Ethernet-MAC utilizes < 1% and > 1% of FPGA resources and also Bridge with a single router and 4X4 NoC use < 2% and >4% of FPGA resources. The Bridge with 4X4 NoC operates at 231.535MHz on FPGA. The performance metrics of the Bridge with NoC consists of average Latency and maximum throughput for different PIR. The Bridge with single router works at 4.615 Gbps, and Bridge with 4x4 NoC operates at 71.576 Gbps at 0.6 PIR. The Bridge with Ethernet-MAC and Bridge-based NoC Router are compared with existing approaches with better improvements in hardware constraints. This model can be incorporated in futuristic researches with the security features to bridge and NoC based systems to strengthen the data packets from attacks.