



IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY

Volume: 6 Issue: V Month of publication: May 2018

DOI: http://doi.org/10.22214/ijraset.2018.5467

# www.ijraset.com

Call: 🛇 08813907089 🕴 E-mail ID: ijraset@gmail.com



Vadhiraj Kulkarni<sup>1</sup>, Pooja M<sup>2</sup> <sup>1</sup>4<sup>th</sup> Sem Digital Electronics Appa IET Kalaburgi Karnataka, India <sup>2</sup>Dept Digital Electronics, Professor Appa IET Kalaburgi, Karnataka, India

Abstract: Fixed-latency serial links are important components of the distributed measurement and control systems. However, most high-speed Serializer-Deserializer (SerDes) chips do not keep the same link latency after each power-up or reset. In this project, we propose a fixed-latency serial transceiver based on dynamic clock phase shifting and changeable delay tuning technologies. Our solution can process all possible phase offsets between the transmitted and received clocks, so it relaxes the requirement of fanning in the same reference clock both to the transmitter and to the receiver. It also eliminates the reset-relock process in the roulette approach. We present a specific example of implementation based on the serial transceiver in FPGA. The experiment results indicate that our transceiver can achieve a deterministic latency with sub-nanosecond precision. Index Terms: ATLAS, fixed latency, field-programmable gate array (FPGA), serial link

## I. INTRODUCTION

A Multi-Gigabit Transceiver (MGT) is a SerDes capable of operating at serial bit rates above 1 Gigabit/second. MGTs are used increasingly for data communications because they can run over longer distances, use fewer wires, and thus have lower costs than parallel interfaces with equivalent data throughput. Latency variations may come from both the serial and parallel sections of a SerDes device. In a serializer, the parallel transmit clock is multiplied to provide the high-speed clock to the serial side of the PISO. In the deserializer, the high-speed recovered clock from the CDR is divided to provide the low-speed recovered clock (i.e. cd) to the parallel side of the SIPO. If the divided clock phase is chosen randomly, we have a potential phase variation of the parallel recovered clock with respect to the transmit clock in terms of integer numbers of UIs. A phase variation of the recovered clock implies a latency variation of the data transferred on the link. As far as it concerns the parallel section, latency variations may be induced by the presence of elastic buffers. A dedicated mechanism is needed to ensure that always the same number of words has been written in the buffer before they start being read (the receiver elastic buffer of the GTP implements this feature). Like other SerDes, the primary function of the MGT is to transmit parallel data as stream of serial bits, and convert the serial bits it receives to parallel data. The most basic performance metric of an MGT is its serial bit rate, or line rate, which is the number of serial bits it can transmit or receive per second. Although there is no strict rule, MGTs can typically run at line rates of 1 Gigabit/second or more. MGTs have become the 'data highways' for data processing systems that demand a high in/out raw data input and output (e.g. video processing applications). They are becoming very common on FPGA - such programmable logic devices being especially well fitted for parallel data processing algorithms Many distributed measurement and control systems are based on networks of high-speed serial links. In these systems, the timing and control signals (clock, reset, and trigger, etc), along with the data, are transferred across the serial links. Thus, a serial link distributing these signals with a fixed latency is highly desirable in these applications. However, most of the commercial high-speed Serializer-Deserializer (SerDes) chips do not keep the fixed latency through their data paths after each power-up or reset. For example, the chip latency of SCAN25100 and TLK2711 A vary after each power-up. Fixed chip latency requires additional hardware processing circuits and is usually not needed for most data communication applications. Therefore, how to implement a fixed-latency serial transceiver is an important issue when designing these distributed systems. We briefly summarize some representative works in the field of serial links for application to distributed systems. A typical example is the Timing, Trigger, and Control (TTC) system, which has successfully been used as the trigger systems of Large Hadron Collider (LHC) experiments. Aiming at the possible problems of timing synchronization and data transfer in the Super LHC experiments, CERN launched the Gigabit Transceiver (GBT) project [4], which features fixed latency data transfer and phase matched clock recovery. Another project being developed at CERN is White Rabbit, which can synchronize more than 1000 nodes with subnanosecond precision over lengths of 10 km. A transceiver is a device comprising both a transmitter and a receiver which are



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue V, May 2018- Available at www.ijraset.com

combined and share common circuitry or a single housing. When no circuitry is common between transmit and receive functions, the device is a transmitter-receiver.

### **II. RELATED WORK**

"High-speed, fixed-latency serial links with FPGAs". An important application of fixed latency serial for high energy physics experiments links is in trigger systems that needs a feasible data transfer timing. Here, the authors have introduced architecture with respect to high-speed transceivers that can be embedded in newer generation of FPGA's. Latency of such transceivers is not fixed by default. Hence, a clocking scheme and configuration are implemented to attain it.

The automatically tuned linear algebra software (ATLAS) experiment level-one muon trigger is constructed as a synchronous pipeline this includes some fixed-latency serial links for data transfer from the detector to the counting room. The links are based on Agilent G-Link, whose production halted and no compatible off-the-shelf chip-sets are available. In order to have a replacement solution for G-Link-based links, its protocol is adopted.

"Fast Control and Timing distribution based on FPGA-embedded serial Transceivers". In high energy physics experiments, the fast control and timing distribution system (FCTS) must provide the minimum jitter clock distribution network and should transmit data at fixed latency. In fact, trigger signals are a part of the data transferred which also include fast control commands, which need their timing to be preserved.

The embedded high speed serializer-deserializer (SerDes) available in the new generation field programmable gate arrays (FPGAs) can be effectively used for implementation of serial links for FCTS application. Here, based on FPGA-embedded SerDes we perform jitter measurements on a link for FCTS, fixed latency data and clock recovery is achieved even after a loss of clock-loss or a loss of power cycle.

This architecture is implemented within Xilinx GTP transceiver embedded in virtex 5 FPGAs. A variety of tests are performed at a link rate of 2.5 Gigabit/s and a distributed clock network is running at 62.5 MHz. 8b/10b encoding scheme is employed for link testing which is a well flourished standard and also by the scrambling method adopted by the gigabit transceiver project under development at CERN.

S-LINK on a Chip for Embedded Applications". Some of the data acquisition systems of the LHC experiments have inculcated S-LINK which was developed by CERN. It was introduced in late '90s and it is a custom data transmission protocol. S-LINK is based on a simple FIFO type user interface that remains free of the technology is used to implement the physical layer.

In accordance to the raising need of higher density, bandwidth and embedded applications, the authors have developed a single chip version of the S-LINK board which is wide deployed in the ATLAS experiment. This architecture is based on latest generation Xilinx FPGA with high speed serial transceivers in the multi Gb/s domain. Various targets of different requirements can be achieved by a specific implementation.

The requirements include increasing bandwidth performance of the link, keeping it backward a compatible and as well for real time control operation there is a need for fixed latency.

In this paper, they have described the details of S-LINK implementation and various measurements carried out. Specifically, the performance in terms of BER, bandwidth, backward compatibility and fixed latency are analyzed well. Resource occupation is also addressed in the view of a possible IP release.

"Characterizing Jitter Performance of Multi Gigabit FPGA-Embedded Serial Transceivers". The data acquisition systems for high energy physics have addressed high speed serial links as one of their key component. Various physics events like data and even clock, trigger and fast control signals are carried out by them.

For the latter applications, the jitter on the recovered clock from the serial stream happens to be a critical parameter as it in turn affects the timing performance systems. Multi-gigabit serial transceiver has been included in the latest architecture of FPGA, further this can be configured with given options and support many data encoding. Here, authors present various jitter measurements on the recovered clock by a GTP (gigabit transceiver pair) transceiver (embedded in a Xilinx Virtex 5 FPGA) as a function of coding, data pattern and logic activity on the transmitter and receiver FPGAs.

"High speed serial transceivers for data communication systems". This paper describes the critical circuit design and various architecture issues related to high speed serial data links at more than 1Gb/s.

Power vs. Performance trade-offs are presented for synchronous optical networking and synchronous digital hierarchy (SONET/SDH) transceivers and backplane transceivers for Infiniband or similar standards



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue V, May 2018- Available at www.ijraset.com

III. DESIGN METHODOLOGY



Figure 2 Block diagram of the Receiver

A simplified block diagram of the Xilinx 7 series transceiver is shown in Fig1, in which Fig. 2 represents the RX interface and Fig. 1 depicts the TX side. In Fig. 2, an incoming serial stream passes through the clock data recovery circuit and is converted into parallel data from the serial input parallel output (SIPO) module. The parallel stream from the SIPO is synchronized by the recovered parallel clock (XCLK) and can be programmed to cross a series of user service modules following the SIPO, such as "comma detect and align" and "8-b/10-b" decoder, before arriving at the FPGA fabric. There are two clock domains along the signal path: the physical medium attachment (PMA) clock domain with XCLK and the physical coding sublayer (PCS) clock domain with the user clock from the FPGA fabric (USRCLK). An RX elastic buffer is available to resolve the difference between the two clock domains; however, it is not recommended for applications with low and deterministic latency . In the implementation of the router, we bypass all the user service modules and the RX stream is sent to the FPGA fabric directly from the SIPO, as shown in Fig. 1. A phase-alignment circuit in the RX is used to adjust the phase difference between XCLK and USRCLK in this case There are also two clock domains on the GTP TX side, spanning several user service modules, such as "8-b/10-b" encoder, as shown in Fig. 2(b). Similar to the RX side, the parallel data from the FPGA fabric flows into the parallel input serial output (PISO) module without passing through any user service modules for low and deterministic latency [9], as indicated by the solid line in Fig. 2(b). The phase difference between the TX parallel clock in the PMA and the user clock in the PCS is adjusted by a phase-alignment circuit in the transmitter.



Fig. 3. Processing of serial streams inside the FPGA of the router: parts 1 and 7 correspond to the transceivers; parts 2–6 represent the user logic.



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue V, May 2018- Available at www.ijraset.com

- A. There are a total of Seven Stages in the Flow, Denoted by 1–7 in fig. 3
- Stage 1: GTP RX: There is a total of 12 GTP RXs at the inputs: RX #0-11. As the GTP RX does not support 30-bit length operation in the SIPO, we set it to 20 bits. Correspondingly, the 4.8-Gbps serial stream is transferred to a 240-MHz clock domain in the RX.
- 2) Stage 2: Packet Decoder: The 20-bit parallel data from a GTP RX are buffered and reassembled in 30-bit FEB packets by keeping track of the periodic occurrence of 4-bit headers ("1010" or "1100") every 30 bits in the buffer. The data stream is then transformed from the GTP RX 240-MHz clock domain to a 160-MHz clock domain
- 3) Stage 3: Descrambler: Descrambler recovers the 26-bit payload of a 30-bit packet. Headers are not involved in the operation.
- 4) Stage 4: Packet Switch: Recovered packets from all available inputs are gathered. NULL are suppressed and data packets are forwarded. IDLE packets are inserted in each TX whenever necessary.
- 5) Stage 5: Scrambler: The payload of forwarded packets in each output link is scrambled again via [10] to keep dc balance.
- 6) Stage 6: Packet Builder: The PISO in a GTP TX does not support 30-bit operation, either. Packets are buffered and prepared in 20-bit format. Correspondingly, the clock domain returns to a TX 240 MHz domain from the 160-MHz clock domain.
- 7) Stage 7: GTP TX: The 20-bit packets are loaded and serialized. In total, there are four outputs: TX #0–3.



IV. RESULT AND ANALYSIS

Figure 4 simulation result of data latency variation with reference to different phases

The simulation result shown in figure 4 gives variation of data latency at different phase offsets of clock. To observe this we have generated four clocks with different phases. The four clocks are  $Z_clk (0^\circ)$ ,  $N_clk (90^\circ)$ ,  $O_cclk (180^\circ)$  and  $T_cclk (270^\circ)$  and these clocks are shifted by 90° from one another. When the 'start' bit is high and clock is turned on, we have given a data\_in as '0001'. When this data is received for different phase clock, we can observe the bit shift in received data. As we can observe from above figure, data received at  $Z_clk$  is '0001' bit-shift observed is 0;  $N_cclk$  is '0010' bit-shift observed is 1 and  $O_clk$  is '0100' bit-shift observed is 2;  $T_cclk$  is '1000' bit shift observed is 3



Figure 5 Simulation result of Transmitter

In the simulation result shown in figure 5, when the 'start' bit is high and clock is turned on, we have

A. Pin'A' represents the given input data which is the input of the encoder. Pin'B' represents the output symbol of 20b/24b encoder block.



International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue V, May 2018- Available at www.ijraset.com

INPUT Pin 'A'- 11001100110011000011; OUTPUT Pin'B' - zz1001100110011000001111;

- *B.* When, write\_en is turned high & Read\_en is turned low; FIFO write's the output of encoder. Further, when read\_en is high & write\_en is low; FIFO reads the stored data. But when the FIFO is bypassed the output from encoder is directly fed to the PISO block.
- *C.* PISO block converts the parallel output of FIFO to a serial bit stream and this data is transmitted to the receiver which is shown in pin 'Tx'.



Figure 6 Simulation result of CDDA block

In the above simulation result figure 6, when the 'start' bit is high and clock is turned on, we have The data received at the receiver undergoes deserialization and buffering (if buffer is not bypassed). When bypassed the output of the SIPO block is given to the CDDA block, which is given by Pin 'data\_in'. For the received parallel input data\_in -0011001100101010101101; Output of CDDA block is 'data\_out'- 00101010110100110011; This implies that the received data is shifted by 8 bits as the 'u' is found at 8<sup>th</sup> position , hence n=01000. Output of CDDA gives the realigned data by eliminating the comma bit u and as well the calculated bit shift value 'n'.



Figure 7 Simulation result of DCPS block.

In the simulation result shown in the figure 7, when the 'start' bit is high and clock is turned on, we have the bit-shift value 'n' which was obtained in CDDA block is given as input to the DCPS block. Let us consider an example : consider 'n' = 00010 that is 'n' = 2 H. So from the equation 2 (given in chapter 4)  $\Delta P = n * 360^{\circ}/N$ ; where N is the data bit width. As N is 20 bits , hence for  $n = 2 H => \Delta P = 18^{\circ}$ . This implies that all the clocks generated by the DCPS block are shifted by a phase of 18° and these clocks are as well shifted 90° apart from one another. The output clocks are, Z\_clk (0°), N\_clk (90°), O\_clk (180°) and T\_clk. (270°).

A solution of the A solution o

International Journal for Research in Applied Science & Engineering Technology (IJRASET)

ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887 Volume 6 Issue V, May 2018- Available at www.ijraset.com



Figure 8 Simulation result of CDT block.

In the above simulation result figure 8 we have observed the output of CDT block for two cases with different data and bit shift values n, when the 'start' bit is high and clock is turned on, we have the four clocks generated in the DCPS block are fed as input to the CDT block. The input four clocks are, Z\_clk (0°), N\_clk (90°), O\_clk (180°) and T\_clk. (270°).

When the bit-shift value 'n' = 01100 and for the given n  $\Delta P = 216^{\circ}$ . As we have already observed that each clock is shifted by  $\Delta P$ . As,  $\Delta P$  is greater than 180° but less than 270°. Hence the data is received in the O\_clk domain; which is a clock domain with 180° phase shift.



Figure 9 Simulation result of complete transceiver.

All the four simulation results which we have observed above can be termed as the part of the this simulation. In the above simulation result figure 5.6, when the 'start' bit is high and clock is turned on, we have

The input data is given at pin 'A' =1100110011001100. The input 16 bit data is converted to a 20 bit symbol by the 16 b/20 b encoder. When write\_en is high & read\_en is low the 20 bit symbol from the encoder is written in the FIFO and when write\_en is low & read\_en is high the data is read from the FIFO. This data after serialization is transmitted by a Tx driver. At the receiver, the received data undergoes de-serialization and is buffered (if elastic buffer is not bypassed). After this process it is fed to the CDS block. Now the simulation results from are applied to get the output at the receiver as  $Rx_out = 11001100110011001100$  and appropriate Rec\_clk as output.

## V. CONCLUSION

The applications of fixed latency serial links can be found more evidently in distributed measurement and control systems. However, most SerDes chips do not support fixed-latency transfers. Based on the dynamic clock phase shifting and changeable delay tuning technologies, we develop a fixed-latency serial transceiver. Our solution can process all the possible clock phase offsets, so the transmitter and receiver need not use the same reference clock. Utilizing the external circuit to perform the clock



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 6.887

Volume 6 Issue V, May 2018- Available at www.ijraset.com

phase-shift and data bit-shift, our solution reduces the dependence on the transceiver architecture. We make some suggestions for applying our scheme to other SerDes devices.

#### REFERENCES

- [1] Microsemi PolarFire FPGA Transceiver User Guide, document UG0677, Jun. 2017. [Online] Available: https://www.microsemi. com/documentportal/doc\_view/136531-ug0677-polarfire-fpgatransceiver-user-guide
- [2] Altera Cyclone V Device Handbook: Transceivers, vol. 2, Altera, San Jose, CA, USA, Jan. 2016. [Online] Available: https://www.altera.com/en\_US/pdfs/literature/hb/cyclone-v/cv\_5v3.pdf
- [3] 7 Series FPGAs GTP Transceivers User Guide, document UG482, v1.9, Xilinx, San Jose, CA, USA, 2016 [Online]. Available: https://www. xilinx.com/support/documentation/user\_guides/ug482\_7Series\_GTP\_ Transceivers.pdf
- [4] J. Wang et al., "FPGA implementation of a fixed latency scheme in a signal packet router for the upgrade of ATLAS forward muon trigger electronics," IEEE Trans. Nucl. Sci., vol. 62, no.5, pp. 2194–2201, Oct. 2015.
- [5] B. Deng et al., "Component prototypes towards a low-latency, smallform-factor optical link for the ATLAS liquid argon calorimeter phase-I trigger upgrade," IEEE Trans. Nucl. Sci., vol. 62, no. 1, pp. 250–256, Feb. 2015.
- [6] J. Wang et al., "Characterization of a serializer ASIC chip for the upgrade of the ATLAS muon detector," IEEE Trans. Nucl. Sci., vol. 62, no.6, pp. 3242–3248, Dec. 2015.
- [7] 7 Series GTP Transceivers TX/RX Latency Values, document AR# 58981, Xilinx, San Jose, CA, USA, Jan. 2014. [Online]. Available: http:// www.xilinx.com/support/answers/58981.html
- [8] X. Liu, Q.-X. Deng, and Z.-K. Wang, "Design and FPGA implementation of high-speed, fixed-latency serial transceivers," IEEE, Trans. Nucl. Sci., vol. 61, no. 1, pp. 561–567, Feb. 2014.
- [9] R. Giordano and A. Aloisio, "Fixed-latency, Multi-Gigabit serial links with Xilinx FPGAs," IEEE Trans. Nucl. Sci., vol. 58, no. 1, pp. 194–201, Feb. 2011.
- [10] Aloisio, F. Cevenini, R. Giordano, and V. Izzo, "Emulating the GLink chip set with FPGA serial transceivers in the ATLAS level-1 Muon trigger," IEEE Trans. Nucl. Sci., vol. 57, no. 2, pp. 467–471, Apr. 2010.











45.98



IMPACT FACTOR: 7.129







INTERNATIONAL JOURNAL FOR RESEARCH

IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY

Call : 08813907089 🕓 (24\*7 Support on Whatsapp)