# INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Volume: 11 Issue: VIII Month of publication: Aug 2023 DOI: https://doi.org/10.22214/ijraset.2023.55432 www.ijraset.com Call: © 08813907089 E-mail ID: ijraset@gmail.com ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com ### Design of Double 16\_32 -Bit RISC Processor Nagendra Prasad N<sup>1</sup>, Dr. Sujatha Hiremath<sup>2</sup>, Arjumanth Farraj<sup>3</sup> Department of ECE, RV College of Engineering Abstract: The SOCs built today offer a high level of functionality, serve a variety of applications, and improve in efficiency and cost. Embedded systems also face area and power consumption constraints in addition to real-time challenges. The main objective is to design and implement a 32-bit High-performance RISC (Reduced Instruction Set Computer) Processor architecture. The Processor is designed as an instantiation of submodules using Verilog HDL (Hardware Description Language). a 16-bit compatibility is introduced which makes use of the ISA to execute two 16bit operations at the same time and thus provides the capability to switch and execute both 32-bit and two 16-bit operations using the execution unit. The ISA is modified to meet the requirement to execute both 16-bit operation and 32-bit operations. Each of these instructions are independent of the other instruction and can be executed simultaneously. This enables the RISC based architecture to also enhance the speed of the design by a factor of 2 for 16 bit operations. Keywords: RTL,RISC architecture,32-bits,16-bits, Verilog, variable frequency, ISA(instruction set architecture) #### I. INTRODUCTION Processors are an essential part of any electronic gadgets used to control and operate various functionality according to the user's needs MIPS is an architecture that is used to design a MIPS-based RISC processor. MIPS design is based on the RISC principle, so it has fixed length instructions with a few different formats. It also emphasizes the load/store architecture. The access time of the register is much faster than the access time of memory, so it is more advantageous in terms of speed to perform any operations in an on-chip register rather than in memory. To eliminate the impact of memory operation, MIPS uses load store architecture where memory access is only required when load and store instructions are being fetched. However, nowadays, performance is an essential parameter for any electronic gadget, so to improve the overall performance. Still, hazards have to be dealt with using a pipeline MIPS processor. Data hazard occurs in a pipeline when an instruction depends on the result of the previous instruction that is already in process and not fully executed. Hazards can be rectified by adding an extra hardware unit known as a forwarding unit that directly forwards the result of ALU through some multiplexer as an input of ALU for the next instruction if required. Computer architecture composes of computer organization and the Instruction Set Architecture, ISA. ISA gives a logical view of what a computer is capable of doing and when you look at computer organization, it basically talks about how ISA is implemented. Both these put together is normally called computer. There are two main architectures around which most of the Processors are designed around: Complex Instruction Set Computer (CISC) and Reduced Instruction Set Computer (RISC). RISC is one of the types of microprocessors that uses an extremely improved set of instructions. It is used as an alternative of CISC and it's considered to be the most efficient CPU architecture #### II. METHODOLOGY The flow diagram as shown in figure 1 indicate the RTL to GDS flow where the RTL code is functionally verified using the NC launch simulator and the synthesis process starts to create the gate-level netlist for the design is obtained where the required constraints and the required technology library files are necessary for the optimization and mapping to the particular technology library to take place. The gate-level netlist becomes the input file for the Cadence INNOVUS tool to start with the physical design process. Below steps indicate the flow Figure 1. RTL to GDS flow #### International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com - In this design, four stages of pipeline which are Instruction Fetch(IF), Instruction Decode(ID), Execute Stage(EX), Memory Access, and Write Back Stage(MEM WB) have been carried out. The sub-module of the processor will be first designed, coded, and tested by employing bottom-up design methodology. - 2) Once all sub-modules were designed and established to be fully functional, they were instantiated into a top module to develop the RISC processor. - 3) The processor will then be tested by executing a comprehensive set of instructions while verifying proper functionality and timing. - 4) The synthesis of the top-level module gives the gate level netlist for the PD process. - 5) The physical design process starts once the design is functionally verified. - 6) Physical design part includes partitioning, floor planning and routing. - 7) In partitioning complex circuit will be reduced into simple blocks. - 8) During the floor planning process, all the blocks will be assigned with proper boundary. - 9) In the routing part, depending upon the requirement local and global routings will be carried. - 10) The physical design process takes the netlist, .sdc, .lib, .lef files for both placement and routing of the design. - 11) The placement of the design takes place based on the technology library file and library exchange format file which contains all the metal layer information, design rules, and abstract-level information. - 12) Based on this file, placement of the standard cells in the gate-level netlist will be placed on the core area of the design partition. The architecture of the design is illustrated below figure 2 Figure 2. Architecture of the RISC processor #### **III.IMPLEMENTATION** All RISC instructions can be classified into three groups in terms of instruction encoding: - 1) R-type (Register) - 2) I type (Immediate) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com TABLE I ISA OPCODE VALUES FOR THE INSTRUCTION | OPCODE | | | | | |--------|--|--|--|--| | 0000 | | | | | | 0001 | | | | | | 0010 | | | | | | 0011 | | | | | | 0100 | | | | | | 0101 | | | | | | 0110 | | | | | | 0111 | | | | | | 1000 | | | | | | 1001 | | | | | | 1010 | | | | | | 1011 | | | | | | 1100 | | | | | | | | | | | The Opcodes for the different instructions which can be executed is illustrated in the table 1. The ISA for the 32 bit instruction set is illustrated below in the figure 3. | | opcode | Source Reg | Secondary Reg | <b>Destination Reg</b> | Immediate value | |----|--------|------------|---------------|------------------------|-----------------| | 31 | 26 | 5 25 | 21 20 | 16 15 | 11 10 | Figure 3. ISA for 32 bit instruction set - The most significant bits from 31:26 contains the opcode for the type of operations to be performed. - The Source Register address is obtained from the bits 25:21 - The secondary Register address is obtained from the bits 20:16 - The destination Register address is obtained from the bits 15:11 - The Immediate values is provided from the bits 10:0 in the instructions #### 3) 16-Bit Compatibility The instruction set architecture and the decoding by the decoder are built in such a way that two 16-bit instructions can be decoded and executed at the same time. No additional hardware is required for the registers and the execution units. However, there is additional hardware required for the decoding operations. This provides the ability to switch and execute both 32-bit and two 16-bit operations using the same execution unit. There are multiple advantages of such a methodology: - The speed of operations is increased by a factor of 2. - The number of registers is increased by a factor of 2. - The power consumption remains the same as that of the 32-bit instructions. - Multi-thread operation is obtained by this method. The ISA structure of the dual 16-bit execution methodology is shown in figure 4. | Opcode1 | Source | Secondary | Destination | Source | Secondary | Opcode2 | |---------|--------|-----------|-------------|--------|-----------|---------| | | Reg1 | Reg1 | Reg | Reg2 | Reg2 | | Figure 4. ISA for the dual 16-bit operation Architecture • Opcode1 indicates the opcode for the Source Reg1 and Secondary Reg1 #### International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com - Source Reg 1 indicates the address of the Source Reg1 - Secondary Reg 1 indicates the address of the Secondary Reg1 - Destination Reg indicates the address of the final Destination Register. - Source Reg 2 indicates the address of the Source Reg2, - Secondary Reg 2 indicates the address of the Secondary Reg2 - Opcode2 indicates the opcode for the Source Reg2 and Secondary Reg2 The idea behind the execution of two 16-bit operations is the division of a single 32-bit register into Higher and Lower words. The instruction set architecture is divided into 2 words. The most significant Word controls the most significant words of all the registers and the lower Word controls the lower words of the Registers. Figure 5 The flow diagram from the RTL to Physical implementation - Figure 5 represents the block diagram of the RISC processor the blocks include instruction Memory, program counter, Instruction Register, Instruction decoder, Registers, Instruction Execution, ALU, Memory Operations, and data memory. - Once the design is functionally verified the physical design process is carried out. - The Flow diagram in figure 3.6.7 indicates the flow of the physical implementation. - Physical design part includes partitioning, floor planning and routing. - In partitioning complex circuit is reduced into simple blocks. - During the floor planning process, all the blocks has been assigned with proper boundaries. - Placement operation is performed. - In the routing part, depending upon the specification local and global routings is carried out. - The physical design process takes the netlist, sdc, .lib, .lef files for both placement and routing of the design. - The placement of the design takes place based on the technology library file and library exchange format file which contains all the metal layer information, design rules and abstract level information. - Based on this file, placement of the standard cells in the gate level netlist will be placed on the core area of the design partition. - The Static timing analysis is performed and the worst negative slack is determined - Based on negative time slack the placement of the cell is performed to ensure the slack is not negative - The static timing analysis of the optimized placement is obtained to ensure that the worst negative slack is positive. ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com #### IV.RESULTS AND ANALYSIS Figure 6. Illustrates the synthesized gate level netlist top module The figure 6. indicates the top most block that is the Integration\_Top block. The top block contains the remaining blocks ID\_ALU (Instruction\_decode and Execution), the IM (Instruction memory), PC (Program Counter), IF (instruction Fetch) and the Memory\_loader. In the Synthesized design however only two blocks are created that is the ID\_ALU and the IM blocks. The other blocks have been ungrouped so as to form a simpler and optimized final design. This process is called ungrouping which has been enabled to reduce area and power consumptions Table 2 Power Consumed by the synthesised design | Types Of Power Consumptions: | Power consumed | |------------------------------|----------------| | Leakage Power | 38.332 nW | | Dynamic Power | 1.1649 mW | | Total Power | 1.1649 mW | The power report for the synthesized design is shown in the table 2. Figure 7 instruction execution of 32 bits. The simulation results are illustrated below indicating a 32 bit instruction execution and also two 16 bit parallel execution. The following instruction of 32 bits is executed ADD R0 R2 STORE IN R3 #### International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 11 Issue VIII Aug 2023- Available at www.ijraset.com Values in R0 register is added with the contents in the R2 register and the sum is stored in the R3 register. As shown in the figure 7. Figure 8 illustrates two 16 bit instruction execution in a single clock Both the Loading of R0 and R1 register takes place in a single edge of the clock cycle as indicated in the figure 8. #### V. ACKNOWLEDGMENT I am indebted to my guide, Dr. Sujatha Hiremath, Assistant Professor, RV College of Engineering for the wholehearted support, suggestions and invaluable advice throughout myproject work and also helped in the preparation of this paper. #### **VI.CONCLUSION** The 4- stage pipelined RISC architecture processor increases the speed of the operation as compared to the Von-Neumann architecture by using the separate data and address buses for both instruction and data using instruction memory and data memory respectively. The capability has been provided to execute two 16-bit operations at the same time which increases the efficiency and thus executes both 32-bit and two 16-bit operations using the same execution unit and as a result of which leads to increase in speed by a factor of 2. The total power consumed by the system is reported to be with frequency of 100MHz was 1.1649W.Tools Vivado ,Innovus and genus have been utilized to implement the design. #### **REFERENCES** - [1] T.-T. Hoang, C. Duran, R. Serrano, M. Sarmiento, and A. T. S. C.-K. P. Khai-Duy Nguyen, "System on a chip with 8 and 32 bits processor in 180 nm technology for iot application," IEEE Transactions on Circuits and Systems I, pp. 1–6, 2022. doi: 10.1109/HCS52781.2021.9566862. - [2] H. V. R. Aradhya, G. Kanase, and V. Y, "Rtl to gdsii of harvard structure risc processor," IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT, pp. 1–6, 2021. doi: 10.1109/CONECCT52877. 2021.9622735. - [3] A. Aadarsh, A. Kumar, A. Yadav, and P. Joshi, "Design and power analysis of 32-bit pipelined processor," 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), pp. 1–6, 2021. doi: 10.1109/ICACITE51222.2021.9404622. - [4] K. P. K and V. P. A. M, "Designing and implementation of 32-bit 5 stage pipelined mips based risc processor capable of resolving data hazards," IEEE Transactions on Nanotechnology, pp. 1–6, 2021. doi: 10.1109/ICMNWC52512.2021.9688435. - [5] K. P. K and V. P. A. M, "Power aware study of 32-bit 5-stage pipeline risc cpu using 180nm cmos technology," 14th IEEE India Council International Conference, pp. 1–6, 2021. doi: 10.1109/INDICON.2017.8488074. - [6] H. V. R. Aradhya, G. Kanase, and V. Y, "Rtl to gdsii of harvard structure risc processor," IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT, pp. 1–6, 2021. doi: 10.1109/CONECCT52877. 2021.9622735. - [7] K. P. K and V. P. A. M, "Designing and implementation of 32-bit 5 stage pipelined mips based risc processor capable of resolving data hazards," "2021 IEEE Interna tional Conference on Mobile Networks and Wireless Communications (ICMNWC),, pp. 62–71, 2021, issn: 0167-9260. doi: 10.1109/ICMNWC52512.2021.9688435... - [8] P. K. Yadav and P. K. Misra, ""power aware study of 32-bit 5-stage pipeline risc cpu using 180nm cmos technology," Optical and Quantum Electronics, 2017. doi: 10.1109/INDICON.2017.8488074. - [9] J. Rohit and M. Raghavendra, "Implementation of 32-bit risc processors without interlocked pipelining on artix-7 fpga board," 2017 International Conference on Circuits, Controls, and Communications (CCUBE), 2017. doi: 10.1109/CCUBE. 2017.8394137. - [10] N. Dwivedi and P. Chhawcharia, ""power mitigation in high-performance 32-bit mips-based cpu on xilinx fpgas," 2017 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), 2017. doi: 10.1109/ICCE-ASIA.2017.8307850. 45.98 IMPACT FACTOR: 7.129 IMPACT FACTOR: 7.429 ## INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Call: 08813907089 🕓 (24\*7 Support on Whatsapp)