Processer Modeling and Simulation

17 minute read

Source Code

For the sake of simplification, all 5, 4, 3, and 2 input AND and OR gates are all given the delay of 8ns, and NOT gates 3ns throughout each project.

EDA Selection


Installing and Problems

There were few problems downloading and installing ModelSim onto my virtual machine of Windows 10 onto my MacBook Pro. For some reason, after installing and initially running to program, it decided to skip the part where I input my information for proper licensing. I was unaware of this so copied the orgate code over, ran it and encountered some problems and lost synchronization with VMware and Windows 10 crashed.


I rebooted and decided to start up in Windows 10 instead of opening it up in parallel. I reinstalled and this time it decided to not skip inputting information for licensing. I input my information, obtain the .dat file and copied it over.

Solution and Simulation

Apparently it worked, and there were no issues compiling and simulating both the VHDL and Verilog codes. However, the simulations differed a bit, even though I just Copied and Pasted the code into both .v and .vhd, and their respective test benches.


Installing and Problems

Installing Vivado, as well as figuring out Cadence, was difficult: for a start, the tutorial was out of date, since it was no longer Xilinx ISE, but Vivado, so I had to Figure things out myself. The installation process was different, but fairly simple and encountered no problems. Afterward, I had many problems. For some reason, I kept getting error messages saying:

{Error: Vivado user apps not installed}

Initially, I disregarded it, and it seemed to work fine until I added .v and .vhd and their test bench files and the source files hierarchy would not show correct file icons, or would not update correctly. Whichever it was, it was not allowing me to edit simulation settings or even simulate.


I decided to reinstall. Relicensing was no problem because I made an account and synchronized fine. However, I was still getting the error user apps not installed. I found a discussion forum on the Xilinx website and found a solution to the problem. Apparently, this was an issue with the current version of Vivado for some. The problem was that the directory for third-party apps was damaged or installed incorrectly, so a workaround was to utilize the TCL Console provided.

Solution and Simulation

Someone on a forum said to enter the following in the TCL Console:

tcl :: reset store

The command appeared to have executed correctly. I restarted Vivado and to my surprise, the error went away. However, the simulation would still not run. I decided to look around the program and I found I can now access user apps. There was an option to download Vivado Simulator and ModelSim Simulator, so I did and this fixed my simulation problems. The source hierarchy updated correctly and I could now compile and simulate correctly.

After comparing the two tools, I decided to go with using Xilinx Vivado.

Figure 2.0: Vivado VHDL Simulation

Figure 2.1: Vivado Verilog Simulation

Battery Level Indicator

For this project, I wanted to limit myself to only use basic logic gates, e.g., AND, OR, NOT, etc. Given that the output of warning was every three levels (levels 3, 6, 9, 12, and 15), 4 bits (indicators) are best suited to represent the range from 0 to 15. Therefore, Indicator3, Indicator2, Indicator1, and Indicator0 were used as bits to make a battery level indicator that makes warning 1 at bits 0010 (level 3), 0101 (level 6), 1000 (level 9), 1011 (level 12), and 1110 (level 15). The circuit used to implement to represent this is shown in Figure 1.1.

Figure 1.1: Battery Level Indicator

The minimum delay to be guaranteed a result is 19ns, which is confirmed by simulation showing the before and after in Figure 1.2 and 1.3.

Figure 1.2: Battery Level Indicator: `0010` is given to the indicators

Figure 1.3: Battery Level Indicator: Delay after `0010` is inputted

All values are correct and can be confirmed by the given truth table (Figure 1.4).

Figure 1.4: Battery Level Indicator Truth Table

A Random Digital Circuit

Being limited to only using AND, OR, and NOT gates, the XOR gate at the end of the circuit given (Figure 2.1) had to be expanded. So the actual circuit used to simulate the circuit is shown in Figure 2.2.

Figure 2.1: The given circuit and its incomplete waveform

Figure 2.2: The actual circuit implemented

Analyzing the implemented circuit (Figure 2.2), it can be shown, given that a NAND gate consists of an AND and NOT gate, the delay is 11ns, giving x 11ns delay. Following this, the OR gate at Y has an 8ns delay and is proven in Figures 2.6 and 2.7. Finally, it can be shown that the critical path of gives a delay of 38ns, therefore Z is successfully updated after 38ns. The incomplete wave form is completed and shown in Figures 2.3 – 2.7.

Figure 2.3: `x` output delay

Figure 2.4: `Y` output delay

Figure 2.5: `Z` output delay

Figure 2.6: First update change (`110`)

Figure 2.7: Z output delay of 38ns given inputs (`110`)

The values can be confirmed with the truth table in Figure 2.8.

Figure 2.8: The Given Circuit Truth Table

1-bit Full Adder/Subtractor

One big issue was encountered trying to create a full 1-bit adder/subtractor, which would affect how the HDL would be designed for current and later use. Knowing how a full adder traditionally works in many well known ICs (inverting the 2nd input bit and adding one, thus making it a subtractor) has three inputs A, B, and a Carryin/Borrowin bit, where the Carryin/Borrowin bit was used both as a control for making the block a subtractor, or adder. However, the entity given had 4 input bits, in_0, in_1, cin, and the AddOrSub bit, thus making this a non-traditional full adder-subtractor, at least this is what was understood. Because of this, the circuit diagram used to create an Adder/Subtractor is shown as follows in Figure 3.1.

Figure 3.1: The 1-bit Full Adder/Subtractor

In order to create the design shown in Figure 3.1: a new truth table (Figure 3.2) was created and then K-Maps were utilized to simplify and create the circuit.

Figure 3.2: 1-bit Full Adder/Subtractor with an AddOrSub bit

Since the simplification that any input AND and OR gate has the same delays, the result maximum delay that can result in a correct answer is 19ns. It is known that this does not reflect reality, so the simulation is not essentially correct. The delays can be confirmed in the following figures, Figure 3.3 and 3.4.

Figure 3.3: Adder/Subtractor at 150ns

Figure 3.4: Adder / Subtractor delay

To easily read the simulation: from 0 to 400ns addition is being tested, with no carry in being tested from 0 to 200ns, and 200 to 400ns with carry in; from 400 to 750ns subtraction is being tested, with no borrow in from 400 to 600ns, and with borrow in from 600 to 750ns.

1-bit Comparator

A simple comparator, with the given delays noted at the beginning of this report, was built using this circuit shown in Figure 4.1.

Figure 4.1: 1-bit Comparator

The truth table used to create the circuit is shown in Figure 4.2.

Figure 4.2: 1-bit Comparator Truth Table

It can be shown in Figure 4.3 and 4.4 that the delays used results in giving the EQUAL output bit have a delay of 22ns. Since the EQUAL output bit depends on having assigned less and greater their proper delays of 11ns each, one can safely assume that the delays for less and greater are correct in the simulation. If unsure, one can confirm by looking at Figures 4.3 and 4.4.

Figure 4.3: 1-bit Comparator given 1 and 1

Figure 4.4: 1-bit Comparator showing the delay of EQUAL is 22ns later

Concurrent N-Bits Ones Counter

Create a ones counter using only concurrent statements was found to be trickier than expected. The thought process goes, increment a counter when a “1” value was found in the signal, and output the counter. However, using this approach, at least when coding it like C language, it doesn’t work. The add operation is not supported, so you cannot increment, and you cannot read and write to the same location of a cell and an array or vector, otherwise, the signal is multiple driven. The numeric_std library was used to have to ability to increment, and an N size array of natural numbers was utilized to output a natural value number. The behavior of the array is shown in Figure 1.1.

Figure 1.1 shows It also shows “count,” an array of natural numbers begin incremented, using numeric_std library, and the value being incremented is the value of the cell before it when there is a one in the signal, or not being incremented, e.g., there is a “1” in the first cell of the signal (x), so it increments by 1 in (count), and another in the next cell of (x), so it increments the “1” in (count) and puts it in the next cell in (count), and detects none in the last cell of (x) so doesn’t increment and just carries the “2” in (count).

This behavior can be emulated by using an adder to increment and a selector, where the select bits are values of (x) itself, as shown schematics for a 2-bit and 3-bit wide input in Figure 1.2 and Figure 1.3.

Figure 1.2: 2-Bit Wide Input Ones Counter

Figure 1.3: 3-Bit Wide Input Ones Counter

The RTL_ADD is essentially the utilization of the “+” operator using a numeric std library, and RTL_MUX is the result of using when and else statements within the code. As one can see, the RTL_MUX is being driven by the signal (x) itself and RTL_ADD is incrementing (count), where is initially being given the value of the first cell of the input (x). This emulation works for any bit size of the input, as shown in Figure 1.4. “tb_a”, “tb_c”, and tb_e are just input signals for different sizes and “tb_b”, “tb_d”, and “tb_f” are output values in natural numbers.

Figure 1.4: Simulation of 3 Different Sizes of Input (`x`)

Since no gates were used to make the adder or multiplexor, there is no truth table.

8-to-1 32-Input Bit Multiplexor

2 -to-1 Multiplexors are used to make an 8-to-1 selector. The 2-to-1 multiplexor was created on the gate levels, and each gate given its own delay: 8ns for AND and OR gates, and 3ns for NOT gates. The schematic for this design is shown in Figure 2.1. This design gives a minimum delay of 19ns.

Figure 2.1: 32-Input Bit, 2-to-1 Multiplexor

Making an 8-to-1 multiplexor using 2-to-1 multiplexors gives the design shown in Figure 2.2. Since each 2-to-1 multiplexor is 19ns, and a maximum of 3 multiplexors are to be filtered through, this design gives an overall delay of 57ns, as confirmed in Figure 2.3.

Figure 2.2: 8-to-1 Multiplexor using 2-to-1 Multiplexors

Figure 2.3: Simulation of the 8-to-1 Multiplexor

The odd behavior between each output signal is the result of bits taking time to update due to the gate delays. This is behavior is natural, and is of slight concern, but it eventually evens out to get the signal chosen. The truth table for the 8-to-1 multiplexor is given in Table 1.

Select Bit 0Select Bit 1Select Bit 2Input Signal

Table 1: 8-to-1 Multiplexor Truth Table

32-bit Full Adder/Subtractor

For simplicity, to create a 32-bit full adder/subtractor, 1-bit full adders/subtractors are chained together as a ripple carry adder. It is known that this sort of design guarantees an answer, but at the cost of speed, because each 1-bit adder/subtractor waits for the carryout bit from the previous 1-bit full adder/subtractor. The schematic for the design is shown in Figure 1.1. It may appear that a column of adders is connected to another column of the adder, but it is not, they are really chained together one-by-one. The reason for this schematic is because each instance is controlled by the same “sub” bit (Figure 1.2), and thus every instance either acts as an adder or a subtractor, and the optimal image is to make it square-like.

Overflow detection for a ripple carry adder/subtractor is done by take the XOR of the very last two carry bits (Figure 1.3), because when the last full adder/subtractor carries in a 0 and carries out a 1, then it added two negative numbers and got a positive number, therefore overflow, or it carries in a 0 and carries out a 1, then it added two positive numbers and resulted in a negative number, and therefore overflow. The detection is

Looking at the simulation (Figure 1.4), it appears to be correct. I add two positive numbers, then subtract the same numbers, then subtract 1 from -1, then test overflow, and finally test carry out and overflow.

Figure 1.1: Schematic for Full Adder/Subtractor

Figure 1.2: sub Bit Controls all AddOrSub Bits

Figure 1.3: Overflow Detection (with Last 2 Carry Bits)

Figure 1.4: Simulation of Full Adder/Subtractor

32-Bit Comparator

Similar to the ripple carry adder/subtractor, the design for the 32-bit comparator is a ripple comparator (Figure 2.1), chaining together 2-bit comparators. To make things simple, I designed a modified version of a 2-bit comparator using gates (Figure 2.2), which essentially reflects using two 1-bit comparators. To chain them together, I utilized the great and less output bits and fed it into the next comparator as x(0) and y(0), and used x(1) and y(1) as inputs from the input signal. This effectively makes it a ripple comparator (Figure 2.3).

As mentioned, this is a modified version of a 2-bit comparator, because a regular 2-bit comparator has a total of 4 NOT gates, where this has only 2. The behavior is correct. However, it will result in an 11 when the numbers are equal, where it usually results in 00. To fix this, the gates at the end of the ripple comparator—dubbed the “adjustment box"—are implemented (Figure 2.4). This adjustment box effectively saves the number of gates used throughout the design and reduces the amount of delay significantly.

The equal bit is handled by taking the NOR of the final great and less signals and thus completing the design.

The simulation in Figure 2.5 shows the successful simulation of the 32-bit Comparator.

Figure 2.1: Ripple Comparator

Figure 2.2: 2-bit Comparator Using 2 1-bit Comparators at Gate Level

Figure 2.3: Ripple Comparator Mapping

Figure 2.4: **adjustment box** and Equal handling

Figure 2.5: Simulation of 32-Bit Comparator

32-Bit Shifter

Building this module was more difficult than anticipated. This was more-or-less executed by utilizing the VHDL language.

There are 4 possible cases:

  • logical shift right
  • logical shift left
  • arithmetic shift right shift
  • arithmetic shift left

Each case has the concatenation of the input bits and a fill, zeros if logical and MSB or LSB if arithmetic; the output takes the value of one of these cases.

A generate statement is utilized to generate through all possible shifts, each with their own respective amount of delay (10ns per shift) for the four cases. One can visualize this behavior as having four 32-to-1 multiplexors, one for right logical shifts, another for left logical shifts, another for right arithmetic shifts, and the last for arithmetic left shift. These multiplexors are controlled by the shifting bit and is then fed to two 2-to-1 multiplexors to select which direction is wanted and is then fed to a final 2-to-1 multiplexor to choose arithmetic or logical (Figure 3.1).

Figure 3.1: General outline of VHDL Code

However, not wanted to build every multiplexor shown in Figure 3.1, the coding language was used and the schematic generated using Vivado gave the schematic shown in Figure 3.2. It is extremely difficult to interpret the generate schematic, but the simulation for shifts appear correct (Figure 3.3). The first shift was the maximum unsigned number shifted 31 times, and there is a 310ns delay. The others are correct as well.

Figure 3.2: Vivado Schematic of N-Shift 32-bit Shifter

Figure 3.3: Simulation of 32-Bit Shifter (310ns delay for shifting 31 times)

The Decoder/Controller

Building the controller is fairly simple. The outputs are assigned certain parts of the 32-bit instruction using concatenation, and the write enable is handled by using a multiplexor. Writing is done on every instruction except when comparing or when there is no operation done, so the multiplexor was controlled by the function portion of the 32-bit instruction. The write enable could have also have been handled using logic gates, since the function codes act as a truth table, but doing this would consume time and will require finding the simplest equation. I instead took the simplest solution by using a multiplexor.

The simulation (Figure 1.2) shows the proper behavior of the controller.

Figure 1.1: The Decoder/Controller

Figure 1.2: Decoder/Controller Simulation


Putting together the ALU was fairly simple. The difficult part was making sure all functions will work correctly, because functions like shifting, take in the second 32-bit input (b_alu) to determine how many shifts are to be done.

Only one of each sub-component, e.g., the adder/subtractor, the comparator, and shifter, are instantiated once (Figure 2.1 - 2.2), and each of their respective behaviors, like switching between adding and subtracting, are done by using a multiplexor to switch between the two.

As for each of the logical operators, e.g. AND, OR, NOT, and XOR, is done by using a generate statement to create 2-input logic operators for every 32-bits for a_alu and b_alu (Figure 2.3).

A multiplexor that selects between 14 operations, that is controlled by a 4-bit input (alu function) is done by using when and else statements, which effectively creates the large multiplexor using smaller 2-input multiplexors (Figure 2.4).

The overall schematic for the ALU is shown in Figure 2.5, and the behavior (simulation) is shown in Figure 2.6. When writing the testbench, the numbers “10” and “15” are used for nearly every calculation, one input is given all zeros are used for OR, and one input is given all ones for the AND and XOR operations. The delays for the operations are the delays created by each subcomponent (adder, shifter, comparator).

Figure 2.1: Shifter Instantiated Once

Figure 2.2: Comparator and Adder/Subtractor Instantiated Once

Figure 2.3: A Look at some 2-input Multiplexors to make a larger Multiplexor

Figure 2.4: 32 2-Input Logic Operators for `AND`, `OR`, `NOT`, and `XOR`

Figure 2.5: ALU Simulation, with the Cursor at 2451ns to show delay when comparing

Figure 2.5: ALU Schematic

The Processor

Once the ALU and decoder/controller components were created and proved to be behaving correctly, the next part was to connect each component to a registry file in order to deal with memory, which was not difficult. However, what was difficult was tailoring the processor to accompany the ALU. There are couple functions that would not initially work, which are the shifting and mov/movI operations.

First, the mov/movI operations would not behave correctly because the operation had one command (1101 or alu_mov), and had to switch between a_alu and b_alu (registry value or immediate value). This was a problem because the ALU had no input for OP (registry value or immediate), so it could not determine between the two. To solve this a multiplexor was used to switch between the output of the registry output and the immediate values, and controlled by the OP bit, and fed into a_alu (Figure 3.1).

Next, the shifting operation will take b_alu to determine how many shifts are to be done on a_alu. This appeared simple at first, but when the assignment requires that the shifts are to be done on some registry value (rs) and put into a different registry value (rd), where the number of shifts is determined solely on rt concatenated with the immediate value (rt & imm). To solve this, b_alu was given rt & imm only when the user determines that the ALU will be doing shifts. This was done by using yet another multiplexor, to let b_alu be rt & imm only when shifts are done, otherwise b_alu behaves normally (Figure 3.2).

Overall, the schematic nearly reflects the schematic given by the professor (Figure 3.3), but with slight changes, as mentioned for the fixes for mov/movI and shifting (Figure 3.4).

There are comments on the testbench file that shows what values I am testing, and after analyzing, they appear to be correct (Figure 3.5).

Figure 3.1: The Multiplexor Fix for `mov/movI` Operation (1st slot is `rt` & `imm`, 2nd slot is `reg_source`)

Figure 3.2: The Multiplexor Fix for shifting (1st slot `rt` & `imm`, 2nd slot is normal behavior)

Figure 3.3: Initial Design (without fixes for `mov/movI` and shifting)

Figure 3.4: Final Generated Schematic of Processor (with fixes for `mov/movI` and shifting)

Figure 3.5: Simulation for the Processor

University of California, Irvine
Henry Samueli School of Engineering
EECS Department