04/12/2020
Ilia Zarka 9289
Iosif Chrysostomou 9130
We run the benchmarks on this step with a cache line size of 64B
, a 2 way set associative data cache with a size of 64kB
, a 2 way set associative instruction cache with a size of 32kB
and an 8 way set associative L2 cache with a size of 2MB
.
In both cases, the system runs at 2GHz
(system.clk_domain). This clock is used to synchronize everything on the motherboard. The cpu_clk_domain clock refers to the cpu clock. That clock is by default, multiple of the system clock. So a second CPU, would run at a multiple of 1GHz
.
Simulated ms | CPI | Data Cache Miss Rate | Instruction Cache Miss Rate | L2 Cache Miss Rate | |
---|---|---|---|---|---|
specbzip | 160.703 | 1.607035 | 0.014133 | 0.000076 | 0.294739 |
spechmmer | 118.547 | 1.185466 | 0.001684 | 0.000204 | 0.079949 |
speclibm | 262.248 | 2.622476 | 0.060971 | 0.000098 | 0.999927 |
specmcf | 109.233 | 1.09233 | 0.002038 | 0.000037 | 0.727788 |
specsjeng | 705.453 | 7.054533 | 0.121829 | 0.000020 | 0.999979 |
The simulated seconds do not scale with the clock frequency. This was also apparent on the first lab where we run our code with a wider variety of frequencies. The reason is that some stages of the execution of a command, do not depend only on the CPU frequency. A couple of those stages are reading and writing on the memory. The memory transfers rates are not tied to the CPU frequency, which means there can be delays which are independent from the rest of the system.
As we have learnt from the lessons, having set associativity greatly increases performance by avoiding memory conflicts, but also has diminishing returns at larger set size. This is the reason we chose to simulate 2, 4 and 8-way associativities. Also, Cache Line Size has a rather negetive impact on the performance when small values are used, however very large values increase the implementation cost (based on Step 3).
- Cache Line Size
- Data Cache Size and Associativity
- Instruction Cache Size and Associativity
- L2 Cache Size and Associativity
For calculating the cost, we had a few key criteria in our minds.
- First, the cost of every memory level scales linearly with its size.
- Then, the higher level memories with the same characteristics (size, associativity etc) are cheaper to manufacture. In this case, we assumed that the L2 cache is 50% cheaper than the L1 cache.
- Increasing the associativity should also increase the cost, but not linearly. As we saw in the lectures, increasing the associativity, has diminishing returns. More specifically, more than an 8 way associative memory, shows very little improvements in performance. We made an educated guess and chose a logarithmic increase in the overall cost from the increase in assosiativity.
- Lastly, cash line size should also play a role in the final cost. We choose to multiply the above cost with the line size, because it affects both memory levels, and it increases the complexity of the whole CPU.
So we came up with the following formula
cost = (InstCacheSize * log(InstCacheAssoc) + DataCacheSize * log(DataCacheAssoc) + 0.5 * L2Size * log(L2Assoc)) * log(CacheLineSize)
Cache Line Size | 32kB | 64kB | 128kB |
---|---|---|---|
cost | 4.44M | 5.33M | 6.22M |
From the simulations we run, it looks like increasing the cache line size, benefits only certain workloads. Out of our five benchmarks, only specjeng and speclibm showed a significant decrease in the simulated ms.
Data Cache | 32kB 2-way | 64kB 2-way | 128kB 2-way | 32kB 4-way | 64kB 4-way | 128kB 4-way |
---|---|---|---|---|---|---|
cost | 5.25M | 5.568M | 6.16M | 5.39M | 5.84M | 6.73M |
Changing the Data cache configurations showed minimal improvements in performance across the board, with only the specbzip benchmark having a measurable difference in the simulated ms.
Instr. Cache | 32kB 2-way | 64kB 2-way | 128kB 2-way | 32kB 4-way | 64kB 4-way | 128kB 4-way |
---|---|---|---|---|---|---|
cost | 5.44M | 5.75M | 6.36M | 5.58M | 6.03M | 6.92M |
The benefits of changing the Instruction caches size and associativity are basically zero in every benchmark we tried.
L2 Cache | 512kB 4-way | 1MB 4-way | 2MB 4-way | 4MB 4-way | 512kB 8-way | 1MB 8-way | 2MB 8-way | 4MB 8-way |
---|---|---|---|---|---|---|---|---|
cost | 4.14M | 7.54M | 14.5M | 28.44M | 5.45M | 10.09M | 19.6M | 38.61M |
L2 cache shows a similar trend to the other two memories we tried tweaking. Most of the benchmarks had little to no improvements except specbzip, which had an decrease in the simulated ms of about 5%.\
Considering our knowledge and our results from the benchmarks we run, we concluded that the optimum configurations are the following:
- 64B Cache line size
- 64kB 4 way associative Data cache
- 32kB 2 way associative Instr. Cache
- 1MB 4 way associative L2 cache
Default Parameters | Chosen Parameters | |
---|---|---|
cost | 19.464.192 | 7.274.496 |
The performance of the 2 configurations is displayed in the graph below. The final subplot indicates the Price-to-Performance Ratio, in this case in the form of cost * cpi
where less is better.
First of all, we have encountered many compatibility issues between the lastest Ubuntu release, the cross-compiler and the spec2006 benchmarks. Secondly, despite changing different parameters of the architecture the results of the simulation did not show significant performance changes which was disappointing and difficult to work with. We suspect the simulated time being so low (less than half a second in most cases) is the main reason the results could not differentiate from one another. On the other hand, the script given to extract values from stats.txt has been proven quite usefull and time saving, after a small modification we saved the results in a .csv
file and manipulated the data in MATLAB.