NVDLA Integration + Cleanup Ariane Preprocessing #505

abejgonzalez · 2020-04-05T22:02:01Z

Related issue:

Type of change: new feature

Impact: rtl change | software change

Release Notes

Chipyard
This integrates master of NVDLA into Chipyard based off work done in (https://github.com/CSL-KU/nvidia-dla-blocks) and (https://github.com/sifive/block-nvdla-sifive). This builds in both VCS and Verilator. Tested with the NVDLA test put in tests/. This also changes Ariane to use the insert-includes.py script that replaces includes (instead of using Verilator).

FireSim
Added both small@75mhz and large@50MHz NVDLA to Rocket configs. Both small and large have completed AFI builds.

Remaining Todos

Add documentation
Run FPGA tests (run YOLO3, ResNet-50, nvdla-sw regressions)
Add to CI (basic test only)
Consider removing FireSim configs in favor of just keeping config fragments (i.e. WithNVDLALarge WithNVDLASmall) - Removed
Add recipe to FireSim for small and large (just added small unless FireSim devs want more)
Rename opendla_small.h to something more reasonable
Rename HasPeripheryNVDLA to CanHave
Change .gitmodules for nvdla-workload nvdla-sw to be https

sims/vcs/Makefile

abejgonzalez · 2020-04-05T22:39:30Z

Thanks to @farzadfch for the help!

farzadfch · 2020-04-05T23:57:07Z

Thanks for sharing the PR. A few points:

For FPGA tests, we want to start from the basic tests in here.
The NVDLA driver is written for Linux 4.13.3. I hope that is ports smoothly to 5.3.
The driver has been updated since I integrated NVDLA with FireSim. The DTS here is based on the older version. In the updated driver, the correct driver parameters are selected based on the DTS (here and here). So we should generate the correct DTS based on the configuration in the Chisel code.

tymcauley · 2020-04-06T00:48:14Z

Agreed, thanks for sharing this! I've also done work recently on integrating the NVDLA into Chipyard/FireSim, so I'd love to contribute some of my learnings and help move this along.

For @farzadfch's third point, I generated the DTS compatibility string like this to integrate with the new kernel mode driver:

val dtsdevice = new SimpleDevice("nvdla",Seq("nvidia,nv_" + params.config))

For integrating the NVDLA Linux driver into the current FireSim Linux kernel (version 5.3), the process was not so bad, but changes were necessary. Some functions were deprecated, but have drop-in replacements, and some new #includes were needed (struct definitions moved around), but that was about the extent of it. I can give more specific input during the week 🙂.

I have a concern with the test tests/nvdla.c. It appears to be generated from the .cfg file in this directory (one of the NVDLA hardware testbench files). However, tests/nvdla.c doesn't appear to do any correctness testing (such as feeding in the data from CDP_0_input.dat and comparing the results with the data in CDP_0_output.dat). It'd be great for the test to ensure correctness in the operation.

Furthermore, since the tests/nvdla.c test appears to be targeted at the the nv_large configuration, it would be great to describe a process for integrating tests for other NVDLA configurations, since this directory appears to contain tests for a number of configurations. And perhaps this test will work fine for other configurations, but it would still be valuable to make it clear how to develop other NVDLA tests to exercise other portions of the design.

I'm also curious to read through scripts/insert-includes.py a bit more this week, I had to jump through some hoops to get all of the RTL collateral to integrate with Chipyard/FireSim (I ended up adding a ton of addResource calls to nvdla.scala).

Regarding the nvdla_large.v and nvdla_small.v wrapper files, do we want to always tie together the core_clk and csb_clk inputs? I know this was done to accommodate the single-clock requirements for Black Boxes in FireSim. Is that requirement still in place? Also, could we design a system that ties these clocks together only when you're building the system for FireSim and not for an SoC?

Related FireSim vs SoC question: I haven't thought about this too much yet, but it would be good to look into the issue of selecting the NVDLA-provided FPGA vs. synthesis RAM models for these two different use cases as well.

abejgonzalez · 2020-04-06T01:22:47Z

Wow. Thanks for the quick feedback!

Linux Testing:
I haven't done any testing with Linux yet so that is the next step. @tymcauley would you be willing to share your changes/integration for the NVDLA Linux driver?

CI Tests/Other Tests:
As for the nvdla.riscv test, I merely copied it from @farzadfch's other work, so frankly I don't know anything about what it does. I think it would be a good idea to have a better test to make sure it is integrated properly however, I know very little about the tests in the NVDLA repo. I can try to look into some of the NVDLA tests when I have the chance but any recommendations/pointers are welcome!

Insert Includes:
The scripts/insert-includes.py is a very "dumb" script that just looks for includes and replaces them with the file it matches with. Normally, I would pre-process with Verilator then pass in the pre-processed output to the rest of the flow, however this removes any comments that may be needed later (i.e. pragmas and *synopsys*). So this script might be a reasonable band-aid? This script works for these two cases, but it can change if it needs to be more robust.

Clock Requirements:
Yes. I think the single-clock BB requirement is still in place (@davidbiancolin). We could potentially write the wrapper files (nvdla_small/large.v) in Chisel and pass in two different clocks. Thus the black boxes would be the inner modules (NV_NVDLA_apb2csb1, NV_nvdla), not the wrapper module. I think that might make things work nicely with multi-clock support in FireSim. With that being said, I would like to keep extensive FireSim multi-clock changes out of this PR since we are still testing/developing that feature.

FPGA vs SoC RAMs:
I think this should be relatively easy. We can just pass a flag into the NVDLA preprocess Verilog makefile that switches between the two types of RAMs.

Please let me know if you guys have any other questions or concerns!

davidbiancolin · 2020-04-06T04:42:05Z

It is now legal to have blackboxes with multiple clocks in FireSim, but i have not tested that case. It should just trivially work though if the existing clock gating scheme works on the NVDLA with a single clock.

generators/chipyard/src/main/scala/ConfigFragments.scala

tymcauley · 2020-04-06T18:06:05Z

Here's what I did with the Linux driver:

I started with the @farzadfch's firesim-nvdla branch of riscv-linux
I then applied a few updates to the KMD from the NVDLA software repo. I used these commits from that repo:
- 2841094c12531252ad8b7f73a3635bbd13ca5858
- 8df37971b901b623aaeba66e6c998f74a499c272
- d01dfc8640b8664703047315e5a337602ab47d2d
- 9deea34f2d61bd55abb8ec5648e7079d9665e220
- 4022452c3124cdf1fcbb1124ce884187375a80ea
Notably, I avoided including the changes from the following commits from the upstream nvdla/sw repo, as at least one of them introduced regressions in the KMD's behavior. I didn't do an exhaustive test to ensure which commit(s) were the problem though:
- 16befcb84f8ccd2b3d1b0683b965375ae11274df
- c906fc63eb0f20bc6ea6337fc7ef38647923339f
- fae1eab705673fd63df03f4a927d6fda81e2a13b

At this point, I moved all of this work into the current Linux kernel fork in the FireMarshal repo. If you try to compile the Linux kernel, you'll run into a number of compilation errors. Here are the diffs I had to make to solve those errors:

Add #include <linux/uaccess.h> to include/nvdla_linux.h
The following changes to nvdla_gem.c:
- Add #include <drm/drm_device.h>, #include <drm/drm_drv.h>, and #include <linux/dma-mapping.h>
- Replace drm_gem_object_unreference_unlocked(...) with drm_gem_object_put_unlocked(...)
- Replace drm_dev_unref(...) with drm_dev_put(...)
- Change the dma_declare_coherent_memory(...) invocation to remove the final argument, DMA_MEMORY_EXCLUSIVE.

After making those changes, the driver compiled and functioned properly. I've only tested with the nv_large configuration, not nv_small. Also @farzadfch, I couldn't run your YOLOv3 workload yet, as there's an issue with the GLIBC headers used in the buildroot image. Haven't yet looked into how to resolve that issue.

abejgonzalez · 2020-04-06T19:09:59Z

Thanks @tymcauley Ill take a look at the Linux driver work. I also added a switch to the config to choose between fpga/synth rams. Let me know if that works for you.

tymcauley · 2020-04-06T19:38:16Z

Thanks @tymcauley Ill take a look at the Linux driver work. I also added a switch to the config to choose between fpga/synth rams. Let me know if that works for you.

That switch looks great to me, thanks!

tymcauley · 2020-04-06T20:44:57Z

generators/firechip/src/main/scala/TargetConfigs.scala

+class WithNVDLALarge extends chipyard.config.WithNVDLA("large")
+class WithNVDLASmall extends chipyard.config.WithNVDLA("small")


Do you think these might belong in this file?

IIRC I think for you to make a build recipe like WithNVDLALarge_SomeFireSimConfig as the TARGET_CONFIG you need all of the components in the same scala package.

farzadfch · 2020-04-07T18:17:58Z

Also @farzadfch, I couldn't run your YOLOv3 workload yet, as there's an issue with the GLIBC headers used in the buildroot image. Haven't yet looked into how to resolve that issue.

What is the error? Does it complain about libgomp.so.1?

tymcauley · 2020-04-07T18:25:06Z

Also @farzadfch, I couldn't run your YOLOv3 workload yet, as there's an issue with the GLIBC headers used in the buildroot image. Haven't yet looked into how to resolve that issue.

What is the error? Does it complain about libgomp.so.1?

No, although I did have to add libgomp.so.1 to the BR2_TOOLCHAIN_EXTRA_EXTERNAL_LIBS line in software/firemarshal/wlutil/br/buildroot-config:

diff --git a/software/firemarshal/wlutil/br/buildroot-config b/software/firemarshal/wlutil/br/buildroot-config
index 69df8c3..54f3af4 100644
--- a/software/firemarshal/wlutil/br/buildroot-config
+++ b/software/firemarshal/wlutil/br/buildroot-config
@@ -717,7 +717,7 @@ BR2_TOOLCHAIN_EXTERNAL_INET_RPC=y
 BR2_TOOLCHAIN_EXTERNAL_PATH="$(RISCV)"
 BR2_TOOLCHAIN_EXTERNAL_PREFIX="$(ARCH)-unknown-linux-gnu"
 BR2_TOOLCHAIN_EXTERNAL_PREINSTALLED=y
-BR2_TOOLCHAIN_EXTRA_EXTERNAL_LIBS=""
+BR2_TOOLCHAIN_EXTRA_EXTERNAL_LIBS="libgomp.so.1"
 BR2_TOOLCHAIN_HAS_FULL_GETTEXT=y
 BR2_TOOLCHAIN_HAS_NATIVE_RPC=y
 BR2_TOOLCHAIN_HAS_SSP=y

Once I did that and ran solo.sh, I got this error:

# ./solo.sh
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libodlalayer.so)
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)

tymcauley · 2020-04-29T21:43:37Z

A few comments about some files in the nvdla-wrapper submodule:

src/main/resources/vsrc.mk: Should we include some information on how these file lists were generated from the files that come out of the NVDLA tree build? It'd be great to have a relatively clear path for people who are interested in adding support for a new NVDLA configuration.
src/main/resources/build-hw-vmod.sh: Probably related to the above question. I have changed this script locally so that instead of using cp to copy over the NVDLA RTL, it uses rsync with some --exclude and --include flags to gate which files we grab. Does that sound useful? If so, then we could also augment this script to auto-generate the contents of src/main/resources/vsrc.mk, if that seems practical. I'm thinking of something along these lines:

#!/bin/bash

set -ex

cd "$(dirname "$0")"

function build_nvdla_rtl() {
    local NVDLA_TYPE="$1"
    echo "Building NVDLA configuration: \"$NVDLA_TYPE\""
    cp tree.make hw
    sed -i -e "s/nv_small/$NVDLA_TYPE/" hw/tree.make
    cd hw
    ./tools/bin/tmake -build vmod
    cd ..
}

function install_nvdla_rtl() {
    local NVDLA_TYPE="$1"
    local SRC_DIR=hw/outdir/nv_$NVDLA_TYPE/vmod
    local DST_DIR=vsrc/$NVDLA_TYPE
    rm -rf "$DST_DIR/vmod"
    rsync -av \
        --exclude="*.vcp" \
        --exclude="*.swl" \
        --include="nv_assert_no_x.vlib" \
        --exclude="*.vlib" \
        "$SRC_DIR" "$DST_DIR"
}

function generate_vsrc_mk() {
    # TODO: generate `vsrc.mk` from the files in `vsrc/*`
    exit 1
}

NVDLA_TYPE=small
build_nvdla_rtl $NVDLA_TYPE
install_nvdla_rtl $NVDLA_TYPE

NVDLA_TYPE=large
build_nvdla_rtl $NVDLA_TYPE
install_nvdla_rtl $NVDLA_TYPE

generate_vsrc_mk

Then we could adjust the install_nvdla_rtl function to filter out files that we don't want to copy over, in case they would create issues for an auto-generated vsrc.mk.

abejgonzalez · 2020-04-29T23:41:01Z

The vsrc.mk actually took a long time to get right since it is hard to know what the base set of files needed are. I just did a brute force approach of copying the initial vsrc.mk from Farzad and adding/deleting files that caused problems in VCS/Verilator/Synth. I'm not sure if there is a specific way to know what files are used. If there is a specific way to get the file list I think that would be good.

As mentioned earlier, after I used the build-hw-vmod.sh script to build the RTL I just brute-forced searching for the right files. I'm not opposed to changing it... but I'm not sure how many files are included versus excluded. The script might be very large with excluded files but that is just speculation.

abejgonzalez · 2020-05-08T03:17:29Z

@tymcauley @farzadfch I'm looking to push this through into dev right after the #544 PR goes through. Any final comments or questions?

@alonamid Any documentation / usability comments?

tymcauley

LGTM! Thanks for driving this @abejgonzalez!

alonamid · 2020-05-08T22:12:44Z

docs/Generators/NVDLA.rst

+NVDLA Software with FireMarshal
+-------------------------------
+
+Located at ``software/nvdla-workload`` is a FireMarshal-based workload to boot Linux with the proper NVDLA drivers.


How likely is this to get broken upon Linux kernel upgrades?

Hopefully not by much... but frankly I wouldn't know until one happened.

One is expected to happen this week? firesim/FireMarshal#151

I updated the SW to match the bumped kernel (added extra instructions in the SW workload section to address this).

alonamid · 2020-05-08T22:20:21Z

Why does this include a firesim bump now that the firesim configs are inside the nvdla wrapper?

alonamid · 2020-05-08T22:21:07Z

LGTM overall, although other people tend to have stronger opinions than me about adding Verilator flags

abejgonzalez · 2020-05-08T22:22:31Z

Why does this include a firesim bump now that the firesim configs are inside the nvdla wrapper?

When I bump after the #544 PR, I will point to the correct FireSim repository.

abejgonzalez · 2020-05-08T22:24:38Z

LGTM overall, although other people tend to have stronger opinions than me about adding Verilator flags

The Verilator flag matches RC and fixes the same issue they were seeing so I don't see others having a problem with it.

jerryz123 · 2020-05-15T18:05:21Z

generators/firechip/src/main/scala/TargetConfigs.scala

 // Enables tracing on all cores
 class WithTraceIO extends Config((site, here, up) => {
  case BoomTilesKey => up(BoomTilesKey) map (tile => tile.copy(trace = true))
  case ArianeTilesKey => up(ArianeTilesKey) map (tile => tile.copy(trace = true))
  case TracePortKey => Some(TracePortParams())
 })

+// Adds a small/large NVDLA to the system
+class WithNVDLALarge extends nvidia.blocks.dla.WithNVDLA("large")
+class WithNVDLASmall extends nvidia.blocks.dla.WithNVDLA("small")


Why are these here?

IIRC to do something like this WithNVDLASmall_DDR3FRFCFSLLC4MB_FireSimQuadRocketConfig in the build recipe you need to have WithNVDLA... here. (so that everything is in the same scala package). I assume that this would be the easiest way to add an NVDLA for external users.

See https://github.com/ucb-bar/nvdla-wrapper/blob/master/firesim-collateral/nvdla_config_build_recipes.ini.

abejgonzalez and others added 8 commits April 2, 2020 11:49

[nvdla] initial nvdla integration

7be0cd1

[nvdla] add firesim configs

bdb8fcd

[nvdla] re-add accidentally deleted line

1c0b249

[nvdla] works on master with small

5c9ff11

[nvdla] use master branch of nvdla

ef97407

[nvdla] remove extra sources

2418d0c

[nvdla] bump

8c558dd

[nvdla + ariane] bump and use insert-includes for pre-processing

b4e379e

abejgonzalez added the enhancement label Apr 5, 2020

abejgonzalez self-assigned this Apr 5, 2020

abejgonzalez commented Apr 5, 2020

View reviewed changes

sims/vcs/Makefile Show resolved Hide resolved

[nvdla] add ci | remove target configs in FireChip | update naming

b460322

abejgonzalez added the DO NOT MERGE label Apr 6, 2020

jerryz123 reviewed Apr 6, 2020

View reviewed changes

generators/chipyard/src/main/scala/ConfigFragments.scala Outdated Show resolved Hide resolved

[nvdla] bump nvdla | fix ci run-tests error

e65f0ca

abejgonzalez added 3 commits April 6, 2020 13:32

[nvdla] re-enable PCWM-L error | fix/update makefile(s)

047ab17

[nvdla] bump nvdla fragments in FireChip

efa7d7c

[misc] bump tutorial patches

10d4975

tymcauley reviewed Apr 6, 2020

View reviewed changes

[chipyard] remove extra import

85853b3

abejgonzalez mentioned this pull request Apr 29, 2020

SimDRAM fails for 32-bit designs on verilator #462

Open

abejgonzalez mentioned this pull request Apr 29, 2020

Multi-Clock Support for NVDLA #539

Open

[ci skip] [nvdla] bump submodule urls

0bc0e35

abejgonzalez force-pushed the nvdla-integration branch from 2818e8f to c9d72ea Compare May 8, 2020 03:15

[misc] move firesim specific configs into nvdla dir [ci skip]

bd748d1

abejgonzalez force-pushed the nvdla-integration branch from c9d72ea to bd748d1 Compare May 8, 2020 03:16

tymcauley approved these changes May 8, 2020

View reviewed changes

alonamid reviewed May 8, 2020

View reviewed changes

abejgonzalez mentioned this pull request May 13, 2020

Support BlackBox RTL designs containing "include" directives firesim/firesim#522

Closed

abejgonzalez and others added 5 commits May 13, 2020 17:50

Merge remote-tracking branch 'origin/dev' into nvdla-integration

425838f

[nvdla] fix run-tests in ci

6cbeea2

update RC configs | bump marshal | bump nvdla-workload

0b4644c

[nvdla] bump nvdla-workload [ci skip]

ee8c698

Merge remote-tracking branch 'origin/dev' into nvdla-integration

6b2def8

jerryz123 reviewed May 15, 2020

View reviewed changes

alonamid approved these changes May 15, 2020

View reviewed changes

abejgonzalez added 3 commits May 15, 2020 21:39

add topology mixin to nvdla configs

9611bba

update tutorial patches

39100c8

Merge branch 'dev' into nvdla-integration

01f4c12

abejgonzalez merged commit 85b555d into dev May 16, 2020

abejgonzalez deleted the nvdla-integration branch May 17, 2020 03:33

alonamid mentioned this pull request May 30, 2020

Chipyard 1.3 Release #500

Merged

manox mentioned this pull request Aug 10, 2021

YOLO3 workload ucb-bar/nvdla-workload#5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVDLA Integration + Cleanup Ariane Preprocessing #505

NVDLA Integration + Cleanup Ariane Preprocessing #505

abejgonzalez commented Apr 5, 2020 •

edited

Loading

abejgonzalez commented Apr 5, 2020

farzadfch commented Apr 5, 2020

tymcauley commented Apr 6, 2020

abejgonzalez commented Apr 6, 2020 •

edited

Loading

davidbiancolin commented Apr 6, 2020 •

edited

Loading

tymcauley commented Apr 6, 2020

abejgonzalez commented Apr 6, 2020

tymcauley commented Apr 6, 2020

tymcauley Apr 6, 2020

abejgonzalez Apr 6, 2020 •

edited

Loading

farzadfch commented Apr 7, 2020

tymcauley commented Apr 7, 2020

tymcauley commented Apr 29, 2020

abejgonzalez commented Apr 29, 2020 •

edited

Loading

abejgonzalez commented May 8, 2020

tymcauley left a comment

alonamid May 8, 2020

abejgonzalez May 8, 2020

alonamid May 8, 2020

abejgonzalez May 15, 2020

alonamid commented May 8, 2020

alonamid commented May 8, 2020

abejgonzalez commented May 8, 2020

abejgonzalez commented May 8, 2020

jerryz123 May 15, 2020

abejgonzalez May 15, 2020

abejgonzalez May 15, 2020

		class WithNVDLALarge extends chipyard.config.WithNVDLA("large")
		class WithNVDLASmall extends chipyard.config.WithNVDLA("small")

NVDLA Integration + Cleanup Ariane Preprocessing #505

NVDLA Integration + Cleanup Ariane Preprocessing #505

Conversation

abejgonzalez commented Apr 5, 2020 • edited Loading

abejgonzalez commented Apr 5, 2020

farzadfch commented Apr 5, 2020

tymcauley commented Apr 6, 2020

abejgonzalez commented Apr 6, 2020 • edited Loading

davidbiancolin commented Apr 6, 2020 • edited Loading

tymcauley commented Apr 6, 2020

abejgonzalez commented Apr 6, 2020

tymcauley commented Apr 6, 2020

Choose a reason for hiding this comment

abejgonzalez Apr 6, 2020 • edited Loading

Choose a reason for hiding this comment

farzadfch commented Apr 7, 2020

tymcauley commented Apr 7, 2020

tymcauley commented Apr 29, 2020

abejgonzalez commented Apr 29, 2020 • edited Loading

abejgonzalez commented May 8, 2020

tymcauley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alonamid commented May 8, 2020

alonamid commented May 8, 2020

abejgonzalez commented May 8, 2020

abejgonzalez commented May 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abejgonzalez commented Apr 5, 2020 •

edited

Loading

abejgonzalez commented Apr 6, 2020 •

edited

Loading

davidbiancolin commented Apr 6, 2020 •

edited

Loading

abejgonzalez Apr 6, 2020 •

edited

Loading

abejgonzalez commented Apr 29, 2020 •

edited

Loading