Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVDLA Integration + Cleanup Ariane Preprocessing #505

Merged
merged 35 commits into from
May 16, 2020
Merged

Conversation

abejgonzalez
Copy link
Contributor

@abejgonzalez abejgonzalez commented Apr 5, 2020

Related issue:

Type of change: new feature

Impact: rtl change | software change

Release Notes

Chipyard
This integrates master of NVDLA into Chipyard based off work done in (https://github.com/CSL-KU/nvidia-dla-blocks) and (https://github.com/sifive/block-nvdla-sifive). This builds in both VCS and Verilator. Tested with the NVDLA test put in tests/. This also changes Ariane to use the insert-includes.py script that replaces includes (instead of using Verilator).

FireSim
Added both small@75mhz and large@50MHz NVDLA to Rocket configs. Both small and large have completed AFI builds.

Remaining Todos

  • Add documentation
  • Run FPGA tests (run YOLO3, ResNet-50, nvdla-sw regressions)
  • Add to CI (basic test only)
  • Consider removing FireSim configs in favor of just keeping config fragments (i.e. WithNVDLALarge WithNVDLASmall) - Removed
  • Add recipe to FireSim for small and large (just added small unless FireSim devs want more)
  • Rename opendla_small.h to something more reasonable
  • Rename HasPeripheryNVDLA to CanHave
  • Change .gitmodules for nvdla-workload nvdla-sw to be https

@abejgonzalez
Copy link
Contributor Author

Thanks to @farzadfch for the help!

@farzadfch
Copy link

Thanks for sharing the PR. A few points:

  • For FPGA tests, we want to start from the basic tests in here.
  • The NVDLA driver is written for Linux 4.13.3. I hope that is ports smoothly to 5.3.
  • The driver has been updated since I integrated NVDLA with FireSim. The DTS here is based on the older version. In the updated driver, the correct driver parameters are selected based on the DTS (here and here). So we should generate the correct DTS based on the configuration in the Chisel code.

@tymcauley
Copy link
Contributor

Agreed, thanks for sharing this! I've also done work recently on integrating the NVDLA into Chipyard/FireSim, so I'd love to contribute some of my learnings and help move this along.

For @farzadfch's third point, I generated the DTS compatibility string like this to integrate with the new kernel mode driver:

val dtsdevice = new SimpleDevice("nvdla",Seq("nvidia,nv_" + params.config))

For integrating the NVDLA Linux driver into the current FireSim Linux kernel (version 5.3), the process was not so bad, but changes were necessary. Some functions were deprecated, but have drop-in replacements, and some new #includes were needed (struct definitions moved around), but that was about the extent of it. I can give more specific input during the week 🙂.

I have a concern with the test tests/nvdla.c. It appears to be generated from the .cfg file in this directory (one of the NVDLA hardware testbench files). However, tests/nvdla.c doesn't appear to do any correctness testing (such as feeding in the data from CDP_0_input.dat and comparing the results with the data in CDP_0_output.dat). It'd be great for the test to ensure correctness in the operation.

Furthermore, since the tests/nvdla.c test appears to be targeted at the the nv_large configuration, it would be great to describe a process for integrating tests for other NVDLA configurations, since this directory appears to contain tests for a number of configurations. And perhaps this test will work fine for other configurations, but it would still be valuable to make it clear how to develop other NVDLA tests to exercise other portions of the design.

I'm also curious to read through scripts/insert-includes.py a bit more this week, I had to jump through some hoops to get all of the RTL collateral to integrate with Chipyard/FireSim (I ended up adding a ton of addResource calls to nvdla.scala).

Regarding the nvdla_large.v and nvdla_small.v wrapper files, do we want to always tie together the core_clk and csb_clk inputs? I know this was done to accommodate the single-clock requirements for Black Boxes in FireSim. Is that requirement still in place? Also, could we design a system that ties these clocks together only when you're building the system for FireSim and not for an SoC?

Related FireSim vs SoC question: I haven't thought about this too much yet, but it would be good to look into the issue of selecting the NVDLA-provided FPGA vs. synthesis RAM models for these two different use cases as well.

@abejgonzalez
Copy link
Contributor Author

abejgonzalez commented Apr 6, 2020

Wow. Thanks for the quick feedback!

Linux Testing:
I haven't done any testing with Linux yet so that is the next step. @tymcauley would you be willing to share your changes/integration for the NVDLA Linux driver?

CI Tests/Other Tests:
As for the nvdla.riscv test, I merely copied it from @farzadfch's other work, so frankly I don't know anything about what it does. I think it would be a good idea to have a better test to make sure it is integrated properly however, I know very little about the tests in the NVDLA repo. I can try to look into some of the NVDLA tests when I have the chance but any recommendations/pointers are welcome!

Insert Includes:
The scripts/insert-includes.py is a very "dumb" script that just looks for includes and replaces them with the file it matches with. Normally, I would pre-process with Verilator then pass in the pre-processed output to the rest of the flow, however this removes any comments that may be needed later (i.e. pragmas and *synopsys*). So this script might be a reasonable band-aid? This script works for these two cases, but it can change if it needs to be more robust.

Clock Requirements:
Yes. I think the single-clock BB requirement is still in place (@davidbiancolin). We could potentially write the wrapper files (nvdla_small/large.v) in Chisel and pass in two different clocks. Thus the black boxes would be the inner modules (NV_NVDLA_apb2csb1, NV_nvdla), not the wrapper module. I think that might make things work nicely with multi-clock support in FireSim. With that being said, I would like to keep extensive FireSim multi-clock changes out of this PR since we are still testing/developing that feature.

FPGA vs SoC RAMs:
I think this should be relatively easy. We can just pass a flag into the NVDLA preprocess Verilog makefile that switches between the two types of RAMs.

Please let me know if you guys have any other questions or concerns!

@davidbiancolin
Copy link
Contributor

davidbiancolin commented Apr 6, 2020

It is now legal to have blackboxes with multiple clocks in FireSim, but i have not tested that case. It should just trivially work though if the existing clock gating scheme works on the NVDLA with a single clock.

@tymcauley
Copy link
Contributor

Here's what I did with the Linux driver:

  • I started with the @farzadfch's firesim-nvdla branch of riscv-linux
  • I then applied a few updates to the KMD from the NVDLA software repo. I used these commits from that repo:
    • 2841094c12531252ad8b7f73a3635bbd13ca5858
    • 8df37971b901b623aaeba66e6c998f74a499c272
    • d01dfc8640b8664703047315e5a337602ab47d2d
    • 9deea34f2d61bd55abb8ec5648e7079d9665e220
    • 4022452c3124cdf1fcbb1124ce884187375a80ea
  • Notably, I avoided including the changes from the following commits from the upstream nvdla/sw repo, as at least one of them introduced regressions in the KMD's behavior. I didn't do an exhaustive test to ensure which commit(s) were the problem though:
    • 16befcb84f8ccd2b3d1b0683b965375ae11274df
    • c906fc63eb0f20bc6ea6337fc7ef38647923339f
    • fae1eab705673fd63df03f4a927d6fda81e2a13b

At this point, I moved all of this work into the current Linux kernel fork in the FireMarshal repo. If you try to compile the Linux kernel, you'll run into a number of compilation errors. Here are the diffs I had to make to solve those errors:

  • Add #include <linux/uaccess.h> to include/nvdla_linux.h
  • The following changes to nvdla_gem.c:
    • Add #include <drm/drm_device.h>, #include <drm/drm_drv.h>, and #include <linux/dma-mapping.h>
    • Replace drm_gem_object_unreference_unlocked(...) with drm_gem_object_put_unlocked(...)
    • Replace drm_dev_unref(...) with drm_dev_put(...)
    • Change the dma_declare_coherent_memory(...) invocation to remove the final argument, DMA_MEMORY_EXCLUSIVE.

After making those changes, the driver compiled and functioned properly. I've only tested with the nv_large configuration, not nv_small. Also @farzadfch, I couldn't run your YOLOv3 workload yet, as there's an issue with the GLIBC headers used in the buildroot image. Haven't yet looked into how to resolve that issue.

@abejgonzalez
Copy link
Contributor Author

Thanks @tymcauley Ill take a look at the Linux driver work. I also added a switch to the config to choose between fpga/synth rams. Let me know if that works for you.

@tymcauley
Copy link
Contributor

Thanks @tymcauley Ill take a look at the Linux driver work. I also added a switch to the config to choose between fpga/synth rams. Let me know if that works for you.

That switch looks great to me, thanks!

Comment on lines 78 to 79
class WithNVDLALarge extends chipyard.config.WithNVDLA("large")
class WithNVDLASmall extends chipyard.config.WithNVDLA("small")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think these might belong in this file?

Copy link
Contributor Author

@abejgonzalez abejgonzalez Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC I think for you to make a build recipe like WithNVDLALarge_SomeFireSimConfig as the TARGET_CONFIG you need all of the components in the same scala package.

@farzadfch
Copy link

Also @farzadfch, I couldn't run your YOLOv3 workload yet, as there's an issue with the GLIBC headers used in the buildroot image. Haven't yet looked into how to resolve that issue.

What is the error? Does it complain about libgomp.so.1?

@tymcauley
Copy link
Contributor

Also @farzadfch, I couldn't run your YOLOv3 workload yet, as there's an issue with the GLIBC headers used in the buildroot image. Haven't yet looked into how to resolve that issue.

What is the error? Does it complain about libgomp.so.1?

No, although I did have to add libgomp.so.1 to the BR2_TOOLCHAIN_EXTRA_EXTERNAL_LIBS line in software/firemarshal/wlutil/br/buildroot-config:

diff --git a/software/firemarshal/wlutil/br/buildroot-config b/software/firemarshal/wlutil/br/buildroot-config
index 69df8c3..54f3af4 100644
--- a/software/firemarshal/wlutil/br/buildroot-config
+++ b/software/firemarshal/wlutil/br/buildroot-config
@@ -717,7 +717,7 @@ BR2_TOOLCHAIN_EXTERNAL_INET_RPC=y
 BR2_TOOLCHAIN_EXTERNAL_PATH="$(RISCV)"
 BR2_TOOLCHAIN_EXTERNAL_PREFIX="$(ARCH)-unknown-linux-gnu"
 BR2_TOOLCHAIN_EXTERNAL_PREINSTALLED=y
-BR2_TOOLCHAIN_EXTRA_EXTERNAL_LIBS=""
+BR2_TOOLCHAIN_EXTRA_EXTERNAL_LIBS="libgomp.so.1"
 BR2_TOOLCHAIN_HAS_FULL_GETTEXT=y
 BR2_TOOLCHAIN_HAS_NATIVE_RPC=y
 BR2_TOOLCHAIN_HAS_SSP=y

Once I did that and ran solo.sh, I got this error:

# ./solo.sh
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by ./darknet)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libodlalayer.so)
./darknet: /lib/libpthread.so.0: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libdarknet.so)
./darknet: /lib/libm.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)
./darknet: /lib/libc.so.6: version `GLIBC_2.26' not found (required by /root/darknet-nvdla/libnvdla_runtime.so)

@tymcauley
Copy link
Contributor

A few comments about some files in the nvdla-wrapper submodule:

  • src/main/resources/vsrc.mk: Should we include some information on how these file lists were generated from the files that come out of the NVDLA tree build? It'd be great to have a relatively clear path for people who are interested in adding support for a new NVDLA configuration.
  • src/main/resources/build-hw-vmod.sh: Probably related to the above question. I have changed this script locally so that instead of using cp to copy over the NVDLA RTL, it uses rsync with some --exclude and --include flags to gate which files we grab. Does that sound useful? If so, then we could also augment this script to auto-generate the contents of src/main/resources/vsrc.mk, if that seems practical. I'm thinking of something along these lines:
#!/bin/bash

set -ex

cd "$(dirname "$0")"

function build_nvdla_rtl() {
    local NVDLA_TYPE="$1"
    echo "Building NVDLA configuration: \"$NVDLA_TYPE\""
    cp tree.make hw
    sed -i -e "s/nv_small/$NVDLA_TYPE/" hw/tree.make
    cd hw
    ./tools/bin/tmake -build vmod
    cd ..
}

function install_nvdla_rtl() {
    local NVDLA_TYPE="$1"
    local SRC_DIR=hw/outdir/nv_$NVDLA_TYPE/vmod
    local DST_DIR=vsrc/$NVDLA_TYPE
    rm -rf "$DST_DIR/vmod"
    rsync -av \
        --exclude="*.vcp" \
        --exclude="*.swl" \
        --include="nv_assert_no_x.vlib" \
        --exclude="*.vlib" \
        "$SRC_DIR" "$DST_DIR"
}

function generate_vsrc_mk() {
    # TODO: generate `vsrc.mk` from the files in `vsrc/*`
    exit 1
}

NVDLA_TYPE=small
build_nvdla_rtl $NVDLA_TYPE
install_nvdla_rtl $NVDLA_TYPE

NVDLA_TYPE=large
build_nvdla_rtl $NVDLA_TYPE
install_nvdla_rtl $NVDLA_TYPE

generate_vsrc_mk

Then we could adjust the install_nvdla_rtl function to filter out files that we don't want to copy over, in case they would create issues for an auto-generated vsrc.mk.

@abejgonzalez
Copy link
Contributor Author

abejgonzalez commented Apr 29, 2020

The vsrc.mk actually took a long time to get right since it is hard to know what the base set of files needed are. I just did a brute force approach of copying the initial vsrc.mk from Farzad and adding/deleting files that caused problems in VCS/Verilator/Synth. I'm not sure if there is a specific way to know what files are used. If there is a specific way to get the file list I think that would be good.

As mentioned earlier, after I used the build-hw-vmod.sh script to build the RTL I just brute-forced searching for the right files. I'm not opposed to changing it... but I'm not sure how many files are included versus excluded. The script might be very large with excluded files but that is just speculation.

@abejgonzalez
Copy link
Contributor Author

@tymcauley @farzadfch I'm looking to push this through into dev right after the #544 PR goes through. Any final comments or questions?

@alonamid Any documentation / usability comments?

Copy link
Contributor

@tymcauley tymcauley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for driving this @abejgonzalez!

NVDLA Software with FireMarshal
-------------------------------

Located at ``software/nvdla-workload`` is a FireMarshal-based workload to boot Linux with the proper NVDLA drivers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely is this to get broken upon Linux kernel upgrades?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully not by much... but frankly I wouldn't know until one happened.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is expected to happen this week? firesim/FireMarshal#151

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the SW to match the bumped kernel (added extra instructions in the SW workload section to address this).

@alonamid
Copy link
Contributor

alonamid commented May 8, 2020

Why does this include a firesim bump now that the firesim configs are inside the nvdla wrapper?

@alonamid
Copy link
Contributor

alonamid commented May 8, 2020

LGTM overall, although other people tend to have stronger opinions than me about adding Verilator flags

@abejgonzalez
Copy link
Contributor Author

Why does this include a firesim bump now that the firesim configs are inside the nvdla wrapper?

When I bump after the #544 PR, I will point to the correct FireSim repository.

@abejgonzalez
Copy link
Contributor Author

LGTM overall, although other people tend to have stronger opinions than me about adding Verilator flags

The Verilator flag matches RC and fixes the same issue they were seeing so I don't see others having a problem with it.

// Enables tracing on all cores
class WithTraceIO extends Config((site, here, up) => {
case BoomTilesKey => up(BoomTilesKey) map (tile => tile.copy(trace = true))
case ArianeTilesKey => up(ArianeTilesKey) map (tile => tile.copy(trace = true))
case TracePortKey => Some(TracePortParams())
})

// Adds a small/large NVDLA to the system
class WithNVDLALarge extends nvidia.blocks.dla.WithNVDLA("large")
class WithNVDLASmall extends nvidia.blocks.dla.WithNVDLA("small")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC to do something like this WithNVDLASmall_DDR3FRFCFSLLC4MB_FireSimQuadRocketConfig in the build recipe you need to have WithNVDLA... here. (so that everything is in the same scala package). I assume that this would be the easiest way to add an NVDLA for external users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abejgonzalez abejgonzalez merged commit 85b555d into dev May 16, 2020
@abejgonzalez abejgonzalez deleted the nvdla-integration branch May 17, 2020 03:33
@alonamid alonamid mentioned this pull request May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants