Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balar mmio with Vanadis #2428

Open
wants to merge 59 commits into
base: devel
Choose a base branch
from

Conversation

William-An
Copy link
Contributor

@William-An William-An commented Dec 12, 2024

Balar mmio with Vanadis

  • Add support for using Balar as an MMAP device for Vanadis to access
  • Create a custom CUDA runtime lib to run CUDA programs with Vanadis
  • Add more CUDA runtime API to support rodinia-2.0 benchmark
  • Add more unit test test cases

	* Add a new CUDA API id "GPU_PARAM_CONFIG" to support
	  querying kernel function argument size and alignment
	  information from GPGPU-Sim.

	* Add param "cuda_executable" to BalarMMIO so that it
	  can know the CUDA binary path when running LLVM CUDA
	  code (Vanadis cannot know the host file structure).

	* Add all the CUDA API implementations needed to link
	  the test program inside tests/vanadisLLVMRISCV.

	* Minor formatting changes.
@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvoskuilen @feldergast Do we want to keep the prerequisites in this readme or remove them in favor of the list that we test against? Already discussed what testing we want in the nightlies versus weeklies.


- Tested on commit `0f358dda178f96db3b0da88b2b965492c4be187d`
- Use `./configure --prefix=$SST_CORE_HOME --disable-mpi --disable-mem-pools` for sst-core config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An Did you test at all with mem pools enabled?

balar->cuda_ret.is_cuda_call_done = false;

// Create a DMA request to read the cuda call packet from cache to balar
DMAEngine::DMAEngineControlRegisters dma_registers;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An Did we discuss putting this in memH or vanadis?
@gvoskuilen

gridDim,
blockDim,
packet->configure_call.sharedMem,
packet->configure_call.stream
(cudaStream_t) packet->configure_call.stream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do CUDA streams work in this framework?

GPU_MALLOC_HOST_RET,
};

// Future: Make this into a class with additional serialization methods?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvoskuilen @feldergast Is this going to be necessary for checkpointing/debug?

# Constans shared across components
network_bw = "25GB/s"
clock = "2GHz"
balar_mmio_testcpu_addr = 4096
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An How configurable are the mmio addresses?

clock = "2GHz"
balar_mmio_testcpu_addr = 4096
balar_mmio_vanadis_addr = 0x80100000
balar_mmio_size = 1024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the mmio sizes?

uint64_t size;
uint64_t offset;
uint8_t value[200];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An If this is related to the array from above, we should find a way to ensure that this is propagated everywhere that relies on it.

@@ -43,7 +48,8 @@ int main( int argc, char* argv[] ) {

// Preparing the data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only five updates? And why is n = 10k?


/**
* @file cuda_runtime_api.h
* @author Weili An (an107@purdue.edu)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@William-An You should probably remove your email address from these unless you want users bugging you directly. ^-^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants