Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added useful "Build script exited at step X" errors for each step in build-setup.sh #1614

Merged
merged 7 commits into from
Oct 9, 2023

Conversation

JL102
Copy link
Contributor

@JL102 JL102 commented Oct 5, 2023

This was the only way I knew how to display the step at which the build-setup process failed. I've personally experienced failures at multiple of the build steps, and before I got used to Chipyard, it was hard to figure out which step was the culprit. With this, users should have a bit more info to troubleshoot their issues. For some of the build steps that required multiple lines, I figured it made more sense to put them into a sub-script, rather than putting a && at the end of each line. But for the firesim one for example, since it was two .sh calls, I just put a && after the first one, inside of the try block, to make sure both lines run.

Of course, it would be ideal if all the build steps work without issue, but Chipyard has so many moving parts that I think there's the potential for any of the steps to break. For example, when working with my own fork, I've often accidentally left typos or syntax errors in the code before cloning and re-running setup, leading to step 5 breaking.

Related PRs / Issues:

Type of change:

  • Bug fix
  • New feature
  • Other enhancement

Impact:

  • RTL change
  • Software change (RISC-V software)
  • Build system change
  • Other

Contributor Checklist:

  • Did you set main as the base branch?
  • Is this PR's title suitable for inclusion in the changelog and have you added a changelog:<topic> label?
  • Did you state the type-of-change/impact?
  • Did you delete any extraneous prints/debugging code?
  • (N/A, no permissions) Did you mark the PR with a changelog: label?
  • (N/A) Did you update the conda .conda-lock.yml file if you updated the conda requirements file?
  • (Note below**) Did you add documentation for the feature?
  • (If applicable) Did you add a test demonstrating the PR?
  • (If applicable) Did you mark the PR as Please Backport?

** I checked the help text ./build-setup.sh --help and Chapter 1.4 of readthedocs and I don't see any specific place that needs to be changed. It may be beneficial to include general tips on debugging issues? But I think that may be outside of the scope of this PR.

Example output (I intentionally added a typo to DigitalTop.scala and ran ./build-setup.sh -s 1 -s 2 -s 3 -s 4):

...
[info] set current project to chipyardRoot (in build file:/home/drak/chipyard_wip/)
[info] set current project to chipyard (in build file:/home/drak/chipyard_wip/)
[info] compiling 1 Scala source to /home/drak/chipyard_wip/generators/chipyard/target/scala-2.13/classes ...
[error] /home/drak/chipyard_wip/generators/chipyard/src/main/scala/DigitalTop.scala:42:30: not found: type DigitalTop
[error] class DigitalTopModule[+L <: DigitalTop](l: L) extends ChipyardSystemModule(l)
[error]                              ^
[error] /home/drak/chipyard_wip/generators/chipyard/src/main/scala/DigitalTop.scala:42:56: inferred type arguments [L] do not conform to class ChipyardSystemModule's type parameter bounds [+
L <: chipyard.ChipyardSystem]
[error] class DigitalTopModule[+L <: DigitalTop](l: L) extends ChipyardSystemModule(l)
[error]                                                        ^
[error] /home/drak/chipyard_wip/generators/chipyard/src/main/scala/DigitalTop.scala:42:77: type mismatch;
[error]  found   : L(in class DigitalTopModule)
[error]  required: L(in class ChipyardSystemModule)
[error] class DigitalTopModule[+L <: DigitalTop](l: L) extends ChipyardSystemModule(l)
[error]                                                                             ^
[error] three errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 4 s, completed Oct 5, 2023, 2:29:18 AM
make: *** [/home/drak/chipyard_wip/common.mk:438: launch-sbt] Error 1
Build script exited with exit code 2 at step 5: chipyard pre-compile sources. Check the above logs for more details on the error.

This was the only way I knew how to display the step at which the build-setup process failed.
I've personally experienced failures at multiple of the build steps, and before I got used to Chipyard,
it was hard to figure out which step was the culprit. With this, users should have a bit more info to
troubleshoot their issues. For some of the build steps that required multiple lines, I figured it made
more sense to put them into a sub-script, rather than putting a && at the end of each line. But for the
firesim one for example, since it was two .sh calls, I just put a && after the first one, inside of the
try block, to make sure both lines run.
Copy link
Contributor

@jerryz123 jerryz123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this change. @abejgonzalez does this look fine to you?

@JL102
Copy link
Contributor Author

JL102 commented Oct 5, 2023

Actually, hang on. For some reason, when the Conda steps are put into the separate script build-step-init-conda-environment.sh, it causes step 3 to fail.

==>  Building riscv-pk
+ /usr/bin/make -C build
make: Entering directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/file.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/syscall.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/handlers.c
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/frontend.c
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: file.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [Makefile:336: syscall.o] Error 1
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: handlers.o] Error 1
make: *** [Makefile:336: frontend.o] Error 1
make: Leaving directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'

Any idea why gcc would not recognize the -mcmodel=medany argument when the Conda steps are put into the separate script? I confirmed that it was not an issue with the $TOOLCHAIN_TYPE or $PREFIX variables, by simply changing $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX to echo $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX, both when using the new sub-script and after reverting step 1 to its original code. In both cases, build-toolchain-extra.sh was being called with /home/drak/chipyard_pr/scripts/build-toolchain-extra.sh riscv-tools -p /home/drak/chipyard_pr/.conda-env/riscv-tools

I tested this multiple times, and I'm confident that the Conda step is what causes this error. But it seems like a non sequitur, since all the same commands are being run, and build-step-init-conda-environment.sh gets access to all of build-setup.sh's internal variables due to it being called with the source prefix. Makes no sense to me.

scripts/build-setup.sh Outdated Show resolved Hide resolved
@abejgonzalez
Copy link
Contributor

Actually, hang on. For some reason, when the Conda steps are put into the separate script build-step-init-conda-environment.sh, it causes step 3 to fail.

==>  Building riscv-pk
+ /usr/bin/make -C build
make: Entering directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/file.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/syscall.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/handlers.c
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/frontend.c
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: file.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [Makefile:336: syscall.o] Error 1
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: handlers.o] Error 1
make: *** [Makefile:336: frontend.o] Error 1
make: Leaving directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'

Any idea why gcc would not recognize the -mcmodel=medany argument when the Conda steps are put into the separate script? I confirmed that it was not an issue with the $TOOLCHAIN_TYPE or $PREFIX variables, by simply changing $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX to echo $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX, both when using the new sub-script and after reverting step 1 to its original code. In both cases, build-toolchain-extra.sh was being called with /home/drak/chipyard_pr/scripts/build-toolchain-extra.sh riscv-tools -p /home/drak/chipyard_pr/.conda-env/riscv-tools

I tested this multiple times, and I'm confident that the Conda step is what causes this error. But it seems like a non sequitur, since all the same commands are being run, and build-step-init-conda-environment.sh gets access to all of build-setup.sh's internal variables due to it being called with the source prefix. Makes no sense to me.

Creating the conda env adds a new environment variable called $RISCV that is needed for cross-compiling things. So activating conda in a sub-shell probably doesn't work here.

@JL102
Copy link
Contributor Author

JL102 commented Oct 6, 2023

Learned a lot about bash today!

When making the new set of functions, I originally tried to make an alias begin_step and end_step, which created a sub-shell using parentheses and set -e inside of the sub shell; but realized that the environment variables set inside the sub shell don't persist outside of it.

After being inspired by http://mywiki.wooledge.org/BashFAQ/105, I added some &&s and placed exit_if_last_command_failed wherever a command is run that could possibly error out. I think this revision is much more stable than the first.

@JL102
Copy link
Contributor Author

JL102 commented Oct 6, 2023

This revision should be good

Copy link
Contributor

@abejgonzalez abejgonzalez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SFTM! Thanks!

@jerryz123 jerryz123 merged commit 0665f98 into ucb-bar:main Oct 9, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants