Added useful "Build script exited at step X" errors for each step in build-setup.sh #1614

JL102 · 2023-10-05T06:33:50Z

This was the only way I knew how to display the step at which the build-setup process failed. I've personally experienced failures at multiple of the build steps, and before I got used to Chipyard, it was hard to figure out which step was the culprit. With this, users should have a bit more info to troubleshoot their issues. For some of the build steps that required multiple lines, I figured it made more sense to put them into a sub-script, rather than putting a && at the end of each line. But for the firesim one for example, since it was two .sh calls, I just put a && after the first one, inside of the try block, to make sure both lines run.

Of course, it would be ideal if all the build steps work without issue, but Chipyard has so many moving parts that I think there's the potential for any of the steps to break. For example, when working with my own fork, I've often accidentally left typos or syntax errors in the code before cloning and re-running setup, leading to step 5 breaking.

Related PRs / Issues:

Type of change:

Bug fix
New feature
Other enhancement

Impact:

RTL change
Software change (RISC-V software)
Build system change
Other

Contributor Checklist:

Did you set main as the base branch?
Is this PR's title suitable for inclusion in the changelog and have you added a changelog:<topic> label?
Did you state the type-of-change/impact?
Did you delete any extraneous prints/debugging code?
(N/A, no permissions) Did you mark the PR with a changelog: label?
(N/A) Did you update the conda .conda-lock.yml file if you updated the conda requirements file?
(Note below**) Did you add documentation for the feature?
(If applicable) Did you add a test demonstrating the PR?

(If applicable) Did you mark the PR as Please Backport?

** I checked the help text ./build-setup.sh --help and Chapter 1.4 of readthedocs and I don't see any specific place that needs to be changed. It may be beneficial to include general tips on debugging issues? But I think that may be outside of the scope of this PR.

Example output (I intentionally added a typo to DigitalTop.scala and ran ./build-setup.sh -s 1 -s 2 -s 3 -s 4):

...
[info] set current project to chipyardRoot (in build file:/home/drak/chipyard_wip/)
[info] set current project to chipyard (in build file:/home/drak/chipyard_wip/)
[info] compiling 1 Scala source to /home/drak/chipyard_wip/generators/chipyard/target/scala-2.13/classes ...
[error] /home/drak/chipyard_wip/generators/chipyard/src/main/scala/DigitalTop.scala:42:30: not found: type DigitalTop
[error] class DigitalTopModule[+L <: DigitalTop](l: L) extends ChipyardSystemModule(l)
[error]                              ^
[error] /home/drak/chipyard_wip/generators/chipyard/src/main/scala/DigitalTop.scala:42:56: inferred type arguments [L] do not conform to class ChipyardSystemModule's type parameter bounds [+
L <: chipyard.ChipyardSystem]
[error] class DigitalTopModule[+L <: DigitalTop](l: L) extends ChipyardSystemModule(l)
[error]                                                        ^
[error] /home/drak/chipyard_wip/generators/chipyard/src/main/scala/DigitalTop.scala:42:77: type mismatch;
[error]  found   : L(in class DigitalTopModule)
[error]  required: L(in class ChipyardSystemModule)
[error] class DigitalTopModule[+L <: DigitalTop](l: L) extends ChipyardSystemModule(l)
[error]                                                                             ^
[error] three errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 4 s, completed Oct 5, 2023, 2:29:18 AM
make: *** [/home/drak/chipyard_wip/common.mk:438: launch-sbt] Error 1
Build script exited with exit code 2 at step 5: chipyard pre-compile sources. Check the above logs for more details on the error.

This was the only way I knew how to display the step at which the build-setup process failed. I've personally experienced failures at multiple of the build steps, and before I got used to Chipyard, it was hard to figure out which step was the culprit. With this, users should have a bit more info to troubleshoot their issues. For some of the build steps that required multiple lines, I figured it made more sense to put them into a sub-script, rather than putting a && at the end of each line. But for the firesim one for example, since it was two .sh calls, I just put a && after the first one, inside of the try block, to make sure both lines run.

jerryz123

I really like this change. @abejgonzalez does this look fine to you?

JL102 · 2023-10-05T19:49:42Z

Actually, hang on. For some reason, when the Conda steps are put into the separate script build-step-init-conda-environment.sh, it causes step 3 to fail.

==>  Building riscv-pk
+ /usr/bin/make -C build
make: Entering directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/file.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/syscall.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/handlers.c
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/frontend.c
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: file.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [Makefile:336: syscall.o] Error 1
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: handlers.o] Error 1
make: *** [Makefile:336: frontend.o] Error 1
make: Leaving directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'

Any idea why gcc would not recognize the -mcmodel=medany argument when the Conda steps are put into the separate script? I confirmed that it was not an issue with the $TOOLCHAIN_TYPE or $PREFIX variables, by simply changing $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX to echo $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX, both when using the new sub-script and after reverting step 1 to its original code. In both cases, build-toolchain-extra.sh was being called with /home/drak/chipyard_pr/scripts/build-toolchain-extra.sh riscv-tools -p /home/drak/chipyard_pr/.conda-env/riscv-tools

I tested this multiple times, and I'm confident that the Conda step is what causes this error. But it seems like a non sequitur, since all the same commands are being run, and build-step-init-conda-environment.sh gets access to all of build-setup.sh's internal variables due to it being called with the source prefix. Makes no sense to me.

scripts/build-setup.sh

abejgonzalez · 2023-10-06T18:49:29Z

Actually, hang on. For some reason, when the Conda steps are put into the separate script build-step-init-conda-environment.sh, it causes step 3 to fail.
==>  Building riscv-pk
+ /usr/bin/make -C build
make: Entering directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/file.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/syscall.c
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/handlers.c
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc -MMD -MP -Wall -Werror -D__NO_INLINE__ -mcmodel=medany -O2 -std=gnu99 -Wno-unused -Wno-attributes -fno-delete-null-pointer-checks -fno-PIE    -DBBL_LOGO_FILE=\"bbl_logo_file\" -DMEM_START=0x80000000 -fno-stack-protector -U_FORTIFY_SOURCE -DBBL_PAYLOAD=\"bbl_payload\" -I. -I../pk -I../bbl -I../softfloat -I../dummy_payload -I../machine -I../util -c ../pk/frontend.c
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: file.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [Makefile:336: syscall.o] Error 1
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
gcc: error: unrecognized argument in option ‘-mcmodel=medany’
gcc: note: valid arguments to ‘-mcmodel=’ are: 32 kernel large medium small
make: *** [Makefile:336: handlers.o] Error 1
make: *** [Makefile:336: frontend.o] Error 1
make: Leaving directory '/home/drak/chipyard_pr/toolchains/riscv-tools/riscv-pk/build'
Any idea why gcc would not recognize the -mcmodel=medany argument when the Conda steps are put into the separate script? I confirmed that it was not an issue with the $TOOLCHAIN_TYPE or $PREFIX variables, by simply changing $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX to echo $CYDIR/scripts/build-toolchain-extra.sh $TOOLCHAIN_TYPE -p $PREFIX, both when using the new sub-script and after reverting step 1 to its original code. In both cases, build-toolchain-extra.sh was being called with /home/drak/chipyard_pr/scripts/build-toolchain-extra.sh riscv-tools -p /home/drak/chipyard_pr/.conda-env/riscv-tools

I tested this multiple times, and I'm confident that the Conda step is what causes this error. But it seems like a non sequitur, since all the same commands are being run, and build-step-init-conda-environment.sh gets access to all of build-setup.sh's internal variables due to it being called with the source prefix. Makes no sense to me.

Creating the conda env adds a new environment variable called $RISCV that is needed for cross-compiling things. So activating conda in a sub-shell probably doesn't work here.

This also fixed the weird issue I was experiencing where the try-catch in step 1 caused step 3 to break

JL102 · 2023-10-06T23:00:37Z

Learned a lot about bash today!

When making the new set of functions, I originally tried to make an alias begin_step and end_step, which created a sub-shell using parentheses and set -e inside of the sub shell; but realized that the environment variables set inside the sub shell don't persist outside of it.

After being inspired by http://mywiki.wooledge.org/BashFAQ/105, I added some &&s and placed exit_if_last_command_failed wherever a command is run that could possibly error out. I think this revision is much more stable than the first.

JL102 · 2023-10-06T23:49:36Z

This revision should be good

abejgonzalez

SFTM! Thanks!

JL102 added 2 commits October 5, 2023 02:17

Merge branch 'ucb-bar:main' into script-trycatch

3b7057c

jerryz123 requested review from abejgonzalez and jerryz123 October 5, 2023 18:32

jerryz123 approved these changes Oct 5, 2023

View reviewed changes

jerryz123 added the changelog:added label Oct 5, 2023

abejgonzalez requested changes Oct 6, 2023

View reviewed changes

scripts/build-setup.sh Outdated Show resolved Hide resolved

JL102 added 3 commits October 6, 2023 18:43

Replaced "try-catch" with a more special-purpose set of functions

b76ab6b

This also fixed the weird issue I was experiencing where the try-catch in step 1 caused step 3 to break

Made indentation consistent

aded25f

I think now I put &&s everywhere that is necessary

a7993db

Remove now-unused build-step scripts

9b55722

JL102 mentioned this pull request Oct 7, 2023

error occurred in "./build-setup.sh riscv-tools" for the 1.9.0 #1441

Open

3 tasks

Merge branch 'main' into script-trycatch

5bbebb8

JL102 requested a review from abejgonzalez October 8, 2023 05:13

abejgonzalez approved these changes Oct 8, 2023

View reviewed changes

jerryz123 merged commit 0665f98 into ucb-bar:main Oct 9, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added useful "Build script exited at step X" errors for each step in build-setup.sh #1614

Added useful "Build script exited at step X" errors for each step in build-setup.sh #1614

JL102 commented Oct 5, 2023 •

edited

Loading

jerryz123 left a comment

JL102 commented Oct 5, 2023 •

edited

Loading

abejgonzalez commented Oct 6, 2023

JL102 commented Oct 6, 2023

JL102 commented Oct 6, 2023

abejgonzalez left a comment

Added useful "Build script exited at step X" errors for each step in build-setup.sh #1614

Added useful "Build script exited at step X" errors for each step in build-setup.sh #1614

Conversation

JL102 commented Oct 5, 2023 • edited Loading

jerryz123 left a comment

Choose a reason for hiding this comment

JL102 commented Oct 5, 2023 • edited Loading

abejgonzalez commented Oct 6, 2023

JL102 commented Oct 6, 2023

JL102 commented Oct 6, 2023

abejgonzalez left a comment

Choose a reason for hiding this comment

JL102 commented Oct 5, 2023 •

edited

Loading

JL102 commented Oct 5, 2023 •

edited

Loading