📝 15 Dec 2024
2 Dec 2024: Christmas ain’t here yet, but our Dashboard for Apache NuttX RTOS is already Decked in Red…
Which says that NuttX Build is failing for ESP32-C6, as reported by NuttX Build Farm. (More about CI Test next article)
“riscv_exit.c: error: ‘tcb’ undeclared:
g_running_tasks[this_cpu()] = tcb”
Normally our NuttX Maintainers will scramble to identify the Breaking Commit. (Before it gets piled on by More Breaking Commits)
Not any more! Now we can go back in time and “Rewind The Build”, when something breaks the Daily Build…
## Rewind The Build for
## NuttX Target esp32c6-devkitc:gpio
$ sudo sh -c '
. ../github-token.sh &&
./rewind-build.sh
esp32c6-devkitc:gpio
'
Build Failed for This Commit:
nuttx @ 400239877d55b3f63f72c96ca27d44220ae35a89
[Build OK for Previous Commit:
nuttx @ 19e42a8978179d23a49c9090c9a713206e6575d0]
Build Failed for Next Commit:
nuttx @ 140b3080c5f6921e0f9cec0a56ebdb72ca51d1d8
## A-ha! 40023987 is the Breaking Commit!
In this article, we look inside our new tool to Rewind The NuttX Build…
How we run the Rewind Build Script
How the Breaking Commit appears in NuttX Build History (pic below)
What’s inside our Rewind Script
How we used to Rewind Builds Manually
Why we build NuttX in Docker
How does it work?
## Rewind The Build for NuttX Target esp32c6-devkitc:gpio
## TODO: Install Docker Engine
## https://docs.docker.com/engine/install/ubuntu/
## TODO: For WSL, we may need to install Docker on Native Windows
## https://github.com/apache/nuttx/issues/14601#issuecomment-2453595402
$ sudo apt install neofetch glab gh
$ git clone https://github.com/lupyuen/nuttx-build-farm
$ cd nuttx-build-farm
## github-token.sh contains a GitHub Token with Gist Permission:
## export GITHUB_TOKEN=...
$ sudo sh -c '
. ../github-token.sh &&
./rewind-build.sh
esp32c6-devkitc:gpio
'
Build Failed for This Commit:
nuttx @ 400239877d55b3f63f72c96ca27d44220ae35a89
[Build OK for Previous Commit:
nuttx @ 19e42a8978179d23a49c9090c9a713206e6575d0]
Build Failed for Next Commit:
nuttx @ 140b3080c5f6921e0f9cec0a56ebdb72ca51d1d8
## A-ha! 40023987 is the Breaking Commit!
We fly our DeLorean back to 2 Dec 2024. And inspect the NuttX Commits that might have broken our build…
## Show the NuttX Commits on 2 Dec 2024
git clone https://github.com/apache/nuttx
cd nuttx
git reset --hard cc96289e2d88a9cdd5a9bedf0be2d72bf5b0e509
git log
2 Dec | Commit | Title |
---|---|---|
12:05 | cc96289e | xtensa: syscall SYS_switch_context and SYS_restore_context use 0 para |
11:59 | dc8bde8d | cmake(enhance): Enhance romfs so that RAWS files can be added in any location |
11:49 | 208f31c2 | boards/qemu64: Due to dependency changes, the test program of kasantest is deleted |
11:47 | 9fbb81e8 | samv7: fix bytes to words calculation in user signature read |
11:14 | 140b3080 | drivers/audio/wm8994.c: Include nuttx/arch.h to fix compilation (up_mdelay prototype) |
09:41 | 40023987 | risc-v: remove g_running_tasks[this_cpu()] = NULL |
09:23 | 19e42a89 | arch/tricore: migrate to SPDX identifier |
(Many more commits!) |
One of these is the Breaking Commit. Which one?
This is the Manual Way to find the Breaking Commit (pic above)…
## Build the Latest Commit: "xtensa syscall"
make distclean
git reset --hard cc96289e
tools/configure.sh esp32c6-devkitc:gpio
make
## If Build Fails: Try the Previous Commit "Enhance romfs"
make distclean
git reset --hard dc8bde8d
tools/configure.sh esp32c6-devkitc:gpio
make
## If Build Fails: Try the Previous Commit "Test program of kasantest"
make distclean
git reset --hard 208f31c2
tools/configure.sh esp32c6-devkitc:gpio
make
## Repeat until the Build Succeeds
## Record everything we've done as evidence
But for Nuttx Maintainers: Compiling NuttX Locally might not always work!
We might miss out some toolchains and fail the build: Arm, RISC-V, Xtensa, x86_64, …
Thus we run Docker to Compile NuttX. Which has All Toolchains bundled inside (pic above)…
## Build the Latest Commit: "xtensa syscall"
## With the NuttX Docker Image
sudo docker run -it \
ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
/bin/bash
cd
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
cd nuttx
git reset --hard cc96289e
tools/configure.sh esp32c6-devkitc:gpio
make -j
exit
## If Build Fails: Try the Previous Commit "Enhance romfs"
sudo docker run ...
git reset --hard dc8bde8d ...
tools/configure.sh esp32c6-devkitc:gpio
make -j ...
## Repeat until the Build Succeeds
## Record everything we've done as evidence
(More about NuttX Docker Build)
Yep this gets tedious, we repeat all this 20 times (or more) to catch the Breaking Commit!
That’s why we run a script to “Rewind the Build”, Step Back in Time 20 times (says Kylie), to discover the Breaking Commit…
## Rewind The Build for NuttX Target esp32c6-devkitc:gpio
## TODO: Install Docker Engine on Ubuntu x64
## https://docs.docker.com/engine/install/ubuntu/
$ sudo apt install neofetch glab gh
$ git clone https://github.com/lupyuen/nuttx-build-farm
$ cd nuttx-build-farm
## github-token.sh contains a GitHub Token with Gist Permission:
## export GITHUB_TOKEN=...
$ sudo sh -c '
. ../github-token.sh &&
./rewind-build.sh
esp32c6-devkitc:gpio
'
Build Failed for This Commit:
nuttx @ 400239877d55b3f63f72c96ca27d44220ae35a89
[Build OK for Previous Commit:
nuttx @ 19e42a8978179d23a49c9090c9a713206e6575d0]
Build Failed for Next Commit:
nuttx @ 140b3080c5f6921e0f9cec0a56ebdb72ca51d1d8
## A-ha! 40023987 is the Breaking Commit!
The Rewind Build Log looks kinda messy. We have a better way to record the rewinding, and reveal the Breaking Commit…
Head over to NuttX Dashboard and click “NuttX Build History”. (At the top)
Set the Board and Config to esp32c6-devkitc and gpio…
In reverse chronological order, NuttX Build History says that…
NuttX Build is currently failing (reported by NuttX Build Farm)
Commit 40023987 Onwards: All Builds Failed
Before Commit 40023987: NuttX Builds were Successful
Which means: Commit 40023987 is our Breaking Commit!
See the “sudo docker” entries above? They were helpfully inserted by our Rewind Build Script
Much neater than the Rewind Build Log!
After fixing the Breaking Commit, NuttX Build History shows that everything is hunky dory again (top row)
How did our Rewind Build Script update the Build History?
Our Rewind Build Script exports the Build Logs to GitLab Snippets. (Or GitHub Gists, pic below)
The Build Logs are then ingested into our NuttX Build History by a Scheduled Task. So when you run the Rewind Build Script, please tell me your GitLab or GitHub User ID.
What’s inside the Rewind Build Script?
We fetch the Latest 20 Commits from NuttX Repo and Build Each Commit, latest one first: rewind-build.sh
## First Parameter is Target, like "ox64:nsh"
## Checkout the NuttX Repo and NuttX Apps
target=$1
tmp_dir=/tmp/rewind-build/$target
rm -rf $tmp_dir && mkdir -p $tmp_dir && cd $tmp_dir
git clone https://github.com/apache/nuttx-apps apps
git clone https://github.com/apache/nuttx
cd nuttx
## Fetch the Latest 20 Commits
## In Reverse Chronological Order
for commit in $(
TZ=UTC0 \
git log \
-21 \
--date='format-local:%Y-%m-%dT%H:%M:%S' \
--format="%cd,%H"
); do
## Commit looks like 2024-11-24T09:52:42,9f9cc7ecebd97c1a6b511a1863b1528295f68cd7
prev_timestamp=$(echo $commit | cut -d ',' -f 1) ## 2024-11-24T09:52:42
prev_hash=$(echo $commit | cut -d ',' -f 2) ## 9f9cc7ecebd97c1a6b511a1863b1528295f68cd7
## For First Commit: Shift the Commits, don't build yet
if [[ "$next_hash" == "" ]]; then
next_hash=$prev_hash
fi;
if [[ "$nuttx_hash" == "" ]]; then
nuttx_hash=$prev_hash
fi;
if [[ "$timestamp" == "" ]]; then
timestamp=$prev_timestamp
continue
fi;
## Compile NuttX for this Commit
build_commit \
$tmp_dir/$nuttx_hash.log \
$timestamp $apps_hash \
$nuttx_hash $prev_hash $next_hash
## Shift the Commits
next_hash=$nuttx_hash
nuttx_hash=$prev_hash
timestamp=$prev_timestamp
done
build_commit will compile a NuttX Commit (pic below) and upload the Build Log: rewind-build.sh
## Build the NuttX Commit for the Target
function build_commit {
...
## Run the Build Job and find errors / warnings
run_job \
$log $timestamp $apps_hash \
$nuttx_hash $prev_hash $next_hash
clean_log $log
find_messages $log
## Upload the log
upload_log \
$log "unknown" \
$nuttx_hash $apps_hash $timestamp
}
## Run the Build Job for the NuttX Commit and Target.
## Record the Build Log into a file.
function run_job {
...
pushd /tmp
script $log_file \
$script_option \
" \
$script_dir/rewind-commit.sh \
$target $nuttx_hash $apps_hash \
$timestamp $prev_hash $next_hash \
"
popd
}
Which will call rewind_commit.sh to compile One Single Commit…
(clean_log removes Control Chars)
(find_messages searches for Errors)
Earlier we saw our Rewind Build Script compiling the Latest 20 Commits. (Pic above)
This is how we compile One Single Commit for NuttX: rewind-commit.sh
target=$1 ## NuttX Target, like "ox64:nsh"
nuttx_hash=$2 ## Commit Hash of NuttX Repo, like "7f84a64109f94787d92c2f44465e43fde6f3d28f"
apps_hash=$3 ## Commit Hash of NuttX Apps Repo, like "d6edbd0cec72cb44ceb9d0f5b932cbd7a2b96288"
timestamp=$4 ## Timestamp of the NuttX Commit, like "2024-11-24T00:00:00"
prev_hash=$5 ## Previous Commit Hash of NuttX Repo, like "7f84a64109f94787d92c2f44465e43fde6f3d28f"
next_hash=$6 ## Next Commit Hash of NuttX Repo, like "7f84a64109f94787d92c2f44465e43fde6f3d28f"
## Download the Docker Image
sudo docker pull \
ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest
## Build the Target for This Commit
build_nuttx $nuttx_hash $apps_hash
## If it fails: Rebuild with Previous Commit and Next Commit
if [[ "$res" != "0" ]]; then
build_nuttx $prev_hash $apps_hash
build_nuttx $next_hash $apps_hash
fi
Which calls build_nuttx to compile the commit with the NuttX Docker Image: rewind-commit.sh
## Build NuttX in Docker Container
## If CI Test Hangs: Kill it after 1 hour
## We follow the CI Log Format, so that ingest-nuttx-builds will
## ingest our log into NuttX Dashboard and appear in NuttX Build History
## https://github.com/lupyuen/ingest-nuttx-builds/blob/main/src/main.rs
function build_nuttx {
...
sudo docker run -it \
ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
/bin/bash -c "
set -e ;
set -x ;
cd ;
git clone https://github.com/apache/nuttx ;
git clone https://github.com/apache/nuttx-apps apps ;
pushd nuttx ; git reset --hard $nuttx_commit ; popd ;
pushd apps ; git reset --hard $apps_commit ; popd ;
cd nuttx ;
( sleep 3600 ; echo Killing pytest after timeout... ; pkill -f pytest )&
(
(./tools/configure.sh $target && make -j) || (res=\$? ; echo '***** BUILD FAILED' ; exit \$res)
)
"
res=$?
}
Finally we see the whole picture, closing the loop with NuttX Repo, NuttX Build Farm, NuttX Dashboard and NuttX Build History!
Phew that was quick for finding the Breaking Commit?
Yeah our Rewind Build Script took only one hour to rewind 20 commits and isolate the Breaking Commit! Though fixing it took longer…
QEMU RISC-V crashed with an Instruction Page Fault
Which we tracked down by Rewinding the Past 50 Commits
And auto-testing Each Commit in QEMU RISC-V
Yep it’s the same idea as Rewinding a Build! Just that we’re (slowly) locating a Runtime Fault instead of a (quicker) Compile Error
Happy Holidays! Will we have more stories about NuttX CI?
Next Article: We study the internals of a Mystifying Bug that concerns PyTest, QEMU RISC-V and expect
…
Then we’ll chat about an Experimental Mastodon Server for NuttX Continuous Integration.
Many Thanks to the awesome NuttX Admins and NuttX Devs! And my GitHub Sponsors, for sticking with me all these years.
Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…