"Rewinding a Build" for Apache NuttX RTOS (Docker)

📝 15 Dec 2024

“Rewinding a Build” for Apache NuttX RTOS (Docker)

2 Dec 2024: Christmas ain’t here yet, but our Dashboard for Apache NuttX RTOS is already Decked in Red

Dashboard for Apache NuttX RTOS is already Decked in Red

Which says that NuttX Build is failing for ESP32-C6, as reported by NuttX Build Farm. (More about CI Test next article)

“riscv_exit.c: error: ‘tcb’ undeclared:
g_running_tasks[this_cpu()] = tcb”

Normally our NuttX Maintainers will scramble to identify the Breaking Commit. (Before it gets piled on by More Breaking Commits)

Not any more! Now we can go back in time and “Rewind The Build”, when something breaks the Daily Build…

## Rewind The Build for
## NuttX Target esp32c6-devkitc:gpio
$ sudo sh -c '
    . ../github-token.sh &&
    ./rewind-build.sh
      esp32c6-devkitc:gpio
  '
Build Failed for This Commit:
  nuttx @ 400239877d55b3f63f72c96ca27d44220ae35a89

[Build OK for Previous Commit:
  nuttx @ 19e42a8978179d23a49c9090c9a713206e6575d0]

Build Failed for Next Commit:
  nuttx @ 140b3080c5f6921e0f9cec0a56ebdb72ca51d1d8

## A-ha! 40023987 is the Breaking Commit!

In this article, we look inside our new tool to Rewind The NuttX Build

NuttX Build History

§1 Rewind The Build

How does it work?

## Rewind The Build for NuttX Target esp32c6-devkitc:gpio
## TODO: Install Docker Engine
## https://docs.docker.com/engine/install/ubuntu/

## TODO: For WSL, we may need to install Docker on Native Windows
## https://github.com/apache/nuttx/issues/14601#issuecomment-2453595402

$ sudo apt install neofetch glab gh
$ git clone https://github.com/lupyuen/nuttx-build-farm
$ cd nuttx-build-farm

## github-token.sh contains a GitHub Token with Gist Permission:
## export GITHUB_TOKEN=...
$ sudo sh -c '
    . ../github-token.sh &&
    ./rewind-build.sh
      esp32c6-devkitc:gpio
  '
Build Failed for This Commit:
  nuttx @ 400239877d55b3f63f72c96ca27d44220ae35a89

[Build OK for Previous Commit:
  nuttx @ 19e42a8978179d23a49c9090c9a713206e6575d0]

Build Failed for Next Commit:
  nuttx @ 140b3080c5f6921e0f9cec0a56ebdb72ca51d1d8

## A-ha! 40023987 is the Breaking Commit!

(Works also for GitLab Snippets)

(See the Complete Log)

We fly our DeLorean back to 2 Dec 2024. And inspect the NuttX Commits that might have broken our build…

## Show the NuttX Commits on 2 Dec 2024
git clone https://github.com/apache/nuttx
cd nuttx
git reset --hard cc96289e2d88a9cdd5a9bedf0be2d72bf5b0e509
git log
2 DecCommitTitle
12:05cc96289extensa: syscall SYS_switch_context and SYS_restore_context use 0 para
11:59dc8bde8dcmake(enhance): Enhance romfs so that RAWS files can be added in any location
11:49208f31c2boards/qemu64: Due to dependency changes, the test program of kasantest is deleted
11:479fbb81e8samv7: fix bytes to words calculation in user signature read
11:14140b3080drivers/audio/wm8994.c: Include nuttx/arch.h to fix compilation (up_mdelay prototype)
09:4140023987risc-v: remove g_running_tasks[this_cpu()] = NULL
09:2319e42a89arch/tricore: migrate to SPDX identifier
(Many more commits!)

One of these is the Breaking Commit. Which one?

Rewinding a Build the Manual Way

§2 The Manual Way

This is the Manual Way to find the Breaking Commit (pic above)…

## Build the Latest Commit: "xtensa syscall"
make distclean
git reset --hard cc96289e
tools/configure.sh esp32c6-devkitc:gpio
make

## If Build Fails: Try the Previous Commit "Enhance romfs"
make distclean
git reset --hard dc8bde8d
tools/configure.sh esp32c6-devkitc:gpio
make

## If Build Fails: Try the Previous Commit "Test program of kasantest"
make distclean
git reset --hard 208f31c2
tools/configure.sh esp32c6-devkitc:gpio
make

## Repeat until the Build Succeeds
## Record everything we've done as evidence

But for Nuttx Maintainers: Compiling NuttX Locally might not always work!

We might miss out some toolchains and fail the build: Arm, RISC-V, Xtensa, x86_64, …

Rewinding a Build with Docker

§3 The Docker Way

Thus we run Docker to Compile NuttX. Which has All Toolchains bundled inside (pic above)…

## Build the Latest Commit: "xtensa syscall"
## With the NuttX Docker Image
sudo docker run -it \
  ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
  /bin/bash
cd
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
cd nuttx
git reset --hard cc96289e
tools/configure.sh esp32c6-devkitc:gpio
make -j
exit

## If Build Fails: Try the Previous Commit "Enhance romfs"
sudo docker run ...
git reset --hard dc8bde8d ...
tools/configure.sh esp32c6-devkitc:gpio
make -j ...

## Repeat until the Build Succeeds
## Record everything we've done as evidence

(More about NuttX Docker Build)

Yep this gets tedious, we repeat all this 20 times (or more) to catch the Breaking Commit!

That’s why we run a script to “Rewind the Build”, Step Back in Time 20 times (says Kylie), to discover the Breaking Commit…

## Rewind The Build for NuttX Target esp32c6-devkitc:gpio
## TODO: Install Docker Engine on Ubuntu x64
## https://docs.docker.com/engine/install/ubuntu/
$ sudo apt install neofetch glab gh
$ git clone https://github.com/lupyuen/nuttx-build-farm
$ cd nuttx-build-farm

## github-token.sh contains a GitHub Token with Gist Permission:
## export GITHUB_TOKEN=...
$ sudo sh -c '
    . ../github-token.sh &&
    ./rewind-build.sh
      esp32c6-devkitc:gpio
  '
Build Failed for This Commit:
  nuttx @ 400239877d55b3f63f72c96ca27d44220ae35a89

[Build OK for Previous Commit:
  nuttx @ 19e42a8978179d23a49c9090c9a713206e6575d0]

Build Failed for Next Commit:
  nuttx @ 140b3080c5f6921e0f9cec0a56ebdb72ca51d1d8

## A-ha! 40023987 is the Breaking Commit!

(Works also for GitLab Snippets)

(See the Complete Log)

The Rewind Build Log looks kinda messy. We have a better way to record the rewinding, and reveal the Breaking Commit…

§4 NuttX Build History

Head over to NuttX Dashboard and click “NuttX Build History”. (At the top)

Set the Board and Config to esp32c6-devkitc and gpio

NuttX Build History before fixing

In reverse chronological order, NuttX Build History says that…

After fixing the Breaking Commit, NuttX Build History shows that everything is hunky dory again (top row)

NuttX Build History after fixing

How did our Rewind Build Script update the Build History?

Our Rewind Build Script exports the Build Logs to GitLab Snippets. (Or GitHub Gists, pic below)

The Build Logs are then ingested into our NuttX Build History by a Scheduled Task. So when you run the Rewind Build Script, please tell me your GitLab or GitHub User ID.

Our Rewind Build Script exports the Build Logs to GitLab Snippets or GitHub Gists

§5 Rewind Build Script

What’s inside the Rewind Build Script?

We fetch the Latest 20 Commits from NuttX Repo and Build Each Commit, latest one first: rewind-build.sh

## First Parameter is Target, like "ox64:nsh"
## Checkout the NuttX Repo and NuttX Apps
target=$1
tmp_dir=/tmp/rewind-build/$target
rm -rf $tmp_dir && mkdir -p $tmp_dir && cd $tmp_dir
git clone https://github.com/apache/nuttx-apps apps
git clone https://github.com/apache/nuttx
cd nuttx

## Fetch the Latest 20 Commits
## In Reverse Chronological Order
for commit in $(
  TZ=UTC0 \
  git log \
  -21 \
  --date='format-local:%Y-%m-%dT%H:%M:%S' \
  --format="%cd,%H"
); do
  ## Commit looks like 2024-11-24T09:52:42,9f9cc7ecebd97c1a6b511a1863b1528295f68cd7
  prev_timestamp=$(echo $commit | cut -d ',' -f 1)  ## 2024-11-24T09:52:42
  prev_hash=$(echo $commit | cut -d ',' -f 2)  ## 9f9cc7ecebd97c1a6b511a1863b1528295f68cd7

  ## For First Commit: Shift the Commits, don't build yet
  if [[ "$next_hash" == "" ]]; then
    next_hash=$prev_hash
  fi;
  if [[ "$nuttx_hash" == "" ]]; then
    nuttx_hash=$prev_hash
  fi;
  if [[ "$timestamp" == "" ]]; then
    timestamp=$prev_timestamp
    continue
  fi;

  ## Compile NuttX for this Commit
  build_commit \
    $tmp_dir/$nuttx_hash.log \
    $timestamp $apps_hash \
    $nuttx_hash $prev_hash $next_hash

  ## Shift the Commits
  next_hash=$nuttx_hash
  nuttx_hash=$prev_hash
  timestamp=$prev_timestamp
done

build_commit will compile a NuttX Commit (pic below) and upload the Build Log: rewind-build.sh

## Build the NuttX Commit for the Target
function build_commit {
  ...
  ## Run the Build Job and find errors / warnings
  run_job \
    $log $timestamp $apps_hash \
    $nuttx_hash $prev_hash $next_hash
  clean_log $log
  find_messages $log

  ## Upload the log
  upload_log \
    $log "unknown" \
    $nuttx_hash $apps_hash $timestamp
}

## Run the Build Job for the NuttX Commit and Target.
## Record the Build Log into a file.
function run_job {
  ...
  pushd /tmp
  script $log_file \
    $script_option \
    " \
      $script_dir/rewind-commit.sh \
        $target $nuttx_hash $apps_hash \
        $timestamp $prev_hash $next_hash \
    "
  popd
}

Which will call rewind_commit.sh to compile One Single Commit…

(clean_log removes Control Chars)

(find_messages searches for Errors)

(upload_log uploads to GitLab Snippet or GitHub Gist)

(Simon Filgis suggests that we could use “git bisect” to replace the loop that walks through all commits, making it even faster!)

Rewind Build Script

§6 Rewind One Commit

Earlier we saw our Rewind Build Script compiling the Latest 20 Commits. (Pic above)

This is how we compile One Single Commit for NuttX: rewind-commit.sh

target=$1      ## NuttX Target, like "ox64:nsh"
nuttx_hash=$2  ## Commit Hash of NuttX Repo, like "7f84a64109f94787d92c2f44465e43fde6f3d28f"
apps_hash=$3   ## Commit Hash of NuttX Apps Repo, like "d6edbd0cec72cb44ceb9d0f5b932cbd7a2b96288"
timestamp=$4   ## Timestamp of the NuttX Commit, like "2024-11-24T00:00:00"
prev_hash=$5   ## Previous Commit Hash of NuttX Repo, like "7f84a64109f94787d92c2f44465e43fde6f3d28f"
next_hash=$6   ## Next Commit Hash of NuttX Repo, like "7f84a64109f94787d92c2f44465e43fde6f3d28f"

## Download the Docker Image
sudo docker pull \
  ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest

## Build the Target for This Commit
build_nuttx $nuttx_hash $apps_hash

## If it fails: Rebuild with Previous Commit and Next Commit
if [[ "$res" != "0" ]]; then
  build_nuttx $prev_hash $apps_hash
  build_nuttx $next_hash $apps_hash
fi

Which calls build_nuttx to compile the commit with the NuttX Docker Image: rewind-commit.sh

## Build NuttX in Docker Container
## If CI Test Hangs: Kill it after 1 hour
## We follow the CI Log Format, so that ingest-nuttx-builds will
## ingest our log into NuttX Dashboard and appear in NuttX Build History
## https://github.com/lupyuen/ingest-nuttx-builds/blob/main/src/main.rs
function build_nuttx {
  ...
  sudo docker run -it \
    ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
    /bin/bash -c "
    set -e ;
    set -x ;
    cd ;
    git clone https://github.com/apache/nuttx ;
    git clone https://github.com/apache/nuttx-apps apps ;
    pushd nuttx ; git reset --hard $nuttx_commit ; popd ;
    pushd apps  ; git reset --hard $apps_commit  ; popd ;
    cd nuttx ;
    ( sleep 3600 ; echo Killing pytest after timeout... ; pkill -f pytest )&
    (
      (./tools/configure.sh $target && make -j) || (res=\$? ; echo '***** BUILD FAILED' ; exit \$res)
    )
  "
  res=$?
}

Finally we see the whole picture, closing the loop with NuttX Repo, NuttX Build Farm, NuttX Dashboard and NuttX Build History!

(How we Ingest Build Logs)

“Rewinding a Build” for Apache NuttX RTOS (Docker)

§7 What’s Next

Phew that was quick for finding the Breaking Commit?

Yeah our Rewind Build Script took only one hour to rewind 20 commits and isolate the Breaking Commit! Though fixing it took longer…

Happy Holidays! Will we have more stories about NuttX CI?

Next Article: We study the internals of a Mystifying Bug that concerns PyTest, QEMU RISC-V and expect

Then we’ll chat about an Experimental Mastodon Server for NuttX Continuous Integration.

Many Thanks to the awesome NuttX Admins and NuttX Devs! And my GitHub Sponsors, for sticking with me all these years.

Got a question, comment or suggestion? Create an Issue or submit a Pull Request here…

lupyuen.github.io/src/ci6.md