I have a CI failure because I’m building a docker image inside of Travis, and within that build, there is a package being built that has a lot of plugins / libraries. That build takes more than 10 minutes and doesn’t output anything. See here: https://travis-ci.org/ros-planning/navigation2/builds/601413527
This is the error:
Starting >>> gazebo_plugins
2148
2149
2150 No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
2151 Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#build-times-out-because-no-output-was-received
2152
2153 The build has been terminated
I’ve tried looking for solutions, the only thing I found was adding a travis_wait to the beginning of the build. But now I don’t see the output of the build, just the travis_wait messages:
travis_time:start:210720f0
e[0K$ travis_wait 45 docker build --tag navigation2:latest --build-arg PULLREQ=$TRAVIS_PULL_REQUEST --build-arg CMAKE_BUILD_TYPE --build-arg COVERAGE_ENABLED ./
Still running (1 of 45): docker build --tag navigation2:latest --build-arg PULLREQ=1268 --build-arg CMAKE_BUILD_TYPE --build-arg COVERAGE_ENABLED ./
Still running (2 of 45): docker build --tag navigation2:latest --build-arg PULLREQ=1268 --build-arg CMAKE_BUILD_TYPE --build-arg COVERAGE_ENABLED ./
Still running (3 of 45): docker build --tag navigation2:latest --build-arg PULLREQ=1268 --build-arg CMAKE_BUILD_TYPE --build-arg COVERAGE_ENABLED ./
Still running (4 of 45): docker build --tag navigation2:latest --build-arg PULLREQ=1268 --build-arg CMAKE_BUILD_TYPE --build-arg COVERAGE_ENABLED ./
Still running (5 of 45): docker build --tag navigation2:latest --build-arg PULLREQ=1268 --build-arg CMAKE_BUILD_TYPE --build-arg COVERAGE_ENABLED ./
Is there any way to run a travis_wait but still see the output of the command in the console? Otherwise, is there any better solution for this?
Consider making that build report on its progress. Not outputting anything is awful UX because you can’t tell if anything’s happening or it’s stuck (which is bad because if it’s stuck, you are wasting your time waiting for it; and you also have no idea where exactly it’s stuck to be able to fix that). Travis’ auto termination is there specifically to kill stuck commands – for our mutual benefit, as per above.
There is no supported way. travis_wait saves output to a log file and prints it all at the end. You may try tail -F "travis_wait_${$}.log" & (make sure to save the job’s PID to kill it at the end) but this is not a supported way (thus may break in the future) and will get you a duplicate of the output at the end.
Actually, the build tool (colcon) does report on the progress every 30 seconds when I run it on my system, and even when I build the docker image locally. For some reason it seems to be buffered in the Travis run however, and that is why it times out. Is there a way to fix the buffering?
I wasn’t aware of that, and it may meet my needs. I tested this yesterday and I do see the output, so unless there’s a way of making sure the output is printed immediately (non-buffered), this may be the solution.
There is no supported way. travis_wait saves output to a log file and prints it all at the end. You may try tail -F "travis_wait_${$}.log" & (make sure to save the job’s PID to kill it at the end) but this is not a supported way (thus may break in the future) and will get you a duplicate of the output at the end.
I’ve just discovered this travis_wait from the documentation. If it can’t output the executed command while running, then it isn’t very useful, due to what you wrote about how bad UX is not yielding any output for long time. I can’t get why it cannot be changed to work in a useful way. To me, it’s just about not hijacking the stdout of the executed command and adding some keepalive message from time to time.
I can’t use it that way. I’m rather thinking of running a background process of my own, which yields a message from time to time, stopping completely after some time (or when the main process kills it).