Work out kinks in interactions between stages, allow_fail, and fast_finish

jonhoo · November 27, 2018, 6:34pm

In the Rust library imap, I’m looking to reduce the time it takes to get results from CI. My plan was to use a combination of build stages, allow_fail, and fast_finish. The result looks like this: https://github.com/jonhoo/rust-imap/blob/562b77255c61cd1cf813b8034d1c3d3d1325f582/.travis.yml

Some noteworthy things:

fast_finish is enabled for all jobs.
Builds on Windows are allowed to fail (mostly because they take forever to run, and I don’t want to wait for them)
The last build stage, “coverage”, has only a single job, and it is allowed to fail.

I tried to use that .travis.yml file, and ended up with this build. Two things seem to go wrong:

In the “test” stage, when all jobs but the Windows one had completed, Travis did not move on to the next stage (“integration”). Instead, it was waiting for the Windows job to finish. I cancelled the Windows job, and then the next stage was immediately started. I then re-started the Windows job, and then Travis again waited for it to complete before it continued to the “lint” stage. I believe this is GH issue #9677.
The last stage with only one job (“coverage”) takes a while to run (~16m without cache), and is allowed to fail. However, the build was not marked as successful once the last of the jobs in the “lint” stage succeeded. Furthermore, when I cancelled the coverage job, the entire build was marked as cancelled, not successful as I would have expected given that the last remaining job was marked allow_fail. I’ve filed this as GH issue #10356.

mkurz · November 27, 2018, 7:49pm

+1 Same here.

jonhoo · December 5, 2018, 2:41pm

Now that #9677 has also been closed, this probably also becomes the tracking issue for that. Namely that Travis does not advance to next build stage when fast_finish: true.

mmcc007 · March 25, 2019, 11:26pm

+1 I’m getting the same thing:
https://travis-ci.org/mmcc007/flutter_architecture_samples/builds/510839158
Does this also happen with matrices?

(In my case, since no longer depend on stages, I may be able to switch from stages to a matrix.)

nurupo · May 4, 2019, 6:41am

In the “test” stage, when all jobs but the Windows one had completed, Travis did not move on to the next stage (“integration”). Instead, it was waiting for the Windows job to finish. I cancelled the Windows job, and then the next stage was immediately started. I then re-started the Windows job, and then Travis again waited for it to complete before it continued to the “lint” stage.

By design, only one stage is allowed to run at a time. That what stages are. They run in sequence, one at a time. Even if the only remaining job in a stage has allow_failure set, the next stage will not start until that job is finished. That’s the correct and expected behavior.

mvz · October 18, 2019, 9:36am

Maybe this should be split since there are two or maybe even three issues:

Stages wait even if only allow_failures jobs are running
Build is not marked successful even if only allow_failures jobs are left
Canceling an allow_failures job makes the entire build fail

(I came here because I also have the third issue)

Topic		Replies	Views
"fast_finish" doesnt seem to finish when required builds are done Travis CI Discussions & Feedback	4	1153	August 20, 2019
Fast_finish doesn't work at least when 1 job in the stage completes Travis CI Discussions & Feedback bug	0	546	May 14, 2019
Cancel all jobs immediately once one fails Feature Requests	1	1473	July 31, 2019
A later build stage ran even though an earlier stage failed Travis CI Discussions & Feedback build-stages	0	902	May 25, 2019
Fast_finish setup to exit when a script fails Travis CI Discussions & Feedback travis-build	1	733	September 1, 2022

Work out kinks in interactions between stages, allow_fail, and fast_finish

Related topics