E.g. in https://travis-ci.org/native-api/opencv-python/builds/568519173, jobs 69 and 70 were running when the build was cancelled. After that, jobs 71 and 73 started for a few minutes.
What exactly is the problem? The cancel request was received at 6:52:04 UTC, at which point some jobs in the “Final” stage were already running. Unless a job in the “Final” stage started before a job in the “S1” stage was finished, I don’t see a problem here.
Jobs are placed in parallel queues, and the workers pick them up in parallel. There is no guarantee within a single stage that job with smaller number starts before one with a larger number.
If you want the guarantee, jobs will have to be in different stages.
The problem is that when I cancel the build, the jobs currently running terminate – as they should, but at the same time a few pending jobs following them are started – while they shouldn’t.
Some may have already been queued and not cancelable.
Whatever is the case, the end result is visibly incorrect behavior and the fact that build status is not changed to cancelled for a few minutes (which may delay the following queued builds and such).
Another consideration is that your build has 88 jobs. I think the current logic is to fire cancel requests sequentially and it can take some time for the jobs to be actually canceled.
The cause as I see it from behavior is the machinery that manages the jobs is not instructed not to start new jobs for this build before the UI begins cancelling them.
The simplest solution would be to cancel jobs in backwards order or to cancel pending jobs first.
The robust one would be to somehow dequeue the build or pending jobs atomically.