AMD builds either finish in 12 minutes or hang / time out, randomly; only happens if more than 6 jobs in build

I raised the jobs from 6 to 10, since then, all builds time out.
I’m not really sure if this issue has something to do with the number of builds.

Worked fine here:
When I revert my commit and go back to 6 builds it is the same Problem. Does something changed between these two build in your environment?

No one around here to have a look into it?

Switching to g+±9 doesn’t solve it either. the build just dies ~22% no error, nothing. I don’t think this problem has something to do with ubuntu or the packages. Is the container running out of memory?

Still waiting for somone who is able to look into it, idk what travis changed!

Overall it is annoying, that things magically break or stop working… why do I even try to get some kind of support -.-

Update: this build shows how inconsistent builds are right now:
Sure, two builds fail, two finish, three run into timeout and three could finish or timeout.

Over all it is horrible… 1h 23 minutes… the last normal builds finish after ~ 16 minutes like here:

Still timeout in

So it is still the same: (worked fine) (broke somehow)

I can’t see anything wrong with this commit… so waiting for help of the travis CI team.

Confirmed: (passed fast, 6 jobs) vs (timed out, 10 jobs).

It does seem that they give shorter builds faster VMs, probably on an assumption that authors of larger builds are more willing to wait and/or hand-tune their build scripts.

1 Like

I can suggest using -j $(ncpus) instead of a hard-coded number.

Using CCache should also really speed up repeated builds. Since you are using CMake and a custom compiler, how to hook it into the build process is going to be project-specific.

@native-api I tried to go back to 6 builds, but it was still slow. I can try ccache but I don’t think it is possible to reach builds ~12 minutes on the slower VMs with it.

I know from experience that for exactly the same source, CCache speeds up the build about 10x. My idea is that with it, you only need to make a job complete once (e.g. you can restart it until it succeeds) – then it will be faster in the following builds and VM’s slowness won’t be affecting you as much.

Means at least the first build has to be finished, impossible since all builds timeout before they have a chance to be finished on the slow VMs.

Not the entire build but each single job separately. Jobs use and save caches independently. and you can restart individual jobs, too.

Never tried ccache and I would love to do other things with my code instead of messing around with travis… Another x days of reading and “try and error” with travis.yml gives me headache. Right now I bet there will be a problem with ccache in a few months because some “weird” logic in travis kicks in by changing some other config.

The best solution for me right now is to get back to 6 builds and enjoy a reasonable build time, but that does not work either, because some other rule in the background that says “if you are on the slow VM there is no going back”…

There should be a warning “Do not raise the jobs over 6, otherwise you will be on the slow VMs forever, although there is an option for max paralell jobs”.

Thank you for your help and time.

I don’t think you will be on it “forever”. Most probably, they use some heuristics to allocate resources based on availability and your past use.

I can suggest using -j $(ncpus) instead of a hard-coded number.

@native-api - do you have a reference to using ncpus in the docs? I cannot find a mention of it during search (but Google search is not what it used to be). “ncpus”

Also, as I understand things, Travis provides free software projects a machine with 2 cores. See the Travis docs and discussion of “Cores” at Build Environment Overview. The most we are supposed to use is -j 3 (max cpus + 1). I thought I read about max cpus + 1 in the GNU Make manual, but I cannot find it at the moment.