AMD builds either finish in 12 minutes or hang / time out, randomly; only happens if more than 6 jobs in build

Zyres · June 16, 2020, 8:33am

I raised the jobs from 6 to 10, since then, all builds time out.
I’m not really sure if this issue has something to do with the number of builds.
https://travis-ci.org/github/AscEmu/AscEmu/builds/698704600

Zyres · June 18, 2020, 9:01pm

Worked fine here: https://travis-ci.org/github/AscEmu/AscEmu/jobs/696809463
When I revert my commit and go back to 6 builds it is the same Problem. Does something changed between these two build in your environment?

Zyres · June 24, 2020, 12:53pm

No one around here to have a look into it?

Zyres · June 24, 2020, 4:52pm

Switching to g+±9 doesn’t solve it either. the build just dies ~22% no error, nothing. I don’t think this problem has something to do with ubuntu or the packages. Is the container running out of memory?

Still waiting for somone who is able to look into it, idk what travis changed!

Overall it is annoying, that things magically break or stop working… why do I even try to get some kind of support -.-

Zyres · June 24, 2020, 7:28pm

Update: this build shows how inconsistent builds are right now: https://travis-ci.org/github/AscEmu/AscEmu/builds/701721202
Sure, two builds fail, two finish, three run into timeout and three could finish or timeout.

Over all it is horrible… 1h 23 minutes… the last normal builds finish after ~ 16 minutes like here: https://travis-ci.org/github/AscEmu/AscEmu/builds/696809462

Zyres · June 25, 2020, 9:38pm

Still timeout in https://travis-ci.org/github/AscEmu/AscEmu/builds/701804734

Zyres · June 25, 2020, 10:17pm

Okay…
So it is still the same:
https://travis-ci.org/github/AscEmu/AscEmu/builds/696809462 (worked fine)
https://travis-ci.org/github/AscEmu/AscEmu/builds/698704600 (broke somehow)

I can’t see anything wrong with this commit… so waiting for help of the travis CI team.

native-api · June 26, 2020, 1:16am

Confirmed: https://travis-ci.com/github/native-api/AscEmu/builds/173086397 (passed fast, 6 jobs) vs https://travis-ci.com/github/native-api/AscEmu/jobs/353770116 (timed out, 10 jobs).

It does seem that they give shorter builds faster VMs, probably on an assumption that authors of larger builds are more willing to wait and/or hand-tune their build scripts.

native-api · June 26, 2020, 1:17am

I can suggest using -j $(ncpus) instead of a hard-coded number.

Using CCache should also really speed up repeated builds. Since you are using CMake and a custom compiler, how to hook it into the build process is going to be project-specific.

Zyres · June 26, 2020, 7:27am

@native-api I tried to go back to 6 builds, but it was still slow. I can try ccache but I don’t think it is possible to reach builds ~12 minutes on the slower VMs with it.

native-api · June 26, 2020, 9:54am

I know from experience that for exactly the same source, CCache speeds up the build about 10x. My idea is that with it, you only need to make a job complete once (e.g. you can restart it until it succeeds) – then it will be faster in the following builds and VM’s slowness won’t be affecting you as much.

Zyres · June 26, 2020, 10:08am

Means at least the first build has to be finished, impossible since all builds timeout before they have a chance to be finished on the slow VMs.

native-api · June 26, 2020, 10:12am

Not the entire build but each single job separately. Jobs use and save caches independently. and you can restart individual jobs, too.

Zyres · June 26, 2020, 10:27am

Never tried ccache and I would love to do other things with my code instead of messing around with travis… Another x days of reading and “try and error” with travis.yml gives me headache. Right now I bet there will be a problem with ccache in a few months because some “weird” logic in travis kicks in by changing some other config.

The best solution for me right now is to get back to 6 builds and enjoy a reasonable build time, but that does not work either, because some other rule in the background that says “if you are on the slow VM there is no going back”…

There should be a warning “Do not raise the jobs over 6, otherwise you will be on the slow VMs forever, although there is an option for max paralell jobs”.

Thank you for your help and time.

native-api · June 29, 2020, 5:11pm

I don’t think you will be on it “forever”. Most probably, they use some heuristics to allocate resources based on availability and your past use.

noloader · July 4, 2020, 4:39am

I can suggest using -j $(ncpus) instead of a hard-coded number.

@native-api - do you have a reference to using ncpus in the docs? I cannot find a mention of it during search (but Google search is not what it used to be). “ncpus” site:docs.travis-ci.com.

Also, as I understand things, Travis provides free software projects a machine with 2 cores. See the Travis docs and discussion of “Cores” at Build Environment Overview. The most we are supposed to use is -j 3 (max cpus + 1). I thought I read about max cpus + 1 in the GNU Make manual, but I cannot find it at the moment.

native-api · July 4, 2020, 5:48am

Topic		Replies	Views
No cache support on arm64? Multi CPU Architecture	29	3384	July 3, 2020
Builds timing out in last several days Travis CI Discussions & Feedback	5	908	October 22, 2020
Compiling the project takes too long for a single job, Travis is timing out on me Travis CI Discussions & Feedback	2	717	September 24, 2020
Multiple job run in single job script Travis CI Discussions & Feedback	2	1121	September 30, 2019
ARM64/AARCH build is not detected as finished and then fails Multi CPU Architecture build-env , travis-build	3	778	February 25, 2020

AMD builds either finish in 12 minutes or hang / time out, randomly; only happens if more than 6 jobs in build

Related topics