Our multiarch build time out, whereas they should fail immediately on any error in the unit tests.
The issue is correlated with the virtualization solution: x86 uses gce vms and the others use lxd.
As travis has planned to switch x86 to lxd as well, it would be nice to fix this regression before that happens.
Here is a test matrix, where builds should either fail or pass within one minute. However, it can be seen that failing lxd builds last 10 minutes (the default timeout).
I’ve been stuck on this problem for months: basically right after the arm/s390x/ppc64le support was announced, I made https://github.com/gap-system/gap/pull/3744 and for all three architectures, the tests end in a timeout. I’ve changed our executable to abort and inserted a bunch of echo FOOBAR NNN statements in the script orchestrating the CI run to narrow it down, and it’s always the same: our exec exits with a non-zero code (or crashes via abort), and then… nothing; the shell script that launched it does not continue to execute, the VM just hangs until the timeout kills it.
Sadly that means we can’t use these; I was looking in particular forward to having s390x available so that we finally could do CI tests on a big endian machine. Ah well
Don’t set errexit in the main shell. This shell also runs the build’s internal logic and the sudden exit terminates it abnormally.
If a failing command should terminate the build, put it into a section other than script: – i.e. install. The script: section is designed to run tests – i.e. stuff that you want to run to completion even if something fails.