Travis arm64: a job does not start with "An error occurred while generating the build script."

I hit the following error on Travis arm64 build when running another build in parallel.
The job did not start with the error message.

Here is the.travis.yml file.

https://github.com/junaruga/ruby/blob/wip/report-travis-error-arm64/.travis.yml
https://travis-ci.com/github/junaruga/ruby/jobs/506962115

An error occurred while generating the build script.

Here are other jobs for the error.
https://travis-ci.com/github/junaruga/ruby/jobs/506961096
https://travis-ci.com/github/junaruga/ruby/jobs/506961095

Thanks.

1 Like

Same error on pypa/cibuildwheel job
The error is random. Restarting the build works.

Yes, restarting the build works. But it’s not an ideal behavior as a CI.

1 Like

Yes, restarting the build works. But it’s not an ideal behavior as a CI.

You’re obviously right. What I was trying to say is that the configs are valid and that the error is random and purely on Travis CI side since restarting the build works (i.e. same config, same commit). It’s not an ideal behavior as a CI indeed !

I just had 5 out of 9 jobs fail this way:

https://travis-ci.com/github/stephengold/Libbulletjme/builds/227029453

Arm -based building on Arm64 CPU is only available for Open Source repositories (at both travis-ci.org and travis-ci.com). While available to all Open Source repositories, the concurrency available for multiple CPU arch-based jobs is limited during the beta period.

Perhaps, This sentence “the concurrency … is limited during the beta period” might be the reason of this issue.

It seems the arm64 parallel jobs are stable now? I have not seen this error recently.

The jobs should be stable, as I’ve done many builds without a hitch. If there are issues that do pop up, please post them in the community forum and I’ll be glad to fix them for you.

2 Likes

Thanks! I am happy that this issue for the concurrent jobs was fixed. How many concurrent jobs in a matrix work for arm64 (and also ppc64le/s390x)? Have you tested it? I would like to know the maximum jobs number to work for each arm64/ppc64le/s390x pipeline as a reference.

Thanks @junaruga,

Initial testing capacity starts at around 60 concurrent jobs, but this could have raised with the deployment of the newer Arm Neoverse N1 based Ampere Altra arm64 servers, deployed by Equinix - I will run some tests to get a definitive number for you.

For future reference, there’s tons of tests, examples and other things I’ve done with ppc64le/s390x on my GitHub, you may find some of it interesting and relevant to your interests, and don’t hesitate to ask me any questions and I’ll be glad to answer them @junaruga.

2 Likes

I recently started to experience problems with ARM64 builds as well. Most of the time they either fail with the same error “An error occured while generating the build script” or the network errors. Sometimes I have to restart the job for a few times in a row, just to make it pass. At the same time, PPC64 jobs work smoothly.

Travis CI - Test and Deploy with Confidence - that was the problem, for example

Here is our .travis.yml for the reference: rizin/.travis.yml at dev · rizinorg/rizin · GitHub

Yeah, I also experienced the Arm64 build issues recently in the GitHub ruby/ruby repository. In ruby/ruby, we used the Arm64 pipeline for the 2 jobs: arm64 and arm32 on arm64, but we had to reduce it to just one job arm32. As a note, In the ruby/ruby, we also started to use the Cirrus CI Arm pipeline too to test the Arm64 cases. On our test, it enables us to run the maximum 16 concurrent jobs. And so far it is stable.

Hey all,

This issue has been fixed from what I understand.

1 Like

Not really, still experiencing network problems on ARM64 builds. It often fails either on apt-get or git clone for build dependencies.

I experienced this issue 10 days ago again. As a reference, here are the logs.

Hey @junaruga,

Are you experiencing these problems now?

No. I don’t see the problems now.

Great to hear @junaruga!

Unfortunately I still see this problem. A freshly failed job: Travis CI - Test and Deploy with Confidence (one failed out of four jobs).

Perhaps this issue might happen when the Arm64 pipelines are more than 2? In the ruby/ruby repo, there is only 1 Arm64 pipeline now.

Imprint