PPC64LE jobs are stuck from time to time

XVilka · June 1, 2022, 12:18am

In the course of a few months we noticed that quite often, while ARMv8 and System Z jobs run from the queue normally, PPC64 have longer starting times, and is quite often stuck in the infinite loop “Job received → Queued”.

See, for example Travis CI - Test and Deploy with Confidence

Then, after a long period of time, it silently fails because of the timeout. It happened multiple times, the last time - yesterday.

Moreover, while some of such jobs can be restarted later when the problematic period ends, some always fail, no matter how often you restart. Like this one: Travis CI - Test and Deploy with Confidence

It shows me “Automatic restarts limited: Please try restarting this job later or contact support@travis-ci.com.” every time on such jobs.

Could you please address this problem? The last time I contacted support there was no answer from them at all, and we have a paid plan for two parallel jobs!

Notably, despite many reports from us when such incidents happened for a period of at least a day, or other reports on this forum, the Travis CI Status page is always green, as if nothing happened.

XVilka · June 13, 2022, 4:04pm

And it happened again. There seems to be a major flow in the infrastructure that should be addressed.

XVilka · June 14, 2022, 3:00pm

Aaand one more time. It’s getting really ridiculous, given the price we pay $129 for two concurrent jobs every month.

XVilka · July 8, 2022, 11:54pm

Happened again once more.

mustafa · July 14, 2022, 10:51am

Hey, this should be already fixed, sorry about the late notice. Thanks.

mayeut · July 16, 2022, 9:34am

@mustafa,
It’s happening again.
Mixed with the “queued / booting” issue, there’s also a new issue on git clone about gnutls_handshake.
Our 2 concurrent jobs plan is now effectively becoming a 0 concurrent jobs when there are 2 ppc64le jobs trying to run…

link for the gnutls_handshake error: Travis CI - Test and Deploy with Confidence

XVilka · July 17, 2022, 2:20am

I confirm. I noticed this pattern happening on weekends and being unaddressed until working days.
I think they mitigate it by doing something manually every time. What about automating this instead to mitigate the problem if you can’t resolve the core cause?

mayeut · September 4, 2022, 11:13am

Just note that it’s been working fine for the past month.
I don’t know what changed but it’s working…

XVilka · June 3, 2024, 2:33am

It happens again, I reported it to a support a month ago and it’s not yet fixed. It’s ridiculous how unreliable Travis CI in general. And they charge more than other CI services!

I am on the verge of just setting up few QEMU instances on my server for the same architectures Travis CI provides. It surely will be more reliable.

Montana · June 4, 2024, 3:58pm

Hi @XVilka,

This should be okay now.

Topic		Replies	Views
Ppc64le builds stuck in queue Environments	4	818	March 23, 2020
Ppc64 & s390x failing on startup Multi CPU Architecture	5	104	August 25, 2024
Travis arm64: a job does not start with "An error occurred while generating the build script." Multi CPU Architecture arm64	23	2301	November 10, 2021
Travis_wait flaky on ppc64le at travis-ci.com Travis CI Discussions & Feedback travis-build , bug	1	480	June 25, 2020
Arm job timing out during initialization Multi CPU Architecture	3	897	October 23, 2020

PPC64LE jobs are stuck from time to time

Related topics