In the course of a few months we noticed that quite often, while ARMv8 and System Z jobs run from the queue normally, PPC64 have longer starting times, and is quite often stuck in the infinite loop “Job received → Queued”.
Then, after a long period of time, it silently fails because of the timeout. It happened multiple times, the last time - yesterday.
Moreover, while some of such jobs can be restarted later when the problematic period ends, some always fail, no matter how often you restart. Like this one: Travis CI - Test and Deploy with Confidence
It shows me “Automatic restarts limited: Please try restarting this job later or contact support@travis-ci.com.” every time on such jobs.
Could you please address this problem? The last time I contacted support there was no answer from them at all, and we have a paid plan for two parallel jobs!
Notably, despite many reports from us when such incidents happened for a period of at least a day, or other reports on this forum, the Travis CI Status page is always green, as if nothing happened.
@mustafa,
It’s happening again.
Mixed with the “queued / booting” issue, there’s also a new issue on git clone about gnutls_handshake.
Our 2 concurrent jobs plan is now effectively becoming a 0 concurrent jobs when there are 2 ppc64le jobs trying to run…
I confirm. I noticed this pattern happening on weekends and being unaddressed until working days.
I think they mitigate it by doing something manually every time. What about automating this instead to mitigate the problem if you can’t resolve the core cause?
It happens again, I reported it to a support a month ago and it’s not yet fixed. It’s ridiculous how unreliable Travis CI in general. And they charge more than other CI services!
I am on the verge of just setting up few QEMU instances on my server for the same architectures Travis CI provides. It surely will be more reliable.