Travis builds fail due to slowness

Most of the recent pull request checks in Keycloak fail because some of the Travis builds time out after 50 minutes. This started roughly a week ago (around Aug 1, 2019). There was no change in the code base that would cause this long test runs, so it seems that infrastructure is causing this. Is this any known issue?

See e.g. the following builds:

https://travis-ci.org/keycloak/keycloak/jobs/566586956
https://travis-ci.org/keycloak/keycloak/jobs/568226590
https://travis-ci.org/keycloak/keycloak/jobs/568226589

In the (very few) successfull builds in the last month (e.g. https://travis-ci.org/keycloak/keycloak/builds/561057415, https://travis-ci.org/keycloak/keycloak/builds/565156194), that jobs takes ~45 minutes which is dangerously close to the limit.

Split that test suite so that jobs take a more comfortable amount of time.

Maybe network slowness experienced by others?

Yes! Last 5 days we have a network issues. They are totaly random. From failing apt-get install command to timeouted docker push. I have already created a ticket on these problems yestarday, but with not response from Travis :frowning:

Just as a side note to what the others said: this is happening also for private repositories.

Yes, we have timeout fail since 1-2 of august too. And there were no changes in code.

Thank you for the suggestion. We are already working on that part as well, but I think this is not the core of the issue, since that happens just the same for the jobs that were finished in about 25 minutes. In some cases, the first - compilation - step failed after 50 minutes while usually it finishes within 10 minutes.

1 Like

Very similar case for us (and there are several other threads opened recently for network timeouts in various situations):

Get https://registry-1.docker.io/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
+rm -rf -- /home/travis/build/apache/airflow/.build/cache/UVy7yccSkc

If you enable some output on the “compiling…” stage, we’d get something to work with to find out what takes it so long.

It could be the network issues. Or it could be e.g. excessive thrashing due to too many jobs running in parallel. Java build jobs are known to use up to multiple gigabytes of memory each, and a build machine has 7.5GB.

I have created a PR revealing compilation details, the build is here: https://travis-ci.org/keycloak/keycloak/builds/568509983

The results are:

  • 1 job failing because of “No output has been received in the last 10m0s” (link)
  • 1 job failing with no obvious reason because it exceeded max time during compilation (link)
  • 3 Conn timeouts when accessing repository.jboss.org (link) (link) (link)

Perhaps networking issue (limited bandwidth) could be the culprit?

Currently there are many failing jobs due to Connection reset of maven-central@google: link link link and there would be more

Spotted was also “Connection timed out” when accessing github.com: link

Hope this would help with identifying the root cause.