Travis Python archive download is flaky

#1

Hi Travis CI,

My daily CRON build failed this morning because for two of my jobs, downloading Travis’s Python archives failed as follows:

pypy2.7-6.0 is not installed; attempting download
Downloading archive: https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64/pypy2.7-6.0.tar.bz2
curl -sSf -o pypy2.7-6.0.tar.bz2 {archive_url}
curl: (56) GnuTLS recv error (-54): Error in the pull function.
Unable to download pypy2.7-6.0 archive. The archive may not exist. Please consider a different version.

(The same thing happened for my Python 3.8-dev job with the https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64/python-3.8-dev.tar.bz2 download.)

I couldn’t get back to my computer to click the “restart job” button until just now, and that button is hidden on the mobile UI, so my project was sporting the red “build failing” badge all day. After clicking the restart button just now for the 3.8-dev job, it succeeded, so this was a transient error. Since this replaced the previous failed build log, I won’t click retry for the pypy-2.7 job (and leave my project with the “build failing” badge for the time being) so you can still see the failure: https://travis-ci.org/jab/bidict/jobs/489092043

This same thing has happened for my project several times in the past. I suggest making the following improvements (in order of impact):

  1. Improve reliability of these downloads, and document how often they’re expected to fail.
  2. Detect this type of failure and automatically retry later at a better time.
  3. Don’t use the “build failing” badge when this is the only cause of failure (e.g. behave as though allow_failures had been set).
  4. Don’t hide the “restart job” button in the mobile UI.
  5. When a job is restarted, don’t have the new build output clobber the previous build output.

Thanks for your consideration!

2 Likes
#2

We can add retries to the curl command to make it a little more resilient.

1 Like
#3

This just happened again in two of the jobs for today’s CRON build:
https://travis-ci.org/jab/bidict/jobs/490102372
https://travis-ci.org/jab/bidict/jobs/490102374

I’ll follow that PR. Thanks for looking into this.

#4

@BanzaiMan, since such flakiness is able to cause erroneous “build failing” badges to be unfairly advertised for working builds, could you please provide users with a one-click manual override to set a project’s build badge back to “passing”?

P.S. I didn’t want to leave my project’s “build failing” badge up any longer (and at this point the failed build log no longer seemed necessary to leave up to help debug), so I just restarted the https://travis-ci.org/jab/bidict/jobs/490102372 job. This time it got past the flaky downloading step. But this could just happen again at any time.

#5

Happened again today: https://travis-ci.org/jab/bidict/jobs/493292613

#6

Just clicked restart build, to hopefully clear the spurious “build failing” badge. While your underlying fix is still in flight, can you provide any workaround or manual override such as any of the ones suggested above? Thanks for your consideration.

#7

Well, a “manual override” is to use language: generic and download and install Python yourself.
https://github.com/matthew-brett/multibuild does this.

#8

I used to do this but decided to go back to using Travis-provided Python versions once they became better-maintained a while back. The kind of workaround I’m looking for is for Travis to implement a stopgap for “don’t mark my whole project as ‘build failing’ when we can tell a job has failed in the first few seconds, before any of its tests have run” given the known flakiness of the Travis Python download step.

#9

Python archives should now be distributed via CDN and the download should be more reliable.

3 Likes
#10

We’ve moved our archives again, to GCS. Speed should be decent, and connections more reliable than S3.

2 Likes