Sporadic POST failures when deploying on github releases

I get random failures when deploying on GitHub. It seems like the deploy fails for non-Travis related issues, but it’s not retried:

POST https://uploads.github.com/repos/DanySK/Experiment-2019-EAAI-Processes/releases/23795018/assets?name=by-time.tar.gz: 500 - Error saving asset (Octokit::InternalServerError)
failed to deploy

Example build

Should the deploy get wrapped into a travis_retry internally?

This looks like a 500 from GitHub, and it might prove to be difficult to eradicate by ourselves.

A quick look at

points to the use of

in dpl.

I’ve read that most commands are (under the hood) called with travis_retry. Can a similar strategy apply to dpl?
If I restart the build, the deployment succeeds.

Sure, we can look into adding retry logic over dpl, but I also think that having a built-in retry mechanism within dpl is also useful.

A 500 means a problem at the remote server. Thus retrying immediately isn’t going to solve it in the general case, only mitigate it in some select cases. So a retry won’t really be a solution that will let you rest easy.

A real solution would be something that will retry after some significant time, by which whatever problem there was at the server might have been resolved.
This, however, opens a perverse incentive to retry regardless of the problem – even if it’s on your side, thus wasting CI’s resources.

The reported failure must be caused by https://www.githubstatus.com/incidents/fxbbtd7mhz1c – I was getting 500’s for things like push and login during that time.

I believe the current situation is worse than automatic retrying, as the thing I currently do is restarting the whole build, which takes 20+ minutes of full CPU usage. Not exactly a good way to save CI resources and shorten the time to delivery.

I believe a retry after a minute would mitigate many cases. Also, having a chance to restart a single phase of the build would be great (possibly, deploy only).