This is honestly just an expression of frustration with the travis deploy tooling, as my team has already decided to migrate off. It would be reassuring to hear that y’all recognize these as problems and have a plan to address them, however.
My team adopted the deploy feature to deploy a fairly simple application that consists of a.) running migrations b.) firing off two codedeploy deployment groups, in that order.
We’ve run into countless problems since:
There is no way to limit concurrent builds on master. As a result, we frequently run into issues where two people merge around the same time and the deploy fails, since the first deploy won’t have finished before the second one starts, and dpl errors out. We have to manually intervene and retry in this case. This happens about once a day at 5-10 deploys a day.
Further, because two builds can race to production, it’s possible for an older commit to deploy after a newer commit, simply because the newer commit finishes faster. If we manage to detect this, it requires manual intervention. This happens about once a week at 5-10 deploys a day.
An error at one stage of deploy doesn’t break the build, but allows the next stage to proceed. This is still an outstanding issue in the codedeploy plugin for dplv2, although my team is looking at making a PR to fix this since it caused a major outage (but would appreciate if your team could address it first).
DPLv2’s error handling is still flawed. Besides the issue in #3, there were several times during development where the codedeploy plugin simply failed without explanation. In one case, I believe I misspecified the github repository.
AFAICT there is no support for common deploy workflows, like canarying or human approval.
Our interactions with support and this forum have also been really discouraging.
First, one support staff told me I shouldn’t use YAML anchors, even though that’s recommended in the docs and was unrelated to our problem.
Another person seemed to miss the context of what I’d posted them and asked us to tell them what we’d changed in the config—nothing had changed, the system failed after weeks of not failing in that way, which was apparent from the support message.
The feature request thread to fix the issue with concurrent deploys didn’t promise to roadmap anything, but told us to build our own tooling instead: Feature Request: Limit concurrent builds by branch / tag / PR
Y’all, there is no company on earth that wants continuous deploy tooling that doesn’t enforce order. Ensuring that deploys go out in the correct order is a minimally viable feature of any deploy orchestrator. Except for teams that never deploy more than once a day, practically nobody wants a tool that doesn’t support this.
Yet this basic functionality is literally impossible in travis.
P.S. To be clear, we are paying customers to the tune of several k a year and projected 5-8k spend by EOY, if we were to stick around.