Duplicate job attempts for PR builds causing "Queued — Build Created" problems

We’ve been having an issue on and off for a while now that looks extremely similar to the issues others have been reporting, where CI jobs never complete successfully for Github PRs. However, only just now I noticed one key difference.

Unlike what others have been reporting, when our PR checks won’t complete, it turns out it’s because there are TWO “Travis CI - Pull Request” checks listed for the same PR. They’re also both listed in the Checks tab as separate Github Actions. I thought one of the two was a Branch build, but only today realized they’re both Pull Request builds.

What’s weirder, they’re both THE SAME BUILD. Both jobs are showing status for the exact same TravisCI Build job, listed by the same number, but showing different status. If I click through to Travis-ci.com from either one, I land on the same status page there (showing a successful build).

It looks like, because they’re both pointing at the same Travis job, only one of the two checks ever receives the build status update from Travis, and the other one will remain in limbo forever.

Here’s an example of a PR currently in that state. It’s not urgent, so I can leave it there for a bit.

Experience tells me that if I were to push a commit, even an empty one, onto that PR branch, that would un-stick the build. The second “phantom” PR check would vanish, and the other one would complete successfully.

But nothing else seems to correct this, including restarting the Travis job — that would only cause both checks to go back into “Queued — Build Created” state, and once again only one of them would make it back out to “Successful”.

As far as I know we have everything configured correctly. We’ve switched over to the GitHub Actions integration, we’re using travis-ci.com, etc. We were encouraged to upgrade our config after having more-typical issues with builds on travis-ci.org.

What could be causing those duplicate PR build entries?

1 Like

In possibly-related news, I only just now noticed that my fork (which is the source of that PR) had an old travis-ci.org Webhook entry still configured in its settings, so I removed that.

Is there any way that could have been the cause of this?

Could a similar old Webhook be causing this, if it’s still configured on the target repository? (I don’t have access to those settings, but I can fire off an email and ask the admin to check for something, if I know where to point them.)

Check Settings for your project at Github, the Webhooks and Integrations tabs.

There should be no Travis webhooks, and only one Travis app, “Travis CI”.

Then check branch protection rules at the Branch tab and make sure there are no duplicate status check entries.

So, as an update on this, it somehow appears that it somehow was the old travis-ci.org webhook somehow still installed on my fork, that was somehow causing duplicate CI job checks on upstream PRs created from my fork. Somehow!*

The upstream admin checked all of the suggested settings and found nothing amiss, so nothing was changed there, but since I’ve deleted that webhook the issue hasn’t recurred.

I won’t even pretend to understand why that would have fixed it, nor why it would’ve been causing this problem to begin with. I’m just crossing my fingers that it really is cleared up for good.

* – (I feel the number of 'somehow’s in that paragraph adequately expresses my level of comprehension regarding the actual mechanics of whatever was happening here.)

1 Like

Can someone please the Open Source on travis-ci.com - Travis CI documentation?

We followed that to migrate the Python project from travis-ci.org to travis-ci.com, but there was no explanation about webhooks: https://github.com/python/core-workflow/issues/371

1 Like

…Can someone please update the docs, I’m guessing? (You out a word. :wink:)

I have to concur, there’s insufficient info in the current (still-“beta” officially, I know) docs. Especially since the migration from legacy .org to travis-ci.com / Github Actions is apparently not 100% reliable about cleaning up after itself (if it even tries to). And as we’ve seen, leftover old-Travis configuration can apparently cause problems.

I decided to test whether this situation was just a fluke, by checking whether any of my other repos were using travis-ci.org. It turned out I did have one last holdout, another forked project for which I have commit access.

So I signed in to my account @ travis-ci.org for the first time in ~6-8 months and submitted a new “beta application” for migration. After (instantly) receiving the email with the migration link, I followed it and re-initiated that process, only this time I made sure to authorize migration of ALL of my repositories.

Since the migration completed, that last repo is now showing up at travis-ci.com and is no longer listed as using “Legacy integration” on my travis-ci.org dashboard.

HOWEVER, looking in the settings, once again the Webhook for travis-ci.org IS still installed.

It seems like migrating users should be advised to check for (and then delete) any travis-ci.org webooks after the migration. And it’d be useful for the migration documentation to present it as SOP / a blanket recommendation, rather than something that’s only suggested as a troubleshooting / problem response step.

I suppose I’ll email this to support@travis-ci.com, I don’t know if they monitor these forums in an official-enough capacity that having it posted here is sufficient to assume that it’s seen by the proper channels.

(Specifically, while I believe that @native-api is a Travis 'insider" — and thanks for your assistance, BTW — I don’t want to make any presumptions that said assistance is offered in any sort of official capacity.)

This is still happening, for CPython PRs. Some PRs appear to have the travis CI check added twice, and one of them completes successfully, whereas the other one stalls, causing the PR to be blocked from being merged. Example from this PR:

However, for some PRs the check is only added once and completes, so that the PR isn’t blocked from being merged (example here), but there are other examples of blocked PRs, like this one. All in the same repo, and it’s not clear what conditions cause some PRs to have the doubled checks. The repo uses the GitHub Actions API rather than an older API.

@vsajip Ayup, that sure looks familiar. And I bet if you were to click on those two Travis checks, you’d find that they’re both watching the exact same Travis JOB (same exact build#), which is why only one of them will ever receive status updates.

If I’m right about how this happens (though I have no idea why), the cause is that the submitter of the PR in question has old, legacy travis-ci . org Webhooks still configured on their fork of the repo. Pushing any commit at all to the PR branch should unstick the individual PR, but only deleting those webhooks from their fork’s Settings would fix it for good.

(I should also mention that I did indeed escalate this to Travis-CI support, and one of their reps is actively tracking the issue and working on finding a permanent solution.)

I don’t think this is the cause because the PR that @vsajip linked is by a completely new contributor who only forked the repo a few days ago. I doubt they have their own travis-ci webhook set up for the repo.

Ah, but they may at some point have authorized Travis to automatically install on “all new repos”. Then again, I agree that I could simply be wrong about the pathology.

Speculation aside, that’s a very useful data point which I’m going to pass along to Mustafa @ Travis (the rep tracking this issue), as I don’t think this has been encountered on forks that were created post-migration until now.

Everything else certainly looks the same — both checks are watching Travis build 62373, so only one of them received the status notifications… oh! This is also interesting.

One job:

The other job:

(Sorry about the blurry text, that’s a driver issue on my system.)

…Check out the build times, though. I don’t think I’ve ever seen one of these where the checks appear to have been created for the same job, but DAYS apart from each other!

Anyway, hopefully all of this new info will be helpful in pinning down the source of these problems. Thanks!

You need to contact Github support to find out why this happens – only they know the mechanism and have the logs.

@native-api Travis-CI ticket #20712 is open to track this issue.

Awww, nutz!

@ammaraskar was right, it isn’t the webhooks. I just had this happen again on one of my PRs, despite the old webhooks having been completely cleared out of my account weeks ago.

I’ve now experimentally disabled the “Require status checks to pass before merging” feature entirely for my fork of that repo (it previously did have “Travis CI - Pull Request” set as a required check), to see if that perhaps is the real culprit.

Guess we wait and see, now…

I seem to be having the same problem with one of my PRs. It had been working fine all last week. But started this bad behavior yesterday.

PR:

Hello,

We have made related changes to prevent duplicate Travis CI GitHub checks on your PRs. Could you please let us know from (support@travis-ci.com) if you still experiencing the issue?

Thanks.

1 Like

Thanks, Mustafa!

In my own case, the issue hasn’t occurred for the past 2-3 weeks, since I removed all of my fork’s required status checks on 2020-09-26. (That may be coincidental, of course. I’ve thought this was cleared up before only to have it come back again.) I’ll keep an eye out for any more occurrences.

I know this was affecting some of the other members of our project as well, so I’ll ask around to see if anyone is still having issues.

With three copies of the same check, no less! I think that’s a new record, I’ve only ever had two. :laughing:

@mustafa - Thanks for the update. I updated my PR and this time it only created the one expected check. Looks good.

Thanks again.