A Rust job somehow uses wrong compiler version installed by another job

RalfJung · June 22, 2020, 9:59am

Something is going very wrong with this job: it says that the rustc version is rustc 1.46.0-nightly (4fb54ed48 2020-06-14), which is 1 week outdated, but at the same time rustup update says

  nightly-x86_64-unknown-linux-gnu unchanged - rustc 1.46.0-nightly (a8cf39911 2020-06-21)

To me it looks like whatever caching travis is doing when I say - rust: nightly has gone seriously wrong.

RalfJung · June 22, 2020, 10:10am

Wow, even after adding

      # Make sure we really have the latest nightly.
      - rustup toolchain uninstall nightly
      - rustup toolchain install nightly
      - rustc -Vv

I still get a nightly from a week ago. How is that even possible?

RalfJung · June 22, 2020, 10:16am

Oh that is odd, I somehow have two toolchains:

$ rustup show

Default host: x86_64-unknown-linux-gnu

rustup home:  /home/travis/.rustup

installed toolchains

--------------------

nightly-2020-06-15-x86_64-unknown-linux-gnu (default)

nightly-x86_64-unknown-linux-gnu

active toolchain

----------------

nightly-2020-06-15-x86_64-unknown-linux-gnu (default)

rustc 1.46.0-nightly (4fb54ed48 2020-06-14)

(these extra newlines are just how travis copy-and-paste works unfortunately…)

RalfJung · June 22, 2020, 11:01am

I ended up with this work-around:

      # Make sure we really have the latest nightly.
      # For some reason Travis puts us on a snapshot by default...
      - rustup default nightly
      - rustc -Vv
      - cargo -V

native-api · June 22, 2020, 3:05pm

It looks like this default was set by your script, ci/miri.sh, in an earlier build, Travis CI - Test and Deploy with Confidence.

Rustup installs versions into ~/.cargo/, so everything that you change there is cached between builds.

As such, cleaning your cache and avoiding messing up Travis’ stock functionality without knowing what you are doing should fix the problem.

RalfJung · June 22, 2020, 3:19pm

That’s a different job though? It’s the Miri job, not the all-features job.
Are caches shared across all jobs…?

Travis doesn’t provide a good way to compute the nightly version we need, so there’s no avoiding messing with rustup unfortunately. I know pretty well what I am doing with rustup, but I didn’t expect travis to confuse the caches of different jobs.

native-api · June 22, 2020, 3:35pm

I didn’t check if that was the exact job that poisoned the cache or not (I’ll look into that now). This is just very strong evidence that you are doing something funny and as a result, are yourself responsible for diagnosing the resulting errors rather than (or at least before) reporting them as Travis bugs. A build log reports cache IDs that jobs save and restore so you are able to pull the thread back from a faulty job.

If you don’t want Travis’ language installation logic for whatever reason, use language: generic and install all the necessary tooling yourself (generic image doesn’t have Rustup but you can install and cache it). nightly by definition means “rolling release” so it’s logical that Travis doesn’t allow to select a specific build.

RalfJung · June 22, 2020, 3:39pm

I don’t have any idea how to go about debugging this, but my theory for now is that indeed state somehow got mixed up between jobs, so I changed the other job to set the default back to nightly before the cache gets created. Thanks for pointing that out!

There are various good reasons for not wanting to always test the latest nightly, but some specific nightly determined e.g. by the contents of some file.

Yeah I did that elsewhere, also because Travis tends to cache a bit too much for Rust. But that always feels like a hack.

native-api · June 22, 2020, 4:31pm

Here’s the thread pull:

Faulty job. Pull Request #44 #164.8 failed, June 22, 2020 11:53:29, rustc 1.46.0-nightly (4fb54ed48 2020-06-14), cache ID PR.44/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--cargo-nightly – takes it from PR
1st build in this PR. Pull Request #44 #162.8 failed, June 22, 2020 11:11:51, rustc 1.46.0-nightly (4fb54ed48 2020-06-14), cache ID master/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--cargo-nightly.tgz – takes it from master
Last master build before that. master Merge pull request #43 from RalfJung/raw_ref, #161.8 passed , June 20, 2020 16:40:24, rustc 1.46.0-nightly (2d8bd9b74 2020-06-19), no cache, saves cache ID master/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--cargo-nightly
So cache poisoning happens somewhere between 161.8 and 162.8. Let’s take a closer look at build 161, especially at Miri’s handiwork.
Miri job from 161. master Merge pull request #43 from RalfJung/raw_ref, #161.1 failed , sets default to nightly-2020-06-15-x86_64-unknown-linux-gnu… saves cache ID master/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--cargo-nightly

So jobs 161.1 and 161.8 use the same cache ID!

That’s because their configuration only differs in job name and script. Cache ID is computed from OS image, language version and environment variables as specified in .yml (see What is the cache ID for a job generated from? How can staged jobs use the same cache? for source code links). Job name and script are not considered!

It looks like whoever was using MIRI= envvar in Travis CI - Test and Deploy with Confidence (huh, that was actually you) was doing that for a reason! But they forgot to leave a comment in yml explaining their decision, letting that knowledge become lost!

RalfJung · June 23, 2020, 6:48am

I added it because the other jobs did, and I think it was just a hack to get the name to display, but I might be wrong. Anyway the MIRI env var actually became meaningful for the Miri tool, so we had to stop using it.

Thanks a lot! That is very helpful. So I’ll add some dummy env var then to ensure the cache is separate.

native-api · June 23, 2020, 3:10pm

I created a feature request to elimitate this source of hard-to-catch errors:

You may wish to upvote it if you are interested in seeing it implemented

Topic		Replies	Views
CI using one month old version of rust nightly Rust build-env	9	1742	August 11, 2019
Use all parameters of a job to calculate cache ID; retire "shared caches" Feature Requests caching , build-matrix	1	631	June 23, 2020
Following rust-toolchain? Rust	1	1438	December 11, 2018
Configure cache per a job Travis CI Discussions & Feedback	0	554	January 21, 2019
Control Rust Version with Rust-ToolChain Environments rust	3	1249	December 8, 2021

A Rust job somehow uses wrong compiler version installed by another job

Related topics