20+ minutes spent adding yarn folder to the cache

I noticed that our build times on travis have been getting exceptionally long since mid-November:

(Our builds generally take around 20 minutes but lately they regularly hit 40 minutes.

I’ve investigated the travis_time time spans in the log and two operations stand out:

  1. “adding /home/travis/.cache/yarn to cache”
  2. packing a new cache archive.
I've captured some excerpts from this build's log...
e[0Ktravis_time:start:1c2711ba
e[0Kadding /home/travis/.cache/yarn to cachee[0m
travis_time:end:1c2711ba:start=1579128522968430154,finish=1579128959926954236,duration=436958524082,event=setup_cache
e[0Ktravis_fold:end:cache.yarn

That’s a duration of 436 seconds (~7.2 minutes).

[0Ktravis_time:start:04a85855
e[0Ke[32;1mchanges detected (content changed, file is created, or file is deleted):\n/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/.appveyor.yml
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/architecture.jpg
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/example_audit.png
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/example_dev_tools.png
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/lh_favicon_32px.png
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/lh_favicon_76px.png
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modul\n...e[0m
e[32;1mchanges detected, packing new archivee[0m
e[32;1muploading master/cache--linux-xenial-e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855--node-12.tgze[0m
e[32;1mcache uploadede[0m
travis_time:end:04a85855:start=1579130219140939102,finish=1579130862739083015,duration=643598143913,event=cache

That’s a duration of 643.60 seconds (~10.7 minutes).

Obviously, we don’t expect either of these operations to take 10 minutes to complete. (I’ll note we also ran into this yarn gpg problem, though I’m not sure it’s related.)

And the sawtooth pattern in the graph is curious. Each tooth is roughly a week long.

Here’s our travis.yml if that’s useful.

In https://travis-ci.org/GoogleChrome/lighthouse/jobs/637668750#L1545-L1555, I see file names such as:

/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/architecture.jpg
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/example_audit.png
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/example_dev_tools.png
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/lh_favicon_32px.png
/home/travis/.cache/yarn/v4/npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063/node_modules/lighthouse/assets/lh_favicon_76px.png

What are these, and what big are they? The name such as npm-lighthouse-5.6.0-88d67a30-d5bf-4de8-a1a2-fc22ba290046-1579130159063seems to suggest some sort of generated files that you don’t know about before they are actually generated.

We determine whether or not to update the cache by comparing the file paths and their checksums before and after the job, so if a small file is moved from one generated directory to another, it will trigger a cache upload. In addition, numerous small files (say, 10,000 files of 1KB each) is a lot more time-consuming to archive and compress than a single large file (a single 10,000KB file). If either of these considerations is true, I suggest not caching these files.

One build from early November (https://travis-ci.org/GoogleChrome/lighthouse/jobs/607335291#L1350) does not seem to cache these files, and the cache upload is much faster.

I’ve also looked at the sizes of the caches. The latest one on the master branch is 2.63 GB. If these small files are easy/quick to generate, then not caching those may also be beneficial.