No cache support on arm64?

It seems arm64 does not support cache yet? I have cache directories defined in my .travis.yml and other builds including Linux, MacOS and Windows cache the directories properly but arm64 seems to neither check for an existing cache nor store cache after a successful run.

Recognizing this is an alpha release, is their a known issues page tracking what we should not expect to work at this time?

Thanks for working on arm64 by the way! I’d wanted it for some time but never had enough time to stand up my own environment.

Jay

1 Like

Hi @jay0lee

Yes, indeed cache support is not added yet - it will come though, over next weeks. We’re working on making arm builds complete.

It’s terrific that we could fill the ARM gap in your build chain. It’ll take some more time to go out of Alpha and all your feedback is much appreciated!

Happy building!

Michał

I also need caching for this project: https://travis-ci.org/status-im/nim-beacon-chain/jobs/599265212

(mainly so I don’t abuse a GitHub LFS repo’s download limits, and also to reduce the runtime by caching some binaries)

By the way, since ARM64 builds are significantly slower than AMD64 builds, would it be possible to increase the 50 minutes time limit?

I’d love to see the 50 minute time limit bumped also though I have found arm64 environment can perform pretty well since it offers 32 CPUs for a job, amd64 only offers 2. I did need to go back and make sure my workloads were optimized for parallel processing and obviously not all workloads can do so.

1 Like

I don’t expect this to last past the alpha stage.

Hi there,

@stefantalpalaru:
since ARM64 builds are significantly slower than AMD64 builds,

this is actually one of things we observe - some builds are couple of seconds faster, some take significantly longer when comparing AMD vs ARM. We’re still gathering data on this, but it seems, with current capacity being available, it depends a lot on how the things are structured in code and tests. Still the 50 minute timeout cap change is indeed something that we’ll look into soon as a quick remedy.

@jay0lee:
since it offers 32 CPUs for a job, amd64 only offers 2

Every LXD container is assigned also with 2vCPU amount when starting up (see our CI Environments → Overview documentation, however the LXD host can dynamically shuffle available CPU time for the jobs. There’s a nice top level explanation on LXD resource usage control by LXD lead dev, Stéphane Graber. Thus, computing resources assigned to build job can vary for each build triggered depending on current LXD host workload. So wrt

@stefantalpalaru:
I don’t expect this to last past the alpha stage.

It all depends on utilization of ARM infrastructure. I’d expect there would be days or times of the day, where there’s a plenty of capacity free for your more complex builds (if you think in terms of nightly builds etc).
And yes, it’s good idea to optimize your workload for parallel processing if this fits your case.

Hope it helps and gives an insight on resources behind your ARM build job?

So what’s the optimal number of parallel jobs, in this flexible environment?

It depend’s on your exact case (edit: and workload on our end, of course /edit), I’m afraid. I’d try with 2-4 in beginning and see how it works for you.

Personally I detect # CPUs and then set make accordingly:

cpucount=$(nproc --all)
make -j$cpucount

Which seems to work well for C compilers. I also have some tests that can run in parallel and seem to perform best with 2x CPU count threads:

-j$(( $cpucount * 2 ))

This way if they lower # CPUs on arm or raise it on x86_64, my workloads just auto-adjust.

Jay

1 Like

You might be getting all the host’s cores this way: nproc shows all hosts cores in lxc despite limitiation | Proxmox Support Forum

I got a comment stuck in the spam queue, but nproc in LXC seems to show the host’s cores, not the container’s limit. That’s why it’s 32 instead of 2.

Our arm64 builds are slow chiefly because the cache doesn’t exist, but also because pre-built software is less available - for example, there are no “wheels” for the python package numpy, so they must be built (compiling C code) and then neither ccache nor the pip package cache is able to store those for the next build. We probably can’t incorporate arm64 into our CI until this improves, because the build time is already at 37 minutes, which is a shame because its extended precision support is different in a way that usefully exercises our code base.

Thanks for that , @aarchiba - a +1 for the cache priority. Will let you know, when it’s available.

Hello @aarchiba @stefantalpalaru @jay0lee

Cache support is added today for ARM builds.

Please take a look at Xenial and cache support added for ARM builds changelog as well as Build Environment Overview documentation update.

Happy Building!

Thanks Michal! Great to see work on this. Note that I was getting failures at cache stage on arm64 starting today due to ruby not being present:

https://travis-ci.org/jay0lee/GAM/jobs/600803062

I did manage to resolve this by pre-installing ruby:

  addons:
    apt:
      packages:
        - ruby

Jay

There doesn’t seem to be a separate cache key (file name) per used architecture, so I’m getting on ARM those cached binaries I created on AMD64: https://travis-ci.org/status-im/nim-bearssl/jobs/600974595

There’s also a suspicious error in the “store build cache” stage: “/home/travis/.casher/bin/casher: line 230: md5deep: command not found” (when it’s clear that there was an attempt to install it a few lines above)

Thanks for the feedback, @stefantalpalaru and @jay0lee - looking into that!

I’m getting the same error as @stefantalpalaru (“store build cache” stage: “/home/travis/.casher/bin/casher: line 230: md5deep: command not found”).

Has there been any progress made here? Any workaround?

There is a workaround: use different (YAML-defined) env vars to get different cache keys. E.g.: https://github.com/status-im/nim-beacon-chain/blob/93d501cd435fbf37debf44b7d04c228762e157ef/.travis.yml#L37

Official docs: https://docs.travis-ci.com/user/caching/#caches-and-build-matrices

Hi!

For original issue (no proper cache key taking arch into account) there’s a fix deployed - see https://github.com/travis-ci/travis-build/pull/1801

@oskar - feedback much appreciated, thank you. As @stefantalpalaru already corrected pointed out, a case with build matrix where specific job characteristics are shared by more than one job, adding a public env to create unique cache entires is a proper approach. Thanks for surfacing that one @stefantalpalaru!