No cache support on arm64?

jay0lee · October 11, 2019, 9:56pm

It seems arm64 does not support cache yet? I have cache directories defined in my .travis.yml and other builds including Linux, MacOS and Windows cache the directories properly but arm64 seems to neither check for an existing cache nor store cache after a successful run.

Recognizing this is an alpha release, is their a known issues page tracking what we should not expect to work at this time?

Thanks for working on arm64 by the way! I’d wanted it for some time but never had enough time to stand up my own environment.

Jay

Michal · October 17, 2019, 12:18pm

Hi @jay0lee

Yes, indeed cache support is not added yet - it will come though, over next weeks. We’re working on making arm builds complete.

It’s terrific that we could fill the ARM gap in your build chain. It’ll take some more time to go out of Alpha and all your feedback is much appreciated!

Happy building!

Michał

stefantalpalaru · October 17, 2019, 8:05pm

I also need caching for this project: https://travis-ci.org/status-im/nim-beacon-chain/jobs/599265212

(mainly so I don’t abuse a GitHub LFS repo’s download limits, and also to reduce the runtime by caching some binaries)

By the way, since ARM64 builds are significantly slower than AMD64 builds, would it be possible to increase the 50 minutes time limit?

jay0lee · October 17, 2019, 8:22pm

I’d love to see the 50 minute time limit bumped also though I have found arm64 environment can perform pretty well since it offers 32 CPUs for a job, amd64 only offers 2. I did need to go back and make sure my workloads were optimized for parallel processing and obviously not all workloads can do so.

stefantalpalaru · October 17, 2019, 8:38pm

I don’t expect this to last past the alpha stage.

Michal · October 18, 2019, 9:47am

Hi there,

@stefantalpalaru:
since ARM64 builds are significantly slower than AMD64 builds,

this is actually one of things we observe - some builds are couple of seconds faster, some take significantly longer when comparing AMD vs ARM. We’re still gathering data on this, but it seems, with current capacity being available, it depends a lot on how the things are structured in code and tests. Still the 50 minute timeout cap change is indeed something that we’ll look into soon as a quick remedy.

@jay0lee:
since it offers 32 CPUs for a job, amd64 only offers 2

Every LXD container is assigned also with 2vCPU amount when starting up (see our CI Environments → Overview documentation, however the LXD host can dynamically shuffle available CPU time for the jobs. There’s a nice top level explanation on LXD resource usage control by LXD lead dev, Stéphane Graber. Thus, computing resources assigned to build job can vary for each build triggered depending on current LXD host workload. So wrt

@stefantalpalaru:
I don’t expect this to last past the alpha stage.

It all depends on utilization of ARM infrastructure. I’d expect there would be days or times of the day, where there’s a plenty of capacity free for your more complex builds (if you think in terms of nightly builds etc).
And yes, it’s good idea to optimize your workload for parallel processing if this fits your case.

Hope it helps and gives an insight on resources behind your ARM build job?

stefantalpalaru · October 18, 2019, 11:37am

So what’s the optimal number of parallel jobs, in this flexible environment?

Michal · October 18, 2019, 11:46am

It depend’s on your exact case (edit: and workload on our end, of course /edit), I’m afraid. I’d try with 2-4 in beginning and see how it works for you.

jay0lee · October 18, 2019, 12:08pm

Personally I detect # CPUs and then set make accordingly:

cpucount=$(nproc --all)
make -j$cpucount

Which seems to work well for C compilers. I also have some tests that can run in parallel and seem to perform best with 2x CPU count threads:

-j$(( $cpucount * 2 ))

This way if they lower # CPUs on arm or raise it on x86_64, my workloads just auto-adjust.

Jay

stefantalpalaru · October 18, 2019, 5:36pm

You might be getting all the host’s cores this way: nproc shows all hosts cores in lxc despite limitiation | Proxmox Support Forum

stefantalpalaru · October 18, 2019, 11:47pm

I got a comment stuck in the spam queue, but nproc in LXC seems to show the host’s cores, not the container’s limit. That’s why it’s 32 instead of 2.

aarchiba · October 21, 2019, 8:02am

Our arm64 builds are slow chiefly because the cache doesn’t exist, but also because pre-built software is less available - for example, there are no “wheels” for the python package numpy, so they must be built (compiling C code) and then neither ccache nor the pip package cache is able to store those for the next build. We probably can’t incorporate arm64 into our CI until this improves, because the build time is already at 37 minutes, which is a shame because its extended precision support is different in a way that usefully exercises our code base.

github.com/astropy/astropy

Add arm64 to Travis instructions

astropy:master ← aarchiba:arm64

opened 01:07AM - 19 Oct 19 UTC

aarchiba

+39 -1

This pull request is an experiment to evaluate whether astropy works correctly on the arm64 architecture. This architecture is notable for the fact that long doubles are actually binary128, that is, long double is quadruple precision. It is not totally clear that the implementation is in hardware, but it is certainly valuable to see whether the assumption that long doubles are 80 bits or less is built into astropy somewhere. And many smartphones and single-board computers, including the Raspberry Pi 3 and 4, can run an arm64 operating system (though many use the 32-bit version).  Relevant to #9368 and #9361

Michal · October 21, 2019, 8:44am

Thanks for that , @aarchiba - a +1 for the cache priority. Will let you know, when it’s available.

Michal · October 21, 2019, 6:02pm

Hello @aarchiba @stefantalpalaru @jay0lee

Cache support is added today for ARM builds.

Please take a look at Xenial and cache support added for ARM builds changelog as well as Build Environment Overview documentation update.

Happy Building!

jay0lee · October 21, 2019, 7:14pm

Thanks Michal! Great to see work on this. Note that I was getting failures at cache stage on arm64 starting today due to ruby not being present:

https://travis-ci.org/jay0lee/GAM/jobs/600803062

I did manage to resolve this by pre-installing ruby:

  addons:
    apt:
      packages:
        - ruby

Jay

stefantalpalaru · October 21, 2019, 9:54pm

There doesn’t seem to be a separate cache key (file name) per used architecture, so I’m getting on ARM those cached binaries I created on AMD64: https://travis-ci.org/status-im/nim-bearssl/jobs/600974595

There’s also a suspicious error in the “store build cache” stage: “/home/travis/.casher/bin/casher: line 230: md5deep: command not found” (when it’s clear that there was an attempt to install it a few lines above)

Michal · October 22, 2019, 7:27am

Thanks for the feedback, @stefantalpalaru and @jay0lee - looking into that!

oskar · November 11, 2019, 9:01pm

I’m getting the same error as @stefantalpalaru (“store build cache” stage: “/home/travis/.casher/bin/casher: line 230: md5deep: command not found”).

Has there been any progress made here? Any workaround?

stefantalpalaru · November 11, 2019, 10:09pm

There is a workaround: use different (YAML-defined) env vars to get different cache keys. E.g.: https://github.com/status-im/nim-beacon-chain/blob/93d501cd435fbf37debf44b7d04c228762e157ef/.travis.yml#L37

Official docs: https://docs.travis-ci.com/user/caching/#caches-and-build-matrices

Michal · November 20, 2019, 12:21pm

Hi!

For original issue (no proper cache key taking arch into account) there’s a fix deployed - see https://github.com/travis-ci/travis-build/pull/1801

@oskar - feedback much appreciated, thank you. As @stefantalpalaru already corrected pointed out, a case with build matrix where specific job characteristics are shared by more than one job, adding a public env to create unique cache entires is a proper approach. Thanks for surfacing that one @stefantalpalaru!

Topic		Replies	Views
No cache support on ppc64 and s390x Multi CPU Architecture	4	1143	November 29, 2019
Arm64 builds hang for hours and then error out with "Automatic restarts limited" Multi CPU Architecture	5	894	October 23, 2020
Arm64 missing dependencies which exist for AMD Multi CPU Architecture	2	930	January 21, 2020
Xenial support in multi-cpu architecture Multi CPU Architecture	4	913	October 22, 2019
No output has been received and then build terminated on Arm64 Multi CPU Architecture	4	1337	October 24, 2020

No cache support on arm64?

Related topics