It seems arm64 does not support cache yet? I have cache directories defined in my .travis.yml and other builds including Linux, MacOS and Windows cache the directories properly but arm64 seems to neither check for an existing cache nor store cache after a successful run.
Recognizing this is an alpha release, is their a known issues page tracking what we should not expect to work at this time?
Thanks for working on arm64 by the way! I’d wanted it for some time but never had enough time to stand up my own environment.
Yes, indeed cache support is not added yet - it will come though, over next weeks. We’re working on making arm builds complete.
It’s terrific that we could fill the ARM gap in your build chain. It’ll take some more time to go out of Alpha and all your feedback is much appreciated!
I’d love to see the 50 minute time limit bumped also though I have found arm64 environment can perform pretty well since it offers 32 CPUs for a job, amd64 only offers 2. I did need to go back and make sure my workloads were optimized for parallel processing and obviously not all workloads can do so.
@stefantalpalaru:
since ARM64 builds are significantly slower than AMD64 builds,
this is actually one of things we observe - some builds are couple of seconds faster, some take significantly longer when comparing AMD vs ARM. We’re still gathering data on this, but it seems, with current capacity being available, it depends a lot on how the things are structured in code and tests. Still the 50 minute timeout cap change is indeed something that we’ll look into soon as a quick remedy.
@jay0lee:
since it offers 32 CPUs for a job, amd64 only offers 2
Every LXD container is assigned also with 2vCPU amount when starting up (see our CI Environments → Overview documentation, however the LXD host can dynamically shuffle available CPU time for the jobs. There’s a nice top level explanation on LXD resource usage control by LXD lead dev, Stéphane Graber. Thus, computing resources assigned to build job can vary for each build triggered depending on current LXD host workload. So wrt
@stefantalpalaru:
I don’t expect this to last past the alpha stage.
It all depends on utilization of ARM infrastructure. I’d expect there would be days or times of the day, where there’s a plenty of capacity free for your more complex builds (if you think in terms of nightly builds etc).
And yes, it’s good idea to optimize your workload for parallel processing if this fits your case.
Hope it helps and gives an insight on resources behind your ARM build job?
It depend’s on your exact case (edit: and workload on our end, of course /edit), I’m afraid. I’d try with 2-4 in beginning and see how it works for you.
Our arm64 builds are slow chiefly because the cache doesn’t exist, but also because pre-built software is less available - for example, there are no “wheels” for the python package numpy, so they must be built (compiling C code) and then neither ccache nor the pip package cache is able to store those for the next build. We probably can’t incorporate arm64 into our CI until this improves, because the build time is already at 37 minutes, which is a shame because its extended precision support is different in a way that usefully exercises our code base.
There’s also a suspicious error in the “store build cache” stage: “/home/travis/.casher/bin/casher: line 230: md5deep: command not found” (when it’s clear that there was an attempt to install it a few lines above)
@oskar - feedback much appreciated, thank you. As @stefantalpalaru already corrected pointed out, a case with build matrix where specific job characteristics are shared by more than one job, adding a public env to create unique cache entires is a proper approach. Thanks for surfacing that one @stefantalpalaru!