Specify CPU type

Is it possible to specify the CPU being used in the travis build? I am aware of the following settings at
Building on Multiple CPU Architectures - Travis CI However, they seem too broad for my use case (happy to be corrected).

In particular I have a branch that sometimes passes or fails unit tests and I can narrow down the cause to the CPU being used due to messages produced by tensorflow.

E.g. The passing build has these messages

2019-12-17 05:57:57.694007: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-17 05:57:57.713689: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz
2019-12-17 05:57:57.714934: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x27ac2b20 executing computations on platform Host. Devices:
2019-12-17 05:57:57.714960: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-12-17 05:58:08.328999: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 17067520 exceeds 10% of system memory.
2019-12-17 05:58:08.341346: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 17067520 exceeds 10% of system memory.
2019-12-17 05:58:08.355016: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 17067520 exceeds 10% of system memory.
2019-12-17 05:58:08.373354: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 17067520 exceeds 10% of system memory.
2019-12-17 05:58:08.391213: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 17067520 exceeds 10% of system memory.

However the failing build has these messages.

2019-12-19 03:42:03.611822: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-12-19 03:42:03.623020: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2800180000 Hz
2019-12-19 03:42:03.623234: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x28340c00 executing computations on platform Host. Devices:
2019-12-19 03:42:03.623259: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2404c2413

From the messages it seems the 2.3GHz CPU produces a pass and 2.8Ghz CPU produces a failure.

I would like to debug a previously passing build for reproduce ability.
Currently all the debug builds I have launched always seem to be run on a 2.8GHz system/docker container. Is it possible to force the 2.3GHz settings? I have tried 20 repeated debug builds recently and they always seem to build on a 2.8GHz system.

1 Like

Hello jrwishart,

Do you think it’s possible to link the logs here? Have you considered adding bazel into your .travis.yml file?

Here’s a good example repository on GitHub on how to do that: Bazel repository.

-Montana

Hi Montana, Sorry for the late reply. I stopped actively checking the thread after the new year and thought there might be notifications to my email but there was none. Unfortunately I can’t link to the logs since its a private repo. I could try adding Bazel. I’m unfamiliar with it but it seems powerful based off the wiki page.

For those people reading this thread and facing a similar issue, the problem was due to numerical precision of the output of tensorflow. The different hardware caused the tensorflow output to be different (in the 6th decimal place). This is small but aggregated over the models we were fitting to produce noticeable different output (to 1 decimal place). The solution was to round the tensorflow output before it was used in follow up models.

1 Like

Here’s a good example repository on GitHub on how to do that: Bazel repository.

I’m not a Bazel expert, but I believe Bazel uses -march=native. That builds for the native machine, and can cause problems if the binary is moved to another machine. And I seem to recall it is difficult to get Bazel to stop using -march=native.

Also see How to cross-compile tensorflow-serving on docker build image with bazel when copts contain spaces on Stack Overflow.

If you know how to modify a Bazel configuration, then you might try -mno-avx2 to disable AVX2. But it has been my experience disabling all of the obscure features with -mno-xxx can be tricky. I’ve found it best to avoid -march=native, and explicitly specify what you want like -mavx (for AVX but not AVX2).