I have several executables that just don't run on your build system (under docker), and no way I can see to debug

I have a build system which runs on a docker container. That docker container works fine on other windows hosts.

But running this docker container to do my builds - mostly works - but some of the executables just do nothing. They emit no error message, and even emit no error status code.

for example, look at the very end of this build:

https://travis-ci.com/github/SophistSolutions/Stroika/jobs/300679115

I run the line:

docker exec --workdir c:/Stroika buildContainer sh -c “cd ThirdPartyComponents/WIX/CURRENT/ && ./candle -v”

This produces no output - no error status.

‘candle’ is a standard build tool. Its extracted from a zipfile (earlier). Earlier steps show the file permissions all look good, and it just fails to run.

I also have executables I’ve BUILT in earlier steps of the build (another build):

https://travis-ci.com/github/SophistSolutions/Stroika/jobs/300679112

looking at the end of the console log here

Builds/Debug/Tests/Test01

just trying to run the executable I built (and you can see ls listed). But it returns exit status 127.

Again, I’ve done basically the same thing using the same container on other windows hosts. I’m not sure how to debug this without being able to ssh into this VM/container?

There’s always exit status of the last script statement, in $? Bash special variable:

docker exec --workdir c:/Stroika buildContainer sh -c "cd ThirdPartyComponents/WIX/CURRENT/ && ./candle -v; echo $?"

Depending on what product and its version sh in your container belongs to, it may or may not be able to automatically add “.exe” to searched name if you omit it in the command.

Anything beyond that can only be devised from the exit code: Windows has a few standard exit codes for when a process fails to start.


The Error 127 and following messages actually belong to the make command above, due to Stderr output is shown after the build log rather than inline.

Thanks for your feedback.

I’ve updated my test according to your hints:

https://travis-ci.com/github/SophistSolutions/Stroika/jobs/301004464

docker exec --workdir c:/Stroika buildContainer sh -c “./Builds/Debug/Tests/Test01.exe; echo $?”

1

Error code #1 appears to mean " ERROR_INVALID_FUNCTION"

Not a totally clear message, but it provided me with a hint. So I looked at the verison of windows you guys are running in your VM:

S Version: 10.0.17763 N/A Build 17763
That appears to be almost 2 years old. I’m not sure docker worked so well two years ago on Windows. I’m testing on Ver 1909 (18363.720)

I think there is a decent chance this is why this docker container isn’t working right on travisCI, but DOES work correctly on all my machines, plus circleci.

Is there a setting I can provide to get a more recent version of Windows (I see https://docs.travis-ci.com/user/reference/windows/ says you only support 1809)? Or when do you plan to update to a more recent copy of Windows? This version MAY not support docker very well.

FYI, I rebuilt my container using windows mcr.microsoft.com/windows/servercore:1809 docker container - and I get the same issue. So its not a mismatch between the container OS and the host OS, but it appears just that such an old host doesn’t really support docker.

The “standard exit codes” I meant are large – e.g. 128+ if the process was terminated or 3221225781 if a dependent DLL was not found. They are substituted by the system when it forcibly terminates the process.
(They are not actually “standard” in that they are not documented as such or “restricted”, but programs themselves typically don’t use such large codes (and it’s in their best interests not to), so if you see one, it’s safe to assume that it was substituted by the system.)

1 is rather a generic failure exit code that the program itself sets if it encountered an error.

According to https://docs.microsoft.com/ru-ru/virtualization/windowscontainers/deploy-containers/version-compatibility, if the container wasn’t compatible with the host OS, it would not start at all.

The generic exit code rather suggests that the program encountered some error.
Since it gives no output whatsoever about what it considered wrong, I cannot say anything else.

E.g. it might unconditionally assume availability of something that is only available in the latest Windows releases.

At some level, I think your analysis maybe right - SOMETHING getting run probably involves a call to the OS which vectors through from the container to the underlying host, and that is not supported. I DONT think you have right, however, what program is doing that.

This test program is one I coded. If it had a call to a Windows API that was unavailable, that windows API would return with an error (#1) and the code would continue. That’s not what we are seeing.

More importantly, this issue is NOT restricted to just these test programs I built (compiled as part of the travisci build).

It ALSO happens to the ‘boost boostrapper’ program which prevents boost from building.

It ALSO happens to code I dont even BUILD, but simply extract from a zip file downloaded from the internet:

wget --quiet --no-check-certificate --tries=2 -O …/Origs-Cache/wix311-binaries.zip https://github.com/wixtoolset/wix3/releases/download/wix3111rtm/wix311-binaries.zip .

docker exec --workdir c:/Stroika buildContainer sh -c “cd ThirdPartyComponents/WIX/CURRENT/ && ./candle -v; echo $?”
1

(see https://api.travis-ci.com/v3/job/301395702/log.txt for this above script fragment/output)

WIX is a very widely used windows build tool. I cannot run even that and get the same ‘no output’ and ‘exit code 1’.

I tried using the powershell ‘unblock’ command on these executables (on the theorey it was some sort of antivirus thing). I’ve tried several such anti antivirus workarounds. NONE have worked. I’m not sure how to debug this without some sort of debugging access to these environments.

As I said, I’ve run this EXACT container on CircleCI without any problems, and on my own laptop. I THINK its pretty likely the difference stems from the version of the host (though I don’t have strong evidence for this).

Still looking for any ideas on how to move forward/fix this.

Do you have any examples of using travisci to run a docker container to build executables and run them?

THanks,
Lewis.

1 Like

FYI, sometime in the last week or two there were updates to the build environemnt for windows which fixed this issue. Even if it was inadvarant - THANKS!

Lewis.

I am seeing the same issue and it is still there: https://travis-ci.com/github/rweickelt/qbs/jobs/360713587

Various programs in the Windows 1809 Docker image just crash. The image has been working for a few months now and nothing has changed in the host environment.

Since the problem appeared in March, I suspect you’re falling over “You might encounter issues when using Windows Server containers with the February 11, 2020 security update release”.

If it’s only 32-bit executables that are failing, that’s almost certainly it. To fix it, you need to ensure that the base image for the image you’re running is exactly the version of the Windows Server host you’re running on, including the minor parts. Non-matching versions are usually fine as long as they’re on the same side of the February updates, but easier to be certain.

My guess is that the host has the security update (I can see 10.0.17763.1098 in @LewisPringle’s failing logs), and the Windows image you’re using does not have the update. It’s possible the other way 'round though.

I’d suggest adding something to your log that outputs the exact host version (cmd //c ver works from bash) and the exact container image platform version (docker image inspect image | grep OsVersion should work) so that this problem can be ruled in or ruled out.

It’s possible that the fix was to roll back the the host to pre-February, which would fix old images, but break new images.

Hopefully all the Windows builders are using the same version, otherwise you’d have to dynamically change your base image in the script.

1 Like
Imprint