Arm64/ppc64le segfaults

I get reproducible segfaults in getc(stdin) when stdin is a fifo accessed like this:

mkfifo fifo
exec 8<>fifo
echo asdf >fifo
rm fifo
./a-program-calling-getc <&8

This segfaults on arm64 and ppc64le, both on xenial and bionic, both with gcc and clang. amd64, both linux and osx, and s390x are unaffected. FWIW, read(0, ...) doesn’t segfault, but it doesn’t read what it was supposed to, either.

Fairly minimal example:
https://travis-ci.org/szeder/git/builds/617513593
https://travis-ci.org/szeder/git/builds/617513614

@Michal For Binoic and xenial distribution, receiving segfault for ppc64le and arm64 jobs Whereas it passes for AMD64.

all the passed and failing jobs can be seen here:
https://travis-ci.com/ghatwala/git/builds/140921043

I saw similar issue here Segfaults in arm64 environment

Looking for your inputs.Thanks in advance

I think your program has undefined behavior:

int main(int argc, char *argv[])
{
	int c = 'z';
	c = getc(stdin);

	if (c == EOF)
		printf("EOF!\n");
	else
		printf("got '%c' (0x%x)\n", c, c);

	return 0;
}

The first c should probably use %d, not %c, since c is an int. Maybe something like:

printf("got '%d' (0x%x)\n", c, c);

Use g++ -Wall or a Sanitizer to vet the code at compile time.

It doesn’t matter,

int main(...)
{
        getc(stdin);
        return 0;
}

segfaults all the same.

https://travis-ci.org/szeder/git/builds/624647282
https://travis-ci.org/szeder/git/builds/624647400

I checked the issued on the repository https://github.com/junaruga/git/tree/arm64-libc-fifo-segfault-jaruga forked from your repository: https://github.com/szeder/git/tree/arm64-libc-fifo-segfault .

I checked it with strace.

sudo apt-get -yq install strace
strace -f ci/arm64-segfault-demo.sh

The result is here.
https://travis-ci.org/junaruga/git/builds/624724660

For s390x + gcc (ok) case

[pid  2769] fstat(0, {st_mode=000, st_size=0, ...}) = 0

For arm64 + gcc (error) case

[pid  2834] fstat(0,  <unfinished ...>) = ?

I also checked the used glibc version too.

dpkg -S /usr/include/stdio.h
apt-cache show libc6-dev

See https://travis-ci.org/junaruga/git/jobs/624724663#L61

It seems that libc6-dev version: 2.27-3ubuntu1 is used for each Ubuntu case. The upstream latest version is 2.30.

I think you can report to glibc project or Ubuntu libc6 to specify the issue: Is the issue Travis specific or Ubuntu specific or not depending on Linux distributions?

For arm64 + gcc (error) case

[pid  2834] fstat(0,  <unfinished ...>) = ?

Indeed, a simple

struct stat st;
fstat(0, &st);

segfaults, too.

FWIW (probably not much), it works fine on a small Raspberry Pi,
though that’s armv6l, not arm64.

I think you can report to glibc project or Ubuntu libc6 to specify the issue: Is the issue Travis specific or Ubuntu specific or not depending on Linux distributions?

No way. It’s an issue in a alpha feature, involving virtualization
infrastructure that I know absolutely nothing about. So it’s up to
Travis CI folks to investigate further, and narrow down whether the
issue is rooted on their in the first place.

I am interested in seeing a resolution to this issue. At least one upstream project will not implement multi-arch ci support because of the perception that Travis multi arch support is not stable calling out this and other issues.

I am unable to reproduce the segfault on ppc64le outside of the travis environment. I have attempted to match the environment as close as possible with no luck. I made an attempt to troubleshoot the issue inside travis but found that I could not create core file on travis multi-arch, so I am stuck.

Reporting the problem upstream with out the ability to reproduce is useless.
If anyone has ideas or a status please post here.

Another possible way to run ppc64le case is to try to use os: linux-ppc64le without specifying arch: foo It is a legacy ppc64le environment in Traivs.
It had been running experimentally before arch: foo syntax was introduced. And now the os: linux-ppc64le is not recommended by Travis. But I think it’s worth to try it for now.

Here is the example.

Another possible way to run arm64 case is to use Drone CI’s arm64 case.

Here is the example.

If anyone has ideas or a status please post here.

One more idea is to use QEMU on Travis x86_64 (amd64) environment.

Here is the example running aarch64 with the aarch64 cross compiler on x86_64.

Here is the tool I sometimes maintains as one of the maintainers, to run a specific CPU architecture container on x86_64.

But note QEMU emulation environment is very slow and this is emulation.
So, though in my opinion, the native architecture is much better than QEMU environment. But I would introduce the QEMU way as one of the possible ways to run aarch64 and ppc64le cases.

@szeder @junaruga
Much appreciated - report and possible verification ways!

@UlkaAsati - thanks for interlinking, related topic was narrowed down to a suspicious instance for now, verifying if it was indeed the case. Here I lean to agree with @junaruga that there’s maybe a cross-platform issue with tools.

We need to double check that.

cc @djlwilder

Please standby.
It seems there’s a kernel fix to be done, we’re waiting for that to be available.

1 Like

@szeder,

Sorry for the late reply.

It’s an issue in a alpha feature, involving virtualization
infrastructure

I found I could side-step a lot of the issues by using Bionic; and avoiding down-level platforms like Xenial.

I still have a few failures but I am pretty sure they are due to compiler bugs.