S390x IPv6 connect system call sometimes returning "Connection timed out" in apt-add-repository command

In Ruby project Travis s390x case, apt-add-repositorycommand sometimes , once in a 2 times or 3 times, has existed with error “Error: retrieving gpg key timed out”.

https://travis-ci.org/ruby/ruby/jobs/623537864#L58

$ tool/travis_retry.sh sudo -E apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
+ sudo -E apt-add-repository -y ppa:ubuntu-toolchain-r/test
Error: retrieving gpg key timed out.
gpg: keyring `/tmp/tmpya5afzam/secring.gpg' created
gpg: keyring `/tmp/tmpya5afzam/pubring.gpg' created
gpg: requesting key BA9EF27F from hkp server keyserver.ubuntu.com
gpg: /tmp/tmpya5afzam/trustdb.gpg: trustdb created
gpg: key BA9EF27F: public key "Launchpad Toolchain builds" imported
gpg: Total number processed: 1
gpg:               imported: 1  (RSA: 1)
OK

Note the used tool/travis_retry.sh is just the internal retry script.

Then I found the root cause and how to avoid it after the investigation. The cause is s390x environment’s IPv6 network.

Cause

Here is the simple reproducing script I prepared with strace.

- sudo apt-get install -y strace
- sudo -E strace -f apt-add-repository -y "ppa:ubuntu-toolchain-r/test"

You see only arch: s390x case is failed in the list of CPU architectures cases.

Then download each raw log (log.txt) from the Travis log page, check and compare around connect system calls.

$ grep -n connect log_s390x.txt
$ grep -n connect log_x86_64.txt
$ grep -n connect log_arm64.txt

s390x log (log_s390x.txt)

https://travis-ci.org/junaruga/report-travis-error-retrieving-gpg-key-timed-out/jobs/632360432#L10375-L10380
https://travis-ci.org/junaruga/report-travis-error-retrieving-gpg-key-timed-out/jobs/632360432#L13091-L13099

You see the message “Error: retrieving gpg key timed out.” in the output. The timeout only happned for IPv6 (AF_INET6).

$ sudo -E strace -f apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
...
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8003", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT (Connection timed out)
close(3)                                = 0
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8004", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT (Connection timed out)
close(3)    
...
pid  1948] futex(0x3ff88000c14, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, {1578070812, 434829000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
[pid  1948] write(1, "Error: retrieving gpg key timed "..., 37Error: retrieving gpg key timed out.
) = 37
[pid  1948] futex(0x3ff88000c14, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff <unfinished ...>
[pid  1961] <... connect resumed> )     = -1 ETIMEDOUT (Connection timed out)
[pid  1961] close(3)                    = 0
[pid  1961] socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
[pid  1961] connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8004", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ETIMEDOUT (Connection timed out)

amd64 (x86_64) log (log_x86_64.txt)

The part connecting to IPv6 is EADDRNOTAVAIL and passed, because the amd64 environment, IPv6 is disabled. And the apt-add-repository exits successfully.

$ sudo -E strace -f apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
...
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8004", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address)
...

arm64 log (log_arm64.txt)

In arm64 environment, IPv6 is enabled.
And the apt-add-repository exits successfully.

$ sudo -E strace -f apt-add-repository -y "ppa:ubuntu-toolchain-r/test"
...
socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8003", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(59930), inet_pton(AF_INET6, "2001:db8::216:3eff:fe20:cf30", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8004", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(43769), inet_pton(AF_INET6, "2001:db8::216:3eff:fe20:cf30", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
close(3)                                = 0
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8003", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EHOSTUNREACH (No route to host)
close(3)                                = 0
socket(PF_INET6, SOCK_STREAM|SOCK_CLOEXEC, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(443), inet_pton(AF_INET6, "2001:67c:1560:8003::8004", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EHOSTUNREACH (No route to host)
close(3)                                = 0
...

How to avoid this issue (temporary workflow)

Here is the .travis.yml for the temporary workflow.
The solution is to disable IPv6 for only s390x case.
I executed the modified s390x case’s job 10 times continuously, and all job is passed. So, I can say it avoided this issue.

matrix:
  include:
    - arch: amd64
    - arch: arm64
    - arch: ppc64le
    - arch: s390x
      before_script:
        - ip a
        - sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1
        - sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1
        - sudo sysctl -w net.ipv6.conf.lo.disable_ipv6=1
        - ip a

But I wish this issue will be fixed without disabling IPv6 on s390x.

Note: software-properties-common deb package fixing the related issue.

The above reported case is on dist: xenial, but this issue happens on both dist: xenial and dist: bionic.

apt-add-repository command is in software-properties-common deb package.

# dpkg -S /usr/bin/apt-add-repository
software-properties-common: /usr/bin/apt-add-repository

https://packages.ubuntu.com/xenial-updates/software-properties-common

Here is an issue related to “Error: retrieving gpg key timed out.”
https://bugs.launchpad.net/ubuntu/+source/software-properties/+bug/1713962

This issue exists software-properties-common deb package for xenial and xenial-updates, and fixed in bionic and bionic-updates repository.

But this issue actually still happens in bionic and bionic-updates.
I felt that when the command apt-add-repository failing with “Error: retrieving gpg key timed out.” in dist: bionic, the command is freezing more long time.

Just fyi, travis build on z is enabled here… https://travis-ci.org/ruby/ruby

Yes, we are trying to add and enable Travis Z (s390x) and PowerPC (ppc64le) case to the ruby/ruby GitHub repository by following Ruby bug tracking system’s ticket.

Enabling IBM PowerPC/Z cases in Travis CI
https://bugs.ruby-lang.org/issues/16360

And after my following pull-request to disable IPv6 on Travis s390x case was merged recently, the s390x case has been stable, so far no failure after merging the PR to master branch.

Disable IPv6 on Travis s390x case.

After we will see the s390x case is stable enough, we are going to run s390x case without Allowed Failures.

But this is the temporary workflow to pass the s390x case.

This reporting IPv6 connection issue can happen when users add an additional repository such as “ppa:ubuntu-toolchain-r/test” to use gcc-N, in a default condition that is enabling IPv6 on s390x.

So, I wish this issue will be fixed.

1 Like

Hi @junaruga @cwsolee

Thanks for information and write-up.

Does it still happen often on s390x when ipv6 is enabled? I mean is the issue persisting w/o this workaround?

Does it still happen often on s390x when ipv6 is enabled? I mean is the issue persisting w/o this workaround?

Yes. it still happens often on s390x now when ipv6 is enabled without this workaround. In my experience that is once in 3 times.

How to reproduce this issue for your environment.

There is an easy way to reproduce this issue.

  1. Fork this repository: https://github.com/junaruga/report-travis-error-retrieving-gpg-key-timed-out
  2. Then enable Travis for your forked repository, and run Travis for the latest master branch.
  3. Rerun “s390x default condition” job 3 times continuously clicking the “Restart Job” button. (Click => Confirm the job’s done, Click => Confirm the job’s done => …)
  4. You see the error for the 3rd time or 5th’s restarting job.

As a reference, see following job that today I executed.

Thank you for checking it.

1 Like

Got it, thanks for setting up reproducibility case for us!

Imprint