No space left on device after xenial update

iamareebjamal · October 1, 2019, 1:50am

After updating the dist to xenial, the builds are randomly and consistently failing with error: ‘No space left on device’

We have 254 MB of dependencies and we are testing using PostgreSQL database which we clear before each test, so it is unlinkely that our tests are causing the disk to fill up.

These are the failing builds, there are more of these in the project:

https://travis-ci.org/fossasia/open-event-server/builds/591743394?utm_source=github_status&utm_medium=notification
https://travis-ci.org/fossasia/open-event-server/builds/591743551?utm_source=github_status&utm_medium=notification

native-api · October 1, 2019, 1:55am

Use df -h and du -shx to check space on device and occupied by individual subtrees.
This will help you localize at what stages and to what places the space goes.

iamareebjamal · October 1, 2019, 2:38am

We know the failing command, everything is localized. There’s only one stage - dredd tests where the space goes. It’s not a question of what we can do to prevent it or debug what wrong is happening. We already know that.

Also, as this is only happening in xenial distriution, and not if we to back to trusty. So, I don’t think this is something we control.

Our tests write the bare minimum data to the PostgreSQL database between each tests, so it can’t be reduced.

native-api · October 1, 2019, 2:55am

Since you still don’t know what exact operations eat up the space and where – that’s not enough localization.
Localize it further – e.g. run those commands after each test, at specific points inside complicit tests if needed.

So, I don’t think this is something we control.

Without localization, neither do Travis know if that’s something that they control.

iamareebjamal · October 1, 2019, 3:12am

I don’t know if you have even seen the builds I have linked.

As I have told that this does not happen on trusty distribution, only on xenial. And all the operations are DB based, I don’t have anything else to say. The project is open source, the build logs have been provided. It is definitely something the xenial image of Travis controls as it does not happen on trusty and also, sometimes does not happen on xenial

Our tests are not flaky. I am not sure what more you want to localize?

The build shows this error - sqlalchemy.exc.OperationalError: (psycopg2.errors.DiskFull) could not write to file "pg_xlog/xlogtemp.8485": No space left on device

What more localization you want?

fail: GET (200) /v1/tasks/1 duration: 11011ms

3728info: Hooks handler stderr: Traceback (most recent call last):

3729 File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 761, in _commit_impl

3730 self.engine.dialect.do_commit(self.connection)

3731 File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 505, in do_commit

3732 dbapi_connection.commit()

3733psycopg2.errors.DiskFull: could not write to file "pg_xlog/xlogtemp.8018": No space left on device

iamareebjamal · October 1, 2019, 3:14am

How will more localization help? What will it accomplish?

This build, same code, does not fail - https://travis-ci.org/fossasia/open-event-server/builds/591743128?utm_source=github_status&utm_medium=notification

How can that not be controlled by Travis?

iamareebjamal · October 1, 2019, 4:02am

Similar issues have been reported in past by several people, https://github.com/travis-ci/travis-ci/issues/8375

I still ran df before each test and at the starting of the build.

Here are the results:

Before the start of test:

Filesystem      Size  Used Avail Use% Mounted on
udev            3.7G     0  3.7G   0% /dev
tmpfs           748M  8.6M  739M   2% /run
/dev/sda1        68G   12G   57G  18% /
tmpfs           3.7G  8.0K  3.7G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
none            768M   51M  718M   7% /var/ramfs
tmpfs           748M     0  748M   0% /run/user/2000

Last output after tests:

Filesystem      Size  Used Avail Use% Mounted on
udev            3.7G     0  3.7G   0% /dev
tmpfs           748M  8.7M  739M   2% /run
/dev/sda1        68G   12G   57G  18% /
tmpfs           3.7G  8.0K  3.7G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
none            768M  713M   56M  93% /var/ramfs
tmpfs           748M     0  748M   0% /run/user/2000

This time the build did not fail

jagdishdasani · September 15, 2021, 7:37pm

Hi @iamareebjamal @native-api
We found the solution for this by increasing space for /var/ramfs

Topic		Replies	Views
No space left on device for System Z Multi CPU Architecture	19	4294	October 20, 2020
Observation with changing dist from trusty to xenial Java build-env	1	943	February 5, 2019
Requesting Xenial, getting Trusty instead Linux	5	1135	April 29, 2019
Specified `dist: xenial` but trsty build machine is fired up Travis CI Discussions & Feedback	1	644	April 21, 2019
Xenial Go (1.13.x) build fails to initiate go.env stage Xenial	3	696	December 4, 2020

No space left on device after xenial update

Related topics