No space left on device after xenial update

After updating the dist to xenial, the builds are randomly and consistently failing with error: ‘No space left on device’

We have 254 MB of dependencies and we are testing using PostgreSQL database which we clear before each test, so it is unlinkely that our tests are causing the disk to fill up.

These are the failing builds, there are more of these in the project:

https://travis-ci.org/fossasia/open-event-server/builds/591743394?utm_source=github_status&utm_medium=notification
https://travis-ci.org/fossasia/open-event-server/builds/591743551?utm_source=github_status&utm_medium=notification

Use df -h and du -shx to check space on device and occupied by individual subtrees.
This will help you localize at what stages and to what places the space goes.

We know the failing command, everything is localized. There’s only one stage - dredd tests where the space goes. It’s not a question of what we can do to prevent it or debug what wrong is happening. We already know that.

Also, as this is only happening in xenial distriution, and not if we to back to trusty. So, I don’t think this is something we control.

Our tests write the bare minimum data to the PostgreSQL database between each tests, so it can’t be reduced.

Since you still don’t know what exact operations eat up the space and where – that’s not enough localization.
Localize it further – e.g. run those commands after each test, at specific points inside complicit tests if needed.

So, I don’t think this is something we control.

Without localization, neither do Travis know if that’s something that they control.

I don’t know if you have even seen the builds I have linked.

As I have told that this does not happen on trusty distribution, only on xenial. And all the operations are DB based, I don’t have anything else to say. The project is open source, the build logs have been provided. It is definitely something the xenial image of Travis controls as it does not happen on trusty and also, sometimes does not happen on xenial

Our tests are not flaky. I am not sure what more you want to localize?

The build shows this error - sqlalchemy.exc.OperationalError: (psycopg2.errors.DiskFull) could not write to file "pg_xlog/xlogtemp.8485": No space left on device

What more localization you want?

fail: GET (200) /v1/tasks/1 duration: 11011ms

3728info: Hooks handler stderr: Traceback (most recent call last):

3729 File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 761, in _commit_impl

3730 self.engine.dialect.do_commit(self.connection)

3731 File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 505, in do_commit

3732 dbapi_connection.commit()

3733psycopg2.errors.DiskFull: could not write to file "pg_xlog/xlogtemp.8018": No space left on device

How will more localization help? What will it accomplish?

This build, same code, does not fail - https://travis-ci.org/fossasia/open-event-server/builds/591743128?utm_source=github_status&utm_medium=notification

How can that not be controlled by Travis?

Similar issues have been reported in past by several people, https://github.com/travis-ci/travis-ci/issues/8375

I still ran df before each test and at the starting of the build.

Here are the results:

Before the start of test:

Filesystem      Size  Used Avail Use% Mounted on
udev            3.7G     0  3.7G   0% /dev
tmpfs           748M  8.6M  739M   2% /run
/dev/sda1        68G   12G   57G  18% /
tmpfs           3.7G  8.0K  3.7G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
none            768M   51M  718M   7% /var/ramfs
tmpfs           748M     0  748M   0% /run/user/2000

Last output after tests:

Filesystem      Size  Used Avail Use% Mounted on
udev            3.7G     0  3.7G   0% /dev
tmpfs           748M  8.7M  739M   2% /run
/dev/sda1        68G   12G   57G  18% /
tmpfs           3.7G  8.0K  3.7G   1% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.7G     0  3.7G   0% /sys/fs/cgroup
none            768M  713M   56M  93% /var/ramfs
tmpfs           748M     0  748M   0% /run/user/2000

This time the build did not fail

Hi @iamareebjamal @native-api
We found the solution for this by increasing space for /var/ramfs

1 Like