After updating the dist to xenial, the builds are randomly and consistently failing with error: ‘No space left on device’
We have 254 MB of dependencies and we are testing using PostgreSQL database which we clear before each test, so it is unlinkely that our tests are causing the disk to fill up.
These are the failing builds, there are more of these in the project:
https://travis-ci.org/fossasia/open-event-server/builds/591743394?utm_source=github_status&utm_medium=notification
https://travis-ci.org/fossasia/open-event-server/builds/591743551?utm_source=github_status&utm_medium=notification
Use df -h
and du -shx
to check space on device and occupied by individual subtrees.
This will help you localize at what stages and to what places the space goes.
We know the failing command, everything is localized. There’s only one stage - dredd tests where the space goes. It’s not a question of what we can do to prevent it or debug what wrong is happening. We already know that.
Also, as this is only happening in xenial distriution, and not if we to back to trusty. So, I don’t think this is something we control.
Our tests write the bare minimum data to the PostgreSQL database between each tests, so it can’t be reduced.
Since you still don’t know what exact operations eat up the space and where – that’s not enough localization.
Localize it further – e.g. run those commands after each test, at specific points inside complicit tests if needed.
So, I don’t think this is something we control.
Without localization, neither do Travis know if that’s something that they control.
I don’t know if you have even seen the builds I have linked.
As I have told that this does not happen on trusty distribution, only on xenial. And all the operations are DB based, I don’t have anything else to say. The project is open source, the build logs have been provided. It is definitely something the xenial image of Travis controls as it does not happen on trusty and also, sometimes does not happen on xenial
Our tests are not flaky. I am not sure what more you want to localize?
The build shows this error - sqlalchemy.exc.OperationalError: (psycopg2.errors.DiskFull) could not write to file "pg_xlog/xlogtemp.8485": No space left on device
What more localization you want?
fail: GET (200) /v1/tasks/1 duration: 11011ms
3728info: Hooks handler stderr: Traceback (most recent call last):
3729 File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 761, in _commit_impl
3730 self.engine.dialect.do_commit(self.connection)
3731 File "/home/travis/virtualenv/python3.7.1/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 505, in do_commit
3732 dbapi_connection.commit()
3733psycopg2.errors.DiskFull: could not write to file "pg_xlog/xlogtemp.8018": No space left on device
How will more localization help? What will it accomplish?
This build, same code, does not fail - https://travis-ci.org/fossasia/open-event-server/builds/591743128?utm_source=github_status&utm_medium=notification
How can that not be controlled by Travis?
Similar issues have been reported in past by several people, https://github.com/travis-ci/travis-ci/issues/8375
I still ran df
before each test and at the starting of the build.
Here are the results:
Before the start of test:
Filesystem Size Used Avail Use% Mounted on
udev 3.7G 0 3.7G 0% /dev
tmpfs 748M 8.6M 739M 2% /run
/dev/sda1 68G 12G 57G 18% /
tmpfs 3.7G 8.0K 3.7G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
none 768M 51M 718M 7% /var/ramfs
tmpfs 748M 0 748M 0% /run/user/2000
Last output after tests:
Filesystem Size Used Avail Use% Mounted on
udev 3.7G 0 3.7G 0% /dev
tmpfs 748M 8.7M 739M 2% /run
/dev/sda1 68G 12G 57G 18% /
tmpfs 3.7G 8.0K 3.7G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.7G 0 3.7G 0% /sys/fs/cgroup
none 768M 713M 56M 93% /var/ramfs
tmpfs 748M 0 748M 0% /run/user/2000
This time the build did not fail
Hi @iamareebjamal @native-api
We found the solution for this by increasing space for /var/ramfs
1 Like