Extremely different execution times for identical tests

tyralla · February 25, 2025, 8:59am

I am executing two jobs which partly overlap in their covered test suites. The second job needs less than one hour, but the first one exceeds the maximum time limit of 1.5 hours. This has been happening for a few months now. I temporarily added some print statements to search for bottlenecks and found that some identical test suits (based on doctest, if this matters) need way more time when executed by the first job, for example, 365 s vs. 8 s. I changed the Python and Numpy versions, but this did not make a difference. Everything else seems identical to me.

Note that not all tests are affected to the same extent. Tests that start other Python subprocesses via the features of Python’s subprocess module are way more affected than those that do not. (I measure code coverage of those spawned subprocesses with the coverage site package).

I have no idea what is going on; any help would be appreciated!

github.com/hydpy-dev/hydpy

.travis.yml

master

language: python

os: linux

dist: jammy

matrix:
  include:
    - python: 3.10
    - python: 3.13

install:
  - pip install --upgrade pip virtualenv
  - pip install --upgrade -r requirements.txt

script:
  - python prepare_build.py
  - python -m build
  - if [ $TRAVIS_PYTHON_VERSION == "3.10" ] ; then nox -s "doctest(numpy='1')"; fi
  - if [ $TRAVIS_PYTHON_VERSION == "3.13" ] ; then pip install ghp-import; fi

This file has been truncated. show original

Montana · February 25, 2025, 3:36pm

Hi @tyralla,

My gut tells me the first job (Python 3.10) is hitting a resource bottleneck—likely CPU or I/O contention—exacerbated by subprocess overhead and coverage instrumentation. The second job (Python 3.13) might run fewer tests or benefit from a cleaner VM state, avoiding the same strain. The doctest/subprocess combo could be amplifying this due to output capture or process management inefficiencies.

You’re using the coverage package to measure code coverage of spawned subprocesses. This involves intercepting Python execution (via sys.settrace or similar) in both the parent and child processes, which adds overhead. If the first job runs more tests or has a different execution order, this overhead could compound, especially for subprocesses.

So my theory is, temporarily disable subprocess-using tests in the first job (e.g., skip them in nox with a flag or filter) and check if the runtime drops below 90 minutes. This isolates whether subprocesses are the sole culprit.

Go into noxfile.py, you could add a condition like:

if os.environ.get("TRAVIS_PYTHON_VERSION") == "3.10" and not os.environ.get("SKIP_SUBPROCESS"):

Since the first job exceeds 90 minutes, use Travis CI’s build matrix or stages to split its workload:

matrix:
  include:
    - python: 3.10
      name: "Doctests (no subprocess)"
      script:
        - python prepare_build.py
        - python -m build
        - nox -s "doctest(numpy='1')" -- --skip-subprocess
    - python: 3.10
      name: "Doctests (subprocess only)"
      script:
        - python prepare_build.py
        - python -m build
        - nox -s "doctest(numpy='1')" -- --subprocess-only
    - python: 3.13
      script:
        - python prepare_build.py
        - python -m build
        - pip install ghp-import

So what I would do is start with steps 1 and 2: log timings and isolate subprocess tests. If subprocesses are the main issue, optimize their coverage setup (step 3) or split the job (step 4). This should get you under 90 minutes and clarify why the discrepancy exists.

Montana · February 25, 2025, 7:34pm

Hi @tyralla,

Also trying changing the distribution you’re using to bionic and see if that makes a difference.

tyralla · February 26, 2025, 1:07pm

Thanks for the quick response!

Unfortunately, the most simple possible solution - switching to bionic - did not help. Splitting the first job into two would make some things more complicated. Coverage.py could, in fact, play the key role here. I will check if turning it off for some subprocess-related tests helps.

tyralla · February 27, 2025, 6:10am

Closing this issue, as it is more of a Coverage.py issue than a Travis-CI issue.

Montana · February 27, 2025, 4:15pm

Hi @tytalla,

Good to know, thank you so much.

Topic		Replies	Views
Builds timing out in last several days Travis CI Discussions & Feedback	5	908	October 22, 2020
Python subroutine coverage Deployment python	1	680	November 16, 2021
Run linter just once (outside the matrix) Travis CI Discussions & Feedback	3	1279	December 28, 2023
Job passes under container-based infrastructure, fails under vitrual machine infrastructure Travis CI Discussions & Feedback	3	636	December 9, 2018
How to run jobs/stages/builds(?) for different versions of Python? Travis CI Discussions & Feedback	1	1791	August 1, 2023

Extremely different execution times for identical tests

Related topics