I am executing two jobs which partly overlap in their covered test suites. The second job needs less than one hour, but the first one exceeds the maximum time limit of 1.5 hours. This has been happening for a few months now. I temporarily added some print statements to search for bottlenecks and found that some identical test suits (based on doctest, if this matters) need way more time when executed by the first job, for example, 365 s vs. 8 s. I changed the Python and Numpy versions, but this did not make a difference. Everything else seems identical to me.
Note that not all tests are affected to the same extent. Tests that start other Python subprocesses via the features of Python’s subprocess module are way more affected than those that do not. (I measure code coverage of those spawned subprocesses with the coverage site package).
I have no idea what is going on; any help would be appreciated!
My gut tells me the first job (Python 3.10) is hitting a resource bottleneck—likely CPU or I/O contention—exacerbated by subprocess overhead and coverage instrumentation. The second job (Python 3.13) might run fewer tests or benefit from a cleaner VM state, avoiding the same strain. The doctest/subprocess combo could be amplifying this due to output capture or process management inefficiencies.
You’re using the coverage package to measure code coverage of spawned subprocesses. This involves intercepting Python execution (via sys.settrace or similar) in both the parent and child processes, which adds overhead. If the first job runs more tests or has a different execution order, this overhead could compound, especially for subprocesses.
So my theory is, temporarily disable subprocess-using tests in the first job (e.g., skip them in nox with a flag or filter) and check if the runtime drops below 90 minutes. This isolates whether subprocesses are the sole culprit.
Go into noxfile.py, you could add a condition like:
if os.environ.get("TRAVIS_PYTHON_VERSION") == "3.10" and not os.environ.get("SKIP_SUBPROCESS"):
Since the first job exceeds 90 minutes, use Travis CI’s build matrix or stages to split its workload:
So what I would do is start with steps 1 and 2: log timings and isolate subprocess tests. If subprocesses are the main issue, optimize their coverage setup (step 3) or split the job (step 4). This should get you under 90 minutes and clarify why the discrepancy exists.
Unfortunately, the most simple possible solution - switching to bionic - did not help. Splitting the first job into two would make some things more complicated. Coverage.py could, in fact, play the key role here. I will check if turning it off for some subprocess-related tests helps.