How to skip jobs based on the files changed in a subdirectory?

I am working at a company who use the monorepo pattern to store their code. Each sub project has its own test and deploy job so every push to this repo will blindly test and deploy all the projects. In order to reduce build times I would like to only build the projects within which files have changed. Sadly this does not seem to be possible using the Travis conditional syntax as that only works on prefixed meta data (such as the branch name etc).

Ideally I would be able to run a script, as early as possible in the job lifecycle, which performs a git diff inside each projects sub directory and skips the jobs as required. I have managed to get something working by copying the example in this repo - https://github.com/arnaskro/monorepo-travis.

A snippet of our slightly modified job definition looks like the following…

jobs:
  include:
  - stage: "Test"
    name: "common"
    before_install: if devops/check_for_changes.sh $TRAVIS_COMMIT_RANGE common; then echo "Testing"; else echo "NOT testing" && travis_terminate 0; fi
    install:
      - cd common
      - npm install
      - cd ..
    script:
      - cd common
      - npm test
    cache:
      npm: true
      directories:
        - common/node_modules

It seems before_install is the earliest lifecycle phase I can hook into but this only happens after the environment language has been installed (in our case node via nvm) and the build cache restored. This means the minimum time to determine the job is not necessary is ~1 minute. We don’t need the language or cache to determine whether the job is necessary or not.

Is there a way to abort the job earlier than the before_install phase?

Or is there a better way to achieve the same thing? I considered…

  • a preliminary stage that runs the git diff and stores a record of the subprojects to build or not as env vars for later stages to consume
  • using the Travis API to trigger child builds for each of the subprojects that need building. I am not sure if that would work with Github build reporting.

Any advice would be greatly appreciated!

This capacity does not exist at the moment. The previous discussion is found in

Ok no problem. Thanks BanzaiMan.

A second scenario would be to try running a preliminary stage to execute the git diff which records the jobs to run on the environment. Can I set an environment variable on one stage and have it available in a later stage?

This appears to be a dup of feature request Support for skipping builds via file filtering

You can just make whatever checks you need ASAP – e.g. in before_install – and if they fail, exit the build early with travis_terminate <exit code>.

That doesnt help if the build should pass early if certain files are not modified and therefore do not need to be re-tested.

What would help is a way to terminate the entire build (or cancel subsequent phases), not just a job, from inside one job, and indicate the build is a pass or fail even though it is incomplete.

1 Like

This sounds rather dangerous to do. Even if the particular file is unchanged, something else that affects it may have changed.
In any case, such logic is very project-specific and needs to be a part of a test script (i.e. it may find out that it can skip a part of its work – up to the entire work. Making the entire work a special case would be duplicating efforts.).

Since a build consists of multiple independent pipelines, it would only make sense to be able to terminate the current pipeline.

It may be dangerous sometimes, but the alternative may be to unable to run CI at all, if builds are queued. And it may never be dangerous in other cases, where a repo has clear separation of multiple components.

At the moment I dont do any OSX jobs, ever, because it is just too slow to start workers. If I could trigger OSX jobs only when certain files are modified, that increases the amount of QA being done. More QA is not dangerous.

AppVeyor and GitLab CI both provide the functionality requested in this feature request, of path based selection of jobs. Bitrise CI also supports fast finishes of multiple job builds, albeit not quite as declarative as AppVeyor and GitLab. I havent significantly played with Circle CI v2 , but it seems like can also do this fairly easily, and they might even have specialised syntax for it.

1 Like

In this case, you should be able to come up with something more concrete than “what would help is a way…”. This will also make you actually think of how this could be integrated into the existing build logic, including possible corner cases.

Of course I can solve it, but I am not being paid to spec it or build it. The purpose of a feature request is to capture requirements, not the solution.
However the way that AppVeyor and GitLab CI do it would be acceptable for many use cases, and is illustrative and simple enough to be almost a spec, if Travis CI wanted to solve it that way.

I suggest instead allowing successful termination of a build programmatically, as I share your concern about declarative approaches to this problem. I havent started using it for GitLab-CI in OSS projects because I want to generate the list of files which trigger each jobs, because in my OSS repos I know the declarations will get out of sync, and builds will pass when they shouldnt.

travis_terminate_build 0|1 for a black box spec :stuck_out_tongue: Implementing it might be difficult, but that is mostly technical and depends a lot on the travis infrastructure constraints. IMO, the hard part is spec’ing out how that feature would work with the API and UI, etc., and building it. Lots of little details to be designed, starting with what color should be a fast-finished successful build. But it seems Travis CI tends to do those bits of work after deploying the technical capability as a beta.

2 Likes

I had a similar problem where I ended up early exiting if the current commit didn’t change files in the correct folders. This was for CircleCI but the logic would be the same.

# 1. Get all the arguments of the script
# https://unix.stackexchange.com/a/197794
PATHS_TO_SEARCH="$*"

# 2. Make sure the paths to search are not empty
if [ -z "$PATHS_TO_SEARCH" ]; then
    echo "Please provide the paths to search for."
    echo "Example usage:"
    echo "./exit-if-path-not-changed.sh path/to/dir1 path/to/dir2"
    exit 1
fi

# 3. Get the latest commit
LATEST_COMMIT=$(git rev-parse HEAD)

# 4. Get the latest commit in the searched paths
LATEST_COMMIT_IN_PATH=$(git log -1 --format=format:%H --full-diff $PATHS_TO_SEARCH)

if [ $LATEST_COMMIT != $LATEST_COMMIT_IN_PATH ]; then
    echo "Exiting this job because code in the following paths have not changed:"
    echo $PATHS_TO_SEARCH
    # Change this with whatever is appropriate with TravisCI
    circleci step halt
fi

For reference:

1 Like

Another script:

#!/bin/bash
set -e
# This script is called by Travis during the install step.
# It returns 1 if no files where changed. In that case
# no further building/test is required for this image

IMAGE_NAME=$1

echo "Travis event type: $TRAVIS_EVENT_TYPE"

if [ "$TRAVIS_EVENT_TYPE" == "api" ]
then
    # trigger via travis dashboard
    # test all
    echo "Trigger all tests"
    exit 0
fi

# in our repo each job runs in it's own folder
cd "$IMAGE_NAME-docker"

CHANGED_FILES=$(git diff --name-status HEAD~1...HEAD .)
if [ -z "$CHANGED_FILES" ]
then
    # nothing changed, skip building
    echo "No changes in $IMAGE_NAME, terminate"
   
    # this indicates to the parent script that the build can be terminated
    exit 137
fi

echo "There were changes in $IMAGE_NAME changes: $CHANGED_FILES"
exit 0

In the travis.yml file the script is called during the install phase

./check_changed.sh $IMAGE_NAME ; RETURN_CODE=$? ; if [ $RETURN_CODE -eq 137 ]; then travis_terminate 0; elif [ $RETURN_CODE -ne 0 ]; then travis_terminate $RETURN_CODE; fi

If the check_changes script fails somehow you want to detect this and fail the build with an error.

Would love to increase visibility on this as a feature request to be better handled as a native travis function during stage definition in the yml declaration.

I’m running a script similar to what @dwjbosman has shared, however for my use case where I’m building submodules in a mono repo, I have many substages running in parallel - each substage job would need to run this check, and even at the earliest chance to run these checks a job has run for ~30-45sec. That adds up to a lot of dead time consuming build threads other commits might make more ready use of rather than being enqueued.

1 Like

Based on the title of the question, I think this is asking for bypassing jobs within the build, not the entire build? At least, that’s what I would like. I would like to publish versions only if package.json has changed

I’ve read up on monorepos, and it looks like Travis is not really equipped to support monorepos out of the box.

  • In monorepos, the amount of rebuilding necessary is determined dynamically by intelligent build tools, and the logic is completely tool-specific.
    • So the proposed feature is not an adequate solution. It may be good enough specifically for the OP in the specific situation, but I foresee requests for all sorts of additional logic flooding in quickly if this is implemented.

An adequate solution would either be

  • provide beefier build machines with more cores (to commercial users on demand) that can run many tasks in parallel and build everything within a single job; or
  • introduce a “planning” stage that would be able to edit .travis.yml (or, more probably, its representation) on the fly, e.g. disabling jobs as necessary; or
  • in conjunction with the flowchart build model, ability to edit config of the jobs down the flowchart edges from the current job
    • to avoid conflicts, there should probably be a phase for this only during which the configs are accessible and are protected with a critical section synchronization primitive

A workaround would be to

  • Have a series of stages with the same configuration which would
    • determine what to build according to the specific tool’s logic
    • build what is possible within the time limit
    • pass information to the next stage so that it can do the same, picking off from where this job stopped
    • of course, you can run several such jobs in parallel, too, but it’s going to be tricky due to inability to pass information between sibling jobs
1 Like