In 2019, GitHub published their own solution to run automated workflows called GitHub Actions, which allowed those hosting their code in GitHub, to be able to define and run their CI/CD pipelines in the same platform.
When it was released, one of the main pain points to use it was that defining pipelines required large yaml config files, where it was sometimes hard to avoid duplication.
However, during this time, and based on users’ feedback, GitHub has introduced several improvements on this regard.
Recently, I have been refactoring and improving a pipeline in one of my projects, and I wanted to share the different approaches I used to reduce duplication.
Those approaches are:
Matrix.
Composite actions.
Reusable workflows.
This article assumes you have certain knowledge on how GitHub Actions works. If that’s not the case, you probably want to take a look at its documentation.
Introducing the workflow
Let’s imagine we start with this continuous integration workflow for a PHP project:
In human language, this is what the pipeline does:
Check coding styles.
Run a static analysis.
Run unit tests and generate code coverage.
Run E2E tests and generate code coverage.
Run mutation tests for the unit tests.
Run mutation tests for the E2E tests.
The first 4 jobs are all run in parallel, and the last two are run after the tests have finished (as they require the code coverage reports).
Also, for all the jobs, an environment needs to be set-up, with certain version of PHP (sometimes just 8.0, sometimes also 8.1) and some PHP extensions.
Now, let’s see how to improve all those duplicated steps.
Use a matrix
The first thing we can do is merge the jobs for all the PHP versions, and pass that as a matrix argument.
Also, coding styles and static analysis are both static code checks which require a very similar set-up. We can merge those two and pass them via matrix as well.
With this change, we are down from 8 jobs to 5, with the only consideration that we now publish code coverage conditionally based on the PHP version.
Reuse steps with a composite action
The next more obvious thing is that there are a couple of steps that appear on each one of the jobs to set up the environment.
We can combine those into a local composite action that wraps all the individual steps and can be called as a whole by every job.
Local actions have to be located inside .github/actions, in a folder with the name we want, containing an action.yml file, which in our case, could look like this:
.github/actions/ci-setup/action.yml:
This action wraps the 4 steps that we have on every job. The only step we can’t add here is the checkout step, as we need the code to have been checked out first in order to find the action file itself.
With this local composite action, we can refactor the workflow to look like this:
Every job is now much shorter, with almost all duplicated code moved to the composite action.
Also, as a side effect, we got rid of defining the PHP extensions as an env var, since we now pass them as an arg to the action only in one place.
Reuse a whole workflow
But that’s not it. There’s still a lot of duplication between both tests jobs and both mutation tests jobs.
One way we can reduce even further the gap is by extracting them to reusable workflows.
They are similar to composite actions, with the difference that they do not wrap only a couple of steps, but they can have even multiple jobs that we then invoke from the main workflow.
Also, our reusable workflow can still use the composite action we created in previous step.
Let’s define our ci-test reusable workflow:
.github/workflows/ci-test.yml:
And now, we can invoke this reusable workflow from our main CI workflow like this:
This reduces the duplication to the bare minimum, allowing us to reuse the tests + mutation-tests logic both for unit tests and E2E tests, keeping the benefit of making the later depend on the former.
There are a couple of things to clarify from the examples above:
In here, we use local composite actions and reusable workflows.
However, GitHub Actions supports loading them from a different repository, and therefore using them in multiple projects if needed.
In the case of actions, it of course also allows to publish them in the marketplace so that you don’t have to reference them via repository name and path.
It may seem as if unit and mutation tests could have been simplified with a matrix, as we did with the static checks.
However, that would not allow to make every mutation-tests job to depend only on its corresponding tests job, and they would have to wait for all the tests to finish.
That’s why a reusable workflow is a better solution here.
It may also look like the unit-tests and e2e-tests jobs, which in the last version only invoke the ci-tests reusable workflow could have been merged using a matrix.
However, GitHub Actions does not currently allow to use a matrix with reusable workflows.
That’s why they are defined as two independent jobs.
The example used in this article is made-up, but tries to cover a bit of everything to justify all the strategies presented on it.