Skip to content

Setting up a CI/CD Pipeline from scratch

This guide will help you to set up a CI/CD pipeline from scratch for a Python project building an image and pushing it to Docker

This article is written by Edgar Ochoa, Data Engineer at Devoteam G Cloud

A CI/CD pipeline is vital in modern software development to ensure that code is shipped quickly and facilitate all the steps to release. This pipeline ensures that all steps needed for pushing the code to a production environment are done in a repeatable manner.

The goal is to have a functional CI/CD pipeline that will do some validations even before code is pushed to a git repository and once it is in the master branch our docker image will be pushed automatically to a container registry in GCP. This guide will help you to set up a CI/CD pipeline from scratch for a python project building an image and pushing it to  Docker

For python it is possible to use poetry, this is a package and dependent manager the same as it will work with pip or cond but it has some advantages that allow us to create our project using it as a cookie cutter. To install poetry you can follow the guide here

Another tool that will be used is the pre-commits. The pre-commits allow you to run some verifications even before you are allowed to perform a commit from your local machine, this saves time when enforcing best practices and following some code guidelines. This is the guide to installing pre-commit

Table of content

  • Local Setup
    • Initialize our git repository
    • Set up the folder structure
    • Set up pre-commit
    • Create our gitignore file
    • Adding python code to our repository
    • Creating a test for our function
    • Docker image setup
  • CI/CD Pipeline in GCP
    • Setting up Source Repository in GCP
    • Artifact Registry
    • Cloud Build set up
    • Cloud source Trigger
    • Pushing to Cloud Source repository
  • Conclusion

Local Setup

In this chapter, we will be going through the setup of all the local requirements to ensure that our code is clean before pushing it to a git repository. We will set up pre-commits, pytests, and basic project structure.

Initialize our git repository

Navigate to the folder and initialize the git repository

  ```bash
        git init
    ```

Set up the folder structure

To set up the folder structure that includes unit tests use the following command from poetry

 ```bash
        poetry new {project-name}
    ```

In my case, my project will be named `py-docker`

  ```bash
        poetry new py-docker
    ```

The folder structure should show up as follows:

   ```text
        py-docker
        │   README.rst
        │   pyproject.toml
        │
        └───py-docker
        │   │   __init__.py
        │
        └───tests
        │   │   __init__.py
        │   │   test_py_docker.py
        │
        ...
    ```

Poetry creates automatically a `README.rst` file and a `pyproject.toml` along with the tests and the folder where the code will go. The `pyproject.toml` is a file that poetry uses to keep track of dependencies and information such as the authors, version, name of the package, and even the type of license among other things.

The next thing is to perform a poetry installation. This will create a `poetry.lock` file where the information on dependencies will be stored.

    ```bash
        poetry install
    ```

Now it is possible to run tests

   ```bash
        poetry run pytest
    ```

When adding a new library we can now use poetry as follows:

    ```bash
        poetry add {name_of_the_library}
    ```

Set up pre-commit

[Precommit](https://pre-commit.com/) allows you to run some checks on the code before it is committed locally. These checks can be python coding styles and some checks for potential vulnerabilities. To install pre-commit run the following command

    ```bash
        pip install pre-commit
    ```

We will be using the following hooks:

  • black: This checks some python code formatting.
  • bandit: Bandit helps to find common security issues in the python code
  • pydocstyle: This hook analyzes the Python docstring conventions compliance in the code
  • pygrep-hooks: This python hook will enforce the usage of python type annotation
  • hadolint: Docker lint hook
  • flake8: It is a wrapper around a couple of tools to check python code
  • poetry: will export automatically or poetry set up to a requirements.txt

This is the list of other hooks from the pre-commit project that we will be using:

  • end-of-file-fixer
  • trailing-whitespace
  • check-ast
  • check-json
  • check-merge-conflict
  • detect-private-key
  • pretty-format-json

All these hooks can be added to a pre-commit file `pre-commit-config.yaml`. The content of the file should be:

 ```text
    ---
    repos:
    - repo: https://github.com/psf/black
        rev: 22.6.0
        hooks:
        - id: black
    - repo: https://github.com/pre-commit/pre-commit-hooks
        rev: v4.3.0
        hooks:
        - id: end-of-file-fixer
        - id: trailing-whitespace
        - id: check-ast
        - id: check-json
        - id: check-merge-conflict
        - id: detect-private-key
        - id: pretty-format-json
            args: [--autofix]
    - repo: https://github.com/PyCQA/bandit
        rev: 1.7.4
        hooks:
        - id: bandit
    - repo: https://github.com/PyCQA/pydocstyle
        rev: 6.1.1
        hooks:
        - id: pydocstyle
            args: [--match, "(?!tests/test_).*\\.py"]
    - repo: https://github.com/pre-commit/pygrep-hooks
        rev: v1.9.0
        hooks:
        - id: python-use-type-annotations
        - id: python-no-log-warn
    - repo: https://github.com/AleksaC/hadolint-py
        rev: v2.10.0
        hooks:
        - id: hadolint
    - repo: https://github.com/PyCQA/flake8
        rev: 4.0.1
        hooks:
        - id: flake8
    - repo: https://github.com/python-poetry/poetry
        rev: 1.1.13
        hooks:
        - id: poetry-check
        - id: poetry-lock
        - id: poetry-export
            args: ["-f", "requirements.txt", "-o", "requirements.txt"]
    ```

The folder structure should show up as follows:

    ```text
        py-docker
        │   README.rst
        │   pyproject.toml
        │   pre-commit-config.yaml
        │
        └───py-docker
        │   │   __init__.py
        │
        └───tests
        │   │   __init__.py
        │   │   test_py_docker.py
        │
        ...
    ```

Now you can run all the pre-commits running:

    ```bash
        pre-commit run
    ```

NOTE: If you start to do commits after this step you will need to fix all the remarks done by the pre-commit hooks

This repository is now ready to start writing tests, enforcing code styles locally, and ready to keep track of the python dependencies using poetry.

Create our gitignore file

We will create our `.gitignore` [this file](https://git-scm.com/docs/gitignore) will include all the files that we don’t want to keep track of. These files can be logs, local file configurations or OS-generated files. The file should look like this:

 ```text
        # Byte-compiled / optimized / DLL files
        __pycache__/
        *.py[cod]
        *$py.class

        # Distribution / packaging
        .Python
        build/
        develop-eggs/
        dist/
        downloads/
        eggs/
        .eggs/
        lib/
        lib64/
        parts/
        sdist/
        var/
        wheels/
        pip-wheel-metadata/
        share/python-wheels/
        *.egg-info/
        .installed.cfg
        *.egg
        MANIFEST

        # PyInstaller
        #  Usually these files are written by a python script from a template
        #  before PyInstaller builds the exe, so as to inject date/other infos into it.
        *.manifest
        *.spec

        # Unit test / coverage reports
        htmlcov/
        .tox/
        .nox/
        .coverage
        .coverage.*
        .cache
        nosetests.xml
        coverage.xml
        *.cover
        *.py,cover
        .hypothesis/
        .pytest_cache/

        # pyenv
        .python-version

        # pipenv
        #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
        #   However, in case of collaboration, if having platform-specific dependencies or dependencies
        #   having no cross-platform support, pipenv may install dependencies that don't work, or not
        #   install all needed dependencies.
        #Pipfile.lock

        # PEP 582; used by e.g. github.com/David-OConnor/pyflow
        __pypackages__/

        # Environments
        .env
        .venv
        venv/
        ENV/
        env.bak/
        venv.bak/

        #mac file system attributes
        .DS_Store
    ```

Make sure to commit all in your initial commit

    ```bash
    git add . && git commit -m 'initial commit'
    ```

Adding python code to our repository

First, we will create a simple function that will sum up two numbers. We will document this function to comply with our pre-commit hooks. A new file `python_code.py` is needed in our py_docker folder. The content of the file should look like this:

    ```python
        def sum_numbers(num1: int, num2: int) -> int:
            """Sum two numbers.

            :param num1: Number one to sum
            :param num2: Number two to sum
            :return: Sum of the twonumbers
            :rtype: int
            """
            return num1 + num2
    ```

Creating a test for our function

Tests allow us to avoid potential issues when writing code or allow us to avoid repeating the same mistake once an issue happens. In an ideal scenario, you would normally write first the tests for your code and then write the code that will work with such tests

Now we can create a test for our function to check if the result of the function is indeed doing a sum for us:

    ```python
        def test_sum_numbers():
            """Test sum of positive numbers."""
            assert 2 == sum_numbers(1,1)  # nosec
    ```

This function can be added to the existing file `py-docker/tests/test_py_docker.py`To run all the tests you can use poetry

    ```bash
        poetry run pytest
    ```

Or use pytest directly

    ```bash
        pytest
    ```

Docker image setup

Docker is a popular container platform that will allow us to containerize our application to later push it to Artifact Registry so it can be reused for a Pod, VM, or whatever we need.

First, we need to create our [docker](https://docs.docker.com/) file. This file


    ```text
        # syntax=docker/dockerfile:1
        FROM python:3.8.13-slim-bullseye
        WORKDIR /app
        COPY ./requirements.txt /app
        COPY ./py_docker/ /app/py_docker
        RUN pip install --no-cache-dir -r /app/requirements.txt
    ```

A step by step explanation of what is going on in this file:

  1. FROM: This allows us to define the base image we will be using and the version
  2. WORKDIR: Sets the working directory for any RUN, CMD, COPY or ENTRYPOINT instruction. This allows for everything to be under /app folder
  3. COPY: Copies the content of a folder or a file to a directory target. We are moving the folder of py_docker and the `requirements.txt` file
  4. RUN: Runs a bash command, in our case we are running a pip install to install all the libraries defined in the requirements file

CI/CD Pipeline in GCP

We will now look into how to leverage the source repository and cloud build functions to publish our docker image to the artifact registry. This guide can help to set up a basic pipeline to make a continuous delivery for a docker image which later can be used as a backend service for example.

Setting up Source Repository in GCP

Source Repository is a git private solution hosted in GCP. We will use it as our git repository.

1.- We need to create a Source Repository, in order to do that you can either do it using Terraform. If you are not sure how to start using Terraform in GCP you can follow [this guide](https://gcloud.devoteam.com/blog/a-step-by-step-guide-to-set-up-a-gcp-project-to-start-using-terraform/). You can choose between clone from an existing repository or push a new repo. In my case, I will be pushing a new repo and the name of this repo will be `py-docker`.

2.- Follow the instructions in the GCP Console to push or synch your repository to the GCP Source Repository and push the whole content of the folder to Source Repository

In case you would like to create a Cloud Source Repository Manually you can run the following command:

    ```bash
        gcloud source repos create "py-docker-test" --project=PROJECT_ID
    ```

Artifact Registry

Artifact Registry is a container registry hosted in GCP. It is where our built docker image will be hosted.

In Order to enable the service, you can run the following command

    ```bash
        gcloud services enable artifactregistry.googleapis.com
    ```

We will now need to create a repository. In our case the name of the repository is `py-docker` and the location is `europe-west1`, for a full list of locations you can run the following command `gcloud artifacts locations list`. There are multiple types of [repository formats](https://cloud.google.com/artifact-registry/docs/repositories/create-repos#repo-formats) in our case it is a Docker repository

    ```bash
        gcloud artifacts repositories create py-docker \
        --repository-format=Docker \
        --location=europe-west1  \
        --description="Python docker image test"
    ```

Cloud Build set up

Cloud Build allows us to run builds in Google cloud.

To enable Cloud Build you can run the following command

  ```bash
        gcloud services enable cloudbuild.googleapis.com
    ```

The default Cloud Build Service account does not have access to Artifact Registry, we will need to grant permisions to this service account to write to the artifact registry. You will need to use your own Project_ID and Cloud Build Service Account

    ```bash
        gcloud projects add-iam-policy-binding PROJECT_ID \
        --member="serviceAccount:CLOUDBUILDSERVICEACCOUNT@cloudbuild.gserviceaccount.com" \
        --role="roles/artifactregistry.writer"
    ```

This should be run only if all the pytest tests run correctly and work so we need to add another step that runs before to run all the tests

    ```yaml
        - name: 'docker.io/library/python:3.7'
        id: Tests
        entrypoint: /bin/sh
        args: [ -c , 'pip install pytest && pip install -r requirements.txt && pytest tests/']
    ```

The Service account should be ready to push to Artifact Registry. Now we will need to create the cloud build configuration using the file `cloudbuild.yaml` with one step to build and push the docker image to the registry

    ```yaml
        steps:
        - name: 'gcr.io/cloud-builders/docker'
        id: build
        args: [ 'build', '-t', 'europe-west1-docker.pkg.dev/$PROJECT_ID/py-docker/py-docker-image:latest', '.']
        images:
        - 'europe-west1-docker.pkg.dev/$PROJECT_ID/py-docker/py-docker-image'
    ```

There should be now two steps in the cloud build configuration one is to run all the tests and the second is for pushing and building the docker image to our artifacr registry. The cloud build yaml file should look like this

    ```yaml
    steps:
    - name: 'docker.io/library/python:3.7'
    id: Test
    entrypoint: /bin/sh
    args: [ -c , 'pip install pytest && pip install -r requirements.txt && pytest tests/']
    - name: 'gcr.io/cloud-builders/docker'
    id: build
    args: [ 'build', '-t', 'europe-west1-docker.pkg.dev/$PROJECT_ID/py-docker/py-docker-image:latest', '.']
    images:
    - 'europe-west1-docker.pkg.dev/$PROJECT_ID/py-docker/py-docker-image'
    ```

Cloud source Trigger

Cloud source will allow us to have our own git repository hosted in GCP and have out-of-the-box integration with Cloud Build. Cloud Build also integrates with other git repositories services such as GitHub and GitLab. We can create a trigger to wait for a push to the master branch in our `py-docker` cloud source repo. This trigger will run what is defined in the cloudbuild.yaml

    ```bash
        gcloud beta builds triggers create cloud-source-repositories \
    --repo=py-docker --branch-pattern=master --build-config=cloudbuild.yaml --region=europe-west1 --name="py-docker-master"
    ```

Pushing to Cloud Source repository

We have a Cloud Source repository ready, a trigger waiting for a push on master branch to run all the steps defined in the cloud build file. We also have an Artifact Registry ready for Cloud Build to push an image. To push from our current git init you can run the following command:

    ```bash
    git remote add google ssh://USER@DOMAIN.com@source.developers.google.com:2022/p/PROJECT_ID/r/py-docker
    git push --all google
    ```

After pushing it should be possible to see a new cloud build job in GCP running for testing and building our docker image

Conclusion

Now we have a full CI/CD pipeline leveraging Cloud source to host our git repository. Cloud Build is set up to run our tests before building the image and pushing it to Artifact Registry. With this set up we can run our tests locally as well thanks to pytests. With the pre-commits, best practices are enforced even before a commit is made locally and pushed to the git repo.

What’s next?

Cloud Source: For cloud Source, it is not mandatory to host all the code in GCP but you can also mirror an existing repository.

Cloud Build: The push of a Docker image can be done wherever the image needs to be hosted. We can modify and extend the cloudbuild.yaml file to include other steps as well

Pytests: More tests can be added for all the code developed, this setup will try to run all the pytest tests available in the tests folder

Pre-commits: The pre-commits can be further extended or modified with parameters depending on the needs of your team

This is a baseline set up to further increase and experiment with a CI/CD pipeline, a project could potentially contain more steps in the cloud build process and more tests would be required to ensure that all your code is covered.