Macaron Developer’s Guide

To get started with contributing to Macaron, see the CONTRIBUTING page.

To follow the project’s code style, see the Macaron Style Guide page.

For API reference, see the API Reference page.

Writing a New Check

Contributors to Macaron are very likely to need to write a new check or modify an existing one at some point. In this section, we will explain how Macaron checks work. We will also show how to develop a new check.

High-level Design

Before jumping into coding, it is useful to understand how Macaron as a framework works. Macaron is an extensible framework designed to make writing new supply chain security analyses easy. It provides an interface that you can leverage to access existing models and abstractions instead of implementing everything from scratch. For instance, many security checks require traversing through the code in GitHub Actions configurations. Normally, you would need to find the right repository and commit, clone it, find the workflows, and parse them. With Macaron, you don’t need to do any of that and can simply write your security check by using the parsed shell scripts that are triggered in the CI.

Another important aspect of our design is that all the check results are automatically mapped and stored in a local database. By performing this mapping, we make it possible to enforce use case-specific policies on the results of the checks. While storing the check results in the database happens automatically in Macaron’s backend, the developer needs to add a brief specification to make that possible as we will see later.

Once you get familiar with writing a basic check, you can explore the check dependency feature in Macaron. The checks in our framework can be customized to only run if another check has run and returned a specific result type. This feature can be used when checks have an ordering and a parent-child relationship, i.e., one check implements a weaker or stronger version of a security property in a parent check. Therefore, it might make sense to skip running the check and report a result type based on the result of the parent check.

The Check Interface

Each check needs to be implemented as a Python class in a Python module under src/macaron/slsa_analyzer/checks. A check class should subclass the BaseCheck class. The name of the source file containing the check should end with _check.py.

The main logic of a check should be implemented in the run_check abstract method. It is important to understand the input parameters and output objects computed by this method.

Input Parameters

The run_check method is a callback called by our checker framework. The framework pre-computes a context object, ctx: AnalyzeContext and makes it available as the input parameter to the function. The ctx object contains various intermediate representations and models as the input parameter. Most likely, you will need to use the following properties:

The component object acts as a representation of a software component and contains data, such as it’s corresponding Repository and dependencies. Note that component will also be stored in the database and its attributes, such as repository are established as database relationships. You can see the existing tables and their relationships in our data model.

The dynamic_data property would be particularly useful as it contains data about the CI service, artifact registry, and build tool used for building the software component. Note that this object is a shared state among checks. If a check runs before another check, it can make changes to this object, which will be accessible to the checks run subsequently.

Output

The run_check method returns a CheckResultData object. This object consists of result_tables and result_type. The result_tables object is the list of facts generated from the check. The result_type value shows the final result type of the check.

Example

In this example, we show how to add a check to determine if a software component has a source-code repository. Note that this is a simple example to just demonstrate how to add a check from scratch. Feel free to explore other existing checks under src/macaron/slsa_analyzer/checks for more examples.

As discussed earlier, each check needs to be implemented as a Python class in a Python module under src/macaron/slsa_analyzer/checks. A check class should subclass the BaseCheck class.

Create a module

First create a module called repo_check.py under src/macaron/slsa_analyzer/checks.

Add a class for the database

Add a class that subclasses CheckFacts to map your outputs to a table in the database. The class name should follow the <MyCheck>Facts pattern.
Specify the table name in the __tablename__ class variable. Note that the table name should start with _ and it should not have been used by other checks.
Add the id column as the primary key where the foreign key is _check_facts.id.
Add columns for the check outputs that you would like to store in the database. If a column needs to appear as a justification in the HTML/JSON report, pass info={"justification": JustificationType.<TEXT or HREF>} to the column mapper.
Add __mapper_args__ class variable and set "polymorphic_identity" key to the table name.

# Add this line at the top of the file to create the logger object if you plan to use it.
logger: logging.Logger = logging.getLogger(__name__)


class RepoCheckFacts(CheckFacts):
    """The ORM mapping for justifications in the check repository check."""

    __tablename__ = "_repo_check"

    #: The primary key.
    id: Mapped[int] = mapped_column(ForeignKey("_check_facts.id"), primary_key=True)

    #: The Git repository path.
    git_repo: Mapped[str] = mapped_column(String, nullable=True, info={"justification": JustificationType.HREF})

    __mapper_args__ = {
        "polymorphic_identity": "_repo_check",
    }

Add the check class

Add a class for your check that subclasses BaseCheck, provide the check details in the initializer method, and implement the logic of the check in run_check.

A check_id should match the ^mcn_([a-z]+_)+([0-9]+)$ regular expression, which means it should meet the following requirements:

The general format: mcn_<name>_<digits>.

Use lowercase alphabetical letters in name. If name contains multiple words, they must be separated by underscores.

You can set the depends_on attribute in the initializer method to declare such dependencies. In this example, we leave this list empty.

class RepoCheck(BaseCheck):
    """This Check checks whether the target software component has a source-code repository."""

    def __init__(self) -> None:
        """Initialize instance."""
        check_id = "mcn_repo_exists_1"
        description = "Check whether the target software component has a source-code repository."
        depends_on: list[tuple[str, CheckResultType]] = []  # This check doesn't depend on any other checks.
        eval_reqs = [
            ReqName.VCS
        ]  # Choose a SLSA requirement that roughly matches this check from the ReqName enum class.
        super().__init__(check_id=check_id, description=description, depends_on=depends_on, eval_reqs=eval_reqs)

    def run_check(self, ctx: AnalyzeContext) -> CheckResultData:
        """Implement the check in this method.

        Parameters
        ----------
        ctx : AnalyzeContext
              The object containing processed data for the target software component.

        Returns
        -------
        CheckResultData
              The result of the check.
        """
        if not ctx.component.repository:
            logger.info("Unable to find a Git repository for %s", ctx.component.purl)
            # We do not store any results in the database if a check fails. So, just leave result_tables empty.
            return CheckResultData(result_tables=[], result_type=CheckResultType.FAILED)

        return CheckResultData(
            result_tables=[RepoCheckFacts(git_repo=ctx.component.repository.remote_path, confidence=Confidence.HIGH)],
            result_type=CheckResultType.PASSED,
        )

As you can see, the result of the check is returned via the CheckResultData object. You should specify a Confidence score choosing one of the Confidence enum values, e.g., Confidence.HIGH and pass it via keyword argument confidence. You should choose a suitable confidence score based on the accuracy of your check analysis.

Register your check

Finally, you need to register your check by adding it to the registry module at the end of your check module:

registry.register(RepoCheck())

Test your check

Finally, you can add tests for you check. We utilize two types of tests: unit tests, and integration tests.

For unit tests, you can add a tests/slsa_analyzer/checks/test_repo_check.py module. Macaron uses pytest and hypothesis for unit testing. Take a look at other tests for inspiration!

For integration tests, please refer to the README file under tests/integration for further instructions and have a look at our existing integration test cases if you need some examples.

Updating the database diagram

Macaron uses a visual representation of its database to better help developers understand the relationships between the tables within it. This diagram is created using the eralchemy2 an entity relation diagrams generator Python library. When modifications have been made to Macaron’s database, the representative diagram needs to be regenerated to match. This can be done using the following command:

eralchemy2 -i 'sqlite:///<path_to_output>/macaron.db' -o er-diagram.svg

Where <path_to_output> is the location of Macaron’s output folder. The resulting diagram can then replace the previous version found at docs/source/assets/er-diagram.svg