macaron.repo_finder package

This package contains the dependency resolvers for Java projects.

macaron.repo_finder.to_domain_from_known_purl_types(purl_type)

Return the git service domain from a known web-based purl type.

This method is used to handle cases where the purl type value is not the git domain but a pre-defined repo-based type in https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst.

Note that this method will be updated when there are new pre-defined types as per the PURL specification.

Parameters:

purl_type (str) – The type field of the PURL.

Returns:

The git service domain corresponding to the purl type or None if the purl type is unknown.

Return type:

str | None

Submodules

macaron.repo_finder.commit_finder module

This module contains the logic for matching PackageURL versions to repository commits via the tags they contain.

class macaron.repo_finder.commit_finder.AbstractPurlType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

The type represented by a PURL in terms of repositories versus artifacts.

Unsupported types are allowed as a third type.

REPOSITORY = (0,)
ARTIFACT = (1,)
UNSUPPORTED = (2,)
macaron.repo_finder.commit_finder.find_commit(git_obj, purl)

Try to find the commit matching the passed PURL.

The PURL may be a repository type, e.g. GitHub, in which case the commit might be in its version part. Otherwise, the PURL should be a package manager type, e.g. Maven, in which case the commit must be found from the artifact version.

Parameters:
  • git_obj (Git) – The repository.

  • purl (PackageURL) – The PURL of the analysis target.

Returns:

The digest, or None if the commit cannot be correctly retrieved.

Return type:

str | None

macaron.repo_finder.commit_finder.determine_abstract_purl_type(purl)

Determine if the passed purl is a repository type, artifact type, or unsupported type.

Parameters:

purl (PackageURL) – A PURL that represents a repository, artifact, or something that is not supported.

Returns:

The identified type of the PURL.

Return type:

PurlType

macaron.repo_finder.commit_finder.extract_commit_from_version(git_obj, version)

Try to extract the commit from the PURL’s version parameter.

E.g. With commit: pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c. With tag: pkg:github/apache/maven@maven-3.9.1.

Parameters:
  • git_obj (Git) – The repository.

  • version (str) – The version part from the analysis target’s PURL.

Returns:

The digest, or None if the commit cannot be correctly retrieved.

Return type:

str | None

macaron.repo_finder.commit_finder.find_commit_from_version_and_name(git_obj, name, version)

Try to find the matching commit in a repository of a given version (and name) via tags.

The passed version is used to match with the tags in the target repository. The passed name is used in cases where a repository makes use of named prefixes in its tags.

Parameters:
  • git_obj (Git) – The repository.

  • name (str) – The name of the analysis target.

  • version (str) – The version of the analysis target.

Returns:

The digest, or None if the commit cannot be correctly retrieved.

Return type:

str | None

macaron.repo_finder.commit_finder.match_tags(tag_list, name, version)

Return items of the passed tag list that match the passed artifact name and version.

Parameters:
  • tag_list (list[str]) – The list of tags to check.

  • name (str) – The name of the analysis target.

  • version (str) – The version of the analysis target.

Returns:

The list of tags that matched the pattern.

Return type:

list[str]

macaron.repo_finder.provenance_extractor module

This module contains methods for extracting repository and commit metadata from provenance files.

macaron.repo_finder.provenance_extractor.extract_repo_and_commit_from_provenance(payload)

Extract the repository and commit metadata from the passed provenance payload.

Parameters:

payload (InTotoPayload) – The payload to extract from.

Returns:

The repository URL and commit hash if found, a pair of empty strings otherwise.

Return type:

tuple[str, str]

Raises:

ProvenanceError – If the extraction process fails for any reason.

macaron.repo_finder.provenance_extractor.check_if_input_repo_provenance_conflict(repo_path_input, provenance_repo_url)

Test if the input repo and commit match the contents of the provenance.

Parameters:
  • repo_path_input (str | None) – The repo URL from input.

  • provenance_repo_url (str | None) – The repo URL from provenance.

Returns:

True if there is a conflict between the inputs, False otherwise, or if the comparison cannot be performed.

Return type:

bool

macaron.repo_finder.provenance_extractor.check_if_input_purl_provenance_conflict(git_obj, repo_path_input, digest_input, provenance_repo_url, provenance_commit_digest, purl)

Test if the input repository type PURL’s repo and commit match the contents of the provenance.

Parameters:
  • git_obj (Git) – The Git object.

  • repo_path_input (bool) – True if there is a repo as input.

  • digest_input (str) – True if there is a commit as input.

  • provenance_repo_url (str | None) – The repo url from provenance.

  • provenance_commit_digest (str | None) – The commit digest from provenance.

  • purl (PackageURL) – The input repository PURL.

Returns:

True if there is a conflict between the inputs, False otherwise, or if the comparison cannot be performed.

Return type:

bool

macaron.repo_finder.provenance_extractor.check_if_repository_purl_and_url_match(url, repo_purl)

Compare a repository PURL and URL for equality.

Parameters:
  • url (str) – The URL.

  • repo_purl (PackageURL) – A PURL that is of the repository abstract type. E.g. GitHub.

Returns:

True if the two inputs match in terms of URL netloc/domain and path.

Return type:

bool

class macaron.repo_finder.provenance_extractor.ProvenanceBuildDefinition

Bases: ABC

Abstract base class for representing provenance build definitions.

This class serves as a blueprint for various types of build definitions in provenance data. It outlines the methods and properties that derived classes must implement to handle specific build definition types.

expected_build_type: str

Determines the expected buildType field in the provenance predicate.

abstract get_build_invocation(statement)

Retrieve the build invocation information from the given statement.

This method is intended to be implemented by subclasses to extract specific invocation details from a provenance statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build invocation details. This statement contains the metadata about the build process and its associated artifacts.

Returns:

A tuple containing two elements: - The first element is the build invocation entry point (e.g., workflow name), or None if not found. - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.

Return type:

tuple[str | None, str | None]

Raises:

NotImplementedError – If the method is called directly without being overridden in a subclass.

class macaron.repo_finder.provenance_extractor.SLSAGithubGenericBuildDefinitionV01

Bases: ProvenanceBuildDefinition

Class representing the SLSA GitHub Generic Build Definition (v0.1).

This class implements the abstract methods defined in ProvenanceBuildDefinition to extract build invocation details specific to the GitHub provenance generator’s generic build type.

expected_build_type: str = 'https://github.com/slsa-framework/slsa-github-generator/generic@v1'

Determines the expected buildType field in the provenance predicate.

get_build_invocation(statement)

Retrieve the build invocation information from the given statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build invocation details. This statement contains the metadata about the build process and its associated artifacts.

Returns:

A tuple containing two elements: - The first element is the build invocation entry point (e.g., workflow name), or None if not found. - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.

Return type:

tuple[str | None, str | None]

class macaron.repo_finder.provenance_extractor.SLSAGithubActionsBuildDefinitionV1

Bases: ProvenanceBuildDefinition

Class representing the SLSA GitHub Actions Build Definition (v1).

This class implements the abstract methods from the ProvenanceBuildDefinition to extract build invocation details specific to the GitHub Actions build type.

expected_build_type: str = 'https://slsa-framework.github.io/github-actions-buildtypes/workflow/v1'

Determines the expected buildType field in the provenance predicate.

get_build_invocation(statement)

Retrieve the build invocation information from the given statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build invocation details. This statement contains the metadata about the build process and its associated artifacts.

Returns:

A tuple containing two elements: - The first element is the build invocation entry point (e.g., workflow name), or None if not found. - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.

Return type:

tuple[str | None, str | None]

class macaron.repo_finder.provenance_extractor.SLSANPMCLIBuildDefinitionV2

Bases: ProvenanceBuildDefinition

Class representing the SLSA NPM CLI Build Definition (v12).

This class implements the abstract methods from the ProvenanceBuildDefinition to extract build invocation details specific to the GitHub Actions build type.

expected_build_type: str = 'https://github.com/npm/cli/gha/v2'

Determines the expected buildType field in the provenance predicate.

get_build_invocation(statement)

Retrieve the build invocation information from the given statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build invocation details. This statement contains the metadata about the build process and its associated artifacts.

Returns:

A tuple containing two elements: - The first element is the build invocation entry point (e.g., workflow name), or None if not found. - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.

Return type:

tuple[str | None, str | None]

class macaron.repo_finder.provenance_extractor.SLSAGCBBuildDefinitionV1

Bases: ProvenanceBuildDefinition

Class representing the SLSA Google Cloud Build (GCB) Build Definition (v1).

This class implements the abstract methods from ProvenanceBuildDefinition to extract build invocation details specific to the Google Cloud Build (GCB).

expected_build_type: str = 'https://slsa-framework.github.io/gcb-buildtypes/triggered-build/v1'

Determines the expected buildType field in the provenance predicate.

get_build_invocation(statement)

Retrieve the build invocation information from the given statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build invocation details. This statement contains the metadata about the build process and its associated artifacts.

Returns:

A tuple containing two elements: - The first element is the build invocation entry point (e.g., workflow name), or None if not found. - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.

Return type:

tuple[str | None, str | None]

class macaron.repo_finder.provenance_extractor.SLSAOCIBuildDefinitionV1

Bases: ProvenanceBuildDefinition

Class representing the SLSA Oracle Cloud Infrastructure (OCI) Build Definition (v1).

This class implements the abstract methods from ProvenanceBuildDefinition to extract build invocation details specific to OCI builds.

expected_build_type: str = 'https://github.com/oracle/macaron/tree/main/src/macaron/resources/provenance-buildtypes/oci/v1'

Determines the expected buildType field in the provenance predicate.

get_build_invocation(statement)

Retrieve the build invocation information from the given statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build invocation details. This statement contains the metadata about the build process and its associated artifacts.

Returns:

A tuple containing two elements: - The first element is the build invocation entry point (e.g., workflow name), or None if not found. - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.

Return type:

tuple[str | None, str | None]

class macaron.repo_finder.provenance_extractor.WitnessGitLabBuildDefinitionV01

Bases: ProvenanceBuildDefinition

Class representing the Witness GitLab Build Definition (v0.1).

This class implements the abstract methods from ProvenanceBuildDefinition to extract build invocation details specific to GitLab.

expected_build_type: str = 'https://witness.testifysec.com/attestation-collection/v0.1'

Determines the expected buildType field in the provenance predicate.

expected_attestation_type = 'https://witness.dev/attestations/gitlab/v0.1'

Determines the expected attestations.type field in the Witness provenance predicate.

get_build_invocation(statement)

Retrieve the build invocation information from the given statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build invocation details. This statement contains the metadata about the build process and its associated artifacts.

Returns:

A tuple containing two elements: - The first element is the build invocation entry point (e.g., workflow name), or None if not found. - The second element is the invocation URL or identifier (e.g., job URL), or None if not found.

Return type:

tuple[str | None, str | None]

class macaron.repo_finder.provenance_extractor.ProvenancePredicate

Bases: object

Class providing utility methods for handling provenance predicates.

This class contains static methods for extracting information from predicates in provenance statements related to various build definitions. It serves as a helper for identifying build types and finding the appropriate build definitions based on the extracted data.

static get_build_type(statement)

Extract the build type from the provided provenance statement.

Parameters:

statement (InTotoV1Statement | InTotoV01Statement) – The provenance statement from which to extract the build type.

Returns:

The build type if found; otherwise, None.

Return type:

str | None

static find_build_def(statement)

Find the appropriate build definition class based on the extracted build type.

This method checks the provided provenance statement for its build type and returns the corresponding ProvenanceBuildDefinition subclass.

Parameters:

statement (InTotoV01Statement | InTotoV1Statement) – The provenance statement containing the build type information.

Returns:

An instance of the appropriate build definition class that matches the extracted build type.

Return type:

ProvenanceBuildDefinition

Raises:

ProvenanceError – Raised when the build definition cannot be found in the provenance statement.

macaron.repo_finder.provenance_finder module

This module contains methods for finding provenance files.

class macaron.repo_finder.provenance_finder.ProvenanceFinder

Bases: object

This class is used to find and retrieve provenance files from supported registries.

__init__()
find_provenance(purl)

Find the provenance file(s) of the passed PURL.

Parameters:

purl (PackageURL) – The PURL to find provenance for.

Returns:

The provenance payload, or an empty list if not found.

Return type:

list[InTotoPayload]

verify_provenance(purl, provenance)

Verify the passed provenance.

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • provenance (list[InTotoPayload]) – The list of provenance.

Returns:

True if the provenance could be verified, or False otherwise.

Return type:

bool

macaron.repo_finder.provenance_finder.find_npm_provenance(purl, registry)

Find and download the NPM based provenance for the passed PURL.

Two kinds of attestation can be retrieved from npm: “Provenance” and “Publish”. The “Provenance” attestation contains the important information Macaron seeks, but is not signed. The “Publish” attestation is signed. Comparison of the signed vs unsigned at the subject level, allows the unsigned to be verified. See: https://docs.npmjs.com/generating-provenance-statements

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • registry (NPMRegistry) – The npm registry to use.

Returns:

The provenance payload(s), or an empty list if not found.

Return type:

list[InTotoPayload]

macaron.repo_finder.provenance_finder.verify_npm_provenance(purl, provenance)

Compare the unsigned payload subject digest with the signed payload digest, if available.

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • provenance (list[InTotoPayload]) – The provenances to verify.

Returns:

True if the provenance was verified, or False otherwise.

Return type:

bool

macaron.repo_finder.provenance_finder.find_gav_provenance(purl, registry)

Find and download the GAV based provenance for the passed PURL.

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • registry (JFrogMavenRegistry) – The registry to use for finding.

Returns:

The provenance payload if found, or an empty list otherwise.

Return type:

list[InTotoPayload] | None

Raises:

ProvenanceAvailableException – If the discovered provenance file size exceeds the configured limit.

macaron.repo_finder.provenance_finder.find_provenance_from_ci(analyze_ctx, git_obj)

Try to find provenance from CI services of the repository.

Note that we stop going through the CI services once we encounter a CI service that does host provenance assets.

This method also loads the provenance payloads into the CIInfo object where the provenance assets are found.

Parameters:
  • analyze_ctx (AnalyzeContext) – The contenxt of the ongoing analysis.

  • git_obj (Git | None) – The Pydriller Git object representing the repository, if any.

Returns:

The provenance payload, or None if not found.

Return type:

InTotoPayload | None

macaron.repo_finder.provenance_finder.download_provenances_from_github_actions_ci_service(ci_info)

Download provenances from GitHub Actions.

Parameters:

ci_info (CIInfo,) – A CIInfo instance that holds a GitHub Actions git service object.

Return type:

None

macaron.repo_finder.repo_finder module

This module contains the logic for using/calling the different repo finders.

Input

The entry point of the repo finder depends on the type of PURL being analyzed. - If passing a PURL representing an artifact, the find_repo function in this file should be called. - If passing a PURL representing a repository, the to_repo_path function in this file should be called.

Artifact PURLs

For artifact PURLs, the PURL type determines how the repositories are searched for. Currently, for Maven PURLs, SCM meta data is retrieved from the matching POM retrieved from Maven Central (or other configured location).

For Python, .NET, Rust, and NodeJS type PURLs, Google’s Open Source Insights API is used to find the meta data.

In either case, any repository links are extracted from the meta data, then checked for validity via repo_validator::find_valid_repository_url which accepts URLs that point to a GitHub repository or similar.

Repository PURLs

For repository PURLs, the type is checked against the configured valid domains, and accepted or rejected based on that data.

Result

If all goes well, a repository URL that matches the initial artifact or repository PURL will be returned for analysis.

macaron.repo_finder.repo_finder.find_repo(purl)

Retrieve the repository URL that matches the given PURL.

Parameters:

purl (PackageURL) – The parsed PURL to convert to the repository path.

Returns:

The repository URL found for the passed package.

Return type:

str

macaron.repo_finder.repo_finder.to_repo_path(purl, available_domains)

Return the repository path from the PURL string.

This method only supports converting a PURL with the following format:

pkg:<type>/<namespace>/<name>[…]

Where type could be either: - The pre-defined repository-based PURL type as defined in https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

  • The supported git service domains (e.g. github.com) defined in available_domains.

The repository path will be generated with the following format https://<type>/<namespace>/<name>.

Parameters:
  • purl (PackageURL) – The parsed PURL to convert to the repository path.

  • available_domains (list[str]) – The list of available domains

Returns:

The URL to the repository which the PURL is referring to or None if we cannot convert it.

Return type:

str | None

macaron.repo_finder.repo_finder.find_source(purl_string, input_repo)

Perform repo and commit finding for a passed PURL, or commit finding for a passed PURL and repo.

Parameters:
  • purl_string (str) – The PURL string of the target.

  • input_repo (str | None) – The repository path optionally provided by the user.

Returns:

True if the source was found.

Return type:

bool

macaron.repo_finder.repo_finder.get_tags_via_git_remote(repo)

Retrieve all tags from a given repository using ls-remote.

Parameters:

repo (str) – The repository to perform the operation on.

Returns:

A dictionary of tags mapped to their commits, or None if the operation failed..

Return type:

dict[str]

macaron.repo_finder.repo_finder_base module

This module contains the base class for the repo finders.

class macaron.repo_finder.repo_finder_base.BaseRepoFinder

Bases: ABC

This abstract class is used to represent Repository Finders.

abstract find_repo(purl)

Generate iterator from _find_repo that attempts to retrieve a repository URL that matches the passed artifact.

Parameters:

purl (PackageURL) – The PURL of an artifact.

Returns:

The URL of the found repository.

Return type:

str

macaron.repo_finder.repo_finder_deps_dev module

This module contains the PythonRepoFinderDD class to be used for finding repositories using deps.dev.

class macaron.repo_finder.repo_finder_deps_dev.DepsDevType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: StrEnum

The package manager types supported by deps.dev.

This enum should be updated based on updates to deps.dev.

MAVEN = 'maven'
PYPI = 'pypi'
NUGET = 'nuget'
CARGO = 'cargo'
NPM = 'npm'
class macaron.repo_finder.repo_finder_deps_dev.DepsDevRepoFinder

Bases: BaseRepoFinder

This class is used to find repositories using Google’s Open Source Insights A.K.A. deps.dev.

find_repo(purl)

Attempt to retrieve a repository URL that matches the passed artifact.

Parameters:

purl (PackageURL) – The PURL of an artifact.

Returns:

The URL of the found repository.

Return type:

str

static get_project_info(project_url)

Retrieve project information from deps.dev.

Parameters:

project_url (str) – The URL of the project.

Returns:

The project information or None if the information could not be retrieved.

Return type:

dict[str, Any] | None

macaron.repo_finder.repo_finder_java module

This module contains the JavaRepoFinder class to be used for finding Java repositories.

class macaron.repo_finder.repo_finder_java.JavaRepoFinder

Bases: BaseRepoFinder

This class is used to find Java repositories.

__init__()

Initialise the Java repository finder instance.

find_repo(purl)

Attempt to retrieve a repository URL that matches the passed artifact.

Parameters:

purl (PackageURL) – The PURL of an artifact.

Yields:

str – The URL of the found repository.

Return type:

str

macaron.repo_finder.repo_utils module

This module contains the utility functions for repo and commit finder operations.

macaron.repo_finder.repo_utils.create_filename(purl)

Create the filename of the report based on the PURL.

Parameters:

purl (PackageURL) – The PackageURL of the artifact.

Returns:

The filename to save the report under.

Return type:

str

macaron.repo_finder.repo_utils.generate_report(purl, commit, repo, target_dir)

Create the report and save it to the passed directory.

Parameters:
  • purl (str) – The PackageURL of the target artifact, as a string.

  • commit (str) – The commit hash to report.

  • repo (str) – The repository to report.

  • target_dir (str) – The path of the directory where the report will be saved.

Returns:

True if the report was created. False otherwise.

Return type:

bool

macaron.repo_finder.repo_utils.create_report(purl, commit, repo)

Generate report for standalone uses of the repo / commit finder.

Parameters:
  • purl (str) – The PackageURL of the target artifact, as a string.

  • commit (str) – The commit hash to report.

  • repo (str) – The repository to report.

Returns:

The report as a JSON string.

Return type:

str

macaron.repo_finder.repo_utils.prepare_repo(target_dir, repo_path, branch_name='', digest='', purl=None)

Prepare the target repository for analysis.

If repo_path is a remote path, the target repo is cloned to {target_dir}/{unique_path}. The unique_path of a repository will depend on its remote url. For example, if given the repo_path https://github.com/org/name.git, it will be cloned to {target_dir}/github_com/org/name.

If repo_path is a local path, this method will check if repo_path resolves to a directory inside local_repos_path and to a valid git repository.

Parameters:
  • target_dir (str) – The directory where all remote repository will be cloned.

  • repo_path (str) – The path to the repository, can be either local or remote.

  • branch_name (str) – The name of the branch we want to checkout.

  • digest (str) – The hash of the commit that we want to checkout in the branch.

  • purl (PackageURL | None) – The PURL of the analysis target.

Returns:

The pydriller.Git object of the repository or None if error.

Return type:

Git | None

macaron.repo_finder.repo_utils.get_local_repos_path()

Get the local repos path from global config or use default.

If the directory does not exist, it is created.

Return type:

str

macaron.repo_finder.repo_utils.get_git_service(remote_path)

Return the git service used from the remote path.

Parameters:

remote_path (str | None) – The remote path of the repo.

Returns:

The git service derived from the remote path.

Return type:

BaseGitService

macaron.repo_finder.repo_validator module

This module exists to validate URLs in terms of their use as a repository that can be analyzed.

macaron.repo_finder.repo_validator.find_valid_repository_url(urls)

Find a valid URL from the provided URLs.

Parameters:

urls (Iterable[str]) – An Iterable object containing urls.

Returns:

The first valid URL from the iterable, or an empty string if none can be found.

Return type:

str

macaron.repo_finder.repo_validator.resolve_redirects(parsed_url)

Resolve redirecting URLs by returning the location they point to.

Parameters:

parsed_url (ParseResult) – A parsed URL.

Returns:

The resolved redirect location, or None if none was found.

Return type:

str | None