macaron.repo_finder package

This package contains the dependency resolvers for Java projects.

Submodules

macaron.repo_finder.commit_finder module

This module contains the logic for matching PackageURL versions to repository commits via the tags they contain.

class macaron.repo_finder.commit_finder.AbstractPurlType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

The type represented by a PURL in terms of repositories versus artifacts.

Unsupported types are allowed as a third type.

REPOSITORY = (0,)
ARTIFACT = (1,)
UNSUPPORTED = (2,)
macaron.repo_finder.commit_finder.find_commit(git_obj, purl)

Try to find the commit matching the passed PURL.

The PURL may be a repository type, e.g. GitHub, in which case the commit might be in its version part. Otherwise, the PURL should be a package manager type, e.g. Maven, in which case the commit must be found from the artifact version.

Parameters:
  • git_obj (Git) – The repository.

  • purl (PackageURL) – The PURL of the analysis target.

Returns:

The digest, or None if the commit cannot be correctly retrieved.

Return type:

str | None

macaron.repo_finder.commit_finder.determine_abstract_purl_type(purl)

Determine if the passed purl is a repository type, artifact type, or unsupported type.

Parameters:

purl (PackageURL) – A PURL that represents a repository, artifact, or something that is not supported.

Returns:

The identified type of the PURL.

Return type:

PurlType

macaron.repo_finder.commit_finder.extract_commit_from_version(git_obj, version)

Try to extract the commit from the PURL’s version parameter.

E.g. With commit: pkg:github/package-url/purl-spec@244fd47e07d1004f0aed9c. With tag: pkg:github/apache/maven@maven-3.9.1.

Parameters:
  • git_obj (Git) – The repository.

  • version (str) – The version part from the analysis target’s PURL.

Returns:

The digest, or None if the commit cannot be correctly retrieved.

Return type:

str | None

macaron.repo_finder.commit_finder.find_commit_from_version_and_name(git_obj, name, version)

Try to find the matching commit in a repository of a given version (and name) via tags.

The passed version is used to match with the tags in the target repository. The passed name is used in cases where a repository makes use of named prefixes in its tags.

Parameters:
  • git_obj (Git) – The repository.

  • name (str) – The name of the analysis target.

  • version (str) – The version of the analysis target.

Returns:

The digest, or None if the commit cannot be correctly retrieved.

Return type:

str | None

macaron.repo_finder.commit_finder.match_tags(tag_list, name, version)

Return items of the passed tag list that match the passed artifact name and version.

Parameters:
  • tag_list (list[str]) – The list of tags to check.

  • name (str) – The name of the analysis target.

  • version (str) – The version of the analysis target.

Returns:

The list of tags that matched the pattern.

Return type:

list[str]

macaron.repo_finder.provenance_extractor module

This module contains methods for extracting repository and commit metadata from provenance files.

macaron.repo_finder.provenance_extractor.extract_repo_and_commit_from_provenance(payload)

Extract the repository and commit metadata from the passed provenance payload.

Parameters:

payload (InTotoPayload) – The payload to extract from.

Returns:

The repository URL and commit hash if found, a pair of empty strings otherwise.

Return type:

tuple[str, str]

Raises:

ProvenanceError – If the extraction process fails for any reason.

macaron.repo_finder.provenance_extractor.check_if_input_repo_provenance_conflict(repo_path_input, provenance_repo_url)

Test if the input repo and commit match the contents of the provenance.

Parameters:
  • repo_path_input (str | None) – The repo URL from input.

  • provenance_repo_url (str | None) – The repo URL from provenance.

Returns:

True if there is a conflict between the inputs, False otherwise, or if the comparison cannot be performed.

Return type:

bool

macaron.repo_finder.provenance_extractor.check_if_input_purl_provenance_conflict(git_obj, repo_path_input, digest_input, provenance_repo_url, provenance_commit_digest, purl)

Test if the input repository type PURL’s repo and commit match the contents of the provenance.

Parameters:
  • git_obj (Git) – The Git object.

  • repo_path_input (bool) – True if there is a repo as input.

  • digest_input (str) – True if there is a commit as input.

  • provenance_repo_url (str | None) – The repo url from provenance.

  • provenance_commit_digest (str | None) – The commit digest from provenance.

  • purl (PackageURL) – The input repository PURL.

Returns:

True if there is a conflict between the inputs, False otherwise, or if the comparison cannot be performed.

Return type:

bool

macaron.repo_finder.provenance_extractor.check_if_repository_purl_and_url_match(url, repo_purl)

Compare a repository PURL and URL for equality.

Parameters:
  • url (str) – The URL.

  • repo_purl (PackageURL) – A PURL that is of the repository abstract type. E.g. GitHub.

Returns:

True if the two inputs match in terms of URL netloc/domain and path.

Return type:

bool

macaron.repo_finder.provenance_finder module

This module contains methods for finding provenance files.

class macaron.repo_finder.provenance_finder.ProvenanceFinder

Bases: object

This class is used to find and retrieve provenance files from supported registries.

__init__()
find_provenance(purl)

Find the provenance file(s) of the passed PURL.

Parameters:

purl (PackageURL) – The PURL to find provenance for.

Returns:

The provenance payload, or an empty list if not found.

Return type:

list[InTotoPayload]

verify_provenance(purl, provenance)

Verify the passed provenance.

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • provenance (list[InTotoPayload]) – The list of provenance.

Returns:

True if the provenance could be verified, or False otherwise.

Return type:

bool

macaron.repo_finder.provenance_finder.find_npm_provenance(purl, registry)

Find and download the NPM based provenance for the passed PURL.

Two kinds of attestation can be retrieved from npm: “Provenance” and “Publish”. The “Provenance” attestation contains the important information Macaron seeks, but is not signed. The “Publish” attestation is signed. Comparison of the signed vs unsigned at the subject level, allows the unsigned to be verified. See: https://docs.npmjs.com/generating-provenance-statements

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • registry (NPMRegistry) – The npm registry to use.

Returns:

The provenance payload(s), or an empty list if not found.

Return type:

list[InTotoPayload]

macaron.repo_finder.provenance_finder.verify_npm_provenance(purl, provenance)

Compare the unsigned payload subject digest with the signed payload digest, if available.

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • provenance (list[InTotoPayload]) – The provenances to verify.

Returns:

True if the provenance was verified, or False otherwise.

Return type:

bool

macaron.repo_finder.provenance_finder.find_gav_provenance(purl, registry)

Find and download the GAV based provenance for the passed PURL.

Parameters:
  • purl (PackageURL) – The PURL of the analysis target.

  • registry (JFrogMavenRegistry) – The registry to use for finding.

Returns:

The provenance payload if found, or an empty list otherwise.

Return type:

list[InTotoPayload] | None

Raises:

ProvenanceAvailableException – If the discovered provenance file size exceeds the configured limit.

macaron.repo_finder.provenance_finder.find_provenance_from_ci(analyze_ctx, git_obj)

Try to find provenance from CI services of the repository.

Note that we stop going through the CI services once we encounter a CI service that does host provenance assets.

This method also loads the provenance payloads into the CIInfo object where the provenance assets are found.

Parameters:
  • analyze_ctx (AnalyzeContext) – The contenxt of the ongoing analysis.

  • git_obj (Git | None) – The Pydriller Git object representing the repository, if any.

Returns:

The provenance payload, or None if not found.

Return type:

InTotoPayload | None

macaron.repo_finder.provenance_finder.download_provenances_from_github_actions_ci_service(ci_info)

Download provenances from GitHub Actions.

Parameters:

ci_info (CIInfo,) – A CIInfo instance that holds a GitHub Actions git service object.

Return type:

None

macaron.repo_finder.repo_finder module

This module contains the logic for using/calling the different repo finders.

Input

The entry point of the repo finder depends on the type of PURL being analyzed. - If passing a PURL representing an artifact, the find_repo function in this file should be called. - If passing a PURL representing a repository, the to_repo_path function in this file should be called.

Artifact PURLs

For artifact PURLs, the PURL type determines how the repositories are searched for. Currently, for Maven PURLs, SCM meta data is retrieved from the matching POM retrieved from Maven Central (or other configured location).

For Python, .NET, Rust, and NodeJS type PURLs, Google’s Open Source Insights API is used to find the meta data.

In either case, any repository links are extracted from the meta data, then checked for validity via repo_validator::find_valid_repository_url which accepts URLs that point to a GitHub repository or similar.

Repository PURLs

For repository PURLs, the type is checked against the configured valid domains, and accepted or rejected based on that data.

Result

If all goes well, a repository URL that matches the initial artifact or repository PURL will be returned for analysis.

macaron.repo_finder.repo_finder.find_repo(purl)

Retrieve the repository URL that matches the given PURL.

Parameters:

purl (PackageURL) – The parsed PURL to convert to the repository path.

Returns:

The repository URL found for the passed package.

Return type:

str

macaron.repo_finder.repo_finder.to_domain_from_known_purl_types(purl_type)

Return the git service domain from a known web-based purl type.

This method is used to handle cases where the purl type value is not the git domain but a pre-defined repo-based type in https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst.

Note that this method will be updated when there are new pre-defined types as per the PURL specification.

Parameters:

purl_type (str) – The type field of the PURL.

Returns:

The git service domain corresponding to the purl type or None if the purl type is unknown.

Return type:

str | None

macaron.repo_finder.repo_finder.to_repo_path(purl, available_domains)

Return the repository path from the PURL string.

This method only supports converting a PURL with the following format:

pkg:<type>/<namespace>/<name>[…]

Where type could be either: - The pre-defined repository-based PURL type as defined in https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst

  • The supported git service domains (e.g. github.com) defined in available_domains.

The repository path will be generated with the following format https://<type>/<namespace>/<name>.

Parameters:
  • purl (PackageURL) – The parsed PURL to convert to the repository path.

  • available_domains (list[str]) – The list of available domains

Returns:

The URL to the repository which the PURL is referring to or None if we cannot convert it.

Return type:

str | None

macaron.repo_finder.repo_finder_base module

This module contains the base class for the repo finders.

class macaron.repo_finder.repo_finder_base.BaseRepoFinder

Bases: ABC

This abstract class is used to represent Repository Finders.

abstract find_repo(purl)

Generate iterator from _find_repo that attempts to retrieve a repository URL that matches the passed artifact.

Parameters:

purl (PackageURL) – The PURL of an artifact.

Returns:

The URL of the found repository.

Return type:

str

macaron.repo_finder.repo_finder_deps_dev module

This module contains the PythonRepoFinderDD class to be used for finding repositories using deps.dev.

class macaron.repo_finder.repo_finder_deps_dev.DepsDevType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: StrEnum

The package manager types supported by deps.dev.

This enum should be updated based on updates to deps.dev.

MAVEN = 'maven'
PYPI = 'pypi'
NUGET = 'nuget'
CARGO = 'cargo'
NPM = 'npm'
class macaron.repo_finder.repo_finder_deps_dev.DepsDevRepoFinder

Bases: BaseRepoFinder

This class is used to find repositories using Google’s Open Source Insights A.K.A. deps.dev.

find_repo(purl)

Attempt to retrieve a repository URL that matches the passed artifact.

Parameters:

purl (PackageURL) – The PURL of an artifact.

Returns:

The URL of the found repository.

Return type:

str

macaron.repo_finder.repo_finder_java module

This module contains the JavaRepoFinder class to be used for finding Java repositories.

class macaron.repo_finder.repo_finder_java.JavaRepoFinder

Bases: BaseRepoFinder

This class is used to find Java repositories.

__init__()

Initialise the Java repository finder instance.

find_repo(purl)

Attempt to retrieve a repository URL that matches the passed artifact.

Parameters:

purl (PackageURL) – The PURL of an artifact.

Yields:

str – The URL of the found repository.

Return type:

str

macaron.repo_finder.repo_validator module

This module exists to validate URLs in terms of their use as a repository that can be analyzed.

macaron.repo_finder.repo_validator.find_valid_repository_url(urls)

Find a valid URL from the provided URLs.

Parameters:

urls (Iterable[str]) – An Iterable object containing urls.

Returns:

The first valid URL from the iterable, or an empty string if none can be found.

Return type:

str

macaron.repo_finder.repo_validator.resolve_redirects(parsed_url)

Resolve redirecting URLs by returning the location they point to.

Parameters:

parsed_url (ParseResult) – A parsed URL.

Returns:

The resolved redirect location, or None if none was found.

Return type:

str | None