macaron.slsa_analyzer.git_service package

The git_service package contains the supported git services for Macaron.

Submodules

macaron.slsa_analyzer.git_service.api_client module

The module provides API clients for VCS services, such as GitHub.

class macaron.slsa_analyzer.git_service.api_client.GitHubReleaseAsset(name: str, url: str, size_in_bytes: int, api_client: GhAPIClient)

Bases: NamedTuple

An asset published from a GitHub Release.

name: str

The asset name.

url: str

The URL to the asset.

size_in_bytes: int

The size of the asset, in bytes.

api_client: GhAPIClient

The GitHub API client.

download(dest)

Download the asset.

Parameters:

dest (str) – The local destination where the asset is downloaded to. Note that this must include the file name.

Returns:

True if the asset is downloaded successfully; False if not.

Return type:

bool

class macaron.slsa_analyzer.git_service.api_client.BaseAPIClient

Bases: object

This is the base class for API clients.

get_latest_release(full_name)

Return the latest release for the repo.

Parameters:

full_name (str) – The full name of the repo.

Returns:

The latest release object in JSON format.

Return type:

dict

fetch_assets(release, ext='')

Return the release assets that match or empty if it doesn’t exist.

The extension is ignored if name is set.

Parameters:
  • release (dict) – The release object in JSON format.

  • ext (str) – The asset extension to find; this parameter is ignored if name is set.

Returns:

The list of release assets that match or empty if it doesn’t exist.

Return type:

list[dict]

download_asset(url, download_path)

Download the assets of the release that match the pattern (if specified).

Parameters:
  • url (dict) – The release URL.

  • download_path (str) – The path to download assets.

Returns:

Returns True if successful and False otherwise.

Return type:

bool

Return a hyperlink to the file.

Parameters:
  • full_name (str) – The full name of the repository.

  • commit_sha (str) – The sha checksum of the commit that file belongs to.

  • file_path (str) – The relative path of the file to the root dir of the repository.

Returns:

The hyperlink tag to the file.

Return type:

str

get_relative_path_of_workflow(workflow_name)

Return the relative path of the workflow from the root dir of the repo.

Parameters:

workflow_name (str) – The name of the CI configuration file.

Returns:

The relative path of the CI configuration file from the root dir of the repo.

Return type:

str

class macaron.slsa_analyzer.git_service.api_client.GhAPIClient(profile)

Bases: BaseAPIClient

This class acts as a client to use GitHub API.

See https://docs.github.com/en/rest for the GitHub API documentation.

__init__(profile)

Initialize GHSearchClient.

Parameters:

profile (dict) – The json object describes the profile to be included in each request by this client.

get_repo_workflow_data(full_name, workflow_name)

Query GitHub REST API for the information of a workflow.

The url would be in the following form: https://api.github.com/repos/{full_name}/actions/workflows/{workflow_name}

Parameters:
  • full_name (str) – The full name of the target repo in the form owner/repo.

  • workflow_name (str) – The full name of the workflow YAML file.

Returns:

The json query result or an empty dict if failed.

Return type:

dict

Examples

The following call to this method will perform a query to https://api.github.com/repos/owner/repo/actions/workflows/build.yml

get_workflow_runs(full_name, branch_name=None, created_after=None, page=1)

Query the GitHub REST API for the data of all workflow run of a repository.

The url would be in the following form: https://api.github/com/repos/{full_name}/ actions/runs?{page}&branch={branch_name}&created=>={created_after}&per_page={MAX_ITEMS_NUM}

The branch_name and commit_date parameters can be empty. MAX_ITEMS_NUM can be configured via the defaults.ini.

Parameters:
  • full_name (str) – The full name of the target repo in the form owner/repo.

  • branch_name (str | None) – The name of the branch to look for workflow runs (e.g master).

  • created_after (str) – Only look for workflow runs after this date (e.g. 2022-03-11T16:44:40Z).

  • page (int) – The page number for querying as the workflow we want to get might be in a different page (due to max limit 100 items per page).

Returns:

The json query result or an empty dict if failed.

Return type:

dict

Examples

The following call to this method will perform a query to https://api.github/com/repos/owner/repo/actions/runs?1&branch=master&created=>= 2022-03-11T16:44:40Z&per_page=100

get_workflow_run_jobs(full_name, run_id)

Query the GitHub REST API for the workflow run jobs.

The url would be in the following form: https://api.github/com/repos/{full_name}/actions/runs/<run_id>/jobs

Parameters:
  • full_name (str) – The full name of the target repo in the form owner/repo.

  • run_id (str) – The target workflow run ID.

Returns:

The json query result or an empty dict if failed.

Return type:

dict

Examples

The following call to this method will perform a query to https://api.github/com/repos/{full_name}/ actions/runs/<run_id>/jobs

get_workflow_run_for_date_time_range(full_name, datetime_range)

Query the GitHub REST API for the workflow run within a datetime range.

The url would be in the following form: https://api.github.com/repos/{full_name}/actions/runs?create=datetime-range

Parameters:
  • full_name (str) – The full name of the target repo in the form owner/repo.

  • datetime_range (str) – The datetime range to query.

Returns:

The json query result or an empty dict if failed.

Return type:

dict

Examples

The following call to this method will perform a query to https://api.github/com/repos/owner/repo/actions/runs?created=2022-11-05T20:38:40..2022-11-05T20:38:58

get_commit_data_from_hash(full_name, commit_hash)

Query the GitHub API for the data of a commit using the hash for that commit.

The url would be in the following form: https://api.github.com/repos/{full_name}/commits/{commit_hash}

Parameters:
  • full_name (str) – The full name of the repository in the format {owner/name}.

  • commit_hash (str) – The sha commit hash of the target commit.

Returns:

The json query result or an empty dict if failed.

Return type:

dict

Examples

The following call to this method will perform a query to: https://api.github.com/repos/owner/repo/commits/6dcb09b5b57875f334f61aebed695e2e4193db5e

gh_client.get_commit_data_from_hash(
    full_name="owner/repo",
    commit_hash="6dcb09b5b57875f334f61aebed695e2e4193db5e",
)
search(target, query)

Perform a search using GitHub REST API.

This query is at endpoint: api.github.com/search/{target}?{query}

Parameters:
  • target (str) – The search target.

  • query (str) – The query string.

Returns:

The json query result or an empty dict if failed.

Return type:

dict

Examples

The following call to this method will perform a query to: https://api.github.com/search/code?q=addClass+in:file+language:js+repo:jquery/jquery

gh_client.search(
    target="repositories",
    query="q=addClass+in:file+language:js+repo:jquery/jquery",
)
get(url)

Perform a GET request to the given URL.

Parameters:

url (str) – The url to send the GET request.

Returns:

The json query result or an empty dict if failed.

Return type:

dict

get_job_build_log(log_url)

Download and return the build log indicated at log_url.

Parameters:

log_url (str) – The link to get the build log from GitHub API.

Returns:

The whole build log in str.

Return type:

str

get_repo_data(full_name)

Get the repo data using GitHub REST API.

The query is at endpoint: api.github.com/repos/{full_name}

Parameters:

full_name (str) – The full name of the repository in the format {owner/name}.

Returns:

The json query result or an empty dict if failed.

Return type:

dict

Examples

To get the repo data from repository apache/maven:

gh_client.get_repo_data("apache/maven")

Return a GitHub hyperlink tag or just a link to the file.

The format for the link is https://github.com/{full_name}/blob/{digest}/{file_path}. The path of the file is relative to the root dir of the repository. The commit sha must be in full form.

Parameters:
  • full_name (str) – The full name of the repository in the format {owner/name}.

  • commit_sha (str) – The sha checksum of the commit that file belongs to.

  • file_path (str) – The relative path of the file to the root dir of the repository.

Returns:

The hyperlink tag to the file.

Return type:

str

Examples

>>> api_client = GhAPIClient(profile={"headers": "", "query": []})
>>> api_client.get_file_link("owner/repo", "5aaaaa43caabbdbc26c254df8f3aaa7bb3f4ec01", ".travis_ci.yml")
'https://github.com/owner/repo/blob/5aaaaa43caabbdbc26c254df8f3aaa7bb3f4ec01/.travis_ci.yml'
get_relative_path_of_workflow(workflow_name)

Return the relative path of the workflow from the root dir of the repo.

Parameters:

workflow_name (str) – The name of the yaml Gh Action workflow.

Returns:

The relative path of the workflow from the root dir of the repo.

Return type:

str

Examples

>>> api_client = GhAPIClient(profile={"headers": "", "query": []})
>>> api_client.get_relative_path_of_workflow("build.yaml")
'.github/workflows/build.yaml'
get_release_by_tag(full_name, tag)

Return the release of the passed tag.

Parameters:
  • full_name (str) – The full name of the repo.

  • tag (str) – The tag being analyzed.

Returns:

The release object in JSON format, or None if not found.

Return type:

dict | None

get_latest_release(full_name)

Return the latest release for the repo.

Parameters:

full_name (str) – The full name of the repo.

Returns:

The latest release object in JSON format. Schema: https://docs.github.com/en/rest/releases/releases?apiVersion=2022-11-28#get-the-latest-release.

Return type:

dict

fetch_assets(release, ext='')

Return the release assets that match or empty if it doesn’t exist.

The extension is ignored if name is set.

Parameters:
Returns:

A sequence of release assets.

Return type:

Sequence[AssetLocator]

download_asset(url, download_path)

Download the assets of the release that match the pattern (if specified).

Parameters:
  • url (dict) – The release URL.

  • download_path (str) – The path to download assets.

Returns:

Returns True if successful and False otherwise.

Return type:

bool

macaron.slsa_analyzer.git_service.api_client.get_default_gh_client(access_token)

Return a GhAPIClient instance with default values.

Parameters:

access_token (str) – The GitHub personal access token

Return type:

GhAPIClient

macaron.slsa_analyzer.git_service.base_git_service module

This module contains the BaseGitService class to be inherited by a git service.

class macaron.slsa_analyzer.git_service.base_git_service.BaseGitService(name)

Bases: object

This abstract class is used to implement git services.

__init__(name)

Initialize instance.

Parameters:

name (str) – The name of the git service.

abstract load_defaults()

Load the values for this git service from the ini configuration.

Return type:

None

load_hostname(section_name)

Load the hostname of the git service from the ini configuration section section_name.

The section may or may not be available in the configuration. In both cases, the method should not raise ConfigurationError.

Meanwhile, if the section is present but there is a schema violation (e.g. a key such as hostname is missing), this method will raise a ConfigurationError.

Parameters:

section_name (str) – The name of the git service section in the ini configuration file.

Returns:

The hostname. This can be None if the git service section is not found in the ini configuration file, meaning the user does not enable the corresponding git service.

Return type:

str | None

Raises:

ConfigurationError – If there is a schema violation in the git service section.

is_detected(url)

Check if the remote repo at the given url is hosted on this git service.

This check is done by checking the URL of the repo against the hostname of this git service.

Parameters:

url (str) – The url of the remote repo.

Returns:

True if the repo is indeed hosted on this git service.

Return type:

bool

abstract clone_repo(clone_dir, url)

Clone a repository.

Parameters:
  • clone_dir (str) – The name of the directory to clone into. This is equivalent to the <directory> argument of git clone.

  • url (str) – The url to the repository.

Raises:

CloneError – If there is an error cloning the repo.

Return type:

None

abstract check_out_repo(git_obj, branch, digest, offline_mode)

Checkout the branch and commit specified by the user of a repository.

Parameters:
  • git_obj (Git) – The Git object for the repository to check out.

  • branch (str) – The branch to check out.

  • digest (str) – The sha of the commit to check out.

  • offline_mode (bool) – If true, no fetching is performed.

Returns:

The same Git object from the input.

Return type:

Git

Raises:

RepoError – If there is an error while checking out the specific branch or commit.

class macaron.slsa_analyzer.git_service.base_git_service.NoneGitService

Bases: BaseGitService

This class can be used to initialize an empty git service.

__init__()

Initialize instance.

load_defaults()

Load the values for this git service from the ini configuration.

In this particular case, since this class represents a None git service, we do nothing.

Return type:

None

is_detected(url)

Return True if the remote repo is using this git service.

Parameters:

url (str) – The url of the remote repo.

Returns:

True if this git service is detected else False.

Return type:

bool

clone_repo(_clone_dir, url)

Clone a repo.

In this particular case, since this class represents a None git service, we do nothing but raise a CloneError.

Raises:

CloneError – Always raise, since this method should not be used to clone any repository.

Return type:

None

check_out_repo(git_obj, branch, digest, offline_mode)

Checkout the branch and commit specified by the user of a repository.

In this particular case, since this class represents a None git service, we do nothing but raise a RepoError.

Raises:

RepoError – Always raise, since this method should not be used to check out in any repository.

Return type:

Git

macaron.slsa_analyzer.git_service.bitbucket module

This module contains the spec for the BitBucket service.

class macaron.slsa_analyzer.git_service.bitbucket.BitBucket

Bases: BaseGitService

This class contains the spec of the BitBucket service.

__init__()

Initialize instance.

load_defaults()

Load the values for this git service from the ini configuration.

Return type:

None

clone_repo(_clone_dir, _url)

Clone a BitBucket repo.

Return type:

None

check_out_repo(git_obj, branch, digest, offline_mode)

Checkout the branch and commit specified by the user of a repository.

Return type:

Git

macaron.slsa_analyzer.git_service.github module

This module contains the spec for the GitHub service.

class macaron.slsa_analyzer.git_service.github.GitHub

Bases: BaseGitService

This class contains the spec of the GitHub service.

__init__()

Initialize instance.

load_defaults()

Load the values for this git service from the ini configuration and environment variables.

Raises:

ConfigurationError – If there is an error loading the configuration.

Return type:

None

property api_client: GhAPIClient

Return the API client used for querying GitHub API.

This API is used to check if a GitHub repo can be cloned.

clone_repo(clone_dir, url)

Clone a GitHub repository.

Return type:

None

clone_dir: str

The name of the directory to clone into. This is equivalent to the <directory> argument of git clone. The url to the repository.

Raises:

CloneError – If there is an error cloning the repo.

check_out_repo(git_obj, branch, digest, offline_mode)

Checkout the branch and commit specified by the user of a repository.

Parameters:
  • git_obj (Git) – The Git object for the repository to check out.

  • branch (str) – The branch to check out.

  • digest (str) – The sha of the commit to check out.

  • offline_mode (bool) – If true, no fetching is performed.

Returns:

The same Git object from the input.

Return type:

Git

Raises:

RepoError – If there is error while checkout the specific branch and digest.

macaron.slsa_analyzer.git_service.gitlab module

This module contains the spec for the GitLab service.

Note: We are making the assumption that we are only supporting two different GitLab services: one is called publicly_hosted and the other is called self_hosted.

The corresponding access tokens are stored in the environment variables MCN_GITLAB_TOKEN and MCN_SELF_HOSTED_GITLAB_TOKEN, respectively.

Reason for this is mostly because of our assumption that Macaron is used as a container. Fixing static names for the environment variables allows for easier propagation of these variables into the container.

In the ini configuration file, settings for the publicly_hosted GitLab service is in the [git_service.gitlab.publicly_hosted] section; settings for the self_hosted GitLab service is in the [git_service.gitlab.self_hosted] section.

class macaron.slsa_analyzer.git_service.gitlab.GitLab(token_function)

Bases: BaseGitService

This class contains the spec of the GitLab service.

__init__(token_function)

Initialize instance.

Parameters:

token_function (Callable[[], str]) – A function that returns a token when called.

abstract load_defaults()

Load the .ini configuration.

Return type:

None

construct_clone_url(url)

Construct a clone URL for GitLab, with or without access token.

Parameters:

url (str) – The URL of the repository to be cloned.

Returns:

The URL that is actually used for cloning, containing the access token. See GitLab documentation: https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html#clone-using-a-token.

Return type:

str

Raises:

CloneError – If there is an error parsing the URL.

clone_repo(clone_dir, url)

Clone a repository.

To clone a GitLab repository with access token, we embed the access token in the https URL. See GitLab documentation: https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html#clone-using-a-token.

If we clone using the https URL with the token embedded, this URL will be stored as plain text in .git/config as the origin remote URL. Therefore, after a repository is cloned, this remote origin URL will be set with the value of the original url (which does not have the embedded token).

Parameters:
  • clone_dir (str) – The name of the directory to clone into. This is equivalent to the <directory> argument of git clone.

  • url (str) – The url to the GitLab repository.

Raises:

CloneError – If there is an error cloning the repository.

Return type:

None

check_out_repo(git_obj, branch, digest, offline_mode)

Checkout the branch and commit specified by the user of a repository.

For GitLab, this method set the origin remote URL of the target repository to the token-embedded URL if a token is available before performing the checkout operation.

After the checkout operation finishes, the origin remote URL is set back again to ensure that no token-embedded URL remains.

Parameters:
  • git_obj (Git) – The Git object for the repository to check out.

  • branch (str) – The branch to check out.

  • digest (str) – The sha of the commit to check out.

  • offline_mode (bool) – If true, no fetching is performed.

Returns:

The same Git object from the input.

Return type:

Git

Raises:

RepoCheckOutError – If there is error while checkout the specific branch and digest.

class macaron.slsa_analyzer.git_service.gitlab.SelfHostedGitLab

Bases: GitLab

The self-hosted GitLab instance.

__init__()

Initialize instance.

load_defaults()

Load the values for this git service from the ini configuration and environment variables.

In this case, the environment variable MCN_SELF_HOSTED_GITLAB_TOKEN holding the access token for the private GitLab service is expected.

Raises:

ConfigurationError – If there is an error loading the configuration.

Return type:

None

class macaron.slsa_analyzer.git_service.gitlab.PubliclyHostedGitLab

Bases: GitLab

The publicly-hosted GitLab instance.

__init__()

Initialize instance.

load_defaults()

Load the values for this git service from the ini configuration and environment variables.

In this case, the environment variable MCN_GITLAB_TOKEN holding the access token for the public GitLab service is optional.

Raises:

ConfigurationError – If there is an error loading the configuration.

Return type:

None

macaron.slsa_analyzer.git_service.local_repo_git_service module

This module contains the spec for the local repo git service.

class macaron.slsa_analyzer.git_service.local_repo_git_service.LocalRepoGitService

Bases: BaseGitService

This class contains the spec of the local repo git service.

__init__()

Initialize instance.

load_defaults()

Load the values for this git service from the ini configuration.

Return type:

None

clone_repo(_clone_dir, _url)

Cloning from a local repo git service is not supported.

Return type:

None

check_out_repo(git_obj, branch, digest, offline_mode)

Checkout the branch and commit specified by the user of a repository.

Return type:

Git