macaron.slsa_analyzer package
Subpackages
- macaron.slsa_analyzer.asset package
- macaron.slsa_analyzer.build_tool package
- macaron.slsa_analyzer.checks package
- macaron.slsa_analyzer.ci_service package
- macaron.slsa_analyzer.git_service package
- macaron.slsa_analyzer.package_registry package
- macaron.slsa_analyzer.provenance package
- macaron.slsa_analyzer.specs package
Submodules
macaron.slsa_analyzer.analyze_context module
This module contains the Analyze Context class.
The AnalyzeContext is used to store the data of the repository being analyzed.
- class macaron.slsa_analyzer.analyze_context.ChecksOutputs
Bases:
TypedDict
Data computed at runtime by checks.
-
git_service:
BaseGitService
The git service information for the target software component.
-
repo_verification:
list
[RepositoryVerificationResult
] The repository verification info.
-
is_inferred_prov:
bool
True if we cannot find the provenance and Macaron need to infer the provenance.
-
expectation:
Expectation
|None
The expectation to verify the provenance for the target software component.
-
package_registries:
list
[PackageRegistryInfo
] The package registries for the target software component.
-
provenance:
InTotoV01Payload
|InTotoV1Payload
|None
The provenance payload for the target software component.
-
git_service:
- class macaron.slsa_analyzer.analyze_context.AnalyzeContext(component, macaron_path='', output_dir='')
Bases:
object
This class contains data of the current analyzed repository.
- __init__(component, macaron_path='', output_dir='')
Initialize instance.
- property component: Component
Return the object associated with a target software component.
This property contains the information about a software component, such as it’s corresponding repository and dependencies.
- Return type:
- property dynamic_data: ChecksOutputs
Return the dynamic_data object that contains various intermediate representations.
This object is used to pass various models and intermediate representations from the backend in Macaron to checks. A check can also store intermediate results in this object to be used by checks that depend on it. However, please avoid adding arbitrary attributes to this object!
We recommend to take a look at the attributes in this object before writing a new check. Chances are that what you try to implement is already implemented and the results are available in the dynamic_data object.
- Return type:
- property provenances: dict[str, list[InTotoV01Statement | InTotoV1Statement]]
Return the provenances data as a dictionary.
- Returns:
A dictionary in which each key is a CI service’s name and each value is the corresponding provenance payload.
- Return type:
- property is_inferred_provenance: bool
Return True if the provenance for this repo is an inferred one.
- Return type:
- update_req_status(req_name, status, feedback)
Update the status of a single requirement.
- bulk_update_req_status(req_list, status, feedback)
Update the status of a requirements in
req_list
.
- get_slsa_level_table()
Return filled ORM table storing the level for this component.
- Return type:
- get_check_summary()
Return the summary of all checks results for the target repository.
- Returns:
The mapping of the check result type and the related check results.
- Return type:
- macaron.slsa_analyzer.analyze_context.store_inferred_build_info_results(ctx, ci_info, ci_service, trigger_link, job_id=None, step_id=None, step_name=None, callee_node_type=None)
Store the data related to the build.
- Parameters:
ctx (AnalyzeContext) – The analyze context object.
ci_info (CIInfo) – The CI data representation.
ci_service (BaseCIService) – The CI service representation.
trigger_link (str) – The link to the CI workflow.
job_id (str | None) – The CI job ID.
step_id (str | None) – The CI step ID.
step_name (str | None) – The CI step name.
callee_node_type (str | None) – The callee node type in the call graph.
- Return type:
macaron.slsa_analyzer.analyzer module
This module handles the cloning and analyzing a Git repo.
- class macaron.slsa_analyzer.analyzer.Analyzer(output_path, build_log_path)
Bases:
object
This class is used to analyze SLSA levels of a Git repo.
- __init__(output_path, build_log_path)
Initialize instance.
- run(user_config, sbom_path='', deps_depth=0, provenance_payload=None)
Run the analysis and write results to the output path.
This method handles the configuration file and writes the result html reports including dependencies. The return status code of this method depends on the analyzing status of the main repo only.
- Parameters:
user_config (dict) – The dictionary that contains the user config parsed from the yaml file.
sbom_path (str) – The path to the SBOM.
deps_depth (int) – The depth of dependency resolution. Default: 0.
provenance_payload (InToToPayload | None) – The provenance intoto payload for the main software component.
- Returns:
The return status code.
- Return type:
- generate_reports(report)
Generate the report of the analysis to all registered reporters.
- run_single(config, analysis, existing_records=None, provenance_payload=None)
Run the checks for a single repository target.
Please use Analyzer.run if you want to run the analysis for a config parsed from user provided yaml file.
- Parameters:
config (Configuration) – The configuration for running Macaron.
analysis (Analysis) – The current analysis instance.
existing_records (dict[str, Record] | None) – The mapping of existing records that the analysis has run successfully.
provenance_payload (InToToPayload | None) – The provenance intoto payload for the analyzed software component.
- Returns:
The record of the analysis for this repository.
- Return type:
- add_repository(branch_name, git_obj)
Create a repository instance for a target repository.
The repository instances are transient objects for SQLAlchemy, which may be added to the database ultimately.
- Parameters:
branch_name (str | None) – The name of the branch that we are analyzing. We need this because when the target repository is in a detached state, the current branch name cannot be determined.
git_obj (Git) – The pydriller Git object of the target repository.
- Returns:
The target repository or None if not found.
- Return type:
Repository | None
- class AnalysisTarget(parsed_purl: PackageURL | None, repo_path: str, branch: str, digest: str)
Bases:
NamedTuple
Contains the resolved details of a software component to be analyzed.
For repo_path, branch and digest, an empty string is used to indicated that they are not available. This is only for now because the current limitation of the Configuration class.
-
parsed_purl:
PackageURL
|None
The parsed PackageURL object from the PackageURL string of the software component. This field will be None if no PackageURL string is provided for this component.
-
parsed_purl:
- add_component(analysis, analysis_target, git_obj, existing_records=None, provenance_payload=None)
Add a software component if it does not exist in the DB already.
The component instances are transient objects for SQLAlchemy, which may be added to the database ultimately.
- Parameters:
analysis (Analysis) – The current analysis instance.
analysis_target (AnalysisTarget) – The target of this analysis.
git_obj (Git | None) – The pydriller.Git object of the repository.
existing_records (dict[str, Record] | None) – The mapping of existing records that the analysis has run successfully.
provenance_payload (InTotoVPayload | None) – The provenance intoto payload for the analyzed software component.
- Returns:
The software component.
- Return type:
- Raises:
PURLNotFoundError – No PURL is found for the component.
DuplicateCmpError – The component is already analyzed in the same session.
- static parse_purl(config)
Parse the PURL provided in the input.
- Parameters:
config (Configuration) – The target configuration that stores the user input values for the software component.
- Returns:
The parsed PURL, or None if one was not provided as input.
- Return type:
PackageURL | None
- Raises:
InvalidPURLError – If the PURL provided from the user is invalid.
- static to_analysis_target(config, available_domains, parsed_purl, provenance_repo_url=None, provenance_commit_digest=None)
Resolve the details of a software component from user input.
- Parameters:
config (Configuration) – The target configuration that stores the user input values for the software component.
available_domains (list[str]) – The list of supported git service host domain. This is used to convert repo-based PURL to a repository path of the corresponding software component.
parsed_purl (PackageURL | None) – The PURL to use for the analysis target, or None if one has not been provided.
provenance_repo_url (str | None) – The repository URL extracted from provenance, or None if not found or no provenance.
provenance_commit_digest (str | None) – The commit extracted from provenance, or None if not found or no provenance.
- Returns:
The NamedTuple that contains the resolved details for the software component.
- Return type:
- Raises:
InvalidAnalysisTargetError – Raised if a valid Analysis Target cannot be created.
- exception macaron.slsa_analyzer.analyzer.DuplicateCmpError(*args, context=None, **kwargs)
Bases:
DuplicateError
This class is used for duplicated software component errors.
- __init__(*args, context=None, **kwargs)
Create a DuplicateCmpError instance.
- Parameters:
context (AnalyzeContext | None) – The context in which the exception is raised.
macaron.slsa_analyzer.database_store module
The database_store module contains the methods to store analysis results to the database.
- macaron.slsa_analyzer.database_store.store_analyze_context_to_db(analyze_ctx)
Store the content of an analyzed context into the database.
- Parameters:
analyze_ctx (AnalyzeContext) – The analyze context to store into the database.
- Return type:
macaron.slsa_analyzer.git_url module
This module provides methods to perform generic actions on Git URLS.
- macaron.slsa_analyzer.git_url.GIT_REPOS_DIR = 'git_repos'
The directory in the output dir to store all cloned repositories.
- macaron.slsa_analyzer.git_url.parse_git_branch_output(content)
Return the list of branch names from a string that has a format similar to the output of
git branch --list
.- Parameters:
content (str) – The raw output as string from the
git branch
command.- Returns:
The list of strings where each string is a branch element from the raw output.
- Return type:
Examples
>>> from pprint import pprint >>> content = ''' ... * (HEAD detached at 7fc81f8) ... master ... remotes/origin/HEAD -> origin/master ... remotes/origin/master ... remotes/origin/v2.dev ... remotes/origin/v3.dev ... ''' >>> pprint(parse_git_branch_output(content)) ['(HEAD detached at 7fc81f8)', 'master', 'remotes/origin/HEAD -> origin/master', 'remotes/origin/master', 'remotes/origin/v2.dev', 'remotes/origin/v3.dev']
- macaron.slsa_analyzer.git_url.get_branches_containing_commit(git_obj, commit, remote='origin')
Get the branches from a remote that contains a specific commit.
The returned branch names will be in the form of <remote>/<branch_name>.
- Parameters:
- Returns:
The list of branches that contains the commit.
- Return type:
- macaron.slsa_analyzer.git_url.check_out_repo_target(git_obj, branch_name='', digest='', offline_mode=False)
Checkout the branch and commit specified by the user.
This function assumes that a remote “origin” exist and checkout from that remote ONLY.
If
offline_mode
is False, this function will fetch new changes from origin remote. The fetching operation will prune and update all references (e.g. tags, branches) to make sure that the local repository is up-to-date with the repository specified by origin remote.If
offline_mode
is True and neitherbranch_name
nor commit are provided, this function will not do anything and the HEAD commit will be analyzed. If there are uncommitted local changes, the HEAD commit will appear in the report but the repo with local changes will be analyzed. We leave it up to the user to decide whether to commit the changes or not.If
branch_name
is provided and a commit is not provided, this function will checkout that branch from origin remote (i.e. origin/<branch_name).If
branch_name
is not provided and a commit is provided, this function will checkout the commit directly.If both
branch_name
and a commit are provided, this function will checkout the commit directly only if that commit exists in the branch origin/<branch_name>. If not, this function will return False.For all scenarios: - If the checkout fails (e.g. a branch or a commit doesn’t exist), this function will return False. - This function will perform a force checkout https://git-scm.com/docs/git-checkout#Documentation/git-checkout.txt—force
This function supports repositories which are cloned from existing remote repositories. Other scenarios are not covered (e.g. a newly initiated repository).
- Parameters:
git_obj (Git) – The pydriller.Git wrapper object of the target repository.
branch_name (str) – The name of the branch we want to checkout.
digest (str) – The hash of the commit that we want to checkout in the branch.
offline_mode (bool) – If True, this function will not perform any online operation (fetch, pull).
- Returns:
True if succeed else False.
- Return type:
- macaron.slsa_analyzer.git_url.get_default_branch(git_obj)
Return the default branch name of the target repository.
This function does not perform any online operation. It depends on the existence of the remote reference
origin/HEAD
in the git repository. This remote reference will point to the default branch of the remote repository and it’s usually set when the repository is first cloned withgit clone <url>
. Therefore, this method will fail to obtain the default branch name iforigin/HEAD
is not available. An example of this case is when a repository is shallow-cloned from a non-default branch (e.g.git clone --depth=1 <url> -b some_branch
).- Parameters:
git_obj (Git) – The pydriller.Git wrapper object of the target repository.
- Returns:
The default branch name or empty if errors.
- Return type:
- macaron.slsa_analyzer.git_url.is_remote_repo(path_to_repo)
Verify if the given repository path is a remote path.
- macaron.slsa_analyzer.git_url.clone_remote_repo(clone_dir, url)
Clone the remote repository and return the git.Repo object for that repository.
If there is an existing non-empty
clone_dir
, Macaron assumes the repository has been cloned already and cancels the clone. This could happen when multiple runs of Macaron use the same <output_dir>, leading to Macaron potentially trying to clone a repository multiple times.We use treeless partial clone to reduce clone time, by retrieving trees and blobs lazily. For more details, see the following: - https://git-scm.com/docs/partial-clone - https://git-scm.com/docs/git-rev-list - https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone
- Parameters:
- Returns:
The
git.Repo
object of the repository, orNone
if the clone directory already exists.- Return type:
git.Repo | None
- Raises:
CloneError – If the repository has not been cloned and the clone attempt fails.
- macaron.slsa_analyzer.git_url.list_remote_references(arguments, repo)
Retrieve references from a remote repository using Git’s
ls-remote
.
- macaron.slsa_analyzer.git_url.resolve_local_path(start_dir, local_path)
Resolve the local path and check if it’s within a directory.
This method returns an empty string if there are errors with resolving
local_path
(e.g. non-existed dir, broken symlinks, etc.) orstart_dir
does not exist.
- macaron.slsa_analyzer.git_url.get_repo_name_from_url(url)
Extract the repo name of the repository from the remote url.
- Parameters:
url (str) – The remote url of the repository.
- Returns:
The name of the repository or an empty string if errors.
- Return type:
Examples
>>> get_repo_name_from_url("https://github.com/owner/repo") 'repo'
- macaron.slsa_analyzer.git_url.get_repo_full_name_from_url(url)
Extract the full name of the repository from the remote url.
The full name is in the form <owner>/<name>. Note that this function assumes url is a remote url.
- macaron.slsa_analyzer.git_url.get_repo_complete_name_from_url(url)
Return the complete name of the repo from a remote repo url.
The complete name will be in the form
<git_host>/org/name
.- Parameters:
url (str) – The remote url of the target repository.
- Returns:
The unique path resolved from the remote path or an empty string if errors.
- Return type:
Examples
>>> from macaron.config.defaults import load_defaults >>> load_defaults("") True >>> get_repo_complete_name_from_url("https://github.com/apache/maven") 'github.com/apache/maven'
- macaron.slsa_analyzer.git_url.get_remote_origin_of_local_repo(git_obj)
Get the origin remote of a repository.
Note that this origin remote can be either a remote url or a path to a local repo.
- Parameters:
git_obj (Git) – The pydriller.Git object of the repository.
- Returns:
The origin remote path or empty if error.
- Return type:
- macaron.slsa_analyzer.git_url.clean_up_repo_path(repo_path)
Clean up the repo path.
This method returns the repo path after cleaning up.
- macaron.slsa_analyzer.git_url.get_remote_vcs_url(url, clean_up=True)
Verify if the given repository path is a valid vcs.
We support some of the patterns listed in https://git-scm.com/docs/git-clone#_git_urls.
- macaron.slsa_analyzer.git_url.clean_url(url)
Clean the passed url, removing extraneous prefixes and parsing it with urllib.
- Parameters:
url (str) – The path to a repository.
- Returns:
The parsed URL.
- Return type:
ParseResult
- macaron.slsa_analyzer.git_url.parse_remote_url(url, allowed_git_service_hostnames=None)
Verify if the given repository path is a valid vcs.
This method converts the url to a
https://
url and return aurllib.parse.ParseResult object
to be consumed by Macaron. Note that the port number in the original url will be removed.- Parameters:
- Returns:
The parse result of the url or None if errors.
- Return type:
urllib.parse.ParseResult | None
Examples
>>> parse_remote_url("ssh://git@github.com:7999/owner/org.git") ParseResult(scheme='https', netloc='github.com', path='owner/org.git', params='', query='', fragment='')
- macaron.slsa_analyzer.git_url.get_allowed_git_service_hostnames(config)
Load allowed git service hostnames from ini configuration.
Some notes for future improvements:
The fact that this method is here is not ideal.
Q: Why do we need this method here in this
git_url
module in the first place? A: A number of functions in this module also do “URL validation” as part of their logic. This requires loading in the allowed git service hostnames from the ini config.Q: Why don’t we use the
GIT_SERVICES
list from themacaron.slsa_analyzer.git_service
instead of having this second place of loading git service configuration? A: ReferencingGIT_SERVICES
in this module results in cyclic imports since the module whereGIT_SERVICES
is defined in also reference this module.
- macaron.slsa_analyzer.git_url.get_repo_dir_name(url, sanitize=True)
Return the repo directory name from a remote repo url.
The directory name will be in the form
<git_host>/org/name
. When sanitize is True (default), this method makes sure thatgit_host
is a valid directory name: - Contains only lowercase letters and numbers - Only starts with lowercase letters or numbers - Words are separated by_
- Parameters:
- Returns:
The unique path resolved from the remote path or an empty string if errors.
- Return type:
Examples
>>> get_repo_dir_name("https://github.com/apache/maven") 'github_com/apache/maven'
macaron.slsa_analyzer.levels module
This module contains classes that handle the analysis of each SLSA levels.
- class macaron.slsa_analyzer.levels.SLSALevels(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
The enum for the SLSA level of each SLSA requirement.
See Also: https://slsa.dev/spec.
- LEVEL0 = 'SLSA Level 0'
- LEVEL1 = 'SLSA Level 1'
- LEVEL2 = 'SLSA Level 2'
- LEVEL3 = 'SLSA Level 3'
- LEVEL4 = 'SLSA Level 4'
macaron.slsa_analyzer.registry module
This module contains the Registry class for loading checks.
- class macaron.slsa_analyzer.registry.Registry
Bases:
object
This abstract class is used to store checks in Macaron.
- __init__()
Initiate the Registry instance.
- register(check)
Register the check.
This method will terminate the program if there is any error while registering the check.
- get_parents(check_id)
Return the ids of all direct parent checks.
- get_children(check_id)
Return the ids of all direct children checks.
- static get_reachable_nodes(node, get_successors)
Return the set that contains node and nodes that can be transitively reached from it.
This method obtains the successors of a node from get_successors. This get_successors function takes a node as input and returns a Collection of successors of that node.
- Parameters:
node (T) – The start node to find the transitive successors.
get_successors (Callable[[T], Iterable[T]]) – The function to obtain successors of every node.
- Returns:
Contains node and its transitive successors.
- Return type:
Iterable[T]
- get_final_checks(ex_pats, in_pats)
Return a set of the check ids to run from the exclude and include glob patterns.
The exclude and include glob patterns are used to match against the id of registered checks.
Including a check would effectively include all transitive parents of that check. Excluding a check would effectively exclude all transitive children of that check.
The final list of checks to run would be the included checks minus the excluded checks.
- get_check_execution_order()
Get the execution order of checks.
This follows the topological order on the check graph.
- scan(target)
Run all checks on a target repo.
- Parameters:
target (AnalyzeContext) – The object containing processed data for the target repo.
skipped_checks (list[SkippedInfo]) – The list of skipped checks information.
- Returns:
The mapping between the check id and its result.
- Return type:
- prepare()
Prepare for the analysis.
Return False if there are any errors that cause the analysis to not be able to begin.
- Returns:
True if there are no errors, else False.
- Return type:
- static get_all_checks_mapping()
Return the dictionary that includes all registered checks.
macaron.slsa_analyzer.slsa_req module
This module contains the base classes for defining SLSA requirements.
- class macaron.slsa_analyzer.slsa_req.ReqName(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
Store the name of each requirement.
- VCS = 'Version controlled'
- VERIFIED_HISTORY = 'Verified history'
- RETAINED_INDEFINITELY = 'Retained indefinitely'
- TWO_PERSON_REVIEWED = 'Two-person reviewed'
- SCRIPTED_BUILD = 'Scripted Build'
- BUILD_SERVICE = 'Build service'
- BUILD_AS_CODE = 'Build as code'
- EPHEMERAL_ENVIRONMENT = 'Ephemeral environment'
- ISOLATED = 'Isolated'
- PARAMETERLESS = 'Parameterless'
- HERMETIC = 'Hermetic'
- REPRODUCIBLE = 'Reproducible'
- PROV_AVAILABLE = 'Provenance - Available'
- PROV_AUTH = 'Provenance - Authenticated'
- PROV_SERVICE_GEN = 'Provenance - Service generated'
- PROV_NON_FALSIFIABLE = 'Provenance - Non falsifiable'
- PROV_DEPENDENCIES_COMPLETE = 'Provenance - Dependencies complete'
- PROV_CONT_ARTI = 'Provenance content - Identifies artifacts'
- PROV_CONT_BUILDER = 'Provenance content - Identifies builder'
- PROV_CONT_BUILD_INS = 'Provenance content - Identifies build instructions'
- PROV_CONT_SOURCE = 'Provenance content - Identifies source code'
- PROV_CONT_ENTRY = 'Provenance content - Identifies entry point'
- PROV_CONT_BUILD_PARAMS = 'Provenance content - Includes all build parameters'
- PROV_CONT_TRANSITIVE_DEPS = 'Provenance content - Includes all transitive dependencies'
- PROV_CONT_REPRODUCIBLE_INFO = 'Provenance content - Includes reproducible info'
- PROV_CONT_META_DATA = 'Provenance content - Includes metadata'
- SECURITY = 'Security'
- ACCESS = 'Access'
- SUPERUSERS = 'Superusers'
- EXPECTATION = 'Provenance conforms with expectations'
- class macaron.slsa_analyzer.slsa_req.Category(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
The category each requirement belongs to.
- BUILD = 'Build'
Related to the build process.
- SOURCE = 'Source'
Related to the source control.
- PROVENANCE = 'Provenance'
Related to how the provenance is generated and consumed.
- PROVENANCE_CONTENT = 'Provenance content'
Related to the content of provenance.
- COMMON = 'Common requirements'
Related to common requirements for every trusted system involved in the supply chain.
- class macaron.slsa_analyzer.slsa_req.SLSAReq(name, desc, category, req_level)
Bases:
object
This class represents a SLSA requirement (e.g Version Controlled).
- __init__(name, desc, category, req_level)
Initialize instance.
- Parameters:
name (str) – The name of the SLSA requirement.
desc (str) – The description of the SLSA requirement.
category (Category) – The category of the SLSA requirement.
req_level (SLSALevels) – The SLSA level that this requirement belongs to.
- class macaron.slsa_analyzer.slsa_req.SLSAReqStatus
Bases:
object
This class represents the status of a SLSA requirement.
- __init__()
Initialize instance.
- get_tuple()
Return the current feedback of a requirement.
- Return type:
- Returns:
is_addressed (bool) – Whether this SLSA req has been addressed from the analysis.
is_pass (bool) – True if the repository pass this requirement else False.
feedback (str) – The feedback from the analyzer for this requirement.
- macaron.slsa_analyzer.slsa_req.create_requirement_status_dict()
Create a dictionary containing a new, unfilled, SLSA requirement status object for each known SLSA requirement.
- Return type: