macaron.slsa_analyzer.package_registry package
This module defines the package registries.
Submodules
macaron.slsa_analyzer.package_registry.deps_dev module
This module contains implementation of deps.dev service.
- class macaron.slsa_analyzer.package_registry.deps_dev.DepsDevService
Bases:
object
The deps.dev service class.
- static get_package_info(purl)
Check if the package identified by the PackageURL (PURL) exists and return its information.
- Parameters:
purl (str) – The PackageURL (PURL).
- Returns:
The package metadata or None if it doesn’t exist.
- Return type:
dict | None
- Raises:
APIAccessError – If the service is misconfigured, the API is invalid, a network error happens, or unexpected response is returned by the API.
macaron.slsa_analyzer.package_registry.jfrog_maven_registry module
Assets on a package registry.
- class macaron.slsa_analyzer.package_registry.jfrog_maven_registry.JFrogMavenAsset(name: str, group_id: str, artifact_id: str, version: str, metadata: JFrogMavenAssetMetadata, jfrog_maven_registry: JFrogMavenRegistry)
Bases:
NamedTuple
An asset hosted on a JFrog Artifactory repository with Maven layout.
-
metadata:
JFrogMavenAssetMetadata
The metadata of the JFrog Maven asset.
-
jfrog_maven_registry:
JFrogMavenRegistry
The JFrog repo that acts as a package registry following the Maven layout.
-
metadata:
- class macaron.slsa_analyzer.package_registry.jfrog_maven_registry.JFrogMavenAssetMetadata(size_in_bytes: int, sha256_digest: str, download_uri: str)
Bases:
NamedTuple
Metadata of an asset on a JFrog Maven registry.
- class macaron.slsa_analyzer.package_registry.jfrog_maven_registry.JFrogMavenRegistry(hostname=None, repo=None, request_timeout=None, download_timeout=None, enabled=None)
Bases:
PackageRegistry
A JFrog Artifactory repository that acts as a package registry with Maven layout.
For more details on JFrog Artifactory repository, see: https://jfrog.com/help/r/jfrog-artifactory-documentation/repository-management
- __init__(hostname=None, repo=None, request_timeout=None, download_timeout=None, enabled=None)
Instantiate a JFrogMavenRegistry object.
- Parameters:
hostname (str) – The hostname of the JFrog instance.
repo (str | None) – The Artifactory repository with Maven layout on the JFrog instance.
request_timeout (int | None) – The timeout (in seconds) for regular requests made to the package registry.
download_timeout (int | None) – The timeout (in seconds) for downloading files from the package registry.
enabled (bool | None) – Whether the package registry should be active in the analysis or not. “Not active” means no target repo/software component can be matched against this package registry.
- load_defaults()
Load the .ini configuration for the current package registry.
- Raises:
ConfigurationError – If there is a schema violation in the
package_registry.jfrog.maven
section.- Return type:
- fetch_artifact_ids(group_id)
Get all artifact ids under a group id.
This is done by fetching all children folders under the group folder on the registry.
- construct_folder_info_url(folder_path)
Construct a URL for the JFrog Folder Info API.
Documentation: https://jfrog.com/help/r/jfrog-rest-apis/folder-info.
- construct_file_info_url(file_path)
Construct a URL for the JFrog File Info API.
Documentation: https://jfrog.com/help/r/jfrog-rest-apis/file-info.
- construct_latest_version_url(group_id, artifact_id)
Construct a URL for the JFrog Latest Version Search API.
The response payload includes the latest version of the package with the given group id and artifact id. Documentation: https://jfrog.com/help/r/jfrog-rest-apis/artifact-latest-version-search-based-on-layout.
- fetch_latest_version(group_id, artifact_id)
Fetch the latest version of a Java package on this JFrog Maven registry.
- fetch_asset_names(group_id, artifact_id, version, extensions=None)
Retrieve the metadata of assets published for a version of a Maven package.
- Parameters:
group_id (str) – The group id of the Maven package.
artifact_id (str) – The artifact id of the Maven package.
version (str) – The version of the Maven package.
extensions (set[str] | None) – The set of asset extensions. Only assets with names ending in these extensions are fetched. If this is
None
, then all assets are returned regardless of their extensions.
- Returns:
The list of asset names.
- Return type:
- extract_folder_names_from_folder_info_payload(folder_info_payload)
Extract a list of folder names from the Folder Info payload of a Maven group folder.
- extract_file_names_from_folder_info_payload(folder_info_payload, extensions=None)
Extract file names from the Folder Info response payload.
For the schema of this payload and other details regarding the API, see: https://jfrog.com/help/r/jfrog-rest-apis/folder-info.
Note: Currently, we do not try to validate the schema of the payload. Rather, we only try to read as much as possible things that we can recognise.
- Parameters:
- Returns:
The list of filenames in the folder, extracted from the payload.
- Return type:
- fetch_asset_metadata(group_id, artifact_id, version, asset_name)
Fetch an asset’s metadata from JFrog.
- Parameters:
- Returns:
The asset’s metadata, or
None
if the metadata cannot be retrieved.- Return type:
JFrogMavenAssetMetadata | None
- extract_asset_metadata_from_file_info_payload(file_info_payload)
Extract the metadata of an asset from the File Info request payload.
Documentation: https://jfrog.com/help/r/jfrog-rest-apis/file-info.
- Parameters:
file_info_payload (str) – The File Info request payload used to extract the metadata of an asset.
- Returns:
The asset’s metadata, or
None
if the metadata cannot be retrieved.- Return type:
JFrogMavenAssetMetadata | None
- fetch_assets(group_id, artifact_id, version, extensions=None)
Fetch the assets of a Maven package.
- Parameters:
- Returns:
The list of assets of the package.
- Return type:
- construct_asset_url(group_id, artifact_id, version, asset_name)
Get the URL to download an asset.
- Parameters:
- Returns:
The URL to the asset, which can be use for downloading the asset.
- Return type:
- download_asset(url, dest)
Download an asset from the given URL to a given location.
- find_publish_timestamp(purl)
Make a search request to Maven Central to find the publishing timestamp of an artifact.
The reason for directly fetching timestamps from Maven Central is that deps.dev occasionally misses timestamps for Maven artifacts, making it unreliable for this purpose.
To see the search API syntax see: https://central.sonatype.org/search/rest-api-guide/
- Parameters:
purl (str) – The Package URL (purl) of the package whose publication timestamp is to be retrieved. This should conform to the PURL specification.
- Returns:
A timezone-aware datetime object representing the publication timestamp of the specified package.
- Return type:
datetime
- Raises:
InvalidHTTPResponseError – If the URL construction fails, the HTTP response is invalid, or if the response cannot be parsed correctly, or if the expected timestamp is missing or invalid.
NotImplementedError – If not implemented for a registry.
macaron.slsa_analyzer.package_registry.maven_central_registry module
The module provides abstractions for the Maven Central package registry.
- macaron.slsa_analyzer.package_registry.maven_central_registry.same_organization(group_id_1, group_id_2)
Check if two maven group ids are from the same organization.
Note: It is assumed that for recognized source platforms, the top level domain doesn’t change the organization. I.e., io.github.foo and com.github.foo are assumed to be from the same organization.
- class macaron.slsa_analyzer.package_registry.maven_central_registry.MavenCentralRegistry(search_netloc=None, search_scheme=None, search_endpoint=None, registry_url_netloc=None, registry_url_scheme=None, request_timeout=None)
Bases:
PackageRegistry
This class implements a Maven Central package registry.
- __init__(search_netloc=None, search_scheme=None, search_endpoint=None, registry_url_netloc=None, registry_url_scheme=None, request_timeout=None)
Initialize a Maven Central Registry instance.
- Parameters:
search_netloc (str | None = None,) – The netloc of Maven Central search URL.
search_scheme (str | None = None,) – The scheme of Maven Central URL.
search_endpoint (str | None) – The search REST API to find artifacts.
registry_url_netloc (str | None) – The netloc of the Maven Central registry url.
registry_url_scheme (str | None) – The scheme of the Maven Central registry url.
request_timeout (int | None) – The timeout (in seconds) for requests made to the package registry.
- load_defaults()
Load the .ini configuration for the current package registry.
- Raises:
ConfigurationError – If there is a schema violation in the
maven_central
section.- Return type:
- find_publish_timestamp(purl)
Make a search request to Maven Central to find the publishing timestamp of an artifact.
The reason for directly fetching timestamps from Maven Central is that deps.dev occasionally misses timestamps for Maven artifacts, making it unreliable for this purpose.
To see the search API syntax see: https://central.sonatype.org/search/rest-api-guide/
- Parameters:
purl (str) – The Package URL (purl) of the package whose publication timestamp is to be retrieved. This should conform to the PURL specification.
- Returns:
A timezone-aware datetime object representing the publication timestamp of the specified package.
- Return type:
datetime
- Raises:
InvalidHTTPResponseError – If the URL construction fails, the HTTP response is invalid, or if the response cannot be parsed correctly, or if the expected timestamp is missing or invalid.
macaron.slsa_analyzer.package_registry.npm_registry module
The module provides abstractions for the npm package registry.
- class macaron.slsa_analyzer.package_registry.npm_registry.NPMRegistry(hostname=None, attestation_endpoint=None, request_timeout=None, enabled=True)
Bases:
PackageRegistry
This class implements the npm package registry.
There is no complete and up-to-date API documentation for the npm registry and the endpoints are discovered by manual inspection of links on https://www.npmjs.com.
- __init__(hostname=None, attestation_endpoint=None, request_timeout=None, enabled=True)
Initialize the npm Registry instance.
- Parameters:
- load_defaults()
Load the .ini configuration for the current package registry.
- Raises:
ConfigurationError – If there is a schema violation in the
npm registry
section.- Return type:
- download_attestation_payload(url, download_path)
Download the npm attestation from npm registry.
Each npm package can have the following types of attestations:
publish with “https://github.com/npm/attestation/tree/main/specs/publish/v0.1” predicateType
SLSA with “https://slsa.dev/provenance/v0.2” predicateType
SLSA with “https://slsa.dev/provenance/v1” predicateType
We download the unsigned SLSA provenance v0.2 or v1 in this method, and the signed npm type.
An example SLSA v0.2 provenance: https://registry.npmjs.org/-/npm/v1/attestations/@sigstore/mock@0.1.0 An example SLSA v1 provenance: https://registry.npmjs.org/-/npm/v1/attestations/@sigstore/mock@0.6.3
- Parameters:
- Returns:
True
if the asset is downloaded successfully;False
if not.- Return type:
- Raises:
InvalidHTTPResponseError – If the HTTP request to the registry fails or an unexpected response is returned.
- get_latest_version(namespace, name)
Try to retrieve the latest version of a package from the registry.
- class macaron.slsa_analyzer.package_registry.npm_registry.NPMAttestationAsset(namespace: str | None, artifact_id: str, version: str, npm_registry: NPMRegistry, size_in_bytes: int)
Bases:
NamedTuple
An attestation asset hosted on the npm registry.
The API Documentation can be found here:
-
namespace:
str
|None
The optional scope of a package on npm, which is used as the namespace in a PURL string. See https://docs.npmjs.com/cli/v10/using-npm/scope to know about npm scopes. See https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#npm for the namespace in an npm PURL string.
-
npm_registry:
NPMRegistry
The npm registry.
-
size_in_bytes:
int
The size of the asset (in bytes). This attribute is added to match the AssetLocator protocol and is not used because npm API registry does not provide it.
-
namespace:
macaron.slsa_analyzer.package_registry.osv_dev module
This module contains implementation of osv.dev service.
- class macaron.slsa_analyzer.package_registry.osv_dev.OSVDevService
Bases:
object
The deps.dev service class.
- static get_vulnerabilities_purl(purl)
Retrieve vulnerabilities associated with a specific package URL (PURL) by querying the OSV API.
This method calls the OSV query API with the provided package URL (PURL) to fetch any known vulnerabilities associated with that package.
- Parameters:
purl (str) – A string representing the Package URL (PURL) of the package to query for vulnerabilities.
- Returns:
A list of vulnerabilities under the key “vulns” if any vulnerabilities are found for the provided package.
- Return type:
- Raises:
APIAccessError – If there are issues with the API URL construction, missing configuration values, or invalid responses.
- static get_vulnerabilities_package_name(ecosystem, name)
Retrieve vulnerabilities associated with a specific package name and ecosystem by querying the OSV API.
This method calls the OSV query API with the provided ecosystem and package name to fetch any known vulnerabilities associated with that package.
- Parameters:
- Returns:
A list of vulnerabilities under the key “vulns” if any vulnerabilities are found for the provided ecosystem and package name.
- Return type:
- Raises:
APIAccessError – If there are issues with the API URL construction, missing configuration values, or invalid responses.
- static get_vulnerabilities_package_name_batch(packages)
Retrieve vulnerabilities for a batch of packages based on their ecosystem and name.
This method constructs a batch query to the OSV API to check for vulnerabilities in multiple packages by querying the ecosystem and package name. It processes the results while preserving the order of the input packages. If a package has associated vulnerabilities, it is included in the returned list.
- Parameters:
packages (list) – A list of dictionaries, where each dictionary represents a package with keys: - “ecosystem” (str): The package’s ecosystem (e.g., “GitHub Actions”, “npm”). - “name” (str): The name of the package.
- Returns:
A list of packages from the input packages list that have associated vulnerabilities. The order of the returned packages matches the order of the input.
- Return type:
- Raises:
APIAccessError – If there is an issue with querying the OSV API or if the results do not match the expected size.
- static get_osv_url(endpoint)
Construct a full API URL for a given OSV endpoint using values from the .ini configuration.
The configuration is expected to be in a section named [osv_dev] within the defaults object, and must include the following keys:
url_netloc: The base domain of the API.
url_scheme (optional): The scheme (e.g., “https”). Defaults to “https” if not provided.
A key matching the provided endpoint argument (e.g., “query_endpoint”), which defines the URL path.
- Parameters:
endpoint (str) – The key name of the endpoint in the [osv_dev] section to construct the URL path.
- Returns:
The fully constructed API URL.
- Return type:
- Raises:
APIAccessError – If required keys are missing from the configuration or if the URL cannot be constructed.
- static call_osv_query_api(query_data)
Query the OSV (Open Source Vulnerability) knowledge base API with the given data.
This method sends a POST request to the OSV API and processes the response to extract information about vulnerabilities based on the provided query data.
- Parameters:
query_data (dict) – A dictionary containing the query parameters to be sent to the OSV API. The query data should conform to the format expected by the OSV API for querying vulnerabilities.
- Returns:
A list of vulnerabilities under the key “vulns” if the query is successful and the response is valid.
- Return type:
- Raises:
APIAccessError – If there are issues with the API URL construction, missing configuration values, or invalid responses.
- static call_osv_querybatch_api(query_data, expected_size=None)
Query the OSV (Open Source Vulnerability) knowledge base API in batch mode and retrieves vulnerability data.
This method sends a batch query to the OSV API and processes the response to extract a list of results. The method also validates that the number of results matches an optional expected size. It handles API URL construction, error handling, and response validation.
- Parameters:
query_data (dict) – A dictionary containing the batch query data to be sent to the OSV API. This data should conform to the expected format for batch querying vulnerabilities.
expected_size (int, optional) – The expected number of results from the query. If provided, the method checks that the number of results matches this value. If the actual number of results does not match the expected size, an exception is raised. Default is None.
- Returns:
A list of results from the OSV API containing the vulnerability data that matches the query parameters.
- Return type:
- Raises:
APIAccessError – If any of the required configuration keys are missing, if the API URL construction fails, or if the response from the OSV API is invalid or the number of results does not match the expected size.
- static is_version_affected(vuln, pkg_name, pkg_version, ecosystem, source_repo=None)
Check whether a specific version of a package is affected by a vulnerability.
This method parses a vulnerability dictionary to determine whether a given package version falls within the affected version ranges for the specified ecosystem. The function handles version comparisons, extracting details about introduced and fixed versions, and determines if the version is affected by the vulnerability.
- Parameters:
vuln (dict) – A dictionary representing the vulnerability data. It should contain the affected versions and ranges of the package in question, as well as the details of the introduced and fixed versions for each affected range.
pkg_name (str) – The name of the package to check for vulnerability. This should match the package name in the vulnerability data.
pkg_version (str) – The version of the package to check against the vulnerability data.
ecosystem (str) – The ecosystem (e.g., npm, GitHub Actions) to which the package belongs. This should match the ecosystem in the vulnerability data.
source_repo (str | None, optional) – The source repository URL, used if the pkg_version is a commit hash. If provided, the method will try to retrieve the corresponding version tag from the repository. Default is None.
- Returns:
Returns True if the given package version is affected by the vulnerability, otherwise returns False.
- Return type:
- Raises:
APIAccessError – If the vulnerability data is incomplete or malformed, or if the version strings cannot be parsed correctly. This is raised in cases such as: - Missing affected version information - Malformed version data (e.g., invalid version strings) - Failure to parse the version ranges
macaron.slsa_analyzer.package_registry.package_registry module
This module defines package registries.
- class macaron.slsa_analyzer.package_registry.package_registry.PackageRegistry(name, build_tool_names)
Bases:
ABC
Base package registry class.
- __init__(name, build_tool_names)
- abstractmethod load_defaults()
Load the .ini configuration for the current package registry.
- Return type:
- is_detected(build_tool_name)
Detect if artifacts of the repo under analysis can possibly be published to this package registry.
The detection here is based on the repo’s detected build tool. If the package registry is compatible with the given build tool, it can be a possible place where the artifacts produced from the repo are published.
- find_publish_timestamp(purl)
Retrieve the publication timestamp for a package specified by its purl from the deps.dev repository by default.
This method constructs a request URL based on the provided purl, sends an HTTP GET request to fetch metadata about the package, and extracts the publication timestamp from the response.
Note: The method expects the response to include a
version
field with apublishedAt
subfield containing an ISO 8601 formatted timestamp.- Parameters:
purl (str) – The Package URL (purl) of the package whose publication timestamp is to be retrieved. This should conform to the PURL specification.
- Returns:
A timezone-aware datetime object representing the publication timestamp of the specified package.
- Return type:
datetime
- Raises:
InvalidHTTPResponseError – If the URL construction fails, the HTTP response is invalid, or if the response cannot be parsed correctly, or if the expected timestamp is missing or invalid.
NotImplementedError – If not implemented for a registry.
macaron.slsa_analyzer.package_registry.pypi_registry module
The module provides abstractions for the pypi package registry.
- class macaron.slsa_analyzer.package_registry.pypi_registry.PyPIRegistry(registry_url_netloc=None, registry_url_scheme=None, fileserver_url_netloc=None, fileserver_url_scheme=None, inspector_url_netloc=None, inspector_url_scheme=None, request_timeout=None, enabled=True)
Bases:
PackageRegistry
This class implements the pypi package registry.
- __init__(registry_url_netloc=None, registry_url_scheme=None, fileserver_url_netloc=None, fileserver_url_scheme=None, inspector_url_netloc=None, inspector_url_scheme=None, request_timeout=None, enabled=True)
Initialize the pypi Registry instance.
- Parameters:
registry_url_netloc (str | None) – The netloc of the pypi registry url.
registry_url_scheme (str | None) – The scheme of the pypi registry url.
fileserver_url_netloc (str | None) – The netloc of the server url that stores package source files, which contains the hostname and port.
fileserver_url_scheme (str | None) – The scheme of the server url that stores package source files.
inspector_url_netloc (str | None) – The netloc of the inspector server url, which contains the hostname and port.
inspector_url_scheme (str | None) – The scheme of the inspector server url.
request_timeout (int | None) – The timeout (in seconds) for requests made to the package registry.
enabled (bool) – Shows whether making REST API calls to pypi registry is enabled.
- load_defaults()
Load the .ini configuration for the current package registry.
- Raises:
ConfigurationError – If there is a schema violation in the
pypi
section.- Return type:
- download_package_json(url)
Download the package JSON metadata from pypi registry.
- Parameters:
url (str) – The package JSON url.
- Returns:
The JSON response if the request is successful.
- Return type:
- Raises:
InvalidHTTPResponseError – If the HTTP request to the registry fails or an unexpected response is returned.
- fetch_sourcecode(src_url)
Get the source code of the package.
- Returns:
The source code.
- Return type:
str | None
- get_package_page(package_name)
Implement custom API to get package main page.
- get_maintainers_of_package(package_name)
Implement custom API to get all maintainers of the package.
- get_maintainer_profile_page(username)
Implement custom API to get maintainer’s profile page.
- class macaron.slsa_analyzer.package_registry.pypi_registry.PyPIPackageJsonAsset(component_name, component_version, has_repository, pypi_registry, package_json)
Bases:
object
The package JSON hosted on the PyPI registry.
-
pypi_registry:
PyPIRegistry
The pypi registry.
- property url: str
Get the download URL of the asset.
Note: we assume that the path parameters used to construct the URL are sanitized already.
- Return type:
- download(dest)
Download the package JSON metadata and store it in the package_json attribute.
- Returns:
True
if the asset is downloaded successfully;False
if not.- Return type:
- get_project_links()
Retrieve the project links from the base metadata.
This method accesses the “info” section of the base metadata to extract the “project_urls” dictionary, which contains various links related to the project.
- Returns:
Containing project URLs where the keys are the names of the links and the values are the corresponding URLs. Returns None if the “project_urls” section is not found in the base metadata.
- Return type:
dict | None
- get_latest_version()
Get the latest version of the package.
- Returns:
The latest version.
- Return type:
str | None
- get_sourcecode_url()
Get the url of the source distribution.
- Returns:
The URL of the source distribution.
- Return type:
str | None
- get_latest_release_upload_time()
Get upload time of the latest release.
- Returns:
The upload time of the latest release.
- Return type:
str | None
- get_sourcecode()
Get source code of the package.
- __init__(component_name, component_version, has_repository, pypi_registry, package_json)
-
pypi_registry: