macaron.slsa_analyzer.package_registry package

This module defines the package registries.

Submodules

macaron.slsa_analyzer.package_registry.jfrog_maven_registry module

Assets on a package registry.

class macaron.slsa_analyzer.package_registry.jfrog_maven_registry.JFrogMavenAsset(name: str, group_id: str, artifact_id: str, version: str, metadata: JFrogMavenAssetMetadata, jfrog_maven_registry: JFrogMavenRegistry)

Bases: NamedTuple

An asset hosted on a JFrog Artifactory repository with Maven layout.

name: str

The name of the Maven asset.

group_id: str

The group id.

artifact_id: str

The artifact id.

version: str

The version of the Maven asset.

metadata: JFrogMavenAssetMetadata

The metadata of the JFrog Maven asset.

jfrog_maven_registry: JFrogMavenRegistry

The JFrog repo that acts as a package registry following the Maven layout.

property url: str

Get the URL to the asset.

This URL can be used to download the asset.

property sha256_digest: str

Get the SHA256 digest of the asset.

property size_in_bytes: int

Get the size of the asset (in bytes).

download(dest)

Download the asset.

Parameters:

dest (str) – The local destination where the asset is downloaded to. Note that this must include the file name.

Returns:

True if the asset is downloaded successfully; False if not.

Return type:

bool

class macaron.slsa_analyzer.package_registry.jfrog_maven_registry.JFrogMavenAssetMetadata(size_in_bytes: int, sha256_digest: str, download_uri: str)

Bases: NamedTuple

Metadata of an asset on a JFrog Maven registry.

size_in_bytes: int

The size of the asset (in bytes).

sha256_digest: str

The SHA256 digest of the asset.

download_uri: str

The download URI of the asset.

class macaron.slsa_analyzer.package_registry.jfrog_maven_registry.JFrogMavenRegistry(hostname=None, repo=None, request_timeout=None, download_timeout=None, enabled=None)

Bases: PackageRegistry

A JFrog Artifactory repository that acts as a package registry with Maven layout.

For more details on JFrog Artifactory repository, see: https://jfrog.com/help/r/jfrog-artifactory-documentation/repository-management

__init__(hostname=None, repo=None, request_timeout=None, download_timeout=None, enabled=None)

Instantiate a JFrogMavenRegistry object.

Parameters:
  • hostname (str) – The hostname of the JFrog instance.

  • repo (str | None) – The Artifactory repository with Maven layout on the JFrog instance.

  • request_timeout (int | None) – The timeout (in seconds) for regular requests made to the package registry.

  • download_timeout (int | None) – The timeout (in seconds) for downloading files from the package registry.

  • enabled (bool | None) – Whether the package registry should be active in the analysis or not. “Not active” means no target repo/software component can be matched against this package registry.

load_defaults()

Load the .ini configuration for the current package registry.

Raises:

ConfigurationError – If there is a schema violation in the package_registry.jfrog.maven section.

Return type:

None

is_detected(build_tool)

Detect if artifacts of the repo under analysis can possibly be published to this package registry.

The detection here is based on the repo’s detected build tool. If the package registry is compatible with the given build tool, it can be a possible place where the artifacts produced from the repo are published.

JFrogMavenRegistry is compatible with Maven and Gradle.

Parameters:

build_tool (BaseBuildTool) – A detected build tool of the repository under analysis.

Returns:

True if the repo under analysis can be published to this package registry, based on the given build tool.

Return type:

bool

construct_maven_repository_path(group_id, artifact_id=None, version=None, asset_name=None)

Construct a path to a folder or file on the registry, assuming Maven repository layout.

For more details regarding Maven repository layout, see the following: - https://maven.apache.org/repository/layout.html - https://maven.apache.org/guides/mini/guide-naming-conventions.html

Parameters:
  • group_id (str) – The group id of a Maven package.

  • artifact_id (str) – The artifact id of a Maven package.

  • version (str) – The version of a Maven package.

  • asset_name (str) – The asset name.

Returns:

The path to a folder or file on the registry.

Return type:

str

fetch_artifact_ids(group_id)

Get all artifact ids under a group id.

This is done by fetching all children folders under the group folder on the registry.

Parameters:

group_id (str) – The group id.

Returns:

The artifacts ids under the group.

Return type:

list[str]

construct_folder_info_url(folder_path)

Construct a URL for the JFrog Folder Info API.

Documentation: https://jfrog.com/help/r/jfrog-rest-apis/folder-info.

Parameters:

folder_path (str) – The path to the folder.

Returns:

The URL to request the info of the folder.

Return type:

str

construct_file_info_url(file_path)

Construct a URL for the JFrog File Info API.

Documentation: https://jfrog.com/help/r/jfrog-rest-apis/file-info.

Parameters:

file_path (str) – The path to the file.

Returns:

The URL to request the info of the file.

Return type:

str

construct_latest_version_url(group_id, artifact_id)

Construct a URL for the JFrog Latest Version Search API.

The response payload includes the latest version of the package with the given group id and artifact id. Documentation: https://jfrog.com/help/r/jfrog-rest-apis/artifact-latest-version-search-based-on-layout.

Parameters:
  • group_id (str) – The group id of the package.

  • artifact_id (str) – The artifact id of the package.

Returns:

The URL to request the latest version of the package.

Return type:

str

fetch_latest_version(group_id, artifact_id)

Fetch the latest version of a Java package on this JFrog Maven registry.

Parameters:
  • group_id (str) – The group id of the Java package.

  • artifact_id (str) – The artifact id of the Java package.

Returns:

The latest version of the Java package if it could be retrieved, or None otherwise.

Return type:

str | None

fetch_asset_names(group_id, artifact_id, version, extensions=None)

Retrieve the metadata of assets published for a version of a Maven package.

Parameters:
  • group_id (str) – The group id of the Maven package.

  • artifact_id (str) – The artifact id of the Maven package.

  • version (str) – The version of the Maven package.

  • extensions (set[str] | None) – The set of asset extensions. Only assets with names ending in these extensions are fetched. If this is None, then all assets are returned regardless of their extensions.

Returns:

The list of asset names.

Return type:

list[str]

extract_folder_names_from_folder_info_payload(folder_info_payload)

Extract a list of folder names from the Folder Info payload of a Maven group folder.

Parameters:

folder_info_payload (str) – The Folder Info payload.

Returns:

The artifact ids found in the payload.

Return type:

list[str]

extract_file_names_from_folder_info_payload(folder_info_payload, extensions=None)

Extract file names from the Folder Info response payload.

For the schema of this payload and other details regarding the API, see: https://jfrog.com/help/r/jfrog-rest-apis/folder-info.

Note: Currently, we do not try to validate the schema of the payload. Rather, we only try to read as much as possible things that we can recognise.

Parameters:
  • folder_info_payload (JsonType) – The JSON payload of a Folder Info response.

  • extensions (set[str] | None) – The set of allowed extensions. Filenames not ending in these extensions are omitted from the result. If this is None, then all file names are returned regardless of their extensions.

Returns:

The list of filenames in the folder, extracted from the payload.

Return type:

list[str]

fetch_asset_metadata(group_id, artifact_id, version, asset_name)

Fetch an asset’s metadata from JFrog.

Parameters:
  • group_id (str) – The group id of the package containing the asset.

  • artifact_id (str) – The artifact id of the package containing the asset.

  • version (str) – The version of the package containing the asset.

  • asset_name (str) – The name of the asset.

Returns:

The asset’s metadata, or None if the metadata cannot be retrieved.

Return type:

JFrogMavenAssetMetadata | None

extract_asset_metadata_from_file_info_payload(file_info_payload)

Extract the metadata of an asset from the File Info request payload.

Documentation: https://jfrog.com/help/r/jfrog-rest-apis/file-info.

Parameters:

file_info_payload (str) – The File Info request payload used to extract the metadata of an asset.

Returns:

The asset’s metadata, or None if the metadata cannot be retrieved.

Return type:

JFrogMavenAssetMetadata | None

fetch_assets(group_id, artifact_id, version, extensions=None)

Fetch the assets of a Maven package.

Parameters:
  • group_id (str) – The group id of the Maven package.

  • artifact_id (str) – The artifact id of the Maven package.

  • version (str) – The version of the Maven package.

  • extensions (set[str] | None) – The extensions of the assets to fetch. If this is None, all available assets are fetched.

Returns:

The list of assets of the package.

Return type:

list[JFrogMavenAsset]

construct_asset_url(group_id, artifact_id, version, asset_name)

Get the URL to download an asset.

Parameters:
  • group_id (str) – The group id of the package containing the asset.

  • artifact_id (str) – The artifact id of the package containing the asset.

  • version (str) – The version of the package containing the asset.

  • asset_name (str) – The name of the asset.

Returns:

The URL to the asset, which can be use for downloading the asset.

Return type:

str

download_asset(url, dest)

Download an asset from the given URL to a given location.

Parameters:
  • url (str) – The URL to the asset on the package registry.

  • dest (str) – The local destination where the asset is downloaded to.

Returns:

True if the file is downloaded successfully; False if not.

Return type:

bool

find_publish_timestamp(purl, registry_url=None)

Make a search request to Maven Central to find the publishing timestamp of an artifact.

The reason for directly fetching timestamps from Maven Central is that deps.dev occasionally misses timestamps for Maven artifacts, making it unreliable for this purpose.

To see the search API syntax see: https://central.sonatype.org/search/rest-api-guide/

Parameters:
  • purl (str) – The Package URL (purl) of the package whose publication timestamp is to be retrieved. This should conform to the PURL specification.

  • registry_url (str | None) – The registry URL that can be set for testing.

Returns:

A timezone-aware datetime object representing the publication timestamp of the specified package.

Return type:

datetime

Raises:
  • InvalidHTTPResponseError – If the URL construction fails, the HTTP response is invalid, or if the response cannot be parsed correctly, or if the expected timestamp is missing or invalid.

  • NotImplementedError – If not implemented for a registry.

macaron.slsa_analyzer.package_registry.maven_central_registry module

The module provides abstractions for the Maven Central package registry.

macaron.slsa_analyzer.package_registry.maven_central_registry.same_organization(group_id_1, group_id_2)

Check if two maven group ids are from the same organization.

Note: It is assumed that for recognized source platforms, the top level domain doesn’t change the organization. I.e., io.github.foo and com.github.foo are assumed to be from the same organization.

Parameters:
  • group_id_1 (str) – The first group id.

  • group_id_2 (str) – The second group id.

Returns:

True if the two group ids are from the same organization, False otherwise.

Return type:

bool

class macaron.slsa_analyzer.package_registry.maven_central_registry.MavenCentralRegistry(search_netloc=None, search_scheme=None, search_endpoint=None, registry_url_netloc=None, registry_url_scheme=None, request_timeout=None)

Bases: PackageRegistry

This class implements a Maven Central package registry.

__init__(search_netloc=None, search_scheme=None, search_endpoint=None, registry_url_netloc=None, registry_url_scheme=None, request_timeout=None)

Initialize a Maven Central Registry instance.

Parameters:
  • search_netloc (str | None = None,) – The netloc of Maven Central search URL.

  • search_scheme (str | None = None,) – The scheme of Maven Central URL.

  • search_endpoint (str | None) – The search REST API to find artifacts.

  • registry_url_netloc (str | None) – The netloc of the Maven Central registry url.

  • registry_url_scheme (str | None) – The scheme of the Maven Central registry url.

  • request_timeout (int | None) – The timeout (in seconds) for requests made to the package registry.

load_defaults()

Load the .ini configuration for the current package registry.

Raises:

ConfigurationError – If there is a schema violation in the maven_central section.

Return type:

None

is_detected(build_tool)

Detect if artifacts of the repo under analysis can possibly be published to this package registry.

The detection here is based on the repo’s detected build tools. If the package registry is compatible with the given build tools, it can be a possible place where the artifacts produced from the repo are published.

MavenCentralRegistry is compatible with Maven and Gradle.

Parameters:

build_tool (BaseBuildTool) – A detected build tool of the repository under analysis.

Returns:

True if the repo under analysis can be published to this package registry, based on the given build tool.

Return type:

bool

find_publish_timestamp(purl, registry_url=None)

Make a search request to Maven Central to find the publishing timestamp of an artifact.

The reason for directly fetching timestamps from Maven Central is that deps.dev occasionally misses timestamps for Maven artifacts, making it unreliable for this purpose.

To see the search API syntax see: https://central.sonatype.org/search/rest-api-guide/

Parameters:
  • purl (str) – The Package URL (purl) of the package whose publication timestamp is to be retrieved. This should conform to the PURL specification.

  • registry_url (str | None) – The registry URL that can be set for testing.

Returns:

A timezone-aware datetime object representing the publication timestamp of the specified package.

Return type:

datetime

Raises:

InvalidHTTPResponseError – If the URL construction fails, the HTTP response is invalid, or if the response cannot be parsed correctly, or if the expected timestamp is missing or invalid.

macaron.slsa_analyzer.package_registry.npm_registry module

The module provides abstractions for the npm package registry.

class macaron.slsa_analyzer.package_registry.npm_registry.NPMRegistry(hostname=None, attestation_endpoint=None, request_timeout=None, enabled=True)

Bases: PackageRegistry

This class implements the npm package registry.

There is no complete and up-to-date API documentation for the npm registry and the endpoints are discovered by manual inspection of links on https://www.npmjs.com.

__init__(hostname=None, attestation_endpoint=None, request_timeout=None, enabled=True)

Initialize the npm Registry instance.

Parameters:
  • hostname (str | None) – The hostname of the npm registry.

  • attestation_endpoint (str | None) – The attestation REST API.

  • request_timeout (int | None) – The timeout (in seconds) for requests made to the package registry.

  • enabled (bool) – Shows whether making REST API calls to npm registry is enabled.

load_defaults()

Load the .ini configuration for the current package registry.

Raises:

ConfigurationError – If there is a schema violation in the npm registry section.

Return type:

None

is_detected(build_tool)

Detect if artifacts under analysis can be published to this package registry.

The detection here is based on the repo’s detected build tools. If the package registry is compatible with the given build tools, it can be a possible place where the artifacts are published.

NPMRegistry is compatible with npm and Yarn build tools.

Note: if the npm registry is disabled through the ini configuration, this method returns False.

Parameters:

build_tool (BaseBuildTool) – A detected build tool of the repository under analysis.

Returns:

True if the repo under analysis can be published to this package registry, based on the given build tool.

Return type:

bool

download_attestation_payload(url, download_path)

Download the npm attestation from npm registry.

Each npm package can have the following types of attestations:

We download the unsigned SLSA provenance v0.2 or v1 in this method, and the signed npm type.

An example SLSA v0.2 provenance: https://registry.npmjs.org/-/npm/v1/attestations/@sigstore/mock@0.1.0 An example SLSA v1 provenance: https://registry.npmjs.org/-/npm/v1/attestations/@sigstore/mock@0.6.3

Parameters:
  • url (str) – The attestation URL.

  • download_path (str) – The download path for the asset.

Returns:

True if the asset is downloaded successfully; False if not.

Return type:

bool

Raises:

InvalidHTTPResponseError – If the HTTP request to the registry fails or an unexpected response is returned.

get_latest_version(namespace, name)

Try to retrieve the latest version of a package from the registry.

Parameters:
  • namespace (str | None) – The optional namespace of the package.

  • name (str) – The name of the package.

Returns:

The latest version of the package, or None if one cannot be found.

Return type:

str | None

class macaron.slsa_analyzer.package_registry.npm_registry.NPMAttestationAsset(namespace: str | None, artifact_id: str, version: str, npm_registry: NPMRegistry, size_in_bytes: int)

Bases: NamedTuple

An attestation asset hosted on the npm registry.

The API Documentation can be found here:

namespace: str | None

The optional scope of a package on npm, which is used as the namespace in a PURL string. See https://docs.npmjs.com/cli/v10/using-npm/scope to know about npm scopes. See https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst#npm for the namespace in an npm PURL string.

artifact_id: str

The artifact ID.

version: str

The version of the asset.

npm_registry: NPMRegistry

The npm registry.

size_in_bytes: int

The size of the asset (in bytes). This attribute is added to match the AssetLocator protocol and is not used because npm API registry does not provide it.

property name: str

Get the asset name.

property url: str

Get the download URL of the asset.

Note: we assume that the path parameters used to construct the URL are sanitized already.

Return type:

str

download(dest)

Download the asset.

Parameters:

dest (str) – The local destination where the asset is downloaded to. Note that this must include the file name.

Returns:

True if the asset is downloaded successfully; False if not.

Return type:

bool

macaron.slsa_analyzer.package_registry.package_registry module

This module defines package registries.

class macaron.slsa_analyzer.package_registry.package_registry.PackageRegistry(name)

Bases: ABC

Base package registry class.

__init__(name)
abstract load_defaults()

Load the .ini configuration for the current package registry.

Return type:

None

abstract is_detected(build_tool)

Detect if artifacts of the repo under analysis can possibly be published to this package registry.

The detection here is based on the repo’s detected build tool. If the package registry is compatible with the given build tool, it can be a possible place where the artifacts produced from the repo are published.

Parameters:

build_tool (BaseBuildTool) – A detected build tool of the repository under analysis.

Returns:

True if the repo under analysis can be published to this package registry, based on the given build tool.

Return type:

bool

find_publish_timestamp(purl, registry_url=None)

Retrieve the publication timestamp for a package specified by its purl from the deps.dev repository by default.

This method constructs a request URL based on the provided purl, sends an HTTP GET request to fetch metadata about the package, and extracts the publication timestamp from the response.

Note: The method expects the response to include a version field with a publishedAt subfield containing an ISO 8601 formatted timestamp.

Parameters:
  • purl (str) – The Package URL (purl) of the package whose publication timestamp is to be retrieved. This should conform to the PURL specification.

  • registry_url (str | None) – The registry URL that can be set for testing.

Returns:

A timezone-aware datetime object representing the publication timestamp of the specified package.

Return type:

datetime

Raises:
  • InvalidHTTPResponseError – If the URL construction fails, the HTTP response is invalid, or if the response cannot be parsed correctly, or if the expected timestamp is missing or invalid.

  • NotImplementedError – If not implemented for a registry.

macaron.slsa_analyzer.package_registry.pypi_registry module

The module provides abstractions for the pypi package registry.

class macaron.slsa_analyzer.package_registry.pypi_registry.PyPIRegistry(registry_url_netloc=None, registry_url_scheme=None, fileserver_url_netloc=None, fileserver_url_scheme=None, request_timeout=None, enabled=True)

Bases: PackageRegistry

This class implements the pypi package registry.

__init__(registry_url_netloc=None, registry_url_scheme=None, fileserver_url_netloc=None, fileserver_url_scheme=None, request_timeout=None, enabled=True)

Initialize the pypi Registry instance.

Parameters:
  • registry_url_netloc (str | None) – The netloc of the pypi registry url.

  • registry_url_scheme (str | None) – The scheme of the pypi registry url.

  • fileserver_url_netloc (str | None) – The netloc of the server url that stores package source files, which contains the hostname and port.

  • fileserver_url_scheme (str | None) – The scheme of the server url that stores package source files.

  • request_timeout (int | None) – The timeout (in seconds) for requests made to the package registry.

  • enabled (bool) – Shows whether making REST API calls to pypi registry is enabled.

load_defaults()

Load the .ini configuration for the current package registry.

Raises:

ConfigurationError – If there is a schema violation in the pypi section.

Return type:

None

is_detected(build_tool)

Detect if artifacts of the repo under analysis can possibly be published to this package registry.

The detection here is based on the repo’s detected build tools. If the package registry is compatible with the given build tools, it can be a possible place where the artifacts produced from the repo are published.

PyPIRegistry is compatible with Pip and Poetry.

Parameters:

build_tool (BaseBuildTool) – A detected build tool of the repository under analysis.

Returns:

True if the repo under analysis can be published to this package registry, based on the given build tool.

Return type:

bool

download_package_json(url)

Download the package JSON metadata from pypi registry.

Parameters:

url (str) – The package JSON url.

Returns:

The JSON response if the request is successful.

Return type:

dict

Raises:

InvalidHTTPResponseError – If the HTTP request to the registry fails or an unexpected response is returned.

get_package_page(package_name)

Implement custom API to get package main page.

Parameters:

package_name (str) – The package name.

Returns:

The package main page.

Return type:

str | None

get_maintainers_of_package(package_name)

Implement custom API to get all maintainers of the package.

Parameters:

package_name (str) – The package name.

Returns:

The list of maintainers.

Return type:

list | None

get_maintainer_profile_page(username)

Implement custom API to get maintainer’s profile page.

Parameters:

username (str) – The maintainer’s username.

Returns:

The profile page.

Return type:

str | None

get_maintainer_join_date(username)

Implement custom API to get the maintainer’s join date.

Parameters:

username (str) – The maintainer’s username.

Return type:

datetime | None

Returns:

datetime | None: Maintainers join date. Only recent maintainer’s data available.

class macaron.slsa_analyzer.package_registry.pypi_registry.PyPIPackageJsonAsset(component, pypi_registry, package_json)

Bases: object

The package JSON hosted on the PyPI registry.

component: Component

The target pypi software component.

pypi_registry: PyPIRegistry

The pypi registry.

package_json: dict

The asset content.

property size_in_bytes: int

Get the size of asset.

property name: str

Get the asset name.

property url: str

Get the download URL of the asset.

Note: we assume that the path parameters used to construct the URL are sanitized already.

Return type:

str

download(dest)

Download the package JSON metadata and store it in the package_json attribute.

Returns:

True if the asset is downloaded successfully; False if not.

Return type:

bool

get_releases()

Get all releases.

Returns:

Version to metadata.

Return type:

dict | None

Retrieve the project links from the base metadata.

This method accesses the “info” section of the base metadata to extract the “project_urls” dictionary, which contains various links related to the project.

Returns:

Containing project URLs where the keys are the names of the links and the values are the corresponding URLs. Returns None if the “project_urls” section is not found in the base metadata.

Return type:

dict | None

get_latest_version()

Get the latest version of the package.

Returns:

The latest version.

Return type:

str | None

get_sourcecode_url()

Get the url of the source distribution.

Returns:

The URL of the source distribution.

Return type:

str | None

get_latest_release_upload_time()

Get upload time of the latest release.

Returns:

The upload time of the latest release.

Return type:

str | None

__init__(component, pypi_registry, package_json)