macaron.malware_analyzer.pypi_heuristics.metadata package

Submodules

macaron.malware_analyzer.pypi_heuristics.metadata.anomalous_version module

The heuristic analyzer to check for an anomalous package version.

class macaron.malware_analyzer.pypi_heuristics.metadata.anomalous_version.AnomalousVersionAnalyzer

Bases: BaseHeuristicAnalyzer

Analyze the version number (if there is only a single release) to detect if it is anomalous.

A version number is anomalous if any of its values are greater than the epoch or major threshold values. If the version does not adhere to PyPI standards (PEP 440, as per the ‘packaging’ module), this heuristic cannot analyze it.

Calendar versioning is detected as version numbers with the year, month and day present in the following combinations: (using the example 11th October 2016) - YYYY.MM.DD, e.g. 2016.10.11 - YYYY.DD.MM, e.g. 2016.11.10 - YY.DD.MM, e.g. 16.11.10 - YY.MM.DD, e.g. 16.10.11 - MM.DD.YYYY, e.g. 10.11.2016 - DD.MM.YYYY, e.g. 11.10.2016 - DD.MM.YY, e.g. 11.10.16 - MM.DD.YY, e.g. 10.11.16 - YYYYMMDD, e.g. 20161011 - YYYYDDMM, e.g. 20161110 - YYDDMM, e.g. 161110 - YYMMDD, e.g. 161011 - MMDDYYYY, e.g. 10112016 - DDMMYYYY, e.g. 11102016 - DDMMYY, e.g. 111016 - MMDDYY, e.g. 101116 This may be followed by further versioning (e.g. 2016.10.11.5.6.2). This type of versioning is detected based on the date of the upload time for the release within a threshold of a number of days (in the defaults file).

Calendar-semantic versioning is detected as version numbers with the major value as the year (either yyyy or yy), and any other series of numbers following it: - 2016.7.1 woud be version 7.1 of 2016 - 16.1.4 would be version 1.4 of 2016 This type of versioning is detected based on the exact year of the upload time for the release.

All other versionings are detected as semantic versioning.

DETAIL_INFO_KEY: str = 'versioning'
DIGIT_DATE_FORMATS: list[str] = ['%Y%m%d', '%Y%d%m', '%d%m%Y', '%m%d%Y', '%y%m%d', '%y%d%m', '%d%m%y', '%m%d%y']
__init__()
analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

Raises:

HeuristicAnalyzerValueError – if there is no release information available.

class macaron.malware_analyzer.pypi_heuristics.metadata.anomalous_version.Versioning(value)

Bases: Enum

Enum used to assign different versioning methods.

INVALID = 'invalid'
CALENDAR = 'calendar'
CALENDAR_SEMANTIC = 'calendar_semantic'
SEMANTIC = 'semantic'

macaron.malware_analyzer.pypi_heuristics.metadata.closer_release_join_date module

Analyzer checks whether the maintainers’ join date closer to latest package’s release date.

class macaron.malware_analyzer.pypi_heuristics.metadata.closer_release_join_date.CloserReleaseJoinDateAnalyzer

Bases: BaseHeuristicAnalyzer

Check whether the maintainers’ join date closer to package’s latest release date.

If any maintainer’s date duration is larger than threshold, we consider it as “PASS”.

__init__()
analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

macaron.malware_analyzer.pypi_heuristics.metadata.fake_email module

The heuristic analyzer to check the email address of the package maintainers.

class macaron.malware_analyzer.pypi_heuristics.metadata.fake_email.FakeEmailAnalyzer

Bases: BaseHeuristicAnalyzer

Analyze the email address of the package maintainers.

PATTERN = re.compile('\\b            # word‑boundary\n        [A-Za-z0-9]+      # first alpha‑numeric segment\n        (?:\\.[A-Za-z0-9]+)*   # optional “.segment” repeats\n        @\n        [A-Za-z0-9]+      # domain na, re.VERBOSE)
__init__()
get_emails(email_field)

Extract emails from the given email field.

Parameters:

email_field (str) – The email field from which to extract emails.

Returns:

A list of emails extracted from the email field.

Return type:

list[str]

is_valid_email(email)

Check if the email format is valid and the domain has MX records.

Parameters:

email (str) – The email address to check.

Returns:

The validated email object if the email is valid, otherwise None.

Return type:

ValidatedEmail | None

analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

macaron.malware_analyzer.pypi_heuristics.metadata.high_release_frequency module

Analyzer checks the frequent release heuristic.

class macaron.malware_analyzer.pypi_heuristics.metadata.high_release_frequency.HighReleaseFrequencyAnalyzer

Bases: BaseHeuristicAnalyzer

Check whether the release frequency is high.

__init__()
analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

macaron.malware_analyzer.pypi_heuristics.metadata.one_release module

Analyzer checks the packages contain one release.

class macaron.malware_analyzer.pypi_heuristics.metadata.one_release.OneReleaseAnalyzer

Bases: BaseHeuristicAnalyzer

Determine if there is only one release of the package.

__init__()
analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

macaron.malware_analyzer.pypi_heuristics.metadata.similar_projects module

This analyzer checks if the package has a similar structure to other packages maintained by the same user.

class macaron.malware_analyzer.pypi_heuristics.metadata.similar_projects.SimilarProjectAnalyzer

Bases: BaseHeuristicAnalyzer

Check whether the package has a similar structure to other packages maintained by the same user.

__init__()
analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

Raises:

HeuristicAnalyzerValueError – if the analysis fails.

get_url(package_name, package_type='sdist')

Get the URL of the package’s sdist.

Parameters:
  • package_name (str) – The name of the package.

  • package_type (str) – The package type to retrieve the URL of.

Returns:

The URL of the package’s sdist or None if not found.

Return type:

str | None

get_structure(package_name)

Get the file structure of the package’s sdist.

Parameters:

package_name (str) – The name of the package.

Returns:

The list of files in the package’s sdist.

Return type:

list[str]

get_structure_hash(package_name)

Get the hash of the package’s file structure.

Parameters:

package_name (str) – The name of the package.

Returns:

The hash of the package’s file structure.

Return type:

str

macaron.malware_analyzer.pypi_heuristics.metadata.source_code_repo module

The heuristic analyzer to check if a source code repo was found.

class macaron.malware_analyzer.pypi_heuristics.metadata.source_code_repo.SourceCodeRepoAnalyzer

Bases: BaseHeuristicAnalyzer

Analyze the accessibility of the source code repository.

Passes if a repository was found and validated by the repo finder, otherwise fails.

__init__()
analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

macaron.malware_analyzer.pypi_heuristics.metadata.typosquatting_presence module

Analyzer checks if there is typosquatting presence in the package name.

class macaron.malware_analyzer.pypi_heuristics.metadata.typosquatting_presence.TyposquattingPresenceAnalyzer(popular_packages_path=None)

Bases: BaseHeuristicAnalyzer

Check whether the PyPI package has typosquatting presence.

KEYBOARD_LAYOUT = {'-': (0, 10), '0': (0, 9), '1': (0, 0), '2': (0, 1), '3': (0, 2), '4': (0, 3), '5': (0, 4), '6': (0, 5), '7': (0, 6), '8': (0, 7), '9': (0, 8), 'a': (2, 0), 'b': (3, 4), 'c': (3, 2), 'd': (2, 2), 'e': (1, 2), 'f': (2, 3), 'g': (2, 4), 'h': (2, 5), 'i': (1, 7), 'j': (2, 6), 'k': (2, 7), 'l': (2, 8), 'm': (3, 6), 'n': (3, 5), 'o': (1, 8), 'p': (1, 9), 'q': (1, 0), 'r': (1, 3), 's': (2, 1), 't': (1, 4), 'u': (1, 6), 'v': (3, 3), 'w': (1, 1), 'x': (3, 1), 'y': (1, 5), 'z': (3, 0)}
__init__(popular_packages_path=None)
are_neighbors(first_char, second_char)

Check if two characters are adjacent on a QWERTY keyboard.

Adjacent characters are those that are next to each other either horizontally, vertically, or diagonally.

Parameters:
  • first_char (str) – The first character.

  • second_char (str) – The second character.

Returns:

True if the characters are neighbors, False otherwise.

Return type:

bool

substitution_func(first_char, second_char)

Calculate the substitution cost between two characters.

Parameters:
  • first_char (str) – The first character.

  • second_char (str) – The second character.

Returns:

0.0 if the characters are the same, self.keyboard if they are neighbors on a QWERTY keyboard, otherwise self.cost .

Return type:

float

jaro_distance(package_name, popular_package_name)

Calculate the Jaro distance between two package names.

Parameters:
  • package_name (str) – The name of the package being analyzed.

  • popular_package_name (str) – The name of a popular package to compare against.

Returns:

The Jaro distance between the two package names.

Return type:

float

ratio(package_name, popular_package_name)

Calculate the Jaro-Winkler distance ratio.

Parameters:
  • package_name (str) – The name of the package being analyzed.

  • popular_package_name (str) – The name of a popular package to compare against.

Returns:

The Jaro-Winkler distance ratio, incorporating a prefix bonus for common initial characters.

Return type:

float

analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

macaron.malware_analyzer.pypi_heuristics.metadata.unchanged_release module

Heuristics analyzer to check unchanged content in multiple releases.

class macaron.malware_analyzer.pypi_heuristics.metadata.unchanged_release.UnchangedReleaseAnalyzer

Bases: BaseHeuristicAnalyzer

Analyze whether the content of the package is updated by the maintainer.

__init__()
analyze(pypi_package_json)

Check the content of releases keep updating.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

macaron.malware_analyzer.pypi_heuristics.metadata.wheel_absence module

The heuristic analyzer to check .whl file absence.

class macaron.malware_analyzer.pypi_heuristics.metadata.wheel_absence.WheelAbsenceAnalyzer

Bases: BaseHeuristicAnalyzer

Analyze to see if a .whl file is available for the package.

If a package is distributed with a .whl file, this heuristic passes. Otherwise, the heuristic fails.

WHEEL: str = 'bdist_wheel'
INSPECTOR_TEMPLATE = '{inspector_url_scheme}://{inspector_url_netloc}/project/{name}/{version}/packages/{first}/{second}/{rest}/{filename}'
__init__()
analyze(pypi_package_json)

Analyze the package.

Parameters:

pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.

Returns:

The result and related information collected during the analysis.

Return type:

tuple[HeuristicResult, dict[str, JsonType]]

Raises:

HeuristicAnalyzerValueError – If there is no release information, or has other missing package information.