macaron.malware_analyzer.pypi_heuristics.metadata package
Submodules
macaron.malware_analyzer.pypi_heuristics.metadata.anomalous_version module
The heuristic analyzer to check for an anomalous package version.
- class macaron.malware_analyzer.pypi_heuristics.metadata.anomalous_version.AnomalousVersionAnalyzer
Bases:
BaseHeuristicAnalyzer
Analyze the version number (if there is only a single release) to detect if it is anomalous.
A version number is anomalous if any of its values are greater than the epoch or major threshold values. If the version does not adhere to PyPI standards (PEP 440, as per the ‘packaging’ module), this heuristic cannot analyze it.
Calendar versioning is detected as version numbers with the year, month and day present in the following combinations: (using the example 11th October 2016) - YYYY.MM.DD, e.g. 2016.10.11 - YYYY.DD.MM, e.g. 2016.11.10 - YY.DD.MM, e.g. 16.11.10 - YY.MM.DD, e.g. 16.10.11 - MM.DD.YYYY, e.g. 10.11.2016 - DD.MM.YYYY, e.g. 11.10.2016 - DD.MM.YY, e.g. 11.10.16 - MM.DD.YY, e.g. 10.11.16 - YYYYMMDD, e.g. 20161011 - YYYYDDMM, e.g. 20161110 - YYDDMM, e.g. 161110 - YYMMDD, e.g. 161011 - MMDDYYYY, e.g. 10112016 - DDMMYYYY, e.g. 11102016 - DDMMYY, e.g. 111016 - MMDDYY, e.g. 101116 This may be followed by further versioning (e.g. 2016.10.11.5.6.2). This type of versioning is detected based on the date of the upload time for the release within a threshold of a number of days (in the defaults file).
Calendar-semantic versioning is detected as version numbers with the major value as the year (either yyyy or yy), and any other series of numbers following it: - 2016.7.1 woud be version 7.1 of 2016 - 16.1.4 would be version 1.4 of 2016 This type of versioning is detected based on the exact year of the upload time for the release.
All other versionings are detected as semantic versioning.
-
DIGIT_DATE_FORMATS:
list
[str
] = ['%Y%m%d', '%Y%d%m', '%d%m%Y', '%m%d%Y', '%y%m%d', '%y%d%m', '%d%m%y', '%m%d%y']
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
- Raises:
HeuristicAnalyzerValueError – if there is no release information available.
-
DIGIT_DATE_FORMATS:
macaron.malware_analyzer.pypi_heuristics.metadata.closer_release_join_date module
Analyzer checks whether the maintainers’ join date closer to latest package’s release date.
- class macaron.malware_analyzer.pypi_heuristics.metadata.closer_release_join_date.CloserReleaseJoinDateAnalyzer
Bases:
BaseHeuristicAnalyzer
Check whether the maintainers’ join date closer to package’s latest release date.
If any maintainer’s date duration is larger than threshold, we consider it as “PASS”.
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.empty_project_link module
Analyzer checks there is no project link of the package.
- class macaron.malware_analyzer.pypi_heuristics.metadata.empty_project_link.EmptyProjectLinkAnalyzer
Bases:
BaseHeuristicAnalyzer
Check whether the PyPI package has no project links.
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.fake_email module
The heuristic analyzer to check the email address of the package maintainers.
- class macaron.malware_analyzer.pypi_heuristics.metadata.fake_email.FakeEmailAnalyzer
Bases:
BaseHeuristicAnalyzer
Analyze the email address of the package maintainers.
- PATTERN = re.compile('\\b # word‑boundary\n [A-Za-z0-9]+ # first alpha‑numeric segment\n (?:\\.[A-Za-z0-9]+)* # optional “.segment” repeats\n @\n [A-Za-z0-9]+ # domain na, re.VERBOSE)
- __init__()
- get_emails(email_field)
Extract emails from the given email field.
- is_valid_email(email)
Check if the email format is valid and the domain has MX records.
- Parameters:
email (str) – The email address to check.
- Returns:
The validated email object if the email is valid, otherwise None.
- Return type:
ValidatedEmail | None
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.high_release_frequency module
Analyzer checks the frequent release heuristic.
- class macaron.malware_analyzer.pypi_heuristics.metadata.high_release_frequency.HighReleaseFrequencyAnalyzer
Bases:
BaseHeuristicAnalyzer
Check whether the release frequency is high.
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.one_release module
Analyzer checks the packages contain one release.
- class macaron.malware_analyzer.pypi_heuristics.metadata.one_release.OneReleaseAnalyzer
Bases:
BaseHeuristicAnalyzer
Determine if there is only one release of the package.
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.similar_projects module
This analyzer checks if the package has a similar structure to other packages maintained by the same user.
- class macaron.malware_analyzer.pypi_heuristics.metadata.similar_projects.SimilarProjectAnalyzer
Bases:
BaseHeuristicAnalyzer
Check whether the package has a similar structure to other packages maintained by the same user.
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
- Raises:
HeuristicAnalyzerValueError – if the analysis fails.
- get_url(package_name, package_type='sdist')
Get the URL of the package’s sdist.
- get_structure(package_name)
Get the file structure of the package’s sdist.
macaron.malware_analyzer.pypi_heuristics.metadata.source_code_repo module
The heuristic analyzer to check if a source code repo was found.
- class macaron.malware_analyzer.pypi_heuristics.metadata.source_code_repo.SourceCodeRepoAnalyzer
Bases:
BaseHeuristicAnalyzer
Analyze the accessibility of the source code repository.
Passes if a repository was found and validated by the repo finder, otherwise fails.
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.typosquatting_presence module
Analyzer checks if there is typosquatting presence in the package name.
- class macaron.malware_analyzer.pypi_heuristics.metadata.typosquatting_presence.TyposquattingPresenceAnalyzer(popular_packages_path=None)
Bases:
BaseHeuristicAnalyzer
Check whether the PyPI package has typosquatting presence.
- KEYBOARD_LAYOUT = {'-': (0, 10), '0': (0, 9), '1': (0, 0), '2': (0, 1), '3': (0, 2), '4': (0, 3), '5': (0, 4), '6': (0, 5), '7': (0, 6), '8': (0, 7), '9': (0, 8), 'a': (2, 0), 'b': (3, 4), 'c': (3, 2), 'd': (2, 2), 'e': (1, 2), 'f': (2, 3), 'g': (2, 4), 'h': (2, 5), 'i': (1, 7), 'j': (2, 6), 'k': (2, 7), 'l': (2, 8), 'm': (3, 6), 'n': (3, 5), 'o': (1, 8), 'p': (1, 9), 'q': (1, 0), 'r': (1, 3), 's': (2, 1), 't': (1, 4), 'u': (1, 6), 'v': (3, 3), 'w': (1, 1), 'x': (3, 1), 'y': (1, 5), 'z': (3, 0)}
- __init__(popular_packages_path=None)
- are_neighbors(first_char, second_char)
Check if two characters are adjacent on a QWERTY keyboard.
Adjacent characters are those that are next to each other either horizontally, vertically, or diagonally.
- substitution_func(first_char, second_char)
Calculate the substitution cost between two characters.
- jaro_distance(package_name, popular_package_name)
Calculate the Jaro distance between two package names.
- ratio(package_name, popular_package_name)
Calculate the Jaro-Winkler distance ratio.
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.unchanged_release module
Heuristics analyzer to check unchanged content in multiple releases.
- class macaron.malware_analyzer.pypi_heuristics.metadata.unchanged_release.UnchangedReleaseAnalyzer
Bases:
BaseHeuristicAnalyzer
Analyze whether the content of the package is updated by the maintainer.
- __init__()
- analyze(pypi_package_json)
Check the content of releases keep updating.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
macaron.malware_analyzer.pypi_heuristics.metadata.wheel_absence module
The heuristic analyzer to check .whl file absence.
- class macaron.malware_analyzer.pypi_heuristics.metadata.wheel_absence.WheelAbsenceAnalyzer
Bases:
BaseHeuristicAnalyzer
Analyze to see if a .whl file is available for the package.
If a package is distributed with a .whl file, this heuristic passes. Otherwise, the heuristic fails.
- INSPECTOR_TEMPLATE = '{inspector_url_scheme}://{inspector_url_netloc}/project/{name}/{version}/packages/{first}/{second}/{rest}/{filename}'
- __init__()
- analyze(pypi_package_json)
Analyze the package.
- Parameters:
pypi_package_json (PyPIPackageJsonAsset) – The PyPI package JSON asset object.
- Returns:
The result and related information collected during the analysis.
- Return type:
tuple[HeuristicResult, dict[str, JsonType]]
- Raises:
HeuristicAnalyzerValueError – If there is no release information, or has other missing package information.