macaron.code_analyzer.dataflow_analysis package

Submodules

macaron.code_analyzer.dataflow_analysis.analysis module

Entry points to perform and use the dataflow analysis.

macaron.code_analyzer.dataflow_analysis.analysis.analyse_github_workflow_file(workflow_path, repo_path, dump_debug=False)

Perform dataflow analysis for GitHub Actions Workflow file.

Parameters:
  • workflow_path (str) – The path to workflow file.

  • repo_path (str | None) – The path to the repo.

  • dump_debug (bool) – Whether to output debug dot file (in the current working directory).

Returns:

Graph representation of workflow and analysis results.

Return type:

core.Node

macaron.code_analyzer.dataflow_analysis.analysis.analyse_github_workflow(workflow, workflow_source_path, repo_path, dump_debug=False)

Perform dataflow analysis for GitHub Actions Workflow.

Parameters:
  • workflow (github_workflow_model.Workflow) – The workflow.

  • workflow_path (str) – The source path for the workflow.

  • repo_path (str | None) – The path to the repo.

  • dump_debug (bool) – Whether to output debug dot file (in the current working directory).

Returns:

Graph representation of workflow and analysis results.

Return type:

core.Node

macaron.code_analyzer.dataflow_analysis.analysis.analyse_bash_script(bash_content, source_path, repo_path, dump_debug=False)

Perform dataflow analysis for Bash script.

Parameters:
  • bash_content (str) – The Bash script content.

  • source_path (str) – The source path for the Bash script.

  • repo_path (str | None) – The path to the repo.

  • dump_debug (bool) – Whether to output debug dot file (in the current working directory).

Returns:

Graph representation of Bash script and analysis results.

Return type:

core.Node

class macaron.code_analyzer.dataflow_analysis.analysis.FindSecretsVisitor(workflow_var_scope)

Bases: object

Visitor to find references to GitHub secrets in analysis expressions.

__init__(workflow_var_scope)

Construct a visitor to find secrets.

Parameters:

workflow_var_scope (facts.Scope) – Scope in which secrets may be found

workflow_var_scope: Scope

Scope in which secrets may be found

secrets: set[str]

Found secret variable names, populated by running the visitor

visit_value(value)

Search value expression for secrets.

Return type:

None

visit_location(location)

Search location expression for secrets.

Return type:

None

visit_location_specifier(location)

Search location expression for secrets.

Return type:

None

macaron.code_analyzer.dataflow_analysis.analysis.get_reachable_secrets(bash_cmd_node)

Get GitHub secrets that are reachable at a bash command.

Parameters:

bash_cmd_node (bash.BashSingleCommandNode) – The target Bash command node.

Returns:

The set of reachable secret variable names.

Return type:

set[str]

macaron.code_analyzer.dataflow_analysis.analysis.get_containing_github_job(node, parents)

Return the GitHub job node containing the given node, if any.

Parameters:
  • node (core.Node) – The target node.

  • parents (dict[core.Node, code.Node]) – The mapping of nodes to their parent nodes.

Returns:

The containing job node, or None if there is no containing job.

Return type:

github.GitHubActionsNormalJobNode | None

macaron.code_analyzer.dataflow_analysis.analysis.get_containing_github_step(node, parents)

Return the GitHub step node containing the given node, if any.

Parameters:
  • node (core.Node) – The target node.

  • parents (dict[core.Node, code.Node]) – The mapping of nodes to their parent nodes.

Returns:

The containing step node, or None if there is no containing step.

Return type:

github.GitHubActionsRunStepNode | None

macaron.code_analyzer.dataflow_analysis.analysis.get_containing_github_workflow(node, parents)

Return the GitHub workflow node containing the given node, if any.

Parameters:
  • node (core.Node) – The target node.

  • parents (dict[core.Node, code.Node]) – The mapping of nodes to their parent nodes.

Returns:

The containing workflow node, or None if there is no containing workflow.

Return type:

github.GitHubActionsWorkflowNode | None

macaron.code_analyzer.dataflow_analysis.analysis.get_build_tool_commands(nodes, build_tool)

Traverse the callgraph and find all the reachable build tool commands.

This generator yields sorted build tool command objects to allow a deterministic behavior. The objects are sorted based on the string representation of the build tool object.

Parameters:
  • nodes (core.NodeForest) – The callgraph reachable from the CI workflows.

  • build_tool (BaseBuildTool) – The corresponding build tool for which shell commands need to be detected.

Yields:

BuildToolCommand – The object that contains the build command as well useful contextual information.

Return type:

Iterable[BuildToolCommand]

macaron.code_analyzer.dataflow_analysis.analysis.get_ci_events_from_workflow(workflow)

Get the CI events that trigger the GitHub Action workflow.

Parameters:

workflow (github_workflow_model.Workflow) – The target GitHub Action workflow.

Returns:

The list of event names.

Return type:

list[str]

macaron.code_analyzer.dataflow_analysis.bash module

Dataflow analysis implementation for analysing Bash shell scripts.

class macaron.code_analyzer.dataflow_analysis.bash.BashExit

Bases: ExitType

Exit type for Bash exit statement.

class macaron.code_analyzer.dataflow_analysis.bash.BashReturn

Bases: ExitType

Exit type for returning from a Bash function.

class macaron.code_analyzer.dataflow_analysis.bash.BashScriptContext(outer_context, filesystem, env, func_decls, stdin_scope, stdin_loc, stdout_scope, stdout_loc, source_filepath)

Bases: Context

Context for a Bash script.

outer_context: Union[OwningContextRef[GitHubActionsStepContext], NonOwningContextRef[GitHubActionsStepContext], OwningContextRef[BashScriptContext], NonOwningContextRef[BashScriptContext], OwningContextRef[AnalysisContext], NonOwningContextRef[AnalysisContext]]

Outer context, which may be a GitHub run step, another Bash script that ran this script, or just the outermost analysis context if analysing the script in isolation.

filesystem: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for filesystem used by the script.

env: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for env variables within the script.

func_decls: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for defined functions within the script.

stdin_scope: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for the stdin attached to the Bash process.

stdin_loc: LocationSpecifier

Location for the stdin attached to the Bash process.

stdout_scope: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for the stdout attached to the Bash process.

stdout_loc: LocationSpecifier

Location for the stdout attached to the Bash process.

source_filepath: str

Filepath for Bash script file.

static create_from_run_step(context, source_filepath)

Create a new Bash script context (for being called from a GitHub step) and its associated scopes.

Reuses the filesystem and stdout scopes from the outer context, env scope inherits from the outer scope.

Parameters:
Returns:

The new Bash script context.

Return type:

BashScriptContext

static create_from_bash_script(context, source_filepath)

Create a new Bash script context (for being called from another Bash script) and its associated scopes.

Reuses the filesystem, stdin, and stdout scopes from the outer context, env scope inherits from the outer context.

Parameters:
  • context (core.ContextRef[BashScriptContext]) – Outer Bash script context.

  • source_filepath (str) – Filepath of Bash script file.

Returns:

The new Bash script context.

Return type:

BashScriptContext

static create_in_isolation(context, source_filepath)

Create a new Bash script context (for being analysed in isolation) and its associated scopes.

Parameters:
  • context (core.ContextRef[core.AnalysisContext]) – Outer analysis context.

  • source_filepath (str) – Filepath of Bash script file.

Returns:

The new Bash script context.

Return type:

BashScriptContext

with_stdin(stdin_scope, stdin_loc)

Return a modified bash script context with the given stdin.

Return type:

BashScriptContext

with_stdout(stdout_scope, stdout_loc)

Return a modified bash script context with the given stdout.

Return type:

BashScriptContext

get_containing_github_context()

Return the (possibly transitive) containing GitHub step context, if there is one.

Return type:

GitHubActionsStepContext | None

get_containing_analysis_context()

Return the (possibly transitive) containing analysis context.

Return type:

AnalysisContext

direct_refs()

Yield the direct references of the context, either to scopes or to other contexts.

Return type:

Iterator[Union[OwningContextRef[Context], NonOwningContextRef[Context], OwningContextRef[Scope], NonOwningContextRef[Scope]]]

__init__(outer_context, filesystem, env, func_decls, stdin_scope, stdin_loc, stdout_scope, stdout_loc, source_filepath)
class macaron.code_analyzer.dataflow_analysis.bash.RawBashScriptNode(script, context)

Bases: InterpretationNode

Interpretation node representing a Bash script (with the script as an unparsed string value).

Defines how to resolve and parse the Bash script content and generate the analysis representation.

__init__(script, context)

Initialize Bash script node.

Parameters:
  • script (facts.Value) – Value for Bash script content (as a string).

  • context (core.ContextRef[BashScriptContext]) – Bash script context.

script: facts.Value

Value for Bash script content (as a string).

context: core.ContextRef[BashScriptContext]

Bash script context.

identify_interpretations(state)

Interpret the Bash script to resolve and parse the Bash script content and generate the analysis representation.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.bash.BashScriptNode(definition, stmts, context)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a Bash script.

Control flow structure consists of a sequence of Bash statements. Note that this can model complex control flow with branching, loops, etc. because those control flow constructs will be statement nodes with their own control flow nested within.

Control flow that the cuts across multiple levels, such as an exit statement within a if statement branch that would cause the entire script to exit early, are modelled using the alternate exits mechanism (i.e. exit statement creates a BashExit exit state, in the enclosing control-flow constructs the successor of the BashExit exit of a child node will be an early BashExit exit of that construct, and so on up until this node, where there will be a early normal exit, and so the caller of this script would then proceed as normal after the script exits).

__init__(definition, stmts, context)

Initialize Bash script node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: bashparser_model.File

Parsed Bash script AST.

stmts: list[BashStatementNode]

Statement nodes in execution order.

context: core.ContextRef[BashScriptContext]

Bash script context.

children()

Yield the nodes in the sequence.

Return type:

Iterator[Node]

get_entry()

Return the entry node, the first statement in the sequence.

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns the next in the sequence or the exit in the case of the last node, or an early exit in the case of a BashExit or BashReturn exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(script, context)

Create Bash script node from Bash script AST.

Parameters:
Return type:

BashScriptNode

class macaron.code_analyzer.dataflow_analysis.bash.BashBlockNode(definition, stmts, context)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a Bash block.

Control flow structure consists of a sequence of Bash statements.

__init__(definition, stmts, context)

Initialize Bash block node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: bashparser_model.Block | list[bashparser_model.Stmt]

Parsed block AST or list of statement ASTs.

stmts: list[BashStatementNode]

Statement nodes in execution order.

context: core.ContextRef[BashScriptContext]

Bash script context.

children()

Yield the nodes in the sequence.

Return type:

Iterator[Node]

get_entry()

Return the entry node, the first statement in the sequence.

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns the next in the sequence or the exit in the case of the last node, or a propagated early exit of the same type in the case of a BashExit or BashReturn exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the line number and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(script, context)

Create Bash block node from block AST or list of statement ASTs.

Parameters:
Return type:

BashBlockNode

class macaron.code_analyzer.dataflow_analysis.bash.BashFuncCallNode(call_definition, func_definition, block, context)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a call to a Bash function.

Control flow structure consists of a single block containing the function body.

__init__(call_definition, func_definition, block, context)

Initialize Bash function call node.

Parameters:
call_definition: bashparser_model.Stmt

The parsed AST of the callsite statement.

func_definition: bashparser_model.FuncDecl

The parsed AST of the function declaration.

block: BashBlockNode

Node representing the function body.

context: core.ContextRef[BashScriptContext]

Bash script context.

children()

Yield the function body block node.

Return type:

Iterator[Node]

get_entry()

Return the function body block node.

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns the next node in the sequence or the exit in the case of the last node, or an early exit in the case of a BashReturn exit type, or a propagated early BashExit exit in the case of a BashExit exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table.

Contains the line number of the callsite, the line number of the function declaration, and the scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

macaron.code_analyzer.dataflow_analysis.bash.get_stdout_redirects(stmt, context)

Extract the stdout redirects specified on the statement as a set of location expressions.

Return type:

set[Location]

class macaron.code_analyzer.dataflow_analysis.bash.BashStatementNode(definition, context)

Bases: InterpretationNode

Interpretation node representing any kind of Bash statement.

Defines how to interpret the different kinds of statements and generate the appropriate analysis representation.

__init__(definition, context)

Initialize statement node.

definition: bashparser_model.Stmt

The parsed statement AST.

context: core.ContextRef[BashScriptContext]

Bash script context.

identify_interpretations(state)

Interpret the different kinds of statements and generate the appropriate analysis representation.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the line number and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.bash.BashIfClauseNode(definition, cond_stmts, then_stmts, else_stmts, context)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a Bash if statement.

Control flow structure consists of executing the statements of the condition, followed by a branch to execute either the then node or the else node (or if there is no else node, exit immediately). The analysis is not path sensitive, so both branches are always considered possible regardless of the condition.

__init__(definition, cond_stmts, then_stmts, else_stmts, context)

Initialize Bash if statement node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: bashparser_model.IfClause

Parsed if statement AST.

cond_stmts: BashBlockNode

Block node to execute the condition.

then_stmts: BashBlockNode

Block node for the case where the condition is true.

else_stmts: BashBlockNode | BashIfClauseNode | None

Node for the case where the condition is false, if any (will be another if node in the case of an elif).

context: core.ContextRef[BashScriptContext]

Bash script context.

children()

Yield the condition node, then node and (if present) else node.

Return type:

Iterator[Node]

get_entry()

Return the entry node (the condition node).

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns a propagated early exit of the same type in the case of a BashExit or BashReturn exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the line number and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(if_stmt, context)

Create a Bash if statement node from if statement AST.

Parameters:
Return type:

BashIfClauseNode

class macaron.code_analyzer.dataflow_analysis.bash.BashForClauseNode(definition, init_stmts, cond_stmts, body_stmts, post_stmts, context)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a Bash for statement.

Control flow structure consists of executing the statements of the condition, followed by a branch to execute or skip the loop body node . The analysis is not path sensitive, so both branches are always considered possible regardless of the condition.

TODO: Currently doesn’t actually model the loop back edge (need more testing to be confident of analysis termination in the presence of loops).

__init__(definition, init_stmts, cond_stmts, body_stmts, post_stmts, context)

Initialize Bash for statement node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: bashparser_model.ForClause

Parsed for statement AST.

init_stmts: BashBlockNode | None

Block node to execute the initializer.

cond_stmts: BashBlockNode | None

Block node to execute the condition.

body_stmts: BashBlockNode

Block node for the loop body.

post_stmts: BashBlockNode | None

Block node to execute the post.

context: core.ContextRef[BashScriptContext]

Bash script context.

children()

Yield the initializer, condition, body and post nodes.

Return type:

Iterator[Node]

get_entry()

Return the entry node.

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns a propagated early exit of the same type in the case of a BashExit or BashReturn exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the line number and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(for_stmt, context)

Create a Bash for statement node from for statement AST.

Parameters:
Return type:

BashForClauseNode

class macaron.code_analyzer.dataflow_analysis.bash.BashPipeContext(bash_script_context, pipe_scope, pipe_loc)

Bases: Context

Context for a Bash pipe operation.

Introduces a scope and location to represent the pipe itself connecting the piped commands, where output from the piped-from command is written prior to being read as input by the piped-to command.

bash_script_context: Union[OwningContextRef[BashScriptContext], NonOwningContextRef[BashScriptContext]]

Outer Bash script context

pipe_scope: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for pipe.

pipe_loc: LocationSpecifier

Location for pipe.

static create(context)

Create a new pipe context and its associated scope.

Return type:

BashPipeContext

direct_refs()

Yield the direct references of the context, either to scopes or to other contexts.

Return type:

Iterator[Union[OwningContextRef[Context], NonOwningContextRef[Context], OwningContextRef[Scope], NonOwningContextRef[Scope]]]

__init__(bash_script_context, pipe_scope, pipe_loc)
class macaron.code_analyzer.dataflow_analysis.bash.BashPipeNode(definition, lhs, rhs, context)

Bases: ControlFlowGraphNode

Control flow node representing a Bash pipe (“|”) binary command.

Control flow structure consists of executing the left-hand side, followed by the right-hand side. A pipe scope and location is introduced to model the piping of the output from the first command to the input of the second command.

__init__(definition, lhs, rhs, context)

Initialize Bash pipe node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: bashparser_model.BinaryCmd

Parsed pipe binary command AST.

lhs: BashStatementNode

Left-hand side (first) command.

rhs: BashStatementNode

Right-hand side (second) command.

context: core.ContextRef[BashPipeContext]

Pipe context.

children()

Yield the subcommands.

Return type:

Iterator[Node]

get_entry()

Return the entry node (the lhs node).

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns a propagated early exit of the same type in the case of a BashExit or BashReturn exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the line number and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(pipe_cmd, context)

Create Bash pipe node from pipe binary command AST.

Parameters:
Return type:

BashPipeNode

class macaron.code_analyzer.dataflow_analysis.bash.BashAndNode(definition, lhs, rhs, context)

Bases: ControlFlowGraphNode

Control flow node representing a Bash AND (”&&”) binary command.

Control flow structure consists of executing the left-hand side, followed by the right-hand side.

(TODO model short circuit?)

__init__(definition, lhs, rhs, context)

Initialize Bash and node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: bashparser_model.BinaryCmd

Parsed AND binary command AST.

lhs: BashStatementNode

Left-hand side (first) command.

rhs: BashStatementNode

Right-hand side (second) command.

context: core.ContextRef[BashScriptContext]

Bash script context.

children()

Yield the subcommands.

Return type:

Iterator[Node]

get_entry()

Return the entry node (the lhs node).

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns a propagated early exit of the same type in the case of a BashExit or BashReturn exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the line number and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(and_cmd, context)

Create Bash and node from AND binary command AST.

Parameters:
Return type:

BashAndNode

class macaron.code_analyzer.dataflow_analysis.bash.BashOrNode(definition, lhs, rhs, context)

Bases: ControlFlowGraphNode

Control flow node representing a Bash OR (“||”) binary command.

Control flow structure consists of executing the left-hand side, followed by the right-hand side.

(TODO model short circuit?)

__init__(definition, lhs, rhs, context)

Initialize Bash OR node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: bashparser_model.BinaryCmd

Parsed OR binary command AST.

lhs: BashStatementNode

Left-hand side (first) command.

rhs: BashStatementNode

Right-hand side (second) command.

context: core.ContextRef[BashScriptContext]

Bash script context.

children()

Yield the subcommands.

Return type:

Iterator[Node]

get_entry()

Return the entry node (the lhs node).

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node.

Returns a propagated early exit of the same type in the case of a BashExit or BashReturn exit type.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the line number and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(or_cmd, context)

Create Bash OR node from OR binary command AST.

Parameters:
Return type:

BashOrNode

class macaron.code_analyzer.dataflow_analysis.bash.BashSingleCommandNode(definition, context, cmd, args, stdout_redirects)

Bases: InterpretationNode

Interpretation node representing a single Bash command.

Defines how to interpret the semantics of the different supported commands that may be invoked.

__init__(definition, context, cmd, args, stdout_redirects)

Initialize Bash single command node.

Parameters:
definition: bashparser_model.Stmt

Parsed statement AST.

context: core.ContextRef[BashScriptContext]

Bash script context.

cmd: facts.Value

Expression for command name.

args: list[facts.Value | None]

Expressions for argument values (None if unrepresentable).

stdout_redirects: set[facts.Location]

Location expressions for where stdout is redirected to.

identify_interpretations(state)

Interpret the semantics of the different supported commands that may be invoked.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table.

Contains the line number, command expression, argument expressions, stdout redirect location expressions, and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.bash.BashExitNode

Bases: StatementNode

Statement node representing a Bash exit command.

Always exits with the BashExit exit type (which causes the whole script to exit).

apply_effects(before_state)

Apply the effects of the Bash exit.

Returns a BashExit exit state that is otherwise the same as the before state.

Return type:

dict[ExitType, State]

class macaron.code_analyzer.dataflow_analysis.bash.LiteralOrEnvVar(is_env_var, literal)

Bases: object

Represents either a literal or a read of an environment variable.

__init__(is_env_var, literal)
is_env_var: bool

Whether this represents an environment variable (or else a string literal).

literal: str

The environment variable name or string literal value.

macaron.code_analyzer.dataflow_analysis.bash.is_simple_var_read(param_exp)

Return whether expression is a simple env var read e.g. $ENV_VAR.

Return type:

bool

macaron.code_analyzer.dataflow_analysis.bash.parse_env_var_read_word_part(part, allow_dbl_quoted)

Parse word part as a read of an environment variable.

If the given word part is a read of an env var (possibly enclosed in double quotes, if allowed), return the name of the variable, otherwise None.

Return type:

str | None

macaron.code_analyzer.dataflow_analysis.bash.parse_env_var_read_word(word, allow_dbl_quoted)

Parse word as a read of an environment variable.

If the given word is a read of an env var (possibly enclosed in double quotes, if allowed), return the name of the variable, otherwise None.

Return type:

str | None

macaron.code_analyzer.dataflow_analysis.bash.parse_content(parts, allow_dbl_quoted)

Parse the given sequence of word parts.

Return a representation as a sequence of string literal and env var reads, or else return None if not representable in this way.

If allow_dbl_quoted is True, permit word parts to be double quoted expressions, the content of which will be included in the sequence (if False, return None if the sequence contains double quoted expressions).

Return type:

list[LiteralOrEnvVar] | None

macaron.code_analyzer.dataflow_analysis.bash.convert_shell_value_sequence_to_fact_value(content, context)

Convert sequence of Bash values into a single concatenated expression.

Return type:

Value

macaron.code_analyzer.dataflow_analysis.bash.convert_shell_value_to_fact_value(val, context)

Convert a Bash literal or env var read into a value expression.

Return type:

Value

macaron.code_analyzer.dataflow_analysis.bash.convert_shell_word_to_value(word, context)

Convert a Bash word into a value expression.

Return value expression alongside a bool indicating whether the value is “quoted” (or else may require further expansion post-resolution if “unquoted”).

Return type:

tuple[Value, bool] | None

macaron.code_analyzer.dataflow_analysis.bash.parse_dbl_quoted_string(word)

Parse double quoted string.

If the given word is a double quoted expression, return a representation as a sequence of string literal and env var reads, or else return None if it is not a double quoted expression or if it is not representable in this way.

Return type:

list[LiteralOrEnvVar] | None

macaron.code_analyzer.dataflow_analysis.bash.parse_sgl_quoted_string(word)

Parse single quoted string.

If the given word is a single quoted string, return the string literal content, otherwise return None.

Return type:

str | None

macaron.code_analyzer.dataflow_analysis.bash.parse_singular_literal(word)

Parse singular literal word.

If the given word is a single literal, return the string literal content, otherwise return None.

Return type:

str | None

macaron.code_analyzer.dataflow_analysis.bash.parse_bash_expr(expr)

Parse bash expression.

Results are cached to avoid unnessary invocations of the Bash parser (since it requires spawning a separate process).

Return type:

list[Word] | None

macaron.code_analyzer.dataflow_analysis.cmd_parser module

This module contains parsers for command line interfaces for commands relevant to analysis.

macaron.code_analyzer.dataflow_analysis.cmd_parser.parse_python_command_line(args)

Parse python command line.

Parameters:

args (list[str]) – Argument list to python command

Returns:

Parsed python command args

Return type:

argparse.Namespace

macaron.code_analyzer.dataflow_analysis.cmd_parser.main()

Test python command line parser.

Return type:

None

macaron.code_analyzer.dataflow_analysis.core module

Core dataflow analysis framework definitions and algorithm.

macaron.code_analyzer.dataflow_analysis.core.reset_debug_sequence_number()

Reset debug sequence number.

Return type:

None

macaron.code_analyzer.dataflow_analysis.core.get_debug_sequence_number()

Get current debug sequence number value.

Return type:

int

macaron.code_analyzer.dataflow_analysis.core.increment_debug_sequence_number()

Increment debug sequence number.

Return type:

None

class macaron.code_analyzer.dataflow_analysis.core.StateDebugLabel(sequence_number, copied)

Bases: object

Label for state fact providing information useful for debugging.

Provides a record of analysis ordering and whether the fact was just copied from another state rather than newly produced.

sequence_number: int

Sequence number at time when state fact was created.

copied: bool

Whether the state fact is just copied from another state rather than newly produced.”””

__init__(sequence_number, copied)
class macaron.code_analyzer.dataflow_analysis.core.StateTransferFilter

Bases: ABC

Interface for state transfer filters, which filter out state facts by location.

abstractmethod should_transfer(loc)

Return whether facts with the given locations should be transferred or else filtered out.

Return type:

bool

class macaron.code_analyzer.dataflow_analysis.core.State

Bases: object

Representation of the abstract storage state at some program point.

Consists of a set of abstract locations, each associated with a set of possible values.

__init__()

Construct an empty state.

state: dict[Location, dict[Value, StateDebugLabel]]

Mapping of locations to a set of possible values. Values are annotated with a label containing info relevant for debugging

class macaron.code_analyzer.dataflow_analysis.core.DefaultStateTransferFilter

Bases: StateTransferFilter

Default state transfer filter that includes all locations.

should_transfer(loc)

Transfer all locations.

Return type:

bool

class macaron.code_analyzer.dataflow_analysis.core.ExcludedLocsStateTransferFilter(excluded_locs)

Bases: StateTransferFilter

State transfer filter that excludes any locations in the given set.

__init__(excluded_locs)

Construct filter that excludes the given locations.

excluded_locs: set[Location]

Locations to exclude.

should_transfer(loc)

Return whether facts with the given locations should be transferred or else filtered out.

Return type:

bool

class macaron.code_analyzer.dataflow_analysis.core.ExcludedScopesStateTransferFilter(excluded_scopes)

Bases: StateTransferFilter

State transfer filter that excludes any locations that are within the scopes in the given set.

__init__(excluded_scopes)

Construct filter that excludes the given scopes.

excluded_scopes: set[Scope]

Scopes to exclude.

should_transfer(loc)

Return whether facts with the given locations should be transferred or else filtered out.

Return type:

bool

macaron.code_analyzer.dataflow_analysis.core.transfer_state(src_state, dest_state, transfer_filter=<macaron.code_analyzer.dataflow_analysis.core.DefaultStateTransferFilter object>, debug_is_copy=True)

Transfer/copy all facts in the src state to the dest state, except those excluded by the given filter.

Parameters:
  • src_state (State) – The state to transfer facts from.

  • dest_state (State) – The state to modify by transferring facts to.

  • transfer_filter (StateTransferFilter) – The filter to apply to the transferred facts (by default, transfer all).

  • debug_is_copy (bool) – Whether the facts newly added to the dest state should be recorded as being copied or not (for debugging purposes).

Returns:

Whether the dest state was modified.

Return type:

bool

class macaron.code_analyzer.dataflow_analysis.core.ExitType

Bases: ABC

Representation of an exit type, describing the manner in which the execution of a node may terminate.

class macaron.code_analyzer.dataflow_analysis.core.DefaultExit

Bases: ExitType

Default, normal exit.

class macaron.code_analyzer.dataflow_analysis.core.Node

Bases: ABC

Base class of all node types in dataflow analysis.

Subclasses will represent the various program/semantic constructs, and define how to analyse them.

__init__()

Initialize with empty states.

before_state: State

Abstract state at the point before the execution of this node.

exit_states: dict[ExitType, State]

Abstract state at the point after the execution of this node, for each possible distinct exit type.

created_debug_sequence_num: int

Sequence number at the point the node was created, recorded for debugging purposes.

processed_log: list[tuple[int, int]]

Log of begin/end sequence numbers each time this node was processed, recorded for debugging purposes.

abstractmethod children()

Yield the child nodes of this node.

Return type:

Iterator[Node]

abstractmethod analyse()

Perform analysis of this node (and potentially any child nodes).

Update the exit states with the analysis result. Returns whether anything was modified.

Return type:

bool

is_processed()

Return whether this node has been processed.

Return type:

bool

notify_processed(begin_seq_num, end_seq_num)

Record that this node has been processed.

Return type:

None

get_exit_state_transfer_filter()

Return the state transfer filter applicable to the exit state of this node.

By default, nothing is excluded. Subclasses should override to provide appropriate filters to avoid transferring state that will be irrelevant after the node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a table of stringified properties, describing the details of this node, for debugging purposes.

The returned properties table is a mapping of name to value-set, which can be rendered via the functions in the printing module.

Return type:

dict[str, set[tuple[str | None, str]]]

macaron.code_analyzer.dataflow_analysis.core.node_is_not_none(node)

Return whether the given node is not None.

Return type:

TypeGuard[Node]

macaron.code_analyzer.dataflow_analysis.core.traverse_bfs(node)

Traverse the node tree in a breadth-first manner, yielding the nodes (including this node) in traversal order.

Return type:

Iterator[Node]

macaron.code_analyzer.dataflow_analysis.core.build_parent_mapping(node)

Construct a mapping of nodes to their parent nodes.

Return type:

dict[Node, Node]

class macaron.code_analyzer.dataflow_analysis.core.NodeForest(root_nodes)

Bases: object

A collection of independent root nodes (with no control-flow or relation between them).

__init__(root_nodes)

Construct a NodeForest for the given nodes, and build the parent mapping.

root_nodes: list[Node]

Collection of root nodes.

parents: dict[Node, Node]

Mapping of nodes to their parent nodes.

class macaron.code_analyzer.dataflow_analysis.core.ControlFlowGraph(entry)

Bases: object

Graph structure to represent control flow graphs.

__init__(entry)

Construct an initially-empty control flow graph.

entry: Node

Entry node.

successors: dict[Node, dict[ExitType, set[Node | ExitType]]]

Graph of successor edges. Each edge is from a particular exit of a particular node, either to a node or to an exit of the control flow itself.

get_entry()

Return the entry node.

Return type:

Node

add_successor(src, exit_type, dest)

Add a successor edge to the control flow graph.

Return type:

None

get_successors(node, exit_type)

Return the successors for a particular exit of a particular node.

Return type:

set[Node | ExitType]

static create_from_sequence(seq)

Construct a linear sequence of nodes.

Return type:

ControlFlowGraph

class macaron.code_analyzer.dataflow_analysis.core.ControlFlowGraphNode

Bases: Node

Base class for nodes representing control-flow constructs.

Defines the generic algorithm for analysing control flow graphs. Subclasses will define the child nodes and concrete graph structure.

analyse()

Perform analysis of this node.

Performs analysis of the child nodes and propagates state from the exit state of an updated node to the before state of its successor nodes, according to the control-flow-graph structure, then analyses the successor nodes, and so on until a fixpoint is reached and no further updates may be made to any node states.

Returns whether anything was modified.

Return type:

bool

abstractmethod get_entry()

Return the entry node.

Return type:

Node | None

abstractmethod get_successors(node, exit_type)

Return the successors for a particular exit of a particular node.

Return type:

set[Node | ExitType]

class macaron.code_analyzer.dataflow_analysis.core.StatementNode

Bases: Node

Base class for nodes representing constructs with direct effects (and no child nodes).

Subclasses will define the effects that apply when the node is executed.

analyse()

Perform analysis of this node, by applying the effects to update the after state.

Returns whether anything was modified.

Return type:

bool

children()

Yield nothing, as statements have no child nodes.

Return type:

Iterator[Node]

abstractmethod apply_effects(before_state)

Apply the effects of the statement, given the before state, returning the resulting exit state.

Return type:

dict[ExitType, State]

class macaron.code_analyzer.dataflow_analysis.core.NoOpStatementNode

Bases: StatementNode

Statement that has no effect.

apply_effects(before_state)

Apply the effects of the no-op, returning an exit state that is the same as the before state.

Return type:

dict[ExitType, State]

class macaron.code_analyzer.dataflow_analysis.core.InterpretationKey(*args, **kwargs)

Bases: Protocol

Interpretation key used to identify interpretations that have been produced before.

Must support hashing and equality comparison to allow use as a dict key.

__init__(*args, **kwargs)
class macaron.code_analyzer.dataflow_analysis.core.InterpretationNode

Bases: Node

Base class for nodes representing constructs requiring interpretation.

Such constructs must be interpreted to produce possibly-multiple child nodes representing possible interpretations of the semantics of the node.

Analysing the interpretation node will apply the combined effects of all of the possible interpretations. Subclasses will define how to identify the possible interpretations and generate the corresponding nodes.

__init__()

Initialize node with no interpretations.

interpretations: dict[InterpretationKey, Node]

The generated interpretations of this node, identified/deduplicated by some interpretation key.

children()

Yield each of the possible interpretations.

Return type:

Iterator[Node]

update_interpretations()

Analyse the node to identify interpretations.

Analysis is done in the context of the current before state, adding any new interpretations generated to the interpretations dict.

Return type:

bool

abstractmethod identify_interpretations(state)

Analyse the node, in the context of the given before state, to identify interpretations.

Returns, for each discovered interpretation, an identifying interpretation key that can be used to determine if the interpretation has been produced previously, and a callable that generates the node representing that interpretation (used to generate the node if the interpretation is new, otherwise the previously-generated node will be reused).

Return type:

dict[InterpretationKey, Callable[[], Node]]

analyse()

Perform analysis of this node, by analysing each possible interpretation.

Merges the exit states of each analysed interpretation to update the exit state of this node.

Returns whether anything was modified.

Return type:

bool

class macaron.code_analyzer.dataflow_analysis.core.OwningContextRef(ref)

Bases: Generic[R_co]

A reference to a part of a node’s context that “owns” it.

Ownership is used to identify what scopes are tied to a particular node such that they cease to exist or become irrelevant after the node exits, and thus any values stored in locations within those scopes may be erased from the state beyond that point to simplify the state.

ref: TypeVar(R_co, covariant=True)
get_non_owned()

Return a non owning reference to the same object.

Return type:

NonOwningContextRef[TypeVar(R_co, covariant=True)]

__init__(ref)
class macaron.code_analyzer.dataflow_analysis.core.NonOwningContextRef(ref)

Bases: Generic[R_co]

A reference to a part of a node’s context that does not “own” it.

Ownership is used to identify what scopes are tied to a particular node such that they cease to exist or become irrelevant after the node exits, and thus any values stored in locations within those scopes may be erased from the state beyond that point to simplify the state.

ref: TypeVar(R_co, covariant=True)
get_non_owned()

Return a non-owning reference to the same object.

Return type:

NonOwningContextRef[TypeVar(R_co, covariant=True)]

__init__(ref)
class macaron.code_analyzer.dataflow_analysis.core.Context

Bases: ABC

Base class for node contexts.

Represents the necessary context that influences the analysis of a node, primarily that of identifying the concrete scopes that fill particular roles in the node.

abstractmethod direct_refs()

Yield the direct references of the context, either to scopes or to other contexts.

Return type:

Iterator[Union[OwningContextRef[Context], NonOwningContextRef[Context], OwningContextRef[Scope], NonOwningContextRef[Scope]]]

owned_scopes()

Yield the scopes that are owned by this context.

Owned scopes are those that are directly referenced by owning references or scopes that are indirectly referenced by owning references, through referenced contexts that are referenced by owning references.

Return type:

Iterator[OwningContextRef[Scope]]

class macaron.code_analyzer.dataflow_analysis.core.AnalysisContext(repo_path)

Bases: Context

Outermost context of the analysis.

Records the path to the repo checkout, to allow the analysis access to files in the repo.

repo_path: str | None
direct_refs()

No direct references, yields nothing.

Return type:

Iterator[Union[OwningContextRef[Context], NonOwningContextRef[Context], OwningContextRef[Scope], NonOwningContextRef[Scope]]]

__init__(repo_path)
class macaron.code_analyzer.dataflow_analysis.core.SimpleSequence(seq)

Bases: ControlFlowGraphNode

Control-flow-graph node representing the execution of a sequence of nodes.

__init__(seq)

Construct control-flow-graph from sequence.

seq: list[Node]

The sequence of nodes to execute.

children()

Yield the nodes in the sequence.

Return type:

Iterator[Node]

get_entry()

Return the entry node, the first in the sequence.

Return type:

Node

get_successors(node, exit_type)

Return the successor for a given node (the next in the sequence or the exit in the case of the last node).

Return type:

set[Node | ExitType]

class macaron.code_analyzer.dataflow_analysis.core.SimpleAlternatives(alts)

Bases: InterpretationNode

Interpretation node representing a concrete set of alternative nodes.

__init__(alts)

Initialize node.

alts: list[Node]

The alternatives.

identify_interpretations(state)

Return the interpretations of this node, that is, each of the alternatives.

Return type:

dict[InterpretationKey, Callable[[], Node]]

macaron.code_analyzer.dataflow_analysis.core.get_owned_scopes(context)

Return the set of scopes owned via the given reference to a context.

Returns empty if the given reference is non-owning.

Return type:

set[Scope]

macaron.code_analyzer.dataflow_analysis.evaluation module

Functions for evaluating and resolving dataflow analysis expressions.

macaron.code_analyzer.dataflow_analysis.evaluation.evaluate(node, value)

Evaluate the given value, at the point immediately prior to the execution of the given node.

Parameters:
  • node (core.Node) – The node at which to evaluate the value (i.e. in the context of the before state of the node).

  • value (facts.Value) – The value expression to evaluate.

Returns:

The set of possible resolved values for the value expression, each with a record of the resolved value chosen for any read expressions.

Return type:

set[tuple[facts.Value, ReadBindings]]

class macaron.code_analyzer.dataflow_analysis.evaluation.WriteStatement(location, value)

Bases: object

Representation of a write to a given location of a given value.

location: Location

The location to write to.

value: Value

The value to write.

perform_write(before_state)

Return a state containing only the values stored by the write operation, in context of the before state.

Also returns the set of locations within that state which should be considered to have been overwritten, erasing any previous values.

Return type:

tuple[State, set[Location]]

__init__(location, value)
class macaron.code_analyzer.dataflow_analysis.evaluation.StatementSet(stmts)

Bases: object

Representation of a set of (simultaneous) write operations.

stmts: set[WriteStatement]

The set of writes.

apply_effects(before_state)

Apply the effect of the set of writes, returning the resulting state.

Return type:

State

static union(*stmt_sets)

Combine multiple write sets into one.

Return type:

StatementSet

__init__(stmts)
class macaron.code_analyzer.dataflow_analysis.evaluation.ParameterPlaceholderTransformer(allow_unbound_params=True, value_parameter_binds=None, location_parameter_binds=None, scope_parameter_binds=None)

Bases: object

Expression transformer which replaces parameter placeholders with their corresponding bound values.

__init__(allow_unbound_params=True, value_parameter_binds=None, location_parameter_binds=None, scope_parameter_binds=None)

Initialize transformer with bindings.

Parameters:
  • allow_unbound_params (bool) – Whether to raise an exception if a parameter is found with no provided binding.

  • value_parameter_binds (dict[str, facts.Value] | None) – Bindings for value parameter placeholders, mapping parameter name to bound value expression.

  • location_parameter_binds (dict[str, facts.Value] | None) – Bindings for location parameter placeholders, mapping parameter name to bound location expression.

  • scope_parameter_binds (dict[str, facts.Value] | None) – Bindings for scope parameter placeholders, mapping parameter name to bound scope.

allow_unbound_params: bool

Whether to raise an exception if a parameter is found with no provided binding.

value_parameter_binds: dict[str, Value]

Bindings for value parameter placeholders, mapping parameter name to bound value expression.

location_parameter_binds: dict[str, LocationSpecifier]

Bindings for location parameter placeholders, mapping parameter name to bound location expression.

scope_parameter_binds: dict[str, Scope]

Bindings for scope parameter placeholders, mapping parameter name to bound scope.

transform_value(value)

Transform given value expression.

Returns a value expression with any parameter placeholders replaced with their bound values.

Return type:

Value

transform_location(location)

Transform given location expression.

Returns a location expression with any parameter placeholders replaced with their bound values.

Return type:

Location

transform_location_specifier(location)

Transform given location specifier expression.

Returns a location specifier expression with any parameter placeholders replaced with their bound values.

Return type:

LocationSpecifier

transform_scope(scope)

Transform given scope.

Returns a scope with any parameter placeholders replaced with their bound values.

Return type:

Scope

transform_statement(statement)

Transform given write statement.

Returns a write statement with any parameter placeholders replaced with their bound values.

Return type:

WriteStatement

transform_statement_set(statement_set)

Transform given write statement set.

Returns a write statement set with any parameter placeholders replaced with their bound values.

Return type:

StatementSet

macaron.code_analyzer.dataflow_analysis.evaluation.is_singleton(s, e)

Return whether the given set contains only the single given element.

Return type:

bool

macaron.code_analyzer.dataflow_analysis.evaluation.is_singleton_no_bindings(s, e)

Return whether the given set contains only the single given element with no read bindings.

Return type:

bool

macaron.code_analyzer.dataflow_analysis.evaluation.scope_matches(read_scope, stored_scope)

Return whether the given read scope matches the given stored scope.

Matching means that a read of the read scope may return values from the stored scope.

Return type:

bool

macaron.code_analyzer.dataflow_analysis.evaluation.location_subsumes(loc, subloc)

Return whether the given location subsumes the given sub location.

Subsumption means that a read of subloc may be considered to be a read of loc or some part thereof.

Return type:

bool

macaron.code_analyzer.dataflow_analysis.evaluation.get_values_for_subsumed_read(read_loc, state_loc, state_vals)

Return the set of values stored in the state location, if relevant for the given read location.

Return type:

set[Value]

class macaron.code_analyzer.dataflow_analysis.evaluation.ReadBindings(binds=None)

Bases: object

Set of bindings of read expressions to values bound as the result of those read expressions.

__init__(binds=None)

Initialize with given bindings.

bindings: frozendict[Read, Value]

Mapping of read expressions to bound values.

with_binding(read, value)

Return bindings with the given additional binding, or None if the bindings conflict.

Return type:

ReadBindings | None

with_bindings(bindings)

Return bindings with the given additional bindings, or None if the bindings conflict.

Return type:

ReadBindings | None

static combine_bindings(bindings_list)

Return bindings combining all bindings in the given list, or None if the bindings conflict.

Return type:

ReadBindings | None

class macaron.code_analyzer.dataflow_analysis.evaluation.EvaluationTransformer(state)

Bases: object

Expression transformer which evaluates the expression to produce a set of resolved values.

The expression is evaluated in the context of a specified abstract storage state.

__init__(state)

Initialize transformer with state from which to resolve reads.

state: State

The state from which to resolve reads.

transform_write(location, value)

Transform a write location and value, returning the set of resolved values with the necessary bindings.

Return type:

set[tuple[Location, Value, ReadBindings]]

transform_value(value)

Transform a value expression, returning the set of resolved values with the necessary bindings.

Return type:

set[tuple[Value, ReadBindings]]

transform_location(location)

Transform a location expression, returning the set of resolved values with the necessary bindings.

Return type:

set[tuple[Location, ReadBindings]]

transform_location_specifier(location)

Transform a location specifier expression, returning the set of resolved values with the necessary bindings.

Return type:

set[tuple[LocationSpecifier, ReadBindings]]

class macaron.code_analyzer.dataflow_analysis.evaluation.ContainsSymbolicVisitor

Bases: object

Visitor to determine whether a given expression contains any symbolic expressions.

visit_value(value)

Search value expression for symbolic expressions and return whether any were found.

Return type:

bool

visit_location(location)

Search location expression for symbolic expressions and return whether any were found.

Return type:

bool

visit_location_specifier(location)

Search location specifier expression for symbolic expressions and return whether any were found.

Return type:

bool

macaron.code_analyzer.dataflow_analysis.evaluation.filter_symbolic_values(values)

Filter out symbolic values.

Returns a set containing all elements from the given set that do not contain any symbolic expressions.

Return type:

set[tuple[Value, ReadBindings]]

macaron.code_analyzer.dataflow_analysis.evaluation.filter_symbolic_locations(locs)

Filter out symbolic locations.

Returns a set containing all elements from the given set that do not contain any symbolic expressions.

Return type:

set[tuple[Location, ReadBindings]]

macaron.code_analyzer.dataflow_analysis.evaluation.filter_symbolic_location_specifiers(locs)

Filter out symbolic location specifiers.

Returns a set containing all elements from the given set that do not contain any symbolic expressions.

Return type:

set[tuple[LocationSpecifier, ReadBindings]]

macaron.code_analyzer.dataflow_analysis.evaluation.get_single_resolved_str(resolved_values)

If the given set contains only a single string literal value, return that string, or else None.

Return type:

str | None

macaron.code_analyzer.dataflow_analysis.evaluation.get_single_resolved_str_with_default(resolved_values, default_value)

If the given set contains only a single string literal value, return that string, else return default value.

Return type:

str

macaron.code_analyzer.dataflow_analysis.evaluation.parse_str_expr_split(str_expr, delimiter_char, maxsplit=-1)

Split a string expression on the appearance of the delimiter char in literal parts of the expression.

Return type:

list[Value]

macaron.code_analyzer.dataflow_analysis.facts module

Definitions of dataflow analysis representation for value expressions and abstract storage locations.

Also includes an incomplete implementation of serialization/deserialization to a Souffle-datalog-compatible representation, which originated as a remnant of a previous prototype version that involved the datalog engine in the analysis, but is retained here because the serialization is useful for producing a human-readable string representation for debugging purposes, and it may be necessary in future to make these expressions available to the policy engine (which uses datalog). Deserialization is currently non-functional primarily due to the inability to deserialize scope identity, but may potentially be revisited in future, so is left here for posterity.

class macaron.code_analyzer.dataflow_analysis.facts.Value

Bases: ABC

Base class for value expressions.

Subclasses should be comparable by structural equality.

abstractmethod to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

class macaron.code_analyzer.dataflow_analysis.facts.LocationSpecifier

Bases: ABC

Base class for location expressions.

Subclasses should be comparable by structural equality.

abstractmethod to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

class macaron.code_analyzer.dataflow_analysis.facts.Scope(name, outer_scope=None)

Bases: object

Representation of a scope in which a location may exist.

This allows for distinct locations with the same name/path/expression to exist separately in different namespaces.

A scope may have an outer scope, such that a read from a scope may return values from the outer scope(s).

Unlike other expression classes, scopes are distinguished by object identity and not structural equality (TODO now that scopes have names, maybe should revisit this since it makes serialization/deserialization difficult).

__init__(name, outer_scope=None)

Initialize scope.

Parameters:
  • name (str) – Name for display purposes (a sequence number will automatically be appended to make it unique).

  • outer_scope (Scope | None) – Outer scope, if any.

outer_scope: Scope | None

Outer scope, if any.

identifier: str

Name for display purposes.

to_datalog_fact_string(include_outer_scope=False)

Return string representation of scope (in datalog serialized format).

Return type:

str

class macaron.code_analyzer.dataflow_analysis.facts.ParameterPlaceholderScope(name)

Bases: Scope

Special scope placeholder to allow generic parameterized expressions.

TODO This is not really a proper subclass of Scope, should revisit type relationship.

__init__(name)

Initialize placeholder scope with given parameter name.

name: str

Parameter name.

to_datalog_fact_string(include_outer_scope=False)

Return string representation of scope (in datalog serialized format).

Return type:

str

class macaron.code_analyzer.dataflow_analysis.facts.Location(scope, loc)

Bases: object

A location expression qualified with the scope it resides in.

scope: Scope

Scope the location resides in.

loc: LocationSpecifier

Location expression.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(scope, loc)
class macaron.code_analyzer.dataflow_analysis.facts.StringLiteral(literal)

Bases: Value

Value expression representing a string literal.

literal: str

String literal.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(literal)
class macaron.code_analyzer.dataflow_analysis.facts.Read(loc)

Bases: Value

Value expression representing a read of the value stored at a location.

loc: Location

Read value location.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(loc)
class macaron.code_analyzer.dataflow_analysis.facts.ArbitraryNewData(at)

Bases: Value

Value expression representing some arbitrary data.

at: str

Name distiguishing the origin of the data.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(at)
class macaron.code_analyzer.dataflow_analysis.facts.InstalledPackage(name, version, distribution, url)

Bases: Value

Value expression representing an installed package, with identifying metadata (name, version, etc.).

name: Value

Package name.

version: Value

Package version.

distribution: Value

Package distribution.

url: Value

URL of the package.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(name, version, distribution, url)
class macaron.code_analyzer.dataflow_analysis.facts.UnaryStringOperator(value)

Bases: Enum

Unary operators.

BASENAME = 1
BASE64_ENCODE = 2
BASE64DECODE = 3
macaron.code_analyzer.dataflow_analysis.facts.un_op_to_datalog_fact_string(op)

Return string representation of operator (in datalog serialized format).

Return type:

str

class macaron.code_analyzer.dataflow_analysis.facts.BinaryStringOperator(value)

Bases: Enum

Binary operators.

STRING_CONCAT = 1
macaron.code_analyzer.dataflow_analysis.facts.bin_op_to_datalog_fact_string(op)

Return string representation of operator (in datalog serialized format).

Return type:

str

class macaron.code_analyzer.dataflow_analysis.facts.UnaryStringOp(op, operand)

Bases: Value

Value expression representing a unary operator.

op: UnaryStringOperator

Operator.

operand: Value

Operand value.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(op, operand)
class macaron.code_analyzer.dataflow_analysis.facts.BinaryStringOp(op, operand1, operand2)

Bases: Value

Value expression representing a binary operator.

op: BinaryStringOperator

Operator.

operand1: Value

First operand value.

operand2: Value

Second operand value.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

static get_string_concat(operand1, operand2)

Construct a string concatenation operator.

Applies some simple constant-folding simplifications.

Return type:

Value

__init__(op, operand1, operand2)
class macaron.code_analyzer.dataflow_analysis.facts.ParameterPlaceholderValue(name)

Bases: Value

Special placeholder value to allow generic parameterized expressions.

name: str

Parameter name.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(name)
class macaron.code_analyzer.dataflow_analysis.facts.Symbolic(val)

Bases: Value

Value expression representing a symbolic expression.

Represents an expression that has been “frozen” in symbolic form rather than evaluated concretely.

val: Value

Symbolic expression.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(val)
class macaron.code_analyzer.dataflow_analysis.facts.SingleBashTokenConstraint(val)

Bases: Value

Value expression representing a constraint that the underlying value does not parse as multiple Bash tokens.

val: Value

Constrained expression.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(val)
class macaron.code_analyzer.dataflow_analysis.facts.Filesystem(path)

Bases: LocationSpecifier

Location expression representing a filesystem location at a particular file path.

path: Value

Filepath value.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(path)
class macaron.code_analyzer.dataflow_analysis.facts.Variable(name)

Bases: LocationSpecifier

Location expression representing a variable.

name: Value

Variable name.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(name)
class macaron.code_analyzer.dataflow_analysis.facts.Artifact(name, file)

Bases: LocationSpecifier

Location expression representing a file stored within some named artifact storage location.

name: Value

Artifact name.

file: Value

File name within artifact.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(name, file)
class macaron.code_analyzer.dataflow_analysis.facts.FilesystemAnyUnderDir(path)

Bases: LocationSpecifier

Location expression representing any file under a particular directory.

path: Value

Directory file path.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(path)
class macaron.code_analyzer.dataflow_analysis.facts.ArtifactAnyFilename(name)

Bases: LocationSpecifier

Location expression representing any file contained with a named artifact storage location.

name: Value

Artifact name.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(name)
class macaron.code_analyzer.dataflow_analysis.facts.ParameterPlaceholderLocation(name)

Bases: LocationSpecifier

Special placeholder location expression to allow generic parameterized expressions.

name: str

Parameter name.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(name)
class macaron.code_analyzer.dataflow_analysis.facts.Console

Bases: LocationSpecifier

Location expression representing a console, pipe or other text stream.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__()
class macaron.code_analyzer.dataflow_analysis.facts.Installed(name)

Bases: LocationSpecifier

Location expression representing an installed package.

name: Value

Package name.

to_datalog_fact_string()

Return string representation of expression (in datalog serialized format).

Return type:

str

__init__(name)
macaron.code_analyzer.dataflow_analysis.facts.enquote_datalog_string_literal(literal)

Enquote a datalog string literal, with appropriate escaping.

Return type:

str

exception macaron.code_analyzer.dataflow_analysis.facts.FactParseError

Bases: Exception

Happens when an error occurs during fact parsing.

macaron.code_analyzer.dataflow_analysis.facts.consume_whitespace(text)

Consume leading whitespace, returning the remainder to the text.

Return type:

str

macaron.code_analyzer.dataflow_analysis.facts.consume(text, token)

Consume the leading token from the text.

Raises exception if text does not start with the token.

Return type:

str

macaron.code_analyzer.dataflow_analysis.facts.parse_qualified_name(text)

Parse a qualified name, returning the name and the remainder of the text.

Return type:

tuple[str, str]

macaron.code_analyzer.dataflow_analysis.facts.parse_symbol(text)

Parse datalog-serialized string literal.

Return type:

tuple[str, str]

macaron.code_analyzer.dataflow_analysis.facts.parse_location_specifier(text)

Deserialize location specifier from string representation (in datalog serialized format).

Return type:

tuple[LocationSpecifier, str]

macaron.code_analyzer.dataflow_analysis.facts.parse_location(text)

Deserialize location from string representation (in datalog serialized format).

Currently non-functional primarily due to the inability to deserialize scope identity.

Return type:

tuple[Location, str]

macaron.code_analyzer.dataflow_analysis.facts.parse_value(text)

Deserialize value expression from string representation (in datalog serialized format).

Return type:

tuple[Value, str]

macaron.code_analyzer.dataflow_analysis.facts.parse_un_op(text)

Deserialize unary operator from string representation (in datalog serialized format).

Return type:

tuple[UnaryStringOperator, str]

macaron.code_analyzer.dataflow_analysis.facts.parse_bin_op(text)

Deserialize binary operator from string representation (in datalog serialized format).

Return type:

tuple[BinaryStringOperator, str]

macaron.code_analyzer.dataflow_analysis.github module

Dataflow analysis implementation for analysing GitHub Actions Workflow build pipelines.

class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsWorkflowContext(analysis_context, artifacts, releases, env, workflow_variables, console, source_filepath)

Bases: Context

Context for the top-level scope of a GitHub Actions Workflow.

analysis_context: Union[OwningContextRef[AnalysisContext], NonOwningContextRef[AnalysisContext]]

Outer analysis context.

artifacts: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for artifact storage within the pipeline execution (for upload/download artifact).

releases: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for artifacts published as GitHub releases by the pipeline.

env: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for environment variables (env block at top-level of workflow).

workflow_variables: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for variables within the workflow.

console: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for console output.

source_filepath: str

Filepath of workflow file.

static create(analysis_context, source_filepath)

Create a new workflow context and its associated scopes.

Parameters:
  • analysis_context (core.ContextRef[core.AnalysisContext]) – Outer analysis context.

  • source_filepath (str) – Filepath of workflow file.

Returns:

The new workflow context.

Return type:

GitHubActionsWorkflowContext

direct_refs()

Yield the direct references of the context, either to scopes or to other contexts.

Return type:

Iterator[Union[OwningContextRef[Context], NonOwningContextRef[Context], OwningContextRef[Scope], NonOwningContextRef[Scope]]]

__init__(analysis_context, artifacts, releases, env, workflow_variables, console, source_filepath)
class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsJobContext(workflow_context, filesystem, env, job_variables)

Bases: Context

Context for a job within a GitHub Actions Workflow.

workflow_context: Union[OwningContextRef[GitHubActionsWorkflowContext], NonOwningContextRef[GitHubActionsWorkflowContext]]

Outer workflow context.

filesystem: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for filesystem used by the job and its steps.

env: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for environment variables (env block at job level).

job_variables: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for variables within the job (step output variables, etc.).

static create(workflow_context)

Create a new job context and its associated scopes.

Env and job variables scopes inherit from outer context.

Parameters:

workflow_context (core.ContextRef[GitHubActionsWorkflowContext]) – Outer workflow context.

Returns:

The new job context.

Return type:

GitHubActionsJobContext

direct_refs()

Yield the direct references of the context, either to scopes or to other contexts.

Return type:

Iterator[Union[OwningContextRef[Context], NonOwningContextRef[Context], OwningContextRef[Scope], NonOwningContextRef[Scope]]]

__init__(workflow_context, filesystem, env, job_variables)
class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsStepContext(job_context, env, output_var_prefix)

Bases: Context

Context for a step within a job within a GitHub Actions Workflow.

job_context: Union[OwningContextRef[GitHubActionsJobContext], NonOwningContextRef[GitHubActionsJobContext]]

Outer job context.

env: Union[OwningContextRef[Scope], NonOwningContextRef[Scope]]

Scope for environment variables (env block at step level)

output_var_prefix: str | None

Name prefix for step output variables (stored in the job variables) belonging to this step (e.g. “steps.step_id.outputs.”)

static create(job_context, step_id)

Create a new step context and its associated scopes.

Env scope inherits from outer context. Output var prefix is derived from step_id.

Parameters:
  • job_context (core.ContextRef[GitHubActionsJobContext]) – Outer job context.

  • step_id (str | None) – Step id. If provided, used to derive name previx for step output variables.

Returns:

The new step context.

Return type:

GitHubActionsStepContext

direct_refs()

Yield the direct references of the context, either to scopes or to other contexts.

Return type:

Iterator[Union[OwningContextRef[Context], NonOwningContextRef[Context], OwningContextRef[Scope], NonOwningContextRef[Scope]]]

__init__(job_context, env, output_var_prefix)
class macaron.code_analyzer.dataflow_analysis.github.RawGitHubActionsWorkflowNode(definition, context)

Bases: InterpretationNode

Interpretation node representing a GitHub Actions Workflow.

Defines how to interpret a parsed workflow and generate its analysis representation.

__init__(definition, context)

Initialize node.

Typically, construction should be done via the create function rather than using this constructor directly.

definition: github_workflow_model.Workflow

Parsed workflow AST.

context: core.ContextRef[GitHubActionsWorkflowContext]

Workflow context

identify_interpretations(state)

Interpret the workflow AST to generate control flow representation.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the workflow name and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(workflow, analysis_context, source_filepath)

Create workflow node and its associated context.

Parameters:
  • workflow (github_workflow_model.Workflow) – Parsed workflow AST.

  • analysis_context (core.ContextRef[core.AnalysisContext]) – Outer analysis context.

  • source_filepath (str) – Filepath of workflow file.

Returns:

The new workflow node.

Return type:

RawGitHubActionsWorkflowNode

class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsWorkflowNode(definition, context, env_block, jobs, order)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a GitHub Actions Workflow.

Control flow structure executes each job in an arbitrary linear sequence (by default a topological sort satsifying the job dependencies). If an env block exists, it is applied beforehand.

__init__(definition, context, env_block, jobs, order)

Initialize workflow node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: github_workflow_model.Workflow

Parsed workflow AST.

context: core.ContextRef[GitHubActionsWorkflowContext]

Workflow context.

env_block: RawGitHubActionsEnvNode | None

Node to apply effects of env block, if any.

jobs: dict[str, RawGitHubActionsJobNode]

Job nodes, identified by their job id.

order: list[str]

List of job ids specifying job execution order.

children()

Yield the child nodes of this node.

Return type:

Iterator[Node]

get_entry()

Return the entry node.

Return type:

Node

get_successors(node, exit_type)

Return the successors for a particular exit of a particular node.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the workflow name and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(workflow, context)

Create workflow node from workflow AST.

Also creates a job node for each job, and performs a topological sort of the job dependency graph to choose an arbitrary valid sequential execution order.

Parameters:
Returns:

The new workflow node.

Return type:

GitHubActionsWorkflowNode

class macaron.code_analyzer.dataflow_analysis.github.RawGitHubActionsJobNode(definition, job_id, context)

Bases: InterpretationNode

Interpretation node representing a GitHub Actions Job.

Defines how to interpret the different kinds of jobs (normal jobs, reusable workflow call jobs), and generate their analysis representation.

__init__(definition, job_id, context)

Initialize node.

definition: github_workflow_model.Job

Parsed job AST.

job_id: str

Job id.

context: core.ContextRef[GitHubActionsJobContext]

Job context.

identify_interpretations(state)

Interpret job AST to generate representation for either a normal job or a reusable workflow call job.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the job id and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsNormalJobNode(definition, job_id, matrix_block, env_block, steps, output_block, context)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a GitHub Actions Normal Job.

Control flow structure executes each step in the order defined by the job, preceded by applying the effects of the matrix and env blocks if they exist and succeeded by applying the effects of the output block if it exists. (TODO generating output block not yet implemented).

__init__(definition, job_id, matrix_block, env_block, steps, output_block, context)

Initialize job node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: github_workflow_model.NormalJob

Parsed job AST.

job_id: str

Job id.

matrix_block: RawGitHubActionsMatrixNode | None

Node to apply effects of matrix block, if any.

env_block: RawGitHubActionsEnvNode | None

Node to apply effects of env block, if any.

steps: list[RawGitHubActionsStepNode]

Step nodes, in execution order.

output_block: core.Node | None

Node to apply effects of output block, if any.

context: core.ContextRef[GitHubActionsJobContext]

Job context

children()

Yield the child nodes of this node.

Return type:

Iterator[Node]

get_entry()

Return the entry node.

Return type:

Node

get_successors(node, exit_type)

Return the successors for a particular exit of a particular node.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the job id and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(job, job_id, context)

Create normal job node from job AST. Also creates a step node for each step.

Parameters:
Returns:

The new job node.

Return type:

GitHubActionsNormalJobNode

class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsReusableWorkflowCallNode(definition, job_id, context, uses_name, uses_version, with_parameters)

Bases: InterpretationNode

Interpretation node representing a GitHub Actions Reusable Workflow Call Job.

Defines how to interpret the semantics of different supported reusable workflows that may be invoked (TODO currently none are supported).

__init__(definition, job_id, context, uses_name, uses_version, with_parameters)

Initialize reusable workflow call node.

Parameters:
definition: github_workflow_model.ReusableWorkflowCallJob

Parsed reusable workflow call AST.

job_id: str

Job id.

context: core.ContextRef[GitHubActionsJobContext]

Job context.

uses_name: str

Name of the reusable workflow being invoked (without version component).

uses_version: str | None

Version of the reusable workflow being invoked (if specified).

with_parameters: dict[str, facts.Value]

Input parameters specified for reusable workflow.

identify_interpretations(state)

Intepret the semantics of the different supported reusable workflows.

(TODO currently none are supported).

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table.

Contains the job id, reusable workflow name, and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.github.RawGitHubActionsStepNode(definition, context)

Bases: InterpretationNode

Interpretation node representing a GitHub Actions Step.

Defines how to interpret the different kinds of steps (run jobs, action steps), and generate their analysis representation.

__init__(definition, context)

Intitialize node.

definition: github_workflow_model.Step

Parsed step AST.

context: core.ContextRef[GitHubActionsStepContext]

Step context

identify_interpretations(state)

Interpret step AST to generate representation depending on whether it is a run step or an action step.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table.

Contains the step id, name, action name (if action step), and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.github.RawGitHubActionsActionStepNode(definition, context)

Bases: InterpretationNode

Interpretation node representing a GitHub Actions Action Step.

Defines how to extract the name, version and parameters used to invoke the action, and generate a node with those details resolved for further interpretation.

__init__(definition, context)

Initialize node.

definition: github_workflow_model.ActionStep

Parsed step AST.

context: core.ContextRef[GitHubActionsStepContext]

Step context.

identify_interpretations(state)

Intepret action step AST to extract the name, version and parameters.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the step id, name, action name, and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsActionStepNode(definition, context, uses_name, uses_version, with_parameters)

Bases: InterpretationNode

Interpretation node representing a GitHub Actions Action Step.

Defines how to interpret the semantics of different supported actions that may be invoked.

__init__(definition, context, uses_name, uses_version, with_parameters)

Initialize action step node.

Parameters:
  • definition (github_workflow_model.ActionStep) – Parsed step AST.

  • context (core.ContextRef[GitHubActionsStepContext]) – Step context.

  • uses_name (str) – Name of the action being invoked (without version component).

  • uses_version (str | None) – Version of the action being invoked (if specified).

  • with_parameters (dict[str, facts.Value]) – Input parameters specified for action.

definition: github_workflow_model.ActionStep

Parsed step AST.

context: core.ContextRef[GitHubActionsStepContext]

Step context.

uses_name: str

Name of the action being invoked (without version component).

uses_version: str | None

Version of the action being invoked (if specified).

with_parameters: dict[str, facts.Value]

Input parameters specified for action.

identify_interpretations(state)

Intepret the semantics of the different supported actions.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the step id, name, action name, with parameters, and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.github.GitHubActionsRunStepNode(definition, env_block, shell_block, context)

Bases: ControlFlowGraphNode

Control-flow-graph node representing a GitHub Actions Run Step.

Control flow structure executes the shell script defined by the step. If an env block exists, it is applied beforehand.

__init__(definition, env_block, shell_block, context)

Initialize run step node.

Typically, construction should be done via the create function rather than using this constructor directly.

Parameters:
definition: github_workflow_model.RunStep

Parsed step AST.

env_block: RawGitHubActionsEnvNode | None

Node to apply effects of env block, if any.

shell_block: bash.RawBashScriptNode

Shell script to be run.

context: core.ContextRef[GitHubActionsStepContext]

Step context.

children()

Yield the child nodes of this node.

Return type:

Iterator[Node]

get_entry()

Return the entry node.

Return type:

Node

get_successors(node, exit_type)

Return the successors for a particular exit of a particular node.

Return type:

set[Node | ExitType]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the step id, name, and scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

static create(run_step, context)

Create run step node from step AST.

Parameters:
Returns:

The new run step node.

Return type:

GitHubActionsRunStepNode

class macaron.code_analyzer.dataflow_analysis.github.RawGitHubActionsEnvNode(definition, context)

Bases: InterpretationNode

Interpretation node representing an env block in a GitHub Actions Workflow/Job/Step.

Defines how to interpret the declarative env block to generate imperative constructs to write the values to the env variables.

__init__(definition, context)

Initialize env block node.

Parameters:
definition: github_workflow_model.Env

Parsed env block AST.

context: core.ContextRef[GitHubActionsWorkflowContext | GitHubActionsJobContext | GitHubActionsStepContext]

Outer context.

identify_interpretations(state)

Interpret declarative env block to generate imperative constructs to write to the env vars.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.github.RawGitHubActionsMatrixNode(definition, context)

Bases: InterpretationNode

Interpretation node representing a matrix block in a GitHub Actions Job.

Defines how to interpret the declarative matrix block to generate imperative constructs to write the values to the matrix variables.

__init__(definition, context)

Initialize matrix node.

Parameters:
  • definition (github_workflow_model.Matrix) – Parsed matrix block AST.

  • context (core.ContextRef[GitHubActionsJobContext]) – Outer job context.

definition: github_workflow_model.Matrix

Parsed matrix block AST.

context: core.ContextRef[GitHubActionsJobContext]

Outer job context.

identify_interpretations(state)

Interpret declarative matrix block to generate imperative constructs to write to the matrix variables.

Return type:

dict[InterpretationKey, Callable[[], Node]]

get_exit_state_transfer_filter()

Return state transfer filter to clear scopes owned by this node after this node exits.

Return type:

StateTransferFilter

get_printable_properties_table()

Return a properties table containing the scopes.

Return type:

dict[str, set[tuple[str | None, str]]]

macaron.code_analyzer.dataflow_analysis.github_expr module

Parser for GitHub Actions expression language.

macaron.code_analyzer.dataflow_analysis.github_expr.extract_expr_variable_name(node)

Return variable access path for token.

If the given node is a variable access or sequence of property accesses, return the access path as a string, otherwise return None.

Return type:

str | None

macaron.code_analyzer.dataflow_analysis.github_expr.extract_value_from_expr_string(s, var_scope)

Return a value expression representation of a string containing GitHub Actions expressions.

GitHub Action expressions within the string are denoted by “${{ <expr> }}”.

Returns None if it is unrepresentable.

Return type:

Value | None

macaron.code_analyzer.dataflow_analysis.models module

Models of supported commands, actions, etc. that may be invoked by build pipelines.

Defines how they are modelled by the dataflow analysis in terms of their effect on the abstract state.

class macaron.code_analyzer.dataflow_analysis.models.BoundParameterisedStatementSet(parameterised_stmts, value_parameter_binds=None, location_parameter_binds=None, scope_parameter_binds=None)

Bases: object

Representation of a set of (simultaneous) write operations.

Defined as a reference to a set of generic parameterised statements, along with a set of parameter bindings that instantiate the parameterised statements with concrete subexpressions.

__init__(parameterised_stmts, value_parameter_binds=None, location_parameter_binds=None, scope_parameter_binds=None)

Initialize bound parameterised statement set.

Parameters:
parameterised_stmts: StatementSet

Set of generic parameterised statements.

value_parameter_binds: dict[str, Value]

Parameter bindings for values.

location_parameter_binds: dict[str, LocationSpecifier]

Parameter bindings for locations.

scope_parameter_binds: dict[str, Scope]

Parameter bindings for scopes.

instantiated_statements: StatementSet

Instantiated statements.

get_statements()

Return instantiated statement set.

Return type:

StatementSet

class macaron.code_analyzer.dataflow_analysis.models.BoundParameterisedModelNode(stmts)

Bases: StatementNode

Statement node that applies effects as defined in a provided model.

Subclasses will define a statement node with a specific model.

__init__(stmts)

Initialise model statement node.

stmts: BoundParameterisedStatementSet

Statement effects model.

apply_effects(before_state)

Apply effects as defined in a provided model.

Return type:

dict[ExitType, State]

class macaron.code_analyzer.dataflow_analysis.models.InstallPackageNode(install_scope, name, version, distribution, url)

Bases: BoundParameterisedModelNode

Model for package installation.

Stores a representation of the installed package into the abstract “installed packages” location.

static get_model()

Return the model.

Return type:

StatementSet

__init__(install_scope, name, version, distribution, url)

Initialize install package node.

Parameters:
install_scope: facts.Scope

Scope into which to install.

name: facts.Value

Package name.

version: facts.Value

Package version.

distribution: facts.Value

Package distribution.

url: facts.Value

URL of package.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.models.VarAssignKind(value)

Bases: Enum

Kind of variable assignment.

BASH_ENV_VAR = 1

Bash environment variable.

BASH_FUNC_DECL = 2

Bash function declaration.

GITHUB_JOB_VAR = 3

GitHub job variable.

GITHUB_ENV_VAR = 4

GitHub environment variable.

OTHER = 5

Other uncategorized variable.

class macaron.code_analyzer.dataflow_analysis.models.VarAssignNode(kind, var_scope, var_name, value)

Bases: BoundParameterisedModelNode

Model for variable assignment.

Stores the assigned value to the variable location.

static get_model()

Return the model.

Return type:

StatementSet

__init__(kind, var_scope, var_name, value)

Initialize variable assignment node.

Parameters:
  • kind (VarAssignKind) – The kind of variable.

  • var_scope (facts.Scope) – The scope in which the variable is stored.

  • var_name (facts.Value) – The name of the variable.

  • value (facts.Value) – The value to assign to the variable.

kind: VarAssignKind

The kind of variable.

var_scope: facts.Scope

The scope in which the variable is stored.

var_name: facts.Value

The name of the variable.

value: facts.Value

The value to assign to the variable.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.models.GitHubActionsGitCheckoutModelNode

Bases: StatementNode

Model for GitHub git checkout operation.

Currently modelled as a no-op.

apply_effects(before_state)

Apply effects for git checkout (currently nothing).

Return type:

dict[ExitType, State]

class macaron.code_analyzer.dataflow_analysis.models.GitHubActionsUploadArtifactModelNode(artifacts_scope, artifact_name, artifact_file, filesystem_scope, path)

Bases: BoundParameterisedModelNode

Model for uploading artifacts to GitHub pipeline artifact storage.

Stores the content read from a file to the artifact storage location.

static get_model()

Return the model.

Return type:

StatementSet

__init__(artifacts_scope, artifact_name, artifact_file, filesystem_scope, path)

Initialize upload artifacts node.

Parameters:
  • artifacts_scope (facts.Scope) – Scope for pipeline artifact storage.

  • artifact_name (facts.Value) – Artifact name.

  • artifact_file (facts.Value) – Artifact filename.

  • filesystem_scope (facts.Scope) – Scope for filesystem from which to read file.

  • path (facts.Value) – File path to read artifact content from.

artifacts_scope: facts.Scope

Scope for pipeline artifact storage.

artifact_name: facts.Value

Artifact name.

artifact_file: facts.Value

Artifact filename.

filesystem_scope: facts.Scope

Scope for filesystem from which to read file.

path: facts.Value

File path to read artifact content from.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.models.GitHubActionsDownloadArtifactModelNode(artifacts_scope, artifact_name, filesystem_scope)

Bases: BoundParameterisedModelNode

Model for downloading artifacts from GitHub pipeline artifact storage.

For each file in the artifact, reads the content of that artifact and stores it to the filesystem under the same filename.

static get_model()

Return model.

Return type:

StatementSet

__init__(artifacts_scope, artifact_name, filesystem_scope)

Initialize download artifacts node.

Parameters:
  • artifacts_scope (facts.Scope) – Scope for pipeline artifact storage.

  • artifact_name (facts.Value) – Artifact name.

  • filesystem_scope (facts.Scope) – Scope for filesystem to store artifacts to.

artifacts_scope: facts.Scope

Scope for pipeline artifact storage.

artifact_name: facts.Value

Artifact name.

filesystem_scope: facts.Scope

Scope for filesystem to store artifacts to.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.models.GitHubActionsReleaseModelNode(artifacts_scope, artifact_name, artifact_file, filesystem_scope, path)

Bases: GitHubActionsUploadArtifactModelNode

Model for uploading artifacts to a GitHub release.

Modelled in the same way as artifact upload.

class macaron.code_analyzer.dataflow_analysis.models.BashEchoNode(out_loc, value)

Bases: BoundParameterisedModelNode

Model for Bash echo command, which writes the echoed value to some location.

static get_model()

Return model.

Return type:

StatementSet

__init__(out_loc, value)

Initialize echo node.

Parameters:
out_loc: facts.Location

Output location.

value: facts.Value

Value written.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.models.Base64EncodeNode(in_loc, out_loc)

Bases: BoundParameterisedModelNode

Model for Base64 encode operation.

Reads a value from some location, Base64-encodes it and writes the result to another location.

static get_model()

Return model.

Return type:

StatementSet

__init__(in_loc, out_loc)

Initialize Base64 encode node.

Parameters:
in_loc: facts.Location

Location to read input from.

out_loc: facts.Location

Location to write encoded output to.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.models.Base64DecodeNode(in_loc, out_loc)

Bases: BoundParameterisedModelNode

Model for Base64 decode operation.

Reads a value from some location, Base64-decodes it and writes the result to another location.

static get_model()

Return model.

Return type:

StatementSet

__init__(in_loc, out_loc)

Initialize Base64 decode node.

Parameters:
in_loc: facts.Location

Location to read input from.

out_loc: facts.Location

Location to write decoded output to.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

class macaron.code_analyzer.dataflow_analysis.models.MavenBuildModelNode(filesystem_scope)

Bases: BoundParameterisedModelNode

Model for Maven build commands.

Maven build behaviour is approximated as writing some files under the target directory.

static get_model()

Return model.

Return type:

StatementSet

__init__(filesystem_scope)

Initialize Maven build node.

Parameters:

filesystem_scope (facts.Scope) – Scope for filesystem written to.

filesystem_scope: facts.Scope

Scope for filesystem written to.

get_printable_properties_table()

Return a properties tables with the model parameters.

Return type:

dict[str, set[tuple[str | None, str]]]

macaron.code_analyzer.dataflow_analysis.printing module

Functions for printing/displaying dataflow analysis nodes in the form of graphviz (dot) output.

Allows the analysis representation and results to be rendered as a human-readable node-link graph.

Makes use of graphviz’s html-like label feature to add detailed information to each node. Tables are specified in the form of a dict[str, set[tuple[str | None, str]], which is rendered as a two-column table, with the first column containing each of the keys of the dict, and the second column containing the corresponding set of values, as a nested vertical table, with each value having an optional label that, if present, will be rendered in a visually distinguished manner alongside the value.

macaron.code_analyzer.dataflow_analysis.printing.print_as_dot_graph(node, out, include_properties, include_states)

Print root node as dot graph.

Parameters:
  • node (core.Node) – The root node to print.

  • out (TextIO) – Output stream to print to.

  • include_properties (bool) – Whether to include detail on the properties of each node (disable to make nodes simpler/smaller).

  • include_states (bool) – Whether to include detail on the abstract state at each node (disable to make nodes simpler/smaller).

Return type:

None

macaron.code_analyzer.dataflow_analysis.printing.get_printable_table_for_state(state, state_filter=None)

Return a table of the stringified representation of the state.

Consists of a mapping of storage locations to the set of values they may contain (see module comment for description of the return type).

Values are additionally labeled with whether they were new and not copied, and whether they will be excluded by the given filter.

Return type:

dict[str, set[tuple[str | None, str]]]

macaron.code_analyzer.dataflow_analysis.printing.print_as_dot_string(node, out, include_properties, include_states)

Print node as dot representation (to be embedded within a dot graph).

Parameters:
  • node (core.Node) – The node to print.

  • out (TextIO) – Output stream to print to.

  • include_properties (bool) – Whether to include detail on the properties of each node (disable to make nodes simpler/smaller).

  • include_states (bool) – Whether to include detail on the abstract state at each node (disable to make nodes simpler/smaller).

Return type:

None

macaron.code_analyzer.dataflow_analysis.printing.print_cfg_node_as_dot_string(cfg_node, out, include_properties, include_states)

Print control-flow-graph node as dot representation (to be embedded within a dot graph).

Parameters:
  • cfg_node (core.ControlFlowGraphNode) – The control-flow-graph node to print.

  • out (TextIO) – Output stream to print to.

  • include_properties (bool) – Whether to include detail on the properties of each node (disable to make nodes simpler/smaller).

  • include_states (bool) – Whether to include detail on the abstract state at each node (disable to make nodes simpler/smaller).

Return type:

None

macaron.code_analyzer.dataflow_analysis.printing.print_statement_node_as_dot_string(node, out, include_properties, include_states)

Print statement node as dot representation (to be embedded within a dot graph).

Parameters:
  • node (core.StatementNode) – The statement node to print.

  • out (TextIO) – Output stream to print to.

  • include_properties (bool) – Whether to include detail on the properties of each node (disable to make nodes simpler/smaller).

  • include_states (bool) – Whether to include detail on the abstract state at each node (disable to make nodes simpler/smaller).

Return type:

None

macaron.code_analyzer.dataflow_analysis.printing.print_interpretation_node_as_dot_string(node, out, include_properties, include_states)

Print interpretation node as dot representation (to be embedded within a dot graph).

Parameters:
  • node (core.InterpretationNode) – The interpretation node to print.

  • out (TextIO) – Output stream to print to.

  • include_properties (bool) – Whether to include detail on the properties of each node (disable to make nodes simpler/smaller).

  • include_states (bool) – Whether to include detail on the abstract state at each node (disable to make nodes simpler/smaller).

Return type:

None

macaron.code_analyzer.dataflow_analysis.printing.escape_for_dot_html_like_label(s)

Return string escape for inclusion in a dot html-like label.

Return type:

str

class macaron.code_analyzer.dataflow_analysis.printing.DotHtmlLikeTableConfiguration(header_colour, header_font_colour, header_font_size, header_font_bold, body_colour, body_font_colour, body_font_size)

Bases: object

Configuration for rendering of dot html-like table.

header_colour: str

Background colour for table header.

header_font_colour: str

Font colour for table header.

header_font_size: int

Font size for table header.

header_font_bold: bool

Whether font of table header should be bold.

body_colour: str

Background colour for table body.

body_font_colour: str

Font colour for table body.

body_font_size: int

Font size for table body.

__init__(header_colour, header_font_colour, header_font_size, header_font_bold, body_colour, body_font_colour, body_font_size)
macaron.code_analyzer.dataflow_analysis.printing.truncate_long_strings_for_display(s)

Truncate long string if necessary for display.

Return type:

str

macaron.code_analyzer.dataflow_analysis.printing.produce_dot_html_like_table(header, data, config)

Return the given data table rendered as a dot html-like label table.

See module comment for description of how data tables are rendered.

Return type:

str

macaron.code_analyzer.dataflow_analysis.printing.produce_node_dot_html_like_label(node_kind, node_type, node_label, config, subtables)

Return the given node table data rendered as a dot html-like label table.

Contains nested tables for each subtable (see module comment for description of how data tables are rendered).

Return type:

str

macaron.code_analyzer.dataflow_analysis.printing.produce_node_dot_def(node_id, node_kind, node_type, node_label, config, subtables)

Return the given node table data rendered as a dot node containig a html-like label table.

Contains nested tables for each subtable (see module comment for description of how data tables are rendered).

Return type:

str

macaron.code_analyzer.dataflow_analysis.printing.add_context_owned_scopes_to_properties_table(table, context)

Add an entry to the given data table listing the scopes owned by the given context.

Return type:

None

macaron.code_analyzer.dataflow_analysis.run_analysis_standalone module

Module providing entry point to run dataflow analysis independently of Macaron command.

For experimentation and debugging purposes only.

macaron.code_analyzer.dataflow_analysis.run_analysis_standalone.main()

Entry point for running standalone analysis.

Return type:

None