Build a Simple Code Review Assistant#

python-icon Download Python Script

Python script/notebook for this guide.

Simple Code Review Assistant tutorial script

Prerequisites

This guide does not assume any prior knowledge about Project WayFlow. However, it assumes the reader has a basic knowledge of LLMs.

You will need a working installation of WayFlow - see Installation.

Learning goals#

In this use-case tutorial, you will build a more advanced WayFlow application, a Pull Request (PR) Reviewing Assistant, using a WayFlow Flow to automate basic reviews of Python source code.

In this tutorial you will:

  1. Learn the basics of using Flows to build an assistant.

  2. Learn how to compose multiple sub-flows to create a more complex Flow.

  3. Learn more about building Tools that can be used within your Flows.

You can download a Jupyter Notebook for this use-case to follow along from Code PR Review Bot Tutorial.

Introduction to the task#

Code reviews are crucial for maintaining code quality and reviewers often spend considerable time pointing out routine issues such as the presence of debug statements, formatting inconsistencies, or common coding convention violations that may not be fully captured by static code analysis tools. This consumes valuable time that could be spent on reviewing more important things such as the core logic, architecture, and business requirements.

Note

Building an agent with WayFlow to perform such code reviews has a number of advantages:

  1. Review rules can be written using natural language, making an agent much more flexible than a simple static checker.

  2. Writing rules in natural language makes updating the rules very easy.

  3. More general issues can be captured. You can allow the LLM to infer from the rule to more general cases that could be missed by a simple static checker.

  4. New review rules can be generated from the collected comments of existing PRs.

In this tutorial, you will create a WayFlow Flow assistant designed to scan Python pull requests for common oversights such as:

  • Having TODO comments without associated tickets.

  • Using unclear or ambiguous variable naming.

  • Using risky Python code practices such as mutable defaults.

To build this assistant you will break the task into configuration and two sub-flows that will be composed into a single flow:

Complete Flow of the PR Bot

  1. Configure your application, choose an LLM and import required modules [Part 1].

  2. The first sub-flow retrieves and diffs information from a local codebase in a Git repository [Part 2].

  3. The second sub-flow iterates over the file diffs using a MapStep and generates comments with an LLM using the PromptExecutionStep [Step 3].

You will also learn how to extract information using the RegexExtractionStep and the ExtractValueFromJsonStep, and how to build and execute tools with the ServerTool and the ToolExecutionStep.

Note

This is not a production-ready code review assistant that can be used as-is.

Setup#

First, let’s set up the environment. For this tutorial you need to have wayflowcore installed (for additional information please read the installation guide).

Next download the example codebase Git repository, example codebase Git repository. This will be used to generate the sample code diffs for the assistant to review.

Extract the codebase Git repository folder from the compressed archive. Make a note of where the codebase Git repository is extracted to.

Part 1: Imports and LLM configuration#

First, set up the environment. For this tutorial you need to have wayflowcore installed, for additional information, read the installation guide.

WayFlow supports several LLMs API providers. To learn more about the supported LLM providers, read the guide, how to use LLMs from different providers.

First choose an LLM from one of the options below:

from wayflowcore.models import OCIGenAIModel, OCIClientConfigWithApiKey

llm = OCIGenAIModel(
    model_id="provider.model-id",
    compartment_id="compartment-id",
    client_config=OCIClientConfigWithApiKey(
        service_endpoint="https://url-to-service-endpoint.com",
    ),
)

Note

API keys should never be stored in code. Use environment variables and/or tools such as python-dotenv instead.

Be cautious when using external LLM providers and ensure that you comply with your organization’s security policies and any applicable laws and regulations. Consider using a self-hosted LLM solution or a provider that offers on-premises deployment options if you need to maintain strict control over your code and data.

Part 2: Retrieve the PR diff information#

The first phase of the assistant requires retrieving information about the code diffs from a code repository. You have already extracted the sample codebase Git repository to your local environment.

This will be a sub-flow that consists of two simple steps:

  • ToolExecutionStep that collects PR diff information using a Python subprocess to run the Git command.

  • RegexExtractionStep which separates the raw diff information into diffs for each file.

Steps to retrieve the PR diff information

First, take a look at what a diff looks like. The following example shows how a real diff appears when using Git:

MOCK_DIFF = """
diff --git src://calculators/utils.py dst://calculators/utils.py
index 12345678..90123456 100644
--- src://calculators/utils.py
+++ dst://calculators/utils.py
@@ -10,6 +10,15 @@

 def calculate_total(data):
     # TODO: implement tax calculation
     return data

+def get_items(items=[]):
+    result = []
+    for item in items:
+        result.append(item * 2)
+    return result
+
+def process_numbers(numbers):
+    res = []
+    for x in numbers:
+        res.append(x + 1)
+    return res
+
 def calculate_average(numbers):
     return sum(numbers) / len(numbers)


diff --git src://example/utils.py dst://example/utils.py
index 000000000..123456789
--- /dev/null
+++ dst://example/utils.py
@@ -0,0 +1,20 @@
+# Copyright © 2024 Oracle and/or its affiliates.
+
+def calculate_sum(numbers=[]):
+    total = 0
+    for num in numbers:
+        total += num
+    return total
+
+
+def process_data(data):
+    # TODO: Handle exceptions here
+    result = data * 2
+    return result
+
+
+def main():
+    numbers = [1, 2, 3, 4, 5]
+    result = calculate_sum(numbers)
+    print("Sum:", result)
+    data = 10
+    processed_data = process_data(data)
+    print("Processed Data:", processed_data)
+
+
+if __name__ == "__main__":
+    main()
""".strip()

Reading a diff: Removals are identified by the “-” marks and additions by the “+” marks. In this example, there were only additions.

The diff above contains information about two files, calculators/utils.py and example/utils.py. This is an example diff and it is different from the diff that will be generated from the sample codebase. It is included here to show how a Git diff looks and is shorter than the diff that you generate from the sample codebase.

Build a tool#

You need to create a tool to extract a code diff from the local code repository. The @tool decorator can be used for that purpose by simply wrapping a Python function.

The function, local_get_pr_diff_tool, in the code below does the work of extracting the diffs by running the git diff HEAD shell command and capturing the output. It uses a subprocess to run the shell command.

To turn this function into a WayFlow tool, a @tool annotation is used to create a ServerTool from the function.

 1from wayflowcore.tools import tool
 2
 3
 4@tool(description_mode="only_docstring")
 5def local_get_pr_diff_tool(repo_dirpath: str) -> str:
 6    """
 7    Retrieves code diff with a git command given the
 8    path to the repository root folder.
 9    """
10    import subprocess  # nosec: documentation example invoking git locally
11
12    result = subprocess.run(
13        ["git", "diff", "HEAD"],
14        capture_output=True,
15        cwd=repo_dirpath,
16        text=True,
17    )  # nosec: documentation example invoking git locally
18    return result.stdout.strip()

Building the steps and the sub-flow#

Let’s write the code for the first sub-flow.

 1from wayflowcore.controlconnection import ControlFlowEdge
 2from wayflowcore.dataconnection import DataFlowEdge
 3from wayflowcore.flow import Flow
 4from wayflowcore.property import StringProperty
 5from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
 6
 7# IO Variable Names
 8REPO_DIRPATH_IO = "$repo_dirpath_io"
 9PR_DIFF_IO = "$raw_pr_diff"
10FILE_DIFF_LIST_IO = "$file_diff_list"
11
12# Define the steps
13
14start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
15
16# Step 1: Retrieve the pull request diff using the local tool
17get_pr_diff_step = ToolExecutionStep(
18    name="get_pr_diff",
19    tool=local_get_pr_diff_tool,
20    raise_exceptions=True,
21    input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
22    output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
23)
24
25# Step 2: Extract the file diffs from the raw diff using a regular expression
26extract_into_list_of_file_diff_step = RegexExtractionStep(
27    name="extract_into_list_of_file_diff",
28    regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
29    return_first_match_only=False,
30    input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
31    output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
32)
33
34# Define the sub flow
35retrieve_diff_subflow = Flow(
36    name="Retrieve PR diff flow",
37    begin_step=start_step,
38    control_flow_edges=[
39        ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
40        ControlFlowEdge(
41            source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
42        ),
43        ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
44    ],
45    data_flow_edges=[
46        DataFlowEdge(
47            source_step=start_step,
48            source_output=REPO_DIRPATH_IO,
49            destination_step=get_pr_diff_step,
50            destination_input=REPO_DIRPATH_IO,
51        ),
52        DataFlowEdge(
53            source_step=get_pr_diff_step,
54            source_output=PR_DIFF_IO,
55            destination_step=extract_into_list_of_file_diff_step,
56            destination_input=PR_DIFF_IO,
57        ),
58    ],
59)

API Reference: Flow | RegexExtractionStep | ToolExecutionStep | API Reference: tool

The code does the following:

  1. It lists the names of the steps and input/output variables for the sub-flow.

  2. It then creates the different steps within the sub-flow.

  3. Finally, it instantiates the sub-flow. This will be covered in more detail later in the tutorial.

For clarity, the variable names are also prefixed with a dollar ($) sign. This is not necessary and is only done for code clarity. The variable REPO_DIRPATH_IO is used to hold the file path to the sample codebase Git repository and you will use this to pass in the location of the codebase Git repository.

Additionally, you can give explicit names to the input/output variables used in the Flow, e.g. “$repo_dirpath_io” for the variable holding the path to the local repository. Finally, we define those explicit names as string variables (e.g. REPO_DIRPATH_IO) to minimize the number of magic strings in the code.

See also

To learn about the basics of Flows, check out our, introductory tutorial on WayFlow Flows.

Now take a look at each of the steps used in the sub-flow in more detail.

Get the PR diff, get_pr_diff_step#

This uses a ToolExecutionStep to gather the diff information - see the notes on how this is done earlier. When creating it, you need to provide the following:

  • tool: Specifies the tool that will called within the step. This is the tool that was created earlier, local_get_pr_diff_tool.

  • raise_exceptions: Whether to raise exceptions generated by the tool that is called. Here it is set to True and so exceptions will be raised.

  • input_mapping: Specifies the names used for the input parameters of the step. See ToolExecutionStep for more details on using an input_mapping with this type of step.

  • output_mapping: Specifies the name used foe the output parameter of the step. The name held in PR_DIFF_IO will be mapped to the name for the output parameter of the step. Again, see ToolExecutionStep for more details on using an output_mapping with this type of step.

Extract file diffs into a list, extract_into_list_of_file_diff_step#

You now have the diff information from the PR. This step performs a regex extraction on the raw diff text to extract the code to review.

Use a RegexExtractionStep to perform this action. When creating the step, you need to provide the following:

  • regex_pattern: The regex pattern for the extraction. This uses re.findall underneath.

  • return_first_match_only: You want to return all results, so set this to False.

  • input_mapping: Specifies the names used for the input parameters of the step. The input parameter will be mapped to the name, held in PR_DIFF_IO. See RegexExtractionStep for more details on using an input_mapping with this type of step.

  • output_mapping: Specifies the name used for the output parameter of the step. Here, the default name RegexExtractionStep.TEXT is renamed to the name defined in PR_DIFF_IO. Again, see RegexExtractionStep for more details on using an output_mapping with this type of step.

About the pattern:

(diff --git[\s\S]*?)(?=diff --git|$)

The pattern looks for text starting with diff --git, followed by any characters (both whitespace [s] and non-whitespace [S]), until it encounters either another diff --git or the end of the text ($). However, it does not include the next diff --git or the end in the match.

The *? makes it “lazy” or non-greedy, meaning it takes the shortest possible match, rather than the longest.

Tip

Recent Large Language Models are very helpful tools to create, debug and explain Regex patterns given a natural language description.

Finally, create the sub-flow using the Flow class. You specify the steps in the Flow, the starting step of the Flow, the transitions between steps and how data, from the variables, is to pass from one step to the next.

The transitions between steps are defined with ControlFlowEdges. These take a source step and a destination step. Each ControlFlowEdge maps one such transition.

Passing values between steps is a very common occurrence when building Flows. This is done using DataFlowEdges which define that a value is passed from one step to another.

Inputs to a step will most commonly be for parameters within a Jinja template, of which there are several examples of in this tutorial, or parameters to callables used by tools. In a DataFlowEdge you can use the name of the parameter, a string, to act as the destination of a value that is being passed in. It is often less error-prone if you create a variable that is set to the name.

Similarly, when a value is the output of a step, such as when a user’s input is captured in an InputMessageStep, the value is available as a property of the step, for example InputMessageStep.USER_PROVIDED_INPUT. But, it lacks a meaningful name, so it is often helpful to specify one. This is done using an output_mapping when creating the step. Again, you will want to create a variable to hold the name to avoid errors.

Defining a Flow#

Defining the Flow is the last step in the code shown above. There are a couple of things that are worth highlighting:

  • begin_step: A start step needs to be defined for a Flow.

  • control_flow_edges: The transitions between the steps in the Flow are defined as ControlFlowEdges. They have a source_step, which defines the start of a transition, and a destination_step, which defines the destination of a transition. All transitions for the flow will need to be defined.

  • data_flow_edges: Maps the variables between steps connected by a transition using DataFlowEdges. It maps variables from a source step into variables in a destination step. You only need to do this for the variables that need to be passed between steps.

Testing the flow#

You can test this sub-flow by creating an assistant conversation with Flow.start_conversation() and specifying the inputs, in this case the location of the Git repository. The conversation can then be executed with Conversation.execute(). This returns an object that represents the status of the conversation which you can check to confirm that the conversation has successfully finished.

The code below shows how the inputs are passed in. Set the PATH_TO_DIR to the actual path you extracted the sample codebase Git repository to. You then extract the outputs from the conversation.

The full code for testing the sub-flow is shown below:

 1from wayflowcore.executors.executionstatus import FinishedStatus
 2
 3# Replace the path below with the path to your actual codebase sample git repository.
 4PATH_TO_DIR = "path/to/repository_root"
 5
 6test_conversation = retrieve_diff_subflow.start_conversation(
 7    inputs={
 8        REPO_DIRPATH_IO: PATH_TO_DIR,
 9    }
10)
11
12execution_status = test_conversation.execute()
13
14if not isinstance(execution_status, FinishedStatus):
15    raise ValueError("Unexpected status type")
16
17FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
18
19print(FILE_DIFF_LIST[0])

API Reference: Flow

Part 3: Review the list of diffs#

Now that we have a list of diffs for each file, we can review them and generate comments using an LLM.

This task can be broken into a sub-flow made up of five steps:

Sub Flow to review the PR diffs

Build the tools and checks#

Before creating the steps and sub-flow to generate the comments, it is important to define the list of checks the assistant should perform, along with any specific instructions. Additionally, a tool must be created to prefix the diffs with line numbers, allowing the LLM to determine where to add comments.

Below is the full code to achieve this. It is broken into sections so that you can see, in detail, what is happening in each part.

  1PR_BOT_CHECKS = [
  2    """
  3Name: TODO_WITHOUT_TICKET
  4Description: TODO comments should reference a ticket number for tracking.
  5Example code:
  6```python
  7# TODO: Add validation here
  8def process_user_input(data):
  9    return data
 10```
 11Example comment:
 12[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
 13""",
 14    """
 15Name: MUTABLE_DEFAULT_ARGUMENT
 16Description: Using mutable objects as default arguments can lead to unexpected behavior.
 17Example code:
 18```python
 19def add_item(item, items=[]):
 20    items.append(item)
 21    return items
 22```
 23Example comment:
 24[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
 25""",
 26    """
 27Name: NON_DESCRIPTIVE_NAME
 28Description: Variable names should clearly indicate their purpose or content.
 29Example code:
 30```python
 31def process(lst):
 32    res = []
 33    for i in lst:
 34        res.append(i * 2)
 35    return res
 36```
 37Example comment:
 38[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
 39""",
 40]
 41
 42CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
 43
 44PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
 45
 46## Context
 47The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
 48After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
 49
 50## Input
 51### Code diff
 52{{diff}}
 53
 54## Task
 55Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
 56The rules are the following:
 57
 58{{checks}}
 59
 60### Response Format
 61You need to return a review as a json as follows:
 62```json
 63[
 64    {
 65        "content": "the comment as a text",
 66        "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
 67        "line": "line number where the comment applies"
 68    },
 69
 70]
 71```
 72Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
 73If there are no comments and the code seems fine, just output an empty JSON list."""
 74
 75
 76@tool(description_mode="only_docstring")
 77def format_git_diff(diff_text: str) -> str:
 78    """
 79    Formats a git diff by adding line numbers to each line except removal lines.
 80    """
 81
 82    def pad_number(number: int, width: int) -> str:
 83        """Right-align a number with specified width using space padding."""
 84        return str(number).rjust(width)
 85
 86    LINE_NUMBER_WIDTH = 5
 87    PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
 88    current_line_number = 0
 89    formatted_lines = []
 90
 91    for line in diff_text.split("\n"):
 92        # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
 93        if line.startswith("@@"):
 94            try:
 95                # Extract the starting line number and line count
 96                _, position_info, _ = line.split("@@")
 97                new_file_info = position_info.split()[1][1:]  # Remove the '+' prefix
 98                start_line, line_count = map(int, new_file_info.split(","))
 99
100                current_line_number = start_line
101                formatted_lines.append(line)
102                continue
103
104            except (ValueError, IndexError):
105                raise ValueError(f"Invalid diff header format: {line}")
106
107        # Handle content lines
108        if current_line_number > 0 and line:
109            if not line.startswith("-"):
110                # Add line number for added/context lines
111                line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
112                formatted_lines.append(f"{line_prefix} {line}")
113                current_line_number += 1
114            else:
115                # Just add padding for removal lines
116                formatted_lines.append(" " * PADDING_WIDTH + line)
117
118    return "\n".join(formatted_lines)

API Reference: ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep

Checks and LLM instructions#

You will use three simple checks that are shown below. For each check you specify a name, a description of what the LLM should be checking, as well as a code and expected comment example so that the LLM gets a better understanding of what the task is about.

The prompt uses a simple structure:

  1. Role Definition: Define who/what you want the LLM to act as (e.g., “You are a very experienced code reviewer”).

  2. Context Section: Provide relevant background information or specific circumstances that frame the task.

  3. Input Section: Specify the exact information, data, or materials that the LLM will be provided with.

  4. Task Section: Clearly state what you want the LLM to do with the input provided.

  5. Response Format Section: Define how you want the response to be structured or formatted (e.g., bullet points, JSON, with XML tags, and so on).

The prompts are defined in the array, PR_BOT_CHECKS. The individual prompts for the checks are then concatenated into a single string, CONCATENATED_CHECKS, so that it can be used inside the system prompt you will be passing to the LLM.

Define a system prompt, or prompt template, PROMPT_TEMPLATE. It contains placeholders for the diff and the checks that will be replaced when specialising the prompt for each diff.

Tip

How to write high-quality prompts

There is no consensus on what makes the best LLM prompt. However, it is noted that for recent LLMs, a great strategy to use to prompt an LLM is simply to be very specific about the task to be solved, giving enough context and explaining potential edge cases to consider.

Given a prompt, try to determine whether giving the set of instructions to an experienced colleague, that has no prior context about the task, to solve would be sufficient for them to get to the intended result.

Diff formatting tool#

You next need to create a tool using the ServerTool to format the diffs in a manner that makes them consumable by the LLM. A tool, as you will have already seen, is a simple wrapper around a python callable that makes it useable within a flow.

The function, format_git_diff, in the code above does the work of formatting the diffs.

See also

For more information about WayFlow tools please read our guide, How to use tools.

Building the steps and the sub-flow#

With the prompts and diff formatting tool written you can now build the second sub-flow. This sub-flow will iterate over the diffs, generated previously, and then use an LLM to generate review comments from them.

  1from wayflowcore._utils._templating_helpers import render_template_partially
  2from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
  3from wayflowcore.steps import (
  4    ExtractValueFromJsonStep,
  5    MapStep,
  6    OutputMessageStep,
  7    PromptExecutionStep,
  8    ToolExecutionStep,
  9)
 10
 11# IO Variable Names
 12DIFF_TO_STRING_IO = "$diff_to_string"
 13DIFF_WITH_LINES_IO = "$diff_with_lines"
 14FILEPATH_IO = "$filename"
 15JSON_COMMENTS_IO = "$json_comments"
 16EXTRACTED_COMMENTS_IO = "$extracted_comments"
 17NESTED_COMMENT_LIST_IO = "$nested_comment_list"
 18FILEPATH_LIST_IO = "$filepath_list"
 19
 20# Define the steps
 21
 22# Step 1: Format the diff to a string
 23format_diff_to_string_step = OutputMessageStep(
 24    name="format_diff_to_string",
 25    message_template="{{ message | string }}",
 26    output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
 27)
 28
 29# Step 2: Add lines on the diff using a tool
 30add_lines_on_diff_step = ToolExecutionStep(
 31    name="add_lines_on_diff",
 32    tool=format_git_diff,
 33    input_mapping={"diff_text": DIFF_TO_STRING_IO},
 34    output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
 35)
 36
 37# Step 3: Extract the file path from the diff string using a regular expression
 38extract_file_path_step = RegexExtractionStep(
 39    name="extract_file_path",
 40    regex_pattern=r"diff --git src://(.+?) dst://",
 41    return_first_match_only=True,
 42    input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
 43    output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
 44)
 45
 46# Step 4: Generate comments using a prompt
 47generate_comments_step = PromptExecutionStep(
 48    name="generate_comments",
 49    prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
 50    llm=llm,
 51    input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
 52    output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
 53)
 54
 55# Step 5: Extract comments from the JSON output
 56# Define the value type for extracted comments
 57comments_valuetype = ListProperty(
 58    name="values",
 59    description="The extracted comments content and line number",
 60    item_type=DictProperty(value_type=AnyProperty()),
 61    default_value=[],
 62)
 63extract_comments_from_json_step = ExtractValueFromJsonStep(
 64    name="extract_comments_from_json",
 65    output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
 66    retry=True,
 67    llm=llm,
 68    input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
 69    output_mapping={"values": EXTRACTED_COMMENTS_IO},
 70)
 71
 72# Define the sub flow to generate comments for each file diff
 73generate_comments_subflow = Flow(
 74    name="Generate review comments flow",
 75    begin_step=format_diff_to_string_step,
 76    control_flow_edges=[
 77        ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
 78        ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
 79        ControlFlowEdge(extract_file_path_step, generate_comments_step),
 80        ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
 81        ControlFlowEdge(extract_comments_from_json_step, None),
 82    ],
 83    data_flow_edges=[
 84        DataFlowEdge(
 85            format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
 86        ),
 87        DataFlowEdge(
 88            format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
 89        ),
 90        DataFlowEdge(
 91            add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
 92        ),
 93        DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
 94        DataFlowEdge(
 95            generate_comments_step,
 96            JSON_COMMENTS_IO,
 97            extract_comments_from_json_step,
 98            JSON_COMMENTS_IO,
 99        ),
100    ],
101)
102
103# Use the MapStep to apply the sub flow to each file
104for_each_file_step = MapStep(
105    flow=generate_comments_subflow,
106    unpack_input={"message": "."},
107    input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
108    output_descriptors=[
109        ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
110        ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
111    ],
112    output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
113)
114
115generate_all_comments_subflow = Flow.from_steps([for_each_file_step])

API Reference: Property | ListProperty | DictProperty | StringProperty | ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep

Take a look at each of the steps used in the sub-flow to get an understanding of what is happening.

Format diff to string, format_diff_to_string_step#

This step converts the file diff list into a string so that it can be used by the following steps.

This is done with the string Jinja filter as follows: {{ message | string }}. It uses an OutputMessageStep to achieve this.

Note

Jinja templating introduces security concerns that are addressed by WayFlow by restricting Jinja’s rendering capabilities. Please check our guide on How to write secure prompts with Jinja templating for more information.

Add lines to the diff, add_lines_on_diff_step#

This step prefixes the diff with the line numbers required to review comments. It uses a, ToolExecutionStep, to run the tool that you previously defined in order to do this.

The input to the tool, within the I/O dictionary, is specified using the input_mapping. For all these steps, it is important to remember that the outputs of one step are linked to the inputs of the next.

Extract file path, extract_file_path_step#

This extracts the file path from the diff string. The file path is needed for assigning the review comments. The RegexExtractionStep step is used to extract the file path from the diff.

The regular expression is applied to the diff string, extracted form the input map using the input_mapping parameter.

Note: Compared to the RegexExtractionStep used in Part 1, here only the first match is required.

Generate comments, generate_comments_step#

This generates comments using the LLM and the prompt template defined earlier. The PromptExecutionStep step executes the prompt with the LLM defined earlier in this tutorial.

Since the list of checks has already been defined, the template can be pre-rendered using the render_template_partially method. This renders the parts of the template that have been provided, while the remaining information is gathered from the I/O dictionary.

Extract comments from JSON, extract_comments_from_json_step#

This extracts the comments and line numbers from the generated LLM output, which is a serialized JSON structure due to the prompt used. A ExtractValueFromJsonStep is used to do the extraction. When creating the step, specify the following in addition to the usual input_mapping and output_mapping:

  • output_values: This defines the JQ query to extract the comments form the JSON generated by the LLM.

  • llms: An LLM that can be used to help resolve any parsing errors. This is related to retry.

  • retry: If parsing fails, you may want to retry. This is set to True, which results in trying to use the LLM to help resolve any such issues.

Create the sub-flow, generate_comments_subflow#

Here you define what steps are in the sub-flow, what the transitions between the steps are and what will be the starting step. This is exactly the same process you did previously when defining the sub-flow to fetch the PR data.

Applying the comment generation to all file diffs#

Now that you have the sub-flow create, you need to apply it to every file diff. This is done using a MapStep. MapStep takes a sub-flow as input, in this case, the generate_comments_subflow, and applies it to an iterable—in this case, the list of file diffs.

You simply specify:

  • flow: The sub-flow to map, that is applied to the iterable.

  • unpack_input: Defines how to unpack the input. A JQ query can be used to transform the input, but in this case, it is kept as a list.

  • input_mapping: Defines what the sub-flow will iterate over. The key, MapStep.ITERATED_INPUT, is used to pass in the diffs.

  • output_descriptors: Specifies the values to collect from the output generated by applying the sub-flow. In this case, these will be the generated comments and the associated file path.

Note

The MapStep works similarly to how the Python map function works. For more information, see https://docs.python.org/3/library/functions.html#map

Finally, create the sub-flow to generate all comments using the helper method create_single_step_flow.

Testing the sub-flow#

You can test the sub-flow by creating a conversation, as shown in the code below, and specifying the inputs as done in, Part 2: Retrieve the PR diff information.

Since each sub-flow is tested independently, you can reuse the output from the first sub-flow.

 1# we reuse the FILE_DIFF_LIST from the previous test
 2test_conversation = generate_all_comments_subflow.start_conversation(
 3    inputs={
 4        FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
 5    }
 6)
 7
 8execution_status = test_conversation.execute()
 9
10if not isinstance(execution_status, FinishedStatus):
11    raise ValueError("Unexpected status type")
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
14FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
15print(NESTED_COMMENT_LIST[0])
16print(FILEPATH_LIST)

Building the final Flow#

Congratulations! You have completed the three sub-flows, which, when combined into a single flow, will retrieve the PR diff information, generate comments on the diffs using an LLM.

You will wire the sub-flows that you have built together by wrapping them in a FlowExecutionStep. The FlowExecutionSteps are then composed into the final combined Flow.

The code for this is shown below:

 1from wayflowcore.steps import FlowExecutionStep
 2
 3
 4# Steps
 5retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
 6generate_all_comments_flowstep = FlowExecutionStep(
 7    name="generate_comments_flowstep",
 8    flow=generate_all_comments_subflow,
 9)
10
11pr_bot = Flow(
12    name="PR bot flow",
13    begin_step=retrieve_diff_flowstep,
14    control_flow_edges=[
15        ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
16        ControlFlowEdge(generate_all_comments_flowstep, None),
17    ],
18    data_flow_edges=[
19        DataFlowEdge(
20            retrieve_diff_flowstep,
21            FILE_DIFF_LIST_IO,
22            generate_all_comments_flowstep,
23            FILE_DIFF_LIST_IO,
24        )
25    ],
26)

API Reference: Flow | FlowExecutionStep

Testing the combined assistant#

You can now run the PR bot end-to-end on your repo or locally.

Set the PATH_TO_DIR to the actual path you extracted the sample codebase Git repository to. You can also see how the output of the conversation is extracted from the execution_status object, execution_status.output_values.

 1# Replace the path below with the path to your actual codebase sample git repository.
 2PATH_TO_DIR = "path/to/repository_root"
 3
 4conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
 5
 6execution_status = conversation.execute()
 7
 8if not isinstance(execution_status, FinishedStatus):
 9    raise ValueError("Unexpected status type")
10
11print(execution_status.output_values)
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]

Agent Spec Exporting/Loading#

You can export the assistant configuration to its Agent Spec configuration using the AgentSpecExporter.

from wayflowcore.agentspec import AgentSpecExporter

serialized_assistant = AgentSpecExporter().to_json(pr_bot)

Here is what the Agent Spec representation will look like ↓

Click here to see the assistant configuration.
{
  "component_type": "Flow",
  "id": "9c65246d-a0dd-4ec4-801d-afd640b2488e",
  "name": "PR bot flow",
  "description": "",
  "metadata": {
    "__metadata_info__": {}
  },
  "inputs": [
    {
      "type": "string",
      "title": "$repo_dirpath_io"
    }
  ],
  "outputs": [
    {
      "type": "array",
      "items": {
        "type": "string"
      },
      "title": "$filepath_list"
    },
    {
      "type": "array",
      "items": {},
      "title": "$nested_comment_list"
    },
    {
      "type": "string",
      "title": "$raw_pr_diff"
    },
    {
      "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
      "type": "array",
      "items": {
        "type": "string"
      },
      "title": "$file_diff_list",
      "default": []
    }
  ],
  "start_node": {
    "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
  },
  "nodes": [
    {
      "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
    },
    {
      "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
    },
    {
      "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
    },
    {
      "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
    }
  ],
  "control_flow_connections": [
    {
      "component_type": "ControlFlowEdge",
      "id": "a5c123ff-c14c-4291-b174-61d61170f187",
      "name": "retrieve_diff_flowstep_to_generate_comments_flowstep_control_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "from_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      }
    },
    {
      "component_type": "ControlFlowEdge",
      "id": "8a10b23a-2d0c-46c4-82ac-e66ad0b9399b",
      "name": "__StartStep___to_retrieve_diff_flowstep_control_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "from_node": {
        "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      }
    },
    {
      "component_type": "ControlFlowEdge",
      "id": "dac07720-8a5a-4a61-b1e7-50be506ed937",
      "name": "generate_comments_flowstep_to_None End node_control_flow_edge",
      "description": null,
      "metadata": {},
      "from_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      }
    }
  ],
  "data_flow_connections": [
    {
      "component_type": "DataFlowEdge",
      "id": "7b12dfed-309b-46ff-8a2d-bb6f2a3154b6",
      "name": "retrieve_diff_flowstep_$file_diff_list_to_generate_comments_flowstep_$file_diff_list_data_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "source_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "source_output": "$file_diff_list",
      "destination_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "destination_input": "$file_diff_list"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "51122844-22d3-40a8-b652-1b020ce24945",
      "name": "__StartStep___$repo_dirpath_io_to_retrieve_diff_flowstep_$repo_dirpath_io_data_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "source_node": {
        "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
      },
      "source_output": "$repo_dirpath_io",
      "destination_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "destination_input": "$repo_dirpath_io"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "72aa469c-98cd-4f0d-9496-0aa454373aef",
      "name": "generate_comments_flowstep_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "source_output": "$filepath_list",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$filepath_list"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "eac1b375-1541-41f7-87f3-f3e626cc2c9c",
      "name": "generate_comments_flowstep_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "source_output": "$nested_comment_list",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$nested_comment_list"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "0869acb5-4d8f-4b17-b59b-3b915912b628",
      "name": "retrieve_diff_flowstep_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "source_output": "$raw_pr_diff",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$raw_pr_diff"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "9fb2ab9e-ece1-4195-8f51-ef618dcb72bb",
      "name": "retrieve_diff_flowstep_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "source_output": "$file_diff_list",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$file_diff_list"
    }
  ],
  "$referenced_components": {
    "43d58c76-23a0-4d10-943d-f9c5e0835a7c": {
      "component_type": "FlowNode",
      "id": "43d58c76-23a0-4d10-943d-f9c5e0835a7c",
      "name": "generate_comments_flowstep",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "description": "iterated input for the map step",
          "type": "array",
          "items": {
            "description": "\"message\" input variable for the template",
            "title": "message"
          },
          "title": "$file_diff_list"
        }
      ],
      "outputs": [
        {
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$filepath_list"
        },
        {
          "type": "array",
          "items": {},
          "title": "$nested_comment_list"
        }
      ],
      "branches": [
        "next"
      ],
      "subflow": {
        "component_type": "Flow",
        "id": "f95e0e5d-f573-4e25-9d68-8508371246f9",
        "name": "flow_028a7dfb__auto",
        "description": "",
        "metadata": {
          "__metadata_info__": {}
        },
        "inputs": [
          {
            "description": "iterated input for the map step",
            "type": "array",
            "items": {
              "description": "\"message\" input variable for the template",
              "title": "message"
            },
            "title": "$file_diff_list"
          }
        ],
        "outputs": [
          {
            "type": "array",
            "items": {
              "type": "string"
            },
            "title": "$filepath_list"
          },
          {
            "type": "array",
            "items": {},
            "title": "$nested_comment_list"
          }
        ],
        "start_node": {
          "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
        },
        "nodes": [
          {
            "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
          },
          {
            "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
          },
          {
            "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
          }
        ],
        "control_flow_connections": [
          {
            "component_type": "ControlFlowEdge",
            "id": "85a2cdff-6ad4-4f58-8d1c-c8deeb05880c",
            "name": "__StartStep___to_step_0_control_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "from_node": {
              "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            }
          },
          {
            "component_type": "ControlFlowEdge",
            "id": "396e218f-225e-4e36-a33c-a176ca77d345",
            "name": "step_0_to_None End node_control_flow_edge",
            "description": null,
            "metadata": {},
            "from_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
            }
          }
        ],
        "data_flow_connections": [
          {
            "component_type": "DataFlowEdge",
            "id": "6c8b8f78-b587-49ff-a401-6262cdafb0ee",
            "name": "__StartStep___$file_diff_list_to_step_0_$file_diff_list_data_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "source_node": {
              "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
            },
            "source_output": "$file_diff_list",
            "destination_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "destination_input": "$file_diff_list"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "84d3a783-38c8-4d53-bc0b-4205732d1fbf",
            "name": "step_0_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "source_output": "$filepath_list",
            "destination_node": {
              "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
            },
            "destination_input": "$filepath_list"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "b7ffd4c3-4a03-47f0-95fc-0ba670010729",
            "name": "step_0_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "source_output": "$nested_comment_list",
            "destination_node": {
              "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
            },
            "destination_input": "$nested_comment_list"
          }
        ],
        "$referenced_components": {
          "f127a297-842d-4d17-bc89-4704019458d7": {
            "component_type": "ExtendedMapNode",
            "id": "f127a297-842d-4d17-bc89-4704019458d7",
            "name": "step_0",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "description": "iterated input for the map step",
                "type": "array",
                "items": {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                },
                "title": "$file_diff_list"
              }
            ],
            "outputs": [
              {
                "type": "array",
                "items": {},
                "title": "$nested_comment_list"
              },
              {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$filepath_list"
              }
            ],
            "branches": [
              "next"
            ],
            "input_mapping": {
              "iterated_input": "$file_diff_list"
            },
            "output_mapping": {
              "$extracted_comments": "$nested_comment_list",
              "$filename": "$filepath_list"
            },
            "flow": {
              "component_type": "Flow",
              "id": "3da67cce-b8de-40be-bb8d-e1edead178f0",
              "name": "Generate review comments flow",
              "description": "",
              "metadata": {
                "__metadata_info__": {}
              },
              "inputs": [
                {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                }
              ],
              "outputs": [
                {
                  "description": "The extracted comments content and line number",
                  "type": "array",
                  "items": {
                    "type": "object",
                    "additionalProperties": {},
                    "key_type": {
                      "type": "string"
                    }
                  },
                  "title": "$extracted_comments"
                },
                {
                  "description": "the generated text",
                  "type": "string",
                  "title": "$json_comments"
                },
                {
                  "type": "string",
                  "title": "$diff_with_lines"
                },
                {
                  "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                  "type": "string",
                  "title": "$filename",
                  "default": ""
                },
                {
                  "description": "the message added to the messages list",
                  "type": "string",
                  "title": "$diff_to_string"
                }
              ],
              "start_node": {
                "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
              },
              "nodes": [
                {
                  "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                },
                {
                  "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                },
                {
                  "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                },
                {
                  "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                },
                {
                  "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                },
                {
                  "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
                },
                {
                  "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                }
              ],
              "control_flow_connections": [
                {
                  "component_type": "ControlFlowEdge",
                  "id": "becf6951-96fd-4152-97d0-4a4eff042a29",
                  "name": "format_diff_to_string_to_add_lines_on_diff_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "c197b0d5-8002-4910-ae8d-61f97f1f8f26",
                  "name": "add_lines_on_diff_to_extract_file_path_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "406e0670-cc49-4da4-8d15-8c1c320193e8",
                  "name": "extract_file_path_to_generate_comments_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "e54eb347-2e6c-42c4-a7d6-a42c8059bdf3",
                  "name": "generate_comments_to_extract_comments_from_json_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "ebe5e60b-2724-4b51-b287-79f3e8e7fdd1",
                  "name": "__StartStep___to_format_diff_to_string_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "98e7631e-7206-4ba9-b5b0-eb308ac89c0f",
                  "name": "extract_comments_from_json_to_None End node_control_flow_edge",
                  "description": null,
                  "metadata": {},
                  "from_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  }
                }
              ],
              "data_flow_connections": [
                {
                  "component_type": "DataFlowEdge",
                  "id": "ab8ed6de-3ea7-424e-a830-bca10ac57a32",
                  "name": "format_diff_to_string_$diff_to_string_to_add_lines_on_diff_$diff_to_string_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "source_output": "$diff_to_string",
                  "destination_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "destination_input": "$diff_to_string"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "3caaa171-9b4b-44df-8ebd-4d060329f91a",
                  "name": "format_diff_to_string_$diff_to_string_to_extract_file_path_$diff_to_string_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "source_output": "$diff_to_string",
                  "destination_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "destination_input": "$diff_to_string"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "cdf0945b-5a96-42ff-b410-f7c56b5f8e45",
                  "name": "add_lines_on_diff_$diff_with_lines_to_generate_comments_$diff_with_lines_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "source_output": "$diff_with_lines",
                  "destination_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "destination_input": "$diff_with_lines"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "ca6ed62b-6f6a-405f-9f16-5e1304de6608",
                  "name": "extract_file_path_$filename_to_generate_comments_$filename_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "source_output": "$filename",
                  "destination_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "destination_input": "$filename"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "dec4b4bb-56c9-445a-a282-9d095ff6038e",
                  "name": "generate_comments_$json_comments_to_extract_comments_from_json_$json_comments_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "source_output": "$json_comments",
                  "destination_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  },
                  "destination_input": "$json_comments"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "611478d7-281a-4587-81e6-97e8c745da53",
                  "name": "__StartStep___message_to_format_diff_to_string_message_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
                  },
                  "source_output": "message",
                  "destination_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "destination_input": "message"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "227ae098-0baf-4fe8-9615-094bb386c9a9",
                  "name": "extract_comments_from_json_$extracted_comments_to_None End node_$extracted_comments_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  },
                  "source_output": "$extracted_comments",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$extracted_comments"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "6e25b4d8-5656-471b-8ffa-1fe8cfffbc05",
                  "name": "generate_comments_$json_comments_to_None End node_$json_comments_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "source_output": "$json_comments",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$json_comments"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "fdbf1eeb-0278-4dc8-b897-c924937a1692",
                  "name": "add_lines_on_diff_$diff_with_lines_to_None End node_$diff_with_lines_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "source_output": "$diff_with_lines",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$diff_with_lines"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "3b6bcba7-635b-45fa-b450-cf0a15dae463",
                  "name": "extract_file_path_$filename_to_None End node_$filename_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "source_output": "$filename",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$filename"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "2f95704b-4cc1-4983-8a20-e39c79a94e01",
                  "name": "format_diff_to_string_$diff_to_string_to_None End node_$diff_to_string_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "source_output": "$diff_to_string",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$diff_to_string"
                }
              ],
              "$referenced_components": {
                "6000ee3f-ac80-4937-b36c-94fd65cdcda4": {
                  "component_type": "ExtendedToolNode",
                  "id": "6000ee3f-ac80-4937-b36c-94fd65cdcda4",
                  "name": "add_lines_on_diff",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "outputs": [
                    {
                      "type": "string",
                      "title": "$diff_with_lines"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "tool": {
                    "component_type": "ServerTool",
                    "id": "e936566f-7a25-40f3-9434-3e740a7bfb02",
                    "name": "format_git_diff",
                    "description": "Formats a git diff by adding line numbers to each line except removal lines.",
                    "metadata": {
                      "__metadata_info__": {}
                    },
                    "inputs": [
                      {
                        "type": "string",
                        "title": "diff_text"
                      }
                    ],
                    "outputs": [
                      {
                        "type": "string",
                        "title": "tool_output"
                      }
                    ]
                  },
                  "input_mapping": {
                    "diff_text": "$diff_to_string"
                  },
                  "output_mapping": {
                    "tool_output": "$diff_with_lines"
                  },
                  "raise_exceptions": false,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "f0fb3ab4-a950-43b6-a583-6f0044f18c7f": {
                  "component_type": "PluginOutputMessageNode",
                  "id": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f",
                  "name": "format_diff_to_string",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "\"message\" input variable for the template",
                      "title": "message"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "the message added to the messages list",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "expose_message_as_output": true,
                  "message": "{{ message | string }}",
                  "input_mapping": {},
                  "output_mapping": {
                    "output_message": "$diff_to_string"
                  },
                  "message_type": "AGENT",
                  "rephrase": false,
                  "llm_config": null,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "6f6dc822-9352-47ae-9b48-173402a334fe": {
                  "component_type": "PluginRegexNode",
                  "id": "6f6dc822-9352-47ae-9b48-173402a334fe",
                  "name": "extract_file_path",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "raw text to extract information from",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                      "type": "string",
                      "title": "$filename",
                      "default": ""
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "input_mapping": {
                    "text": "$diff_to_string"
                  },
                  "output_mapping": {
                    "output": "$filename"
                  },
                  "regex_pattern": "diff --git a/(.+?) b/",
                  "return_first_match_only": true,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "0ce752d7-3ef1-481b-bb01-c7081ef86103": {
                  "component_type": "ExtendedLlmNode",
                  "id": "0ce752d7-3ef1-481b-bb01-c7081ef86103",
                  "name": "generate_comments",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "\"filename\" input variable for the template",
                      "type": "string",
                      "title": "$filename"
                    },
                    {
                      "description": "\"diff\" input variable for the template",
                      "type": "string",
                      "title": "$diff_with_lines"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "the generated text",
                      "type": "string",
                      "title": "$json_comments"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "llm_config": {
                    "component_type": "VllmConfig",
                    "id": "fb043839-1e69-404c-a178-d8c3de0bfe20",
                    "name": "LLAMA_MODEL_ID",
                    "description": null,
                    "metadata": {
                      "__metadata_info__": {}
                    },
                    "default_generation_parameters": null,
                    "url": "LLAMA_API_URL",
                    "model_id": "LLAMA_MODEL_ID"
                  },
                  "prompt_template": "You are a very experienced code reviewer. You are given a git diff on a file: {{ filename }}\n\n## Context\nThe git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.\nAfter the line number, a line that was changed has a \"+\" before the code. All lines without a \"+\" are just here for context, you will not comment on them.\n\n## Input\n### Code diff\n{{ diff }}\n\n## Task\nYour task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.\nThe rules are the following:\n\n\nName: TODO_WITHOUT_TICKET\nDescription: TODO comments should reference a ticket number for tracking.\nExample code:\n```python\n# TODO: Add validation here\ndef process_user_input(data):\n    return data\n```\nExample comment:\n[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., \"TODO: Add validation here (TICKET-1234)\").\n\n\n---\n\n\nName: MUTABLE_DEFAULT_ARGUMENT\nDescription: Using mutable objects as default arguments can lead to unexpected behavior.\nExample code:\n```python\ndef add_item(item, items=[]):\n    items.append(item)\n    return items\n```\nExample comment:\n[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`\n\n\n---\n\n\nName: NON_DESCRIPTIVE_NAME\nDescription: Variable names should clearly indicate their purpose or content.\nExample code:\n```python\ndef process(lst):\n    res = []\n    for i in lst:\n        res.append(i * 2)\n    return res\n```\nExample comment:\n[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'\n\n\n### Response Format\nYou need to return a review as a json as follows:\n```json\n[\n    {\n        \"content\": \"the comment as a text\",\n        \"suggestion\": \"if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line\",\n        \"line\": \"line number where the comment applies\"\n    },\n    \u2026\n]\n```\nPlease use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.\nIf there are no comments and the code seems fine, just output an empty JSON list.",
                  "input_mapping": {
                    "diff": "$diff_with_lines",
                    "filename": "$filename"
                  },
                  "output_mapping": {
                    "output": "$json_comments"
                  },
                  "prompt_template_object": null,
                  "send_message": false,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "48057b9c-bee7-4286-baf5-625b6f1a6f1a": {
                  "component_type": "PluginExtractNode",
                  "id": "48057b9c-bee7-4286-baf5-625b6f1a6f1a",
                  "name": "extract_comments_from_json",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "raw text to extract information from",
                      "type": "string",
                      "title": "$json_comments"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "The extracted comments content and line number",
                      "type": "array",
                      "items": {
                        "type": "object",
                        "additionalProperties": {},
                        "key_type": {
                          "type": "string"
                        }
                      },
                      "title": "$extracted_comments"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "input_mapping": {
                    "text": "$json_comments"
                  },
                  "output_mapping": {
                    "values": "$extracted_comments"
                  },
                  "output_values": {
                    "values": "[.[] | {\"content\": .[\"content\"], \"line\": .[\"line\"]}]"
                  },
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "e20f5870-d594-4089-9fcd-08146232910d": {
                  "component_type": "StartNode",
                  "id": "e20f5870-d594-4089-9fcd-08146232910d",
                  "name": "__StartStep__",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "\"message\" input variable for the template",
                      "title": "message"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "\"message\" input variable for the template",
                      "title": "message"
                    }
                  ],
                  "branches": [
                    "next"
                  ]
                },
                "39f36227-8910-414c-8b6b-517c0d65b0d8": {
                  "component_type": "EndNode",
                  "id": "39f36227-8910-414c-8b6b-517c0d65b0d8",
                  "name": "None End node",
                  "description": "End node representing all transitions to None in the WayFlow flow",
                  "metadata": {},
                  "inputs": [
                    {
                      "description": "The extracted comments content and line number",
                      "type": "array",
                      "items": {
                        "type": "object",
                        "additionalProperties": {},
                        "key_type": {
                          "type": "string"
                        }
                      },
                      "title": "$extracted_comments"
                    },
                    {
                      "description": "the generated text",
                      "type": "string",
                      "title": "$json_comments"
                    },
                    {
                      "type": "string",
                      "title": "$diff_with_lines"
                    },
                    {
                      "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                      "type": "string",
                      "title": "$filename",
                      "default": ""
                    },
                    {
                      "description": "the message added to the messages list",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "The extracted comments content and line number",
                      "type": "array",
                      "items": {
                        "type": "object",
                        "additionalProperties": {},
                        "key_type": {
                          "type": "string"
                        }
                      },
                      "title": "$extracted_comments"
                    },
                    {
                      "description": "the generated text",
                      "type": "string",
                      "title": "$json_comments"
                    },
                    {
                      "type": "string",
                      "title": "$diff_with_lines"
                    },
                    {
                      "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                      "type": "string",
                      "title": "$filename",
                      "default": ""
                    },
                    {
                      "description": "the message added to the messages list",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "branches": [],
                  "branch_name": "next"
                }
              }
            },
            "unpack_input": {
              "message": "."
            },
            "parallel_execution": false,
            "component_plugin_name": "NodesPlugin",
            "component_plugin_version": "25.4.0.dev0"
          },
          "367ae568-317d-42ec-ae70-4c41afe0dbd0": {
            "component_type": "StartNode",
            "id": "367ae568-317d-42ec-ae70-4c41afe0dbd0",
            "name": "__StartStep__",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "description": "iterated input for the map step",
                "type": "array",
                "items": {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                },
                "title": "$file_diff_list"
              }
            ],
            "outputs": [
              {
                "description": "iterated input for the map step",
                "type": "array",
                "items": {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                },
                "title": "$file_diff_list"
              }
            ],
            "branches": [
              "next"
            ]
          },
          "6f62aecf-03a1-4e38-b551-8eef0efaf4bb": {
            "component_type": "EndNode",
            "id": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb",
            "name": "None End node",
            "description": "End node representing all transitions to None in the WayFlow flow",
            "metadata": {},
            "inputs": [
              {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$filepath_list"
              },
              {
                "type": "array",
                "items": {},
                "title": "$nested_comment_list"
              }
            ],
            "outputs": [
              {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$filepath_list"
              },
              {
                "type": "array",
                "items": {},
                "title": "$nested_comment_list"
              }
            ],
            "branches": [],
            "branch_name": "next"
          }
        }
      }
    },
    "47e367be-4d74-49dc-ac3b-89bb97ffa7df": {
      "component_type": "FlowNode",
      "id": "47e367be-4d74-49dc-ac3b-89bb97ffa7df",
      "name": "retrieve_diff_flowstep",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "type": "string",
          "title": "$repo_dirpath_io"
        }
      ],
      "outputs": [
        {
          "type": "string",
          "title": "$raw_pr_diff"
        },
        {
          "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$file_diff_list",
          "default": []
        }
      ],
      "branches": [
        "next"
      ],
      "subflow": {
        "component_type": "Flow",
        "id": "9e7aed22-876c-4c32-9d44-20ee7ceb3771",
        "name": "Retrieve PR diff flow",
        "description": "",
        "metadata": {
          "__metadata_info__": {}
        },
        "inputs": [
          {
            "type": "string",
            "title": "$repo_dirpath_io"
          }
        ],
        "outputs": [
          {
            "type": "string",
            "title": "$raw_pr_diff"
          },
          {
            "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
            "type": "array",
            "items": {
              "type": "string"
            },
            "title": "$file_diff_list",
            "default": []
          }
        ],
        "start_node": {
          "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
        },
        "nodes": [
          {
            "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
          },
          {
            "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
          },
          {
            "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
          },
          {
            "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
          }
        ],
        "control_flow_connections": [
          {
            "component_type": "ControlFlowEdge",
            "id": "60dc14b8-d9b9-4aec-a958-9f3676848f48",
            "name": "start_step_to_get_pr_diff_control_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "from_node": {
              "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            }
          },
          {
            "component_type": "ControlFlowEdge",
            "id": "500f97de-78b1-42e0-944c-0375dfca734e",
            "name": "get_pr_diff_to_extract_into_list_of_file_diff_control_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "from_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            }
          },
          {
            "component_type": "ControlFlowEdge",
            "id": "22d0cf0d-8edb-4b04-8f54-a234f5705360",
            "name": "extract_into_list_of_file_diff_to_None End node_control_flow_edge",
            "description": null,
            "metadata": {},
            "from_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
            }
          }
        ],
        "data_flow_connections": [
          {
            "component_type": "DataFlowEdge",
            "id": "106e3740-de45-4472-8168-2873ae1dbc82",
            "name": "start_step_$repo_dirpath_io_to_get_pr_diff_$repo_dirpath_io_data_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "source_node": {
              "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
            },
            "source_output": "$repo_dirpath_io",
            "destination_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "destination_input": "$repo_dirpath_io"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "a32cbb1c-eafe-4138-80e2-2cf2e1248312",
            "name": "get_pr_diff_$raw_pr_diff_to_extract_into_list_of_file_diff_$raw_pr_diff_data_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "source_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "source_output": "$raw_pr_diff",
            "destination_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            },
            "destination_input": "$raw_pr_diff"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "3ef5dcf4-acdf-4962-8df6-07b53f249e18",
            "name": "get_pr_diff_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "source_output": "$raw_pr_diff",
            "destination_node": {
              "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
            },
            "destination_input": "$raw_pr_diff"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "08cbca39-e591-4cf4-9057-ae67938d9557",
            "name": "extract_into_list_of_file_diff_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            },
            "source_output": "$file_diff_list",
            "destination_node": {
              "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
            },
            "destination_input": "$file_diff_list"
          }
        ],
        "$referenced_components": {
          "5c73da9c-6ba9-44ce-aab1-212a78d0a720": {
            "component_type": "ExtendedToolNode",
            "id": "5c73da9c-6ba9-44ce-aab1-212a78d0a720",
            "name": "get_pr_diff",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "type": "string",
                "title": "$repo_dirpath_io"
              }
            ],
            "outputs": [
              {
                "type": "string",
                "title": "$raw_pr_diff"
              }
            ],
            "branches": [
              "next"
            ],
            "tool": {
              "component_type": "ServerTool",
              "id": "275aaf19-cdd4-4ed7-a436-e53f922cd740",
              "name": "local_get_pr_diff_tool",
              "description": "# docs-skiprow\nRetrieves code diff with a git command given the  # docs-skiprow\npath to the repository root folder.  # docs-skiprow",
              "metadata": {
                "__metadata_info__": {}
              },
              "inputs": [
                {
                  "type": "string",
                  "title": "repo_dirpath"
                }
              ],
              "outputs": [
                {
                  "type": "string",
                  "title": "tool_output"
                }
              ]
            },
            "input_mapping": {
              "repo_dirpath": "$repo_dirpath_io"
            },
            "output_mapping": {
              "tool_output": "$raw_pr_diff"
            },
            "raise_exceptions": true,
            "component_plugin_name": "NodesPlugin",
            "component_plugin_version": "25.4.0.dev0"
          },
          "4fcb7ebe-325b-446d-a46b-59187c30e260": {
            "component_type": "StartNode",
            "id": "4fcb7ebe-325b-446d-a46b-59187c30e260",
            "name": "start_step",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "type": "string",
                "title": "$repo_dirpath_io"
              }
            ],
            "outputs": [
              {
                "type": "string",
                "title": "$repo_dirpath_io"
              }
            ],
            "branches": [
              "next"
            ]
          },
          "cf841053-2414-48b6-ba6d-0f0f5e11044c": {
            "component_type": "PluginRegexNode",
            "id": "cf841053-2414-48b6-ba6d-0f0f5e11044c",
            "name": "extract_into_list_of_file_diff",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "description": "raw text to extract information from",
                "type": "string",
                "title": "$raw_pr_diff"
              }
            ],
            "outputs": [
              {
                "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$file_diff_list",
                "default": []
              }
            ],
            "branches": [
              "next"
            ],
            "input_mapping": {
              "text": "$raw_pr_diff"
            },
            "output_mapping": {
              "output": "$file_diff_list"
            },
            "regex_pattern": "(diff --git[\\s\\S]*?)(?=diff --git|$)",
            "return_first_match_only": false,
            "component_plugin_name": "NodesPlugin",
            "component_plugin_version": "25.4.0.dev0"
          },
          "dd0e56ab-1267-4345-9f59-ecc053baf2af": {
            "component_type": "EndNode",
            "id": "dd0e56ab-1267-4345-9f59-ecc053baf2af",
            "name": "None End node",
            "description": "End node representing all transitions to None in the WayFlow flow",
            "metadata": {},
            "inputs": [
              {
                "type": "string",
                "title": "$raw_pr_diff"
              },
              {
                "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$file_diff_list",
                "default": []
              }
            ],
            "outputs": [
              {
                "type": "string",
                "title": "$raw_pr_diff"
              },
              {
                "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$file_diff_list",
                "default": []
              }
            ],
            "branches": [],
            "branch_name": "next"
          }
        }
      }
    },
    "020c885e-6d0b-472a-bb91-246ab70ab1db": {
      "component_type": "StartNode",
      "id": "020c885e-6d0b-472a-bb91-246ab70ab1db",
      "name": "__StartStep__",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "type": "string",
          "title": "$repo_dirpath_io"
        }
      ],
      "outputs": [
        {
          "type": "string",
          "title": "$repo_dirpath_io"
        }
      ],
      "branches": [
        "next"
      ]
    },
    "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93": {
      "component_type": "EndNode",
      "id": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93",
      "name": "None End node",
      "description": "End node representing all transitions to None in the WayFlow flow",
      "metadata": {},
      "inputs": [
        {
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$filepath_list"
        },
        {
          "type": "array",
          "items": {},
          "title": "$nested_comment_list"
        },
        {
          "type": "string",
          "title": "$raw_pr_diff"
        },
        {
          "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$file_diff_list",
          "default": []
        }
      ],
      "outputs": [
        {
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$filepath_list"
        },
        {
          "type": "array",
          "items": {},
          "title": "$nested_comment_list"
        },
        {
          "type": "string",
          "title": "$raw_pr_diff"
        },
        {
          "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$file_diff_list",
          "default": []
        }
      ],
      "branches": [],
      "branch_name": "next"
    }
  },
  "agentspec_version": "25.4.1"
}

You can then load the configuration back to an assistant using the AgentSpecLoader.

from wayflowcore.agentspec import AgentSpecLoader

tool_registry = {
    "local_get_pr_diff_tool": local_get_pr_diff_tool,
    "format_git_diff": format_git_diff,
}

assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)

Note

This guide uses the following extension/plugin Agent Spec components:

  • PluginOutputMessageNode

  • PluginExtractNode

  • PluginRegexNode

  • ExtendedLlmNode

  • ExtendedToolNode

  • ExtendedMapNode

See the list of available Agent Spec extension/plugin components in the API Reference

Recap#

In this tutorial you learned how to build a simple PR bot using WayFlow Flows, and learned:

Finally, you learned how to structure code when building assistant as code and how to execute and combine sub flows to build complex assistant.

This is an example of the kind of fully featured tool that you can build with WayFlow.

Next Steps#

Now that you learned how to build a PR reviewing assistant, you may want to check our other guides such as:

Full Code#

Click on the card at the top of this page to download the full code for this guide or copy the code below.

  1# Copyright © 2025 Oracle and/or its affiliates.
  2#
  3# This software is under the Apache License 2.0
  4# %%[markdown]
  5# Tutorial - Build a Simple Code Review Assistant
  6# -----------------------------------------------
  7
  8# How to use:
  9# Create a new Python virtual environment and install the latest WayFlow version.
 10# ```bash
 11# python -m venv venv-wayflowcore
 12# source venv-wayflowcore/bin/activate
 13# pip install --upgrade pip
 14# pip install "wayflowcore==26.1.2" 
 15# ```
 16
 17# You can now run the script
 18# 1. As a Python file:
 19# ```bash
 20# python usecase_prbot.py
 21# ```
 22# 2. As a Notebook (in VSCode):
 23# When viewing the file,
 24#  - press the keys Ctrl + Enter to run the selected cell
 25#  - or Shift + Enter to run the selected cell and move to the cell below# (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
 26# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.
 27
 28# nosec
 29
 30
 31from types import MethodType
 32from typing import Dict, List
 33
 34
 35# %%[markdown]
 36## Define the LLM
 37
 38# %%
 39from wayflowcore.models import VllmModel
 40
 41llm = VllmModel(
 42    model_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
 43    host_port="VLLM_HOST_PORT",
 44)
 45
 46# %%[markdown]
 47## Define the tool that retrieves the PR diff
 48
 49# %%
 50from wayflowcore.tools import tool
 51
 52
 53@tool(description_mode="only_docstring")
 54def local_get_pr_diff_tool(repo_dirpath: str) -> str:
 55    """
 56    Retrieves code diff with a git command given the
 57    path to the repository root folder.
 58    """
 59    import subprocess  # nosec: documentation example invoking git locally
 60
 61    result = subprocess.run(
 62        ["git", "diff", "HEAD"],
 63        capture_output=True,
 64        cwd=repo_dirpath,
 65        text=True,
 66    )  # nosec: documentation example invoking git locally
 67    return result.stdout.strip()
 68
 69
 70# %%[markdown]
 71## Define a mocked PR diff
 72
 73# %%
 74MOCK_DIFF = """
 75diff --git src://calculators/utils.py dst://calculators/utils.py
 76index 12345678..90123456 100644
 77--- src://calculators/utils.py
 78+++ dst://calculators/utils.py
 79@@ -10,6 +10,15 @@
 80
 81 def calculate_total(data):
 82     # TODO: implement tax calculation
 83     return data
 84
 85+def get_items(items=[]):
 86+    result = []
 87+    for item in items:
 88+        result.append(item * 2)
 89+    return result
 90+
 91+def process_numbers(numbers):
 92+    res = []
 93+    for x in numbers:
 94+        res.append(x + 1)
 95+    return res
 96+
 97 def calculate_average(numbers):
 98     return sum(numbers) / len(numbers)
 99
100
101diff --git src://example/utils.py dst://example/utils.py
102index 000000000..123456789
103--- /dev/null
104+++ dst://example/utils.py
105@@ -0,0 +1,20 @@
106+# Copyright © 2024 Oracle and/or its affiliates.
107+
108+def calculate_sum(numbers=[]):
109+    total = 0
110+    for num in numbers:
111+        total += num
112+    return total
113+
114+
115+def process_data(data):
116+    # TODO: Handle exceptions here
117+    result = data * 2
118+    return result
119+
120+
121+def main():
122+    numbers = [1, 2, 3, 4, 5]
123+    result = calculate_sum(numbers)
124+    print("Sum:", result)
125+    data = 10
126+    processed_data = process_data(data)
127+    print("Processed Data:", processed_data)
128+
129+
130+if __name__ == "__main__":
131+    main()
132""".strip()
133
134
135
136# %%[markdown]
137## Create the flow that retrieves the diff of a PR
138
139# %%
140from wayflowcore.controlconnection import ControlFlowEdge
141from wayflowcore.dataconnection import DataFlowEdge
142from wayflowcore.flow import Flow
143from wayflowcore.property import StringProperty
144from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
145
146# IO Variable Names
147REPO_DIRPATH_IO = "$repo_dirpath_io"
148PR_DIFF_IO = "$raw_pr_diff"
149FILE_DIFF_LIST_IO = "$file_diff_list"
150
151# Define the steps
152
153start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
154
155# Step 1: Retrieve the pull request diff using the local tool
156get_pr_diff_step = ToolExecutionStep(
157    name="get_pr_diff",
158    tool=local_get_pr_diff_tool,
159    raise_exceptions=True,
160    input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
161    output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
162)
163
164# Step 2: Extract the file diffs from the raw diff using a regular expression
165extract_into_list_of_file_diff_step = RegexExtractionStep(
166    name="extract_into_list_of_file_diff",
167    regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
168    return_first_match_only=False,
169    input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
170    output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
171)
172
173# Define the sub flow
174retrieve_diff_subflow = Flow(
175    name="Retrieve PR diff flow",
176    begin_step=start_step,
177    control_flow_edges=[
178        ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
179        ControlFlowEdge(
180            source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
181        ),
182        ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
183    ],
184    data_flow_edges=[
185        DataFlowEdge(
186            source_step=start_step,
187            source_output=REPO_DIRPATH_IO,
188            destination_step=get_pr_diff_step,
189            destination_input=REPO_DIRPATH_IO,
190        ),
191        DataFlowEdge(
192            source_step=get_pr_diff_step,
193            source_output=PR_DIFF_IO,
194            destination_step=extract_into_list_of_file_diff_step,
195            destination_input=PR_DIFF_IO,
196        ),
197    ],
198)
199
200
201# %%[markdown]
202## Alternative step that retrieves the PR diff through an API call
203
204# %%
205from wayflowcore.retrypolicy import RetryPolicy
206from wayflowcore.steps import ApiCallStep
207
208# IO Variable Names
209USER_PROVIDED_TOKEN_IO = "$user_provided_token"  # nosec: placeholder IO variable name
210REPO_WORKSPACE_IO = "$repo_workspace"
211REPO_SLUG_IO = "$repo_slug"
212PULL_REQUEST_ID_IO = "$pull_request_id"
213PR_DIFF_IO = "$raw_pr_diff"
214
215get_pr_diff_step = ApiCallStep(
216    url="https://example.com/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}.diff",
217    method="GET",
218    headers={"Authorization": "Bearer {{token}}"},
219    ignore_bad_http_requests=False,
220    retry_policy=RetryPolicy(max_attempts=2),
221    store_response=True,
222    input_mapping={
223        "token": USER_PROVIDED_TOKEN_IO,
224        "workspace": REPO_WORKSPACE_IO,
225        "repo_slug": REPO_SLUG_IO,
226        "pr_id": PULL_REQUEST_ID_IO,
227    },
228    output_mapping={ApiCallStep.HTTP_RESPONSE: PR_DIFF_IO},
229)
230
231
232# %%[markdown]
233## Test the flow that retrieves the PR diff
234
235# %%
236from wayflowcore.executors.executionstatus import FinishedStatus
237
238# Replace the path below with the path to your actual codebase sample git repository.
239PATH_TO_DIR = "path/to/repository_root"
240
241test_conversation = retrieve_diff_subflow.start_conversation(
242    inputs={
243        REPO_DIRPATH_IO: PATH_TO_DIR,
244    }
245)
246
247execution_status = test_conversation.execute()
248
249if not isinstance(execution_status, FinishedStatus):
250    raise ValueError("Unexpected status type")
251
252FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
253
254print(FILE_DIFF_LIST[0])
255
256
257# %%[markdown]
258## Define the tool that formats the diff for the LLM
259
260# %%
261PR_BOT_CHECKS = [
262    """
263Name: TODO_WITHOUT_TICKET
264Description: TODO comments should reference a ticket number for tracking.
265Example code:
266```python
267# TODO: Add validation here
268def process_user_input(data):
269    return data
270```
271Example comment:
272[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
273""",
274    """
275Name: MUTABLE_DEFAULT_ARGUMENT
276Description: Using mutable objects as default arguments can lead to unexpected behavior.
277Example code:
278```python
279def add_item(item, items=[]):
280    items.append(item)
281    return items
282```
283Example comment:
284[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
285""",
286    """
287Name: NON_DESCRIPTIVE_NAME
288Description: Variable names should clearly indicate their purpose or content.
289Example code:
290```python
291def process(lst):
292    res = []
293    for i in lst:
294        res.append(i * 2)
295    return res
296```
297Example comment:
298[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
299""",
300]
301
302CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
303
304PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
305
306## Context
307The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
308After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
309
310## Input
311### Code diff
312{{diff}}
313
314## Task
315Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
316The rules are the following:
317
318{{checks}}
319
320### Response Format
321You need to return a review as a json as follows:
322```json
323[
324    {
325        "content": "the comment as a text",
326        "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
327        "line": "line number where the comment applies"
328    },
329
330]
331```
332Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
333If there are no comments and the code seems fine, just output an empty JSON list."""
334
335
336@tool(description_mode="only_docstring")
337def format_git_diff(diff_text: str) -> str:
338    """
339    Formats a git diff by adding line numbers to each line except removal lines.
340    """
341
342    def pad_number(number: int, width: int) -> str:
343        """Right-align a number with specified width using space padding."""
344        return str(number).rjust(width)
345
346    LINE_NUMBER_WIDTH = 5
347    PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
348    current_line_number = 0
349    formatted_lines = []
350
351    for line in diff_text.split("\n"):
352        # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
353        if line.startswith("@@"):
354            try:
355                # Extract the starting line number and line count
356                _, position_info, _ = line.split("@@")
357                new_file_info = position_info.split()[1][1:]  # Remove the '+' prefix
358                start_line, line_count = map(int, new_file_info.split(","))
359
360                current_line_number = start_line
361                formatted_lines.append(line)
362                continue
363
364            except (ValueError, IndexError):
365                raise ValueError(f"Invalid diff header format: {line}")
366
367        # Handle content lines
368        if current_line_number > 0 and line:
369            if not line.startswith("-"):
370                # Add line number for added/context lines
371                line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
372                formatted_lines.append(f"{line_prefix} {line}")
373                current_line_number += 1
374            else:
375                # Just add padding for removal lines
376                formatted_lines.append(" " * PADDING_WIDTH + line)
377
378    return "\n".join(formatted_lines)
379
380
381# %%[markdown]
382## Create the flow that generates review comments
383
384# %%
385from wayflowcore._utils._templating_helpers import render_template_partially
386from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
387from wayflowcore.steps import (
388    ExtractValueFromJsonStep,
389    MapStep,
390    OutputMessageStep,
391    PromptExecutionStep,
392    ToolExecutionStep,
393)
394
395# IO Variable Names
396DIFF_TO_STRING_IO = "$diff_to_string"
397DIFF_WITH_LINES_IO = "$diff_with_lines"
398FILEPATH_IO = "$filename"
399JSON_COMMENTS_IO = "$json_comments"
400EXTRACTED_COMMENTS_IO = "$extracted_comments"
401NESTED_COMMENT_LIST_IO = "$nested_comment_list"
402FILEPATH_LIST_IO = "$filepath_list"
403
404# Define the steps
405
406# Step 1: Format the diff to a string
407format_diff_to_string_step = OutputMessageStep(
408    name="format_diff_to_string",
409    message_template="{{ message | string }}",
410    output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
411)
412
413# Step 2: Add lines on the diff using a tool
414add_lines_on_diff_step = ToolExecutionStep(
415    name="add_lines_on_diff",
416    tool=format_git_diff,
417    input_mapping={"diff_text": DIFF_TO_STRING_IO},
418    output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
419)
420
421# Step 3: Extract the file path from the diff string using a regular expression
422extract_file_path_step = RegexExtractionStep(
423    name="extract_file_path",
424    regex_pattern=r"diff --git src://(.+?) dst://",
425    return_first_match_only=True,
426    input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
427    output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
428)
429
430# Step 4: Generate comments using a prompt
431generate_comments_step = PromptExecutionStep(
432    name="generate_comments",
433    prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
434    llm=llm,
435    input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
436    output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
437)
438
439# Step 5: Extract comments from the JSON output
440# Define the value type for extracted comments
441comments_valuetype = ListProperty(
442    name="values",
443    description="The extracted comments content and line number",
444    item_type=DictProperty(value_type=AnyProperty()),
445    default_value=[],
446)
447extract_comments_from_json_step = ExtractValueFromJsonStep(
448    name="extract_comments_from_json",
449    output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
450    retry=True,
451    llm=llm,
452    input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
453    output_mapping={"values": EXTRACTED_COMMENTS_IO},
454)
455
456# Define the sub flow to generate comments for each file diff
457generate_comments_subflow = Flow(
458    name="Generate review comments flow",
459    begin_step=format_diff_to_string_step,
460    control_flow_edges=[
461        ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
462        ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
463        ControlFlowEdge(extract_file_path_step, generate_comments_step),
464        ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
465        ControlFlowEdge(extract_comments_from_json_step, None),
466    ],
467    data_flow_edges=[
468        DataFlowEdge(
469            format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
470        ),
471        DataFlowEdge(
472            format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
473        ),
474        DataFlowEdge(
475            add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
476        ),
477        DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
478        DataFlowEdge(
479            generate_comments_step,
480            JSON_COMMENTS_IO,
481            extract_comments_from_json_step,
482            JSON_COMMENTS_IO,
483        ),
484    ],
485)
486
487# Use the MapStep to apply the sub flow to each file
488for_each_file_step = MapStep(
489    flow=generate_comments_subflow,
490    unpack_input={"message": "."},
491    input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
492    output_descriptors=[
493        ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
494        ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
495    ],
496    output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
497)
498
499generate_all_comments_subflow = Flow.from_steps([for_each_file_step])
500
501
502# %%[markdown]
503## Test the flow that generates review comments
504
505# %%
506# we reuse the FILE_DIFF_LIST from the previous test
507test_conversation = generate_all_comments_subflow.start_conversation(
508    inputs={
509        FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
510    }
511)
512
513execution_status = test_conversation.execute()
514
515if not isinstance(execution_status, FinishedStatus):
516    raise ValueError("Unexpected status type")
517
518NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
519FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
520print(NESTED_COMMENT_LIST[0])
521print(FILEPATH_LIST)
522
523
524
525# %%[markdown]
526## Create tool that formats the review comments
527
528# %%
529@tool(description_mode="only_docstring")
530def flatten_information(
531    nested_comments_list: List[List[Dict[str, str]]], filepath_list: List[str]
532) -> List[Dict[str, str]]:
533    """Flattens information from comments and filepaths."""
534    if len(nested_comments_list) != len(filepath_list):
535        raise ValueError(
536            f"Inconsistent list lengths ({len(nested_comments_list)=} and {len(filepath_list)=})"
537        )
538
539    result: List[Dict[str, str]] = []
540    for comments_list, filepath in zip(nested_comments_list, filepath_list):
541        for comment_dict in comments_list:
542            result.append(
543                {
544                    **{key: str(value) for key, value in comment_dict.items()},
545                    "path": filepath,
546                }
547            )
548
549    return result
550
551
552# %%[markdown]
553## Create flow that posts review comments to bitbucket
554
555# %%
556import json
557
558# IO Values
559PR_POST_URL_IO = "$pr_post_url"
560FLATTENED_COMMENT_LIST_IO = "$flattened_comment_list"
561FINAL_HTTP_CODES_IO = "$http_codes"
562
563# Define the steps
564
565# Step 1: Flatten the generated comments into a list of comments
566flatten_nested_comments_list_step = ToolExecutionStep(
567    name="flatten_nested_comment_list",
568    tool=flatten_information,
569    input_mapping={
570        "nested_comments_list": NESTED_COMMENT_LIST_IO,
571        "filepath_list": FILEPATH_LIST_IO,
572    },
573    output_mapping={ToolExecutionStep.TOOL_OUTPUT: FLATTENED_COMMENT_LIST_IO},
574)
575
576# Step 2: Post the comments to bitbucket
577post_comment_step = ApiCallStep(
578    url="https://example.com/rest/api/latest/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}/comments?diffType=EFFECTIVE&markup=true&avatarSize=48",
579    method="POST",
580    data=json.dumps(
581        {
582            "text": "{{content}}",
583            "severity": "NORMAL",
584            "anchor": {
585                "diffType": "EFFECTIVE",
586                "path": "{{path}}",
587                "lineType": "ADDED",
588                "line": "{{line | int}}",
589                "fileType": "TO",
590            },
591        }
592    ),
593    headers={"Accept": "application/json", "Authorization": "Bearer {{token}}"},
594    ignore_bad_http_requests=False,
595    retry_policy=RetryPolicy(max_attempts=2),
596    store_response=True,
597    input_mapping={
598        "token": USER_PROVIDED_TOKEN_IO,
599        "workspace": REPO_WORKSPACE_IO,
600        "repo_slug": REPO_SLUG_IO,
601        "pr_id": PULL_REQUEST_ID_IO,
602    },
603)
604
605post_comments_mapstep = MapStep(
606    name="post_comment",
607    flow=Flow.from_steps([post_comment_step]),
608    unpack_input={"content": ".content", "line": ".line", "path": ".path"},
609    input_mapping={MapStep.ITERATED_INPUT: FLATTENED_COMMENT_LIST_IO},
610    output_descriptors=[ApiCallStep.HTTP_STATUS_CODE],
611    output_mapping={ApiCallStep.HTTP_STATUS_CODE: FINAL_HTTP_CODES_IO},
612)
613
614post_comments_subflow = Flow(
615    name="Post comments to PR flow",
616    begin_step=flatten_nested_comments_list_step,
617    control_flow_edges=[
618        ControlFlowEdge(flatten_nested_comments_list_step, post_comments_mapstep),
619        ControlFlowEdge(post_comments_mapstep, None),
620    ],
621    data_flow_edges=[
622        DataFlowEdge(
623            flatten_nested_comments_list_step,
624            FLATTENED_COMMENT_LIST_IO,
625            post_comments_mapstep,
626            FLATTENED_COMMENT_LIST_IO,
627        )
628    ],
629)
630from wayflowcore.steps.step import StepResult
631
632
633async def _mock_api_post_step_invoke(self, inputs, conversation):
634    output_values = {ApiCallStep.HTTP_RESPONSE: MOCK_DIFF, ApiCallStep.HTTP_STATUS_CODE: 200}
635    return StepResult(
636        outputs=output_values,
637    )
638
639
640post_comment_step.invoke_async = MethodType(_mock_api_post_step_invoke, post_comment_step)
641
642
643# %%[markdown]
644## Test flow that posts review comments
645
646# %%
647# we reuse the NESTED_COMMENT_LIST and FILEPATH_LIST from the previous test
648
649test_conversation = post_comments_subflow.start_conversation(
650    inputs={
651        USER_PROVIDED_TOKEN_IO: "MY_TOKEN",
652        REPO_WORKSPACE_IO: "MY_REPO_WORKSPACE",
653        REPO_SLUG_IO: "MY_REPO_SLUG",
654        PULL_REQUEST_ID_IO: "MY_REPO_ID",
655        NESTED_COMMENT_LIST_IO: NESTED_COMMENT_LIST,
656        FILEPATH_LIST_IO: FILEPATH_LIST,
657    }
658)
659execution_status = test_conversation.execute()
660
661if not isinstance(execution_status, FinishedStatus):
662    raise ValueError("Unexpected status type")
663
664FINAL_HTTP_CODES = execution_status.output_values[FINAL_HTTP_CODES_IO]
665print(FINAL_HTTP_CODES)
666
667
668# %%[markdown]
669## Create flow that performs the review
670
671# %%
672from wayflowcore.steps import FlowExecutionStep
673
674
675# Steps
676retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
677generate_all_comments_flowstep = FlowExecutionStep(
678    name="generate_comments_flowstep",
679    flow=generate_all_comments_subflow,
680)
681
682pr_bot = Flow(
683    name="PR bot flow",
684    begin_step=retrieve_diff_flowstep,
685    control_flow_edges=[
686        ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
687        ControlFlowEdge(generate_all_comments_flowstep, None),
688    ],
689    data_flow_edges=[
690        DataFlowEdge(
691            retrieve_diff_flowstep,
692            FILE_DIFF_LIST_IO,
693            generate_all_comments_flowstep,
694            FILE_DIFF_LIST_IO,
695        )
696    ],
697)
698
699
700# %%[markdown]
701## Tests flow that performs the review
702
703# %%
704# Replace the path below with the path to your actual codebase sample git repository.
705PATH_TO_DIR = "path/to/repository_root"
706
707conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
708
709execution_status = conversation.execute()
710
711if not isinstance(execution_status, FinishedStatus):
712    raise ValueError("Unexpected status type")
713
714print(execution_status.output_values)
715
716NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
717
718
719# %%[markdown]
720## Export config to Agent Spec
721
722# %%
723from wayflowcore.agentspec import AgentSpecExporter
724
725serialized_assistant = AgentSpecExporter().to_json(pr_bot)
726
727
728# %%[markdown]
729## Load Agent Spec config
730
731# %%
732from wayflowcore.agentspec import AgentSpecLoader
733
734tool_registry = {
735    "local_get_pr_diff_tool": local_get_pr_diff_tool,
736    "format_git_diff": format_git_diff,
737}
738
739assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)