Build a Simple Code Review Assistant#

python-icon Download Python Script

Python script/notebook for this guide.

Simple Code Review Assistant tutorial script

Prerequisites

This guide does not assume any prior knowledge about Project WayFlow. However, it assumes the reader has a basic knowledge of LLMs.

You will need a working installation of WayFlow - see Installation.

Learning goals#

In this use-case tutorial, you will build a more advanced WayFlow application, a Pull Request (PR) Reviewing Assistant, using a WayFlow Flow to automate basic reviews of Python source code.

In this tutorial you will:

  1. Learn the basics of using Flows to build an assistant.

  2. Learn how to compose multiple sub-flows to create a more complex Flow.

  3. Learn more about building Tools that can be used within your Flows.

You can download a Jupyter Notebook for this use-case to follow along from Code PR Review Bot Tutorial.

Introduction to the task#

Code reviews are crucial for maintaining code quality and reviewers often spend considerable time pointing out routine issues such as the presence of debug statements, formatting inconsistencies, or common coding convention violations that may not be fully captured by static code analysis tools. This consumes valuable time that could be spent on reviewing more important things such as the core logic, architecture, and business requirements.

Note

Building an agent with WayFlow to perform such code reviews has a number of advantages:

  1. Review rules can be written using natural language, making an agent much more flexible than a simple static checker.

  2. Writing rules in natural language makes updating the rules very easy.

  3. More general issues can be captured. You can allow the LLM to infer from the rule to more general cases that could be missed by a simple static checker.

  4. New review rules can be generated from the collected comments of existing PRs.

In this tutorial, you will create a WayFlow Flow assistant designed to scan Python pull requests for common oversights such as:

  • Having TODO comments without associated tickets.

  • Using unclear or ambiguous variable naming.

  • Using risky Python code practices such as mutable defaults.

To build this assistant you will break the task into configuration and two sub-flows that will be composed into a single flow:

Complete Flow of the PR Bot

  1. Configure your application, choose an LLM and import required modules [Part 1].

  2. The first sub-flow retrieves and diffs information from a local codebase in a Git repository [Part 2].

  3. The second sub-flow iterates over the file diffs using a MapStep and generates comments with an LLM using the PromptExecutionStep [Step 3].

You will also learn how to extract information using the RegexExtractionStep and the ExtractValueFromJsonStep, and how to build and execute tools with the ServerTool and the ToolExecutionStep.

Note

This is not a production-ready code review assistant that can be used as-is.

Setup#

First, let’s set up the environment. For this tutorial you need to have wayflowcore installed (for additional information please read the installation guide).

Next download the example codebase Git repository, example codebase Git repository. This will be used to generate the sample code diffs for the assistant to review.

Extract the codebase Git repository folder from the compressed archive. Make a note of where the codebase Git repository is extracted to.

Part 1: Imports and LLM configuration#

First, set up the environment. For this tutorial you need to have wayflowcore installed, for additional information, read the installation guide.

WayFlow supports several LLMs API providers. To learn more about the supported LLM providers, read the guide, how to use LLMs from different providers.

First choose an LLM from one of the options below:

from wayflowcore.models import OCIGenAIModel

if __name__ == "__main__":

    llm = OCIGenAIModel(
        model_id="provider.model-id",
        service_endpoint="https://url-to-service-endpoint.com",
        compartment_id="compartment-id",
        auth_type="API_KEY",
    )

Note

API keys should never be stored in code. Use environment variables and/or tools such as python-dotenv instead.

Be cautious when using external LLM providers and ensure that you comply with your organization’s security policies and any applicable laws and regulations. Consider using a self-hosted LLM solution or a provider that offers on-premises deployment options if you need to maintain strict control over your code and data.

Part 2: Retrieve the PR diff information#

The first phase of the assistant requires retrieving information about the code diffs from a code repository. You have already extracted the sample codebase Git repository to your local environment.

This will be a sub-flow that consists of two simple steps:

  • ToolExecutionStep that collects PR diff information using a Python subprocess to run the Git command.

  • RegexExtractionStep which separates the raw diff information into diffs for each file.

Steps to retrieve the PR diff information

First, take a look at what a diff looks like. The following example shows how a real diff appears when using Git:

MOCK_DIFF = """
diff --git src://calculators/utils.py dst://calculators/utils.py
index 12345678..90123456 100644
--- src://calculators/utils.py
+++ dst://calculators/utils.py
@@ -10,6 +10,15 @@

 def calculate_total(data):
     # TODO: implement tax calculation
     return data

+def get_items(items=[]):
+    result = []
+    for item in items:
+        result.append(item * 2)
+    return result
+
+def process_numbers(numbers):
+    res = []
+    for x in numbers:
+        res.append(x + 1)
+    return res
+
 def calculate_average(numbers):
     return sum(numbers) / len(numbers)


diff --git src://example/utils.py dst://example/utils.py
index 000000000..123456789
--- /dev/null
+++ dst://example/utils.py
@@ -0,0 +1,20 @@
+# Copyright © 2024 Oracle and/or its affiliates.
+
+def calculate_sum(numbers=[]):
+    total = 0
+    for num in numbers:
+        total += num
+    return total
+
+
+def process_data(data):
+    # TODO: Handle exceptions here
+    result = data * 2
+    return result
+
+
+def main():
+    numbers = [1, 2, 3, 4, 5]
+    result = calculate_sum(numbers)
+    print("Sum:", result)
+    data = 10
+    processed_data = process_data(data)
+    print("Processed Data:", processed_data)
+
+
+if __name__ == "__main__":
+    main()
""".strip()

Reading a diff: Removals are identified by the “-” marks and additions by the “+” marks. In this example, there were only additions.

The diff above contains information about two files, calculators/utils.py and example/utils.py. This is an example diff and it is different from the diff that will be generated from the sample codebase. It is included here to show how a Git diff looks and is shorter than the diff that you generate from the sample codebase.

Build a tool#

You need to create a tool to extract a code diff from the local code repository. The @tool decorator can be used for that purpose by simply wrapping a Python function.

The function, local_get_pr_diff_tool, in the code below does the work of extracting the diffs by running the git diff HEAD shell command and capturing the output. It uses a subprocess to run the shell command.

To turn this function into a WayFlow tool, a @tool annotation is used to create a ServerTool from the function.

 1from wayflowcore.tools import tool
 2
 3
 4@tool(description_mode="only_docstring")
 5def local_get_pr_diff_tool(repo_dirpath: str) -> str:
 6    """
 7    Retrieves code diff with a git command given the
 8    path to the repository root folder.
 9    """
10    import subprocess
11
12    result = subprocess.run(
13        ["git", "diff", "HEAD"],
14        capture_output=True,
15        cwd=repo_dirpath,
16        text=True,
17    )
18    return result.stdout.strip()

Building the steps and the sub-flow#

Let’s write the code for the first sub-flow.

 1from wayflowcore.controlconnection import ControlFlowEdge
 2from wayflowcore.dataconnection import DataFlowEdge
 3from wayflowcore.flow import Flow
 4from wayflowcore.property import StringProperty
 5from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
 6
 7# IO Variable Names
 8REPO_DIRPATH_IO = "$repo_dirpath_io"
 9PR_DIFF_IO = "$raw_pr_diff"
10FILE_DIFF_LIST_IO = "$file_diff_list"
11
12# Define the steps
13
14start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
15
16# Step 1: Retrieve the pull request diff using the local tool
17get_pr_diff_step = ToolExecutionStep(
18    name="get_pr_diff",
19    tool=local_get_pr_diff_tool,
20    raise_exceptions=True,
21    input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
22    output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
23)
24
25# Step 2: Extract the file diffs from the raw diff using a regular expression
26extract_into_list_of_file_diff_step = RegexExtractionStep(
27    name="extract_into_list_of_file_diff",
28    regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
29    return_first_match_only=False,
30    input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
31    output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
32)
33
34# Define the sub flow
35retrieve_diff_subflow = Flow(
36    name="Retrieve PR diff flow",
37    begin_step=start_step,
38    control_flow_edges=[
39        ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
40        ControlFlowEdge(
41            source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
42        ),
43        ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
44    ],
45    data_flow_edges=[
46        DataFlowEdge(
47            source_step=start_step,
48            source_output=REPO_DIRPATH_IO,
49            destination_step=get_pr_diff_step,
50            destination_input=REPO_DIRPATH_IO,
51        ),
52        DataFlowEdge(
53            source_step=get_pr_diff_step,
54            source_output=PR_DIFF_IO,
55            destination_step=extract_into_list_of_file_diff_step,
56            destination_input=PR_DIFF_IO,
57        ),
58    ],
59)

API Reference: Flow | RegexExtractionStep | ToolExecutionStep | API Reference: tool

The code does the following:

  1. It lists the names of the steps and input/output variables for the sub-flow.

  2. It then creates the different steps within the sub-flow.

  3. Finally, it instantiates the sub-flow. This will be covered in more detail later in the tutorial.

For clarity, the variable names are also prefixed with a dollar ($) sign. This is not necessary and is only done for code clarity. The variable REPO_DIRPATH_IO is used to hold the file path to the sample codebase Git repository and you will use this to pass in the location of the codebase Git repository.

Additionally, you can give explicit names to the input/output variables used in the Flow, e.g. “$repo_dirpath_io” for the variable holding the path to the local repository. Finally, we define those explicit names as string variables (e.g. REPO_DIRPATH_IO) to minimize the number of magic strings in the code.

See also

To learn about the basics of Flows, check out our, introductory tutorial on WayFlow Flows.

Now take a look at each of the steps used in the sub-flow in more detail.

Get the PR diff, get_pr_diff_step#

This uses a ToolExecutionStep to gather the diff information - see the notes on how this is done earlier. When creating it, you need to provide the following:

  • tool: Specifies the tool that will called within the step. This is the tool that was created earlier, local_get_pr_diff_tool.

  • raise_exceptions: Whether to raise exceptions generated by the tool that is called. Here it is set to True and so exceptions will be raised.

  • input_mapping: Specifies the names used for the input parameters of the step. See ToolExecutionStep for more details on using an input_mapping with this type of step.

  • output_mapping: Specifies the name used foe the output parameter of the step. The name held in PR_DIFF_IO will be mapped to the name for the output parameter of the step. Again, see ToolExecutionStep for more details on using an output_mapping with this type of step.

Extract file diffs into a list, extract_into_list_of_file_diff_step#

You now have the diff information from the PR. This step performs a regex extraction on the raw diff text to extract the code to review.

Use a RegexExtractionStep to perform this action. When creating the step, you need to provide the following:

  • regex_pattern: The regex pattern for the extraction. This uses re.findall underneath.

  • return_first_match_only: You want to return all results, so set this to False.

  • input_mapping: Specifies the names used for the input parameters of the step. The input parameter will be mapped to the name, held in PR_DIFF_IO. See RegexExtractionStep for more details on using an input_mapping with this type of step.

  • output_mapping: Specifies the name used for the output parameter of the step. Here, the default name RegexExtractionStep.TEXT is renamed to the name defined in PR_DIFF_IO. Again, see RegexExtractionStep for more details on using an output_mapping with this type of step.

About the pattern:

(diff --git[\s\S]*?)(?=diff --git|$)

The pattern looks for text starting with diff --git, followed by any characters (both whitespace [s] and non-whitespace [S]), until it encounters either another diff --git or the end of the text ($). However, it does not include the next diff --git or the end in the match.

The *? makes it “lazy” or non-greedy, meaning it takes the shortest possible match, rather than the longest.

Tip

Recent Large Language Models are very helpful tools to create, debug and explain Regex patterns given a natural language description.

Finally, create the sub-flow using the Flow class. You specify the steps in the Flow, the starting step of the Flow, the transitions between steps and how data, from the variables, is to pass from one step to the next.

The transitions between steps are defined with ControlFlowEdges. These take a source step and a destination step. Each ControlFlowEdge maps one such transition.

Passing values between steps is a very common occurrence when building Flows. This is done using DataFlowEdges which define that a value is passed from one step to another.

Inputs to a step will most commonly be for parameters within a Jinja template, of which there are several examples of in this tutorial, or parameters to callables used by tools. In a DataFlowEdge you can use the name of the parameter, a string, to act as the destination of a value that is being passed in. It is often less error-prone if you create a variable that is set to the name.

Similarly, when a value is the output of a step, such as when a user’s input is captured in an InputMessageStep, the value is available as a property of the step, for example InputMessageStep.USER_PROVIDED_INPUT. But, it lacks a meaningful name, so it is often helpful to specify one. This is done using an output_mapping when creating the step. Again, you will want to create a variable to hold the name to avoid errors.

Defining a Flow#

Defining the Flow is the last step in the code shown above. There are a couple of things that are worth highlighting:

  • begin_step: A start step needs to be defined for a Flow.

  • control_flow_edges: The transitions between the steps in the Flow are defined as ControlFlowEdges. They have a source_step, which defines the start of a transition, and a destination_step, which defines the destination of a transition. All transitions for the flow will need to be defined.

  • data_flow_edges: Maps the variables between steps connected by a transition using DataFlowEdges. It maps variables from a source step into variables in a destination step. You only need to do this for the variables that need to be passed between steps.

Testing the flow#

You can test this sub-flow by creating an assistant conversation with Flow.start_conversation() and specifying the inputs, in this case the location of the Git repository. The conversation can then be executed with Conversation.execute(). This returns an object that represents the status of the conversation which you can check to confirm that the conversation has successfully finished.

The code below shows how the inputs are passed in. Set the PATH_TO_DIR to the actual path you extracted the sample codebase Git repository to. You then extract the outputs from the conversation.

The full code for testing the sub-flow is shown below:

 1from wayflowcore.executors.executionstatus import FinishedStatus
 2
 3# Replace the path below with the path to your actual codebase sample git repository.
 4PATH_TO_DIR = "path/to/repository_root"
 5
 6test_conversation = retrieve_diff_subflow.start_conversation(
 7    inputs={
 8        REPO_DIRPATH_IO: PATH_TO_DIR,
 9    }
10)
11
12execution_status = test_conversation.execute()
13
14if not isinstance(execution_status, FinishedStatus):
15    raise ValueError("Unexpected status type")
16
17FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
18
19print(FILE_DIFF_LIST[0])

API Reference: Flow

Part 3: Review the list of diffs#

Now that we have a list of diffs for each file, we can review them and generate comments using an LLM.

This task can be broken into a sub-flow made up of five steps:

Sub Flow to review the PR diffs

Build the tools and checks#

Before creating the steps and sub-flow to generate the comments, it is important to define the list of checks the assistant should perform, along with any specific instructions. Additionally, a tool must be created to prefix the diffs with line numbers, allowing the LLM to determine where to add comments.

Below is the full code to achieve this. It is broken into sections so that you can see, in detail, what is happening in each part.

  1PR_BOT_CHECKS = [
  2    """
  3Name: TODO_WITHOUT_TICKET
  4Description: TODO comments should reference a ticket number for tracking.
  5Example code:
  6```python
  7# TODO: Add validation here
  8def process_user_input(data):
  9    return data
 10```
 11Example comment:
 12[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
 13""",
 14    """
 15Name: MUTABLE_DEFAULT_ARGUMENT
 16Description: Using mutable objects as default arguments can lead to unexpected behavior.
 17Example code:
 18```python
 19def add_item(item, items=[]):
 20    items.append(item)
 21    return items
 22```
 23Example comment:
 24[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
 25""",
 26    """
 27Name: NON_DESCRIPTIVE_NAME
 28Description: Variable names should clearly indicate their purpose or content.
 29Example code:
 30```python
 31def process(lst):
 32    res = []
 33    for i in lst:
 34        res.append(i * 2)
 35    return res
 36```
 37Example comment:
 38[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
 39""",
 40]
 41
 42CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
 43
 44PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
 45
 46## Context
 47The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
 48After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
 49
 50## Input
 51### Code diff
 52{{diff}}
 53
 54## Task
 55Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
 56The rules are the following:
 57
 58{{checks}}
 59
 60### Reponse Format
 61You need to return a review as a json as follows:
 62```json
 63[
 64    {
 65        "content": "the comment as a text",
 66        "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
 67        "line": "line number where the comment applies"
 68    },
 69
 70]
 71```
 72Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
 73If there are no comments and the code seems fine, just output an empty JSON list."""
 74
 75
 76@tool(description_mode="only_docstring")
 77def format_git_diff(diff_text: str) -> str:
 78    """
 79    Formats a git diff by adding line numbers to each line except removal lines.
 80    """
 81
 82    def pad_number(number: int, width: int) -> str:
 83        """Right-align a number with specified width using space padding."""
 84        return str(number).rjust(width)
 85
 86    LINE_NUMBER_WIDTH = 5
 87    PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
 88    current_line_number = 0
 89    formatted_lines = []
 90
 91    for line in diff_text.split("\n"):
 92        # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
 93        if line.startswith("@@"):
 94            try:
 95                # Extract the starting line number and line count
 96                _, position_info, _ = line.split("@@")
 97                new_file_info = position_info.split()[1][1:]  # Remove the '+' prefix
 98                start_line, line_count = map(int, new_file_info.split(","))
 99
100                current_line_number = start_line
101                formatted_lines.append(line)
102                continue
103
104            except (ValueError, IndexError):
105                raise ValueError(f"Invalid diff header format: {line}")
106
107        # Handle content lines
108        if current_line_number > 0 and line:
109            if not line.startswith("-"):
110                # Add line number for added/context lines
111                line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
112                formatted_lines.append(f"{line_prefix} {line}")
113                current_line_number += 1
114            else:
115                # Just add padding for removal lines
116                formatted_lines.append(" " * PADDING_WIDTH + line)
117
118    return "\n".join(formatted_lines)

API Reference: ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep

Checks and LLM instructions#

You will use three simple checks that are shown below. For each check you specify a name, a description of what the LLM should be checking, as well as a code and expected comment example so that the LLM gets a better understanding of what the task is about.

The prompt uses a simple structure:

  1. Role Definition: Define who/what you want the LLM to act as (e.g., “You are a very experienced code reviewer”).

  2. Context Section: Provide relevant background information or specific circumstances that frame the task.

  3. Input Section: Specify the exact information, data, or materials that the LLM will be provided with.

  4. Task Section: Clearly state what you want the LLM to do with the input provided.

  5. Response Format Section: Define how you want the response to be structured or formatted (e.g., bullet points, JSON, with XML tags, and so on).

The prompts are defined in the array, PR_BOT_CHECKS. The individual prompts for the checks are then concatenated into a single string, CONCATENATED_CHECKS, so that it can be used inside the system prompt you will be passing to the LLM.

Define a system prompt, or prompt template, PROMPT_TEMPLATE. It contains placeholders for the diff and the checks that will be replaced when specialising the prompt for each diff.

Tip

How to write high-quality prompts

There is no consensus on what makes the best LLM prompt. However, it is noted that for recent LLMs, a great strategy to use to prompt an LLM is simply to be very specific about the task to be solved, giving enough context and explaining potential edge cases to consider.

Given a prompt, try to determine whether giving the set of instructions to an experienced colleague, that has no prior context about the task, to solve would be sufficient for them to get to the intended result.

Diff formatting tool#

You next need to create a tool using the ServerTool to format the diffs in a manner that makes them consumable by the LLM. A tool, as you will have already seen, is a simple wrapper around a python callable that makes it useable within a flow.

The function, format_git_diff, in the code above does the work of formatting the diffs.

See also

For more information about WayFlow tools please read our guide, How to use tools.

Building the steps and the sub-flow#

With the prompts and diff formatting tool written you can now build the second sub-flow. This sub-flow will iterate over the diffs, generated previously, and then use an LLM to generate review comments from them.

  1from wayflowcore._utils._templating_helpers import render_template_partially
  2from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
  3from wayflowcore.steps import (
  4    ExtractValueFromJsonStep,
  5    MapStep,
  6    OutputMessageStep,
  7    PromptExecutionStep,
  8    ToolExecutionStep,
  9)
 10
 11# IO Variable Names
 12DIFF_TO_STRING_IO = "$diff_to_string"
 13DIFF_WITH_LINES_IO = "$diff_with_lines"
 14FILEPATH_IO = "$filename"
 15JSON_COMMENTS_IO = "$json_comments"
 16EXTRACTED_COMMENTS_IO = "$extracted_comments"
 17NESTED_COMMENT_LIST_IO = "$nested_comment_list"
 18FILEPATH_LIST_IO = "$filepath_list"
 19
 20# Define the steps
 21
 22# Step 1: Format the diff to a string
 23format_diff_to_string_step = OutputMessageStep(
 24    name="format_diff_to_string",
 25    message_template="{{ message | string }}",
 26    output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
 27)
 28
 29# Step 2: Add lines on the diff using a tool
 30add_lines_on_diff_step = ToolExecutionStep(
 31    name="add_lines_on_diff",
 32    tool=format_git_diff,
 33    input_mapping={"diff_text": DIFF_TO_STRING_IO},
 34    output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
 35)
 36
 37# Step 3: Extract the file path from the diff string using a regular expression
 38extract_file_path_step = RegexExtractionStep(
 39    name="extract_file_path",
 40    regex_pattern=r"diff --git a/(.+?) b/",
 41    return_first_match_only=True,
 42    input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
 43    output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
 44)
 45
 46# Step 4: Generate comments using a prompt
 47generate_comments_step = PromptExecutionStep(
 48    name="generate_comments",
 49    prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
 50    llm=llm,
 51    input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
 52    output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
 53)
 54
 55# Step 5: Extract comments from the JSON output
 56# Define the value type for extracted comments
 57comments_valuetype = ListProperty(
 58    name="values",
 59    description="The extracted comments content and line number",
 60    item_type=DictProperty(value_type=AnyProperty()),
 61)
 62extract_comments_from_json_step = ExtractValueFromJsonStep(
 63    name="extract_comments_from_json",
 64    output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
 65    retry=True,
 66    llm=llm,
 67    input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
 68    output_mapping={"values": EXTRACTED_COMMENTS_IO},
 69)
 70
 71# Define the sub flow to generate comments for each file diff
 72generate_comments_subflow = Flow(
 73    name="Generate review comments flow",
 74    begin_step=format_diff_to_string_step,
 75    control_flow_edges=[
 76        ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
 77        ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
 78        ControlFlowEdge(extract_file_path_step, generate_comments_step),
 79        ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
 80        ControlFlowEdge(extract_comments_from_json_step, None),
 81    ],
 82    data_flow_edges=[
 83        DataFlowEdge(
 84            format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
 85        ),
 86        DataFlowEdge(
 87            format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
 88        ),
 89        DataFlowEdge(
 90            add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
 91        ),
 92        DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
 93        DataFlowEdge(
 94            generate_comments_step,
 95            JSON_COMMENTS_IO,
 96            extract_comments_from_json_step,
 97            JSON_COMMENTS_IO,
 98        ),
 99    ],
100)
101
102# Use the MapStep to apply the sub flow to each file
103for_each_file_step = MapStep(
104    flow=generate_comments_subflow,
105    unpack_input={"message": "."},
106    input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
107    output_descriptors=[
108        ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
109        ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
110    ],
111    output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
112)
113
114generate_all_comments_subflow = Flow.from_steps([for_each_file_step])

API Reference: Property | ListProperty | DictProperty | StringProperty | ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep

Take a look at each of the steps used in the sub-flow to get an understanding of what is happening.

Format diff to string, format_diff_to_string_step#

This step converts the file diff list into a string so that it can be used by the following steps.

This is done with the string Jinja filter as follows: {{ message | string }}. It uses an OutputMessageStep to achieve this.

Add lines to the diff, add_lines_on_diff_step#

This step prefixes the diff with the line numbers required to review comments. It uses a, ToolExecutionStep, to run the tool that you previously defined in order to do this.

The input to the tool, within the I/O dictionary, is specified using the input_mapping. For all these steps, it is important to remember that the outputs of one step are linked to the inputs of the next.

Extract file path, extract_file_path_step#

This extracts the file path from the diff string. The file path is needed for assigning the review comments. The RegexExtractionStep step is used to extract the file path from the diff.

The regular expression is applied to the diff string, extracted form the input map using the input_mapping parameter.

Note: Compared to the RegexExtractionStep used in Part 1, here only the first match is required.

Generate comments, generate_comments_step#

This generates comments using the LLM and the prompt template defined earlier. The PromptExecutionStep step executes the prompt with the LLM defined earlier in this tutorial.

Since the list of checks has already been defined, the template can be pre-rendered using the render_template_partially method. This renders the parts of the template that have been provided, while the remaining information is gathered from the I/O dictionary.

Extract comments from JSON, extract_comments_from_json_step#

This extracts the comments and line numbers from the generated LLM output, which is a serialized JSON structure due to the prompt used. A ExtractValueFromJsonStep is used to do the extraction. When creating the step, specify the following in addition to the usual input_mapping and output_mapping:

  • output_values: This defines the JQ query to extract the comments form the JSON generated by the LLM.

  • llms: An LLM that can be used to help resolve any parsing errors. This is related to retry.

  • retry: If parsing fails, you may want to retry. This is set to True, which results in trying to use the LLM to help resolve any such issues.

Create the sub-flow, generate_comments_subflow#

Here you define what steps are in the sub-flow, what the transitions between the steps are and what will be the starting step. This is exactly the same process you did previously when defining the sub-flow to fetch the PR data.

Applying the comment generation to all file diffs#

Now that you have the sub-flow create, you need to apply it to every file diff. This is done using a MapStep. MapStep takes a sub-flow as input, in this case, the generate_comments_subflow, and applies it to an iterable—in this case, the list of file diffs.

You simply specify:

  • flow: The sub-flow to map, that is applied to the iterable.

  • unpack_input: Defines how to unpack the input. A JQ query can be used to transform the input, but in this case, it is kept as a list.

  • input_mapping: Defines what the sub-flow will iterate over. The key, MapStep.ITERATED_INPUT, is used to pass in the diffs.

  • output_descriptors: Specifies the values to collect from the output generated by applying the sub-flow. In this case, these will be the generated comments and the associated file path.

Note

The MapStep works similarly to how the Python map function works. For more information, see https://docs.python.org/3/library/functions.html#map

Finally, create the sub-flow to generate all comments using the helper method create_single_step_flow.

Testing the sub-flow#

You can test the sub-flow by creating a conversation, as shown in the code below, and specifying the inputs as done in, Part 2: Retrieve the PR diff information.

Since each sub-flow is tested independently, you can reuse the output from the first sub-flow.

 1# we reuse the FILE_DIFF_LIST from the previous test
 2test_conversation = generate_all_comments_subflow.start_conversation(
 3    inputs={
 4        FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
 5    }
 6)
 7
 8execution_status = test_conversation.execute()
 9
10if not isinstance(execution_status, FinishedStatus):
11    raise ValueError("Unexpected status type")
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
14FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
15print(NESTED_COMMENT_LIST[0])
16print(FILEPATH_LIST)

Building the final Flow#

Congratulations! You have completed the three sub-flows, which, when combined into a single flow, will retrieve the PR diff information, generate comments on the diffs using an LLM.

You will wire the sub-flows that you have built together by wrapping them in a FlowExecutionStep. The FlowExecutionSteps are then composed into the final combined Flow.

The code for this is shown below:

 1from wayflowcore.steps import FlowExecutionStep
 2
 3
 4# Steps
 5retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
 6generate_all_comments_flowstep = FlowExecutionStep(
 7    name="generate_comments_flowstep",
 8    flow=generate_all_comments_subflow,
 9)
10
11pr_bot = Flow(
12    name="PR bot flow",
13    begin_step=retrieve_diff_flowstep,
14    control_flow_edges=[
15        ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
16        ControlFlowEdge(generate_all_comments_flowstep, None),
17    ],
18    data_flow_edges=[
19        DataFlowEdge(
20            retrieve_diff_flowstep,
21            FILE_DIFF_LIST_IO,
22            generate_all_comments_flowstep,
23            FILE_DIFF_LIST_IO,
24        )
25    ],
26)

API Reference: Flow | FlowExecutionStep

Testing the combined assistant#

You can now run the PR bot end-to-end on your repo or locally.

Set the PATH_TO_DIR to the actual path you extracted the sample codebase Git repository to. You can also see how the output of the conversation is extracted from the execution_status object, execution_status.output_values.

 1# Replace the path below with the path to your actual codebase sample git repository.
 2PATH_TO_DIR = "path/to/repository_root"
 3
 4conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
 5
 6execution_status = conversation.execute()
 7
 8if not isinstance(execution_status, FinishedStatus):
 9    raise ValueError("Unexpected status type")
10
11print(execution_status.output_values)
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]

Agent Spec Exporting/Loading#

You can export the assistant configuration to its Agent Spec configuration using the AgentSpecExporter.

from wayflowcore.agentspec import AgentSpecExporter

serialized_assistant = AgentSpecExporter().to_json(pr_bot)

Here is what the Agent Spec representation will look like ↓

Click here to see the assistant configuration.
{
  "component_type": "Flow",
  "id": "9c65246d-a0dd-4ec4-801d-afd640b2488e",
  "name": "PR bot flow",
  "description": "",
  "metadata": {
    "__metadata_info__": {}
  },
  "inputs": [
    {
      "type": "string",
      "title": "$repo_dirpath_io"
    }
  ],
  "outputs": [
    {
      "type": "array",
      "items": {
        "type": "string"
      },
      "title": "$filepath_list"
    },
    {
      "type": "array",
      "items": {},
      "title": "$nested_comment_list"
    },
    {
      "type": "string",
      "title": "$raw_pr_diff"
    },
    {
      "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
      "type": "array",
      "items": {
        "type": "string"
      },
      "title": "$file_diff_list",
      "default": []
    }
  ],
  "start_node": {
    "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
  },
  "nodes": [
    {
      "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
    },
    {
      "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
    },
    {
      "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
    },
    {
      "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
    }
  ],
  "control_flow_connections": [
    {
      "component_type": "ControlFlowEdge",
      "id": "a5c123ff-c14c-4291-b174-61d61170f187",
      "name": "retrieve_diff_flowstep_to_generate_comments_flowstep_control_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "from_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      }
    },
    {
      "component_type": "ControlFlowEdge",
      "id": "8a10b23a-2d0c-46c4-82ac-e66ad0b9399b",
      "name": "__StartStep___to_retrieve_diff_flowstep_control_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "from_node": {
        "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      }
    },
    {
      "component_type": "ControlFlowEdge",
      "id": "dac07720-8a5a-4a61-b1e7-50be506ed937",
      "name": "generate_comments_flowstep_to_None End node_control_flow_edge",
      "description": null,
      "metadata": {},
      "from_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "from_branch": null,
      "to_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      }
    }
  ],
  "data_flow_connections": [
    {
      "component_type": "DataFlowEdge",
      "id": "7b12dfed-309b-46ff-8a2d-bb6f2a3154b6",
      "name": "retrieve_diff_flowstep_$file_diff_list_to_generate_comments_flowstep_$file_diff_list_data_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "source_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "source_output": "$file_diff_list",
      "destination_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "destination_input": "$file_diff_list"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "51122844-22d3-40a8-b652-1b020ce24945",
      "name": "__StartStep___$repo_dirpath_io_to_retrieve_diff_flowstep_$repo_dirpath_io_data_flow_edge",
      "description": null,
      "metadata": {
        "__metadata_info__": {}
      },
      "source_node": {
        "$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
      },
      "source_output": "$repo_dirpath_io",
      "destination_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "destination_input": "$repo_dirpath_io"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "72aa469c-98cd-4f0d-9496-0aa454373aef",
      "name": "generate_comments_flowstep_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "source_output": "$filepath_list",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$filepath_list"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "eac1b375-1541-41f7-87f3-f3e626cc2c9c",
      "name": "generate_comments_flowstep_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
      },
      "source_output": "$nested_comment_list",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$nested_comment_list"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "0869acb5-4d8f-4b17-b59b-3b915912b628",
      "name": "retrieve_diff_flowstep_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "source_output": "$raw_pr_diff",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$raw_pr_diff"
    },
    {
      "component_type": "DataFlowEdge",
      "id": "9fb2ab9e-ece1-4195-8f51-ef618dcb72bb",
      "name": "retrieve_diff_flowstep_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
      "description": null,
      "metadata": {},
      "source_node": {
        "$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
      },
      "source_output": "$file_diff_list",
      "destination_node": {
        "$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
      },
      "destination_input": "$file_diff_list"
    }
  ],
  "$referenced_components": {
    "43d58c76-23a0-4d10-943d-f9c5e0835a7c": {
      "component_type": "FlowNode",
      "id": "43d58c76-23a0-4d10-943d-f9c5e0835a7c",
      "name": "generate_comments_flowstep",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "description": "iterated input for the map step",
          "type": "array",
          "items": {
            "description": "\"message\" input variable for the template",
            "title": "message"
          },
          "title": "$file_diff_list"
        }
      ],
      "outputs": [
        {
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$filepath_list"
        },
        {
          "type": "array",
          "items": {},
          "title": "$nested_comment_list"
        }
      ],
      "branches": [
        "next"
      ],
      "subflow": {
        "component_type": "Flow",
        "id": "f95e0e5d-f573-4e25-9d68-8508371246f9",
        "name": "flow_028a7dfb__auto",
        "description": "",
        "metadata": {
          "__metadata_info__": {}
        },
        "inputs": [
          {
            "description": "iterated input for the map step",
            "type": "array",
            "items": {
              "description": "\"message\" input variable for the template",
              "title": "message"
            },
            "title": "$file_diff_list"
          }
        ],
        "outputs": [
          {
            "type": "array",
            "items": {
              "type": "string"
            },
            "title": "$filepath_list"
          },
          {
            "type": "array",
            "items": {},
            "title": "$nested_comment_list"
          }
        ],
        "start_node": {
          "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
        },
        "nodes": [
          {
            "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
          },
          {
            "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
          },
          {
            "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
          }
        ],
        "control_flow_connections": [
          {
            "component_type": "ControlFlowEdge",
            "id": "85a2cdff-6ad4-4f58-8d1c-c8deeb05880c",
            "name": "__StartStep___to_step_0_control_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "from_node": {
              "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            }
          },
          {
            "component_type": "ControlFlowEdge",
            "id": "396e218f-225e-4e36-a33c-a176ca77d345",
            "name": "step_0_to_None End node_control_flow_edge",
            "description": null,
            "metadata": {},
            "from_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
            }
          }
        ],
        "data_flow_connections": [
          {
            "component_type": "DataFlowEdge",
            "id": "6c8b8f78-b587-49ff-a401-6262cdafb0ee",
            "name": "__StartStep___$file_diff_list_to_step_0_$file_diff_list_data_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "source_node": {
              "$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
            },
            "source_output": "$file_diff_list",
            "destination_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "destination_input": "$file_diff_list"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "84d3a783-38c8-4d53-bc0b-4205732d1fbf",
            "name": "step_0_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "source_output": "$filepath_list",
            "destination_node": {
              "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
            },
            "destination_input": "$filepath_list"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "b7ffd4c3-4a03-47f0-95fc-0ba670010729",
            "name": "step_0_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
            },
            "source_output": "$nested_comment_list",
            "destination_node": {
              "$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
            },
            "destination_input": "$nested_comment_list"
          }
        ],
        "$referenced_components": {
          "f127a297-842d-4d17-bc89-4704019458d7": {
            "component_type": "ExtendedMapNode",
            "id": "f127a297-842d-4d17-bc89-4704019458d7",
            "name": "step_0",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "description": "iterated input for the map step",
                "type": "array",
                "items": {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                },
                "title": "$file_diff_list"
              }
            ],
            "outputs": [
              {
                "type": "array",
                "items": {},
                "title": "$nested_comment_list"
              },
              {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$filepath_list"
              }
            ],
            "branches": [
              "next"
            ],
            "input_mapping": {
              "iterated_input": "$file_diff_list"
            },
            "output_mapping": {
              "$extracted_comments": "$nested_comment_list",
              "$filename": "$filepath_list"
            },
            "flow": {
              "component_type": "Flow",
              "id": "3da67cce-b8de-40be-bb8d-e1edead178f0",
              "name": "Generate review comments flow",
              "description": "",
              "metadata": {
                "__metadata_info__": {}
              },
              "inputs": [
                {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                }
              ],
              "outputs": [
                {
                  "description": "The extracted comments content and line number",
                  "type": "array",
                  "items": {
                    "type": "object",
                    "additionalProperties": {},
                    "key_type": {
                      "type": "string"
                    }
                  },
                  "title": "$extracted_comments"
                },
                {
                  "description": "the generated text",
                  "type": "string",
                  "title": "$json_comments"
                },
                {
                  "type": "string",
                  "title": "$diff_with_lines"
                },
                {
                  "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                  "type": "string",
                  "title": "$filename",
                  "default": ""
                },
                {
                  "description": "the message added to the messages list",
                  "type": "string",
                  "title": "$diff_to_string"
                }
              ],
              "start_node": {
                "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
              },
              "nodes": [
                {
                  "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                },
                {
                  "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                },
                {
                  "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                },
                {
                  "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                },
                {
                  "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                },
                {
                  "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
                },
                {
                  "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                }
              ],
              "control_flow_connections": [
                {
                  "component_type": "ControlFlowEdge",
                  "id": "becf6951-96fd-4152-97d0-4a4eff042a29",
                  "name": "format_diff_to_string_to_add_lines_on_diff_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "c197b0d5-8002-4910-ae8d-61f97f1f8f26",
                  "name": "add_lines_on_diff_to_extract_file_path_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "406e0670-cc49-4da4-8d15-8c1c320193e8",
                  "name": "extract_file_path_to_generate_comments_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "e54eb347-2e6c-42c4-a7d6-a42c8059bdf3",
                  "name": "generate_comments_to_extract_comments_from_json_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "ebe5e60b-2724-4b51-b287-79f3e8e7fdd1",
                  "name": "__StartStep___to_format_diff_to_string_control_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "from_node": {
                    "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  }
                },
                {
                  "component_type": "ControlFlowEdge",
                  "id": "98e7631e-7206-4ba9-b5b0-eb308ac89c0f",
                  "name": "extract_comments_from_json_to_None End node_control_flow_edge",
                  "description": null,
                  "metadata": {},
                  "from_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  },
                  "from_branch": null,
                  "to_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  }
                }
              ],
              "data_flow_connections": [
                {
                  "component_type": "DataFlowEdge",
                  "id": "ab8ed6de-3ea7-424e-a830-bca10ac57a32",
                  "name": "format_diff_to_string_$diff_to_string_to_add_lines_on_diff_$diff_to_string_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "source_output": "$diff_to_string",
                  "destination_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "destination_input": "$diff_to_string"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "3caaa171-9b4b-44df-8ebd-4d060329f91a",
                  "name": "format_diff_to_string_$diff_to_string_to_extract_file_path_$diff_to_string_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "source_output": "$diff_to_string",
                  "destination_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "destination_input": "$diff_to_string"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "cdf0945b-5a96-42ff-b410-f7c56b5f8e45",
                  "name": "add_lines_on_diff_$diff_with_lines_to_generate_comments_$diff_with_lines_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "source_output": "$diff_with_lines",
                  "destination_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "destination_input": "$diff_with_lines"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "ca6ed62b-6f6a-405f-9f16-5e1304de6608",
                  "name": "extract_file_path_$filename_to_generate_comments_$filename_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "source_output": "$filename",
                  "destination_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "destination_input": "$filename"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "dec4b4bb-56c9-445a-a282-9d095ff6038e",
                  "name": "generate_comments_$json_comments_to_extract_comments_from_json_$json_comments_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "source_output": "$json_comments",
                  "destination_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  },
                  "destination_input": "$json_comments"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "611478d7-281a-4587-81e6-97e8c745da53",
                  "name": "__StartStep___message_to_format_diff_to_string_message_data_flow_edge",
                  "description": null,
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "source_node": {
                    "$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
                  },
                  "source_output": "message",
                  "destination_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "destination_input": "message"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "227ae098-0baf-4fe8-9615-094bb386c9a9",
                  "name": "extract_comments_from_json_$extracted_comments_to_None End node_$extracted_comments_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
                  },
                  "source_output": "$extracted_comments",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$extracted_comments"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "6e25b4d8-5656-471b-8ffa-1fe8cfffbc05",
                  "name": "generate_comments_$json_comments_to_None End node_$json_comments_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
                  },
                  "source_output": "$json_comments",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$json_comments"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "fdbf1eeb-0278-4dc8-b897-c924937a1692",
                  "name": "add_lines_on_diff_$diff_with_lines_to_None End node_$diff_with_lines_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
                  },
                  "source_output": "$diff_with_lines",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$diff_with_lines"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "3b6bcba7-635b-45fa-b450-cf0a15dae463",
                  "name": "extract_file_path_$filename_to_None End node_$filename_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
                  },
                  "source_output": "$filename",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$filename"
                },
                {
                  "component_type": "DataFlowEdge",
                  "id": "2f95704b-4cc1-4983-8a20-e39c79a94e01",
                  "name": "format_diff_to_string_$diff_to_string_to_None End node_$diff_to_string_data_flow_edge",
                  "description": null,
                  "metadata": {},
                  "source_node": {
                    "$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
                  },
                  "source_output": "$diff_to_string",
                  "destination_node": {
                    "$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
                  },
                  "destination_input": "$diff_to_string"
                }
              ],
              "$referenced_components": {
                "6000ee3f-ac80-4937-b36c-94fd65cdcda4": {
                  "component_type": "ExtendedToolNode",
                  "id": "6000ee3f-ac80-4937-b36c-94fd65cdcda4",
                  "name": "add_lines_on_diff",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "outputs": [
                    {
                      "type": "string",
                      "title": "$diff_with_lines"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "tool": {
                    "component_type": "ServerTool",
                    "id": "e936566f-7a25-40f3-9434-3e740a7bfb02",
                    "name": "format_git_diff",
                    "description": "Formats a git diff by adding line numbers to each line except removal lines.",
                    "metadata": {
                      "__metadata_info__": {}
                    },
                    "inputs": [
                      {
                        "type": "string",
                        "title": "diff_text"
                      }
                    ],
                    "outputs": [
                      {
                        "type": "string",
                        "title": "tool_output"
                      }
                    ]
                  },
                  "input_mapping": {
                    "diff_text": "$diff_to_string"
                  },
                  "output_mapping": {
                    "tool_output": "$diff_with_lines"
                  },
                  "raise_exceptions": false,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "f0fb3ab4-a950-43b6-a583-6f0044f18c7f": {
                  "component_type": "PluginOutputMessageNode",
                  "id": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f",
                  "name": "format_diff_to_string",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "\"message\" input variable for the template",
                      "title": "message"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "the message added to the messages list",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "expose_message_as_output": true,
                  "message": "{{ message | string }}",
                  "input_mapping": {},
                  "output_mapping": {
                    "output_message": "$diff_to_string"
                  },
                  "message_type": "AGENT",
                  "rephrase": false,
                  "llm_config": null,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "6f6dc822-9352-47ae-9b48-173402a334fe": {
                  "component_type": "PluginRegexNode",
                  "id": "6f6dc822-9352-47ae-9b48-173402a334fe",
                  "name": "extract_file_path",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "raw text to extract information from",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                      "type": "string",
                      "title": "$filename",
                      "default": ""
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "input_mapping": {
                    "text": "$diff_to_string"
                  },
                  "output_mapping": {
                    "output": "$filename"
                  },
                  "regex_pattern": "diff --git a/(.+?) b/",
                  "return_first_match_only": true,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "0ce752d7-3ef1-481b-bb01-c7081ef86103": {
                  "component_type": "ExtendedLlmNode",
                  "id": "0ce752d7-3ef1-481b-bb01-c7081ef86103",
                  "name": "generate_comments",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "\"filename\" input variable for the template",
                      "type": "string",
                      "title": "$filename"
                    },
                    {
                      "description": "\"diff\" input variable for the template",
                      "type": "string",
                      "title": "$diff_with_lines"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "the generated text",
                      "type": "string",
                      "title": "$json_comments"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "llm_config": {
                    "component_type": "VllmConfig",
                    "id": "fb043839-1e69-404c-a178-d8c3de0bfe20",
                    "name": "LLAMA_MODEL_ID",
                    "description": null,
                    "metadata": {
                      "__metadata_info__": {}
                    },
                    "default_generation_parameters": null,
                    "url": "LLAMA_API_URL",
                    "model_id": "LLAMA_MODEL_ID"
                  },
                  "prompt_template": "You are a very experienced code reviewer. You are given a git diff on a file: {{ filename }}\n\n## Context\nThe git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.\nAfter the line number, a line that was changed has a \"+\" before the code. All lines without a \"+\" are just here for context, you will not comment on them.\n\n## Input\n### Code diff\n{{ diff }}\n\n## Task\nYour task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.\nThe rules are the following:\n\n\nName: TODO_WITHOUT_TICKET\nDescription: TODO comments should reference a ticket number for tracking.\nExample code:\n```python\n# TODO: Add validation here\ndef process_user_input(data):\n    return data\n```\nExample comment:\n[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., \"TODO: Add validation here (TICKET-1234)\").\n\n\n---\n\n\nName: MUTABLE_DEFAULT_ARGUMENT\nDescription: Using mutable objects as default arguments can lead to unexpected behavior.\nExample code:\n```python\ndef add_item(item, items=[]):\n    items.append(item)\n    return items\n```\nExample comment:\n[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`\n\n\n---\n\n\nName: NON_DESCRIPTIVE_NAME\nDescription: Variable names should clearly indicate their purpose or content.\nExample code:\n```python\ndef process(lst):\n    res = []\n    for i in lst:\n        res.append(i * 2)\n    return res\n```\nExample comment:\n[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'\n\n\n### Reponse Format\nYou need to return a review as a json as follows:\n```json\n[\n    {\n        \"content\": \"the comment as a text\",\n        \"suggestion\": \"if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line\",\n        \"line\": \"line number where the comment applies\"\n    },\n    \u2026\n]\n```\nPlease use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.\nIf there are no comments and the code seems fine, just output an empty JSON list.",
                  "input_mapping": {
                    "diff": "$diff_with_lines",
                    "filename": "$filename"
                  },
                  "output_mapping": {
                    "output": "$json_comments"
                  },
                  "prompt_template_object": null,
                  "send_message": false,
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "48057b9c-bee7-4286-baf5-625b6f1a6f1a": {
                  "component_type": "PluginExtractNode",
                  "id": "48057b9c-bee7-4286-baf5-625b6f1a6f1a",
                  "name": "extract_comments_from_json",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "raw text to extract information from",
                      "type": "string",
                      "title": "$json_comments"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "The extracted comments content and line number",
                      "type": "array",
                      "items": {
                        "type": "object",
                        "additionalProperties": {},
                        "key_type": {
                          "type": "string"
                        }
                      },
                      "title": "$extracted_comments"
                    }
                  ],
                  "branches": [
                    "next"
                  ],
                  "input_mapping": {
                    "text": "$json_comments"
                  },
                  "output_mapping": {
                    "values": "$extracted_comments"
                  },
                  "output_values": {
                    "values": "[.[] | {\"content\": .[\"content\"], \"line\": .[\"line\"]}]"
                  },
                  "component_plugin_name": "NodesPlugin",
                  "component_plugin_version": "25.4.0.dev0"
                },
                "e20f5870-d594-4089-9fcd-08146232910d": {
                  "component_type": "StartNode",
                  "id": "e20f5870-d594-4089-9fcd-08146232910d",
                  "name": "__StartStep__",
                  "description": "",
                  "metadata": {
                    "__metadata_info__": {}
                  },
                  "inputs": [
                    {
                      "description": "\"message\" input variable for the template",
                      "title": "message"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "\"message\" input variable for the template",
                      "title": "message"
                    }
                  ],
                  "branches": [
                    "next"
                  ]
                },
                "39f36227-8910-414c-8b6b-517c0d65b0d8": {
                  "component_type": "EndNode",
                  "id": "39f36227-8910-414c-8b6b-517c0d65b0d8",
                  "name": "None End node",
                  "description": "End node representing all transitions to None in the WayFlow flow",
                  "metadata": {},
                  "inputs": [
                    {
                      "description": "The extracted comments content and line number",
                      "type": "array",
                      "items": {
                        "type": "object",
                        "additionalProperties": {},
                        "key_type": {
                          "type": "string"
                        }
                      },
                      "title": "$extracted_comments"
                    },
                    {
                      "description": "the generated text",
                      "type": "string",
                      "title": "$json_comments"
                    },
                    {
                      "type": "string",
                      "title": "$diff_with_lines"
                    },
                    {
                      "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                      "type": "string",
                      "title": "$filename",
                      "default": ""
                    },
                    {
                      "description": "the message added to the messages list",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "outputs": [
                    {
                      "description": "The extracted comments content and line number",
                      "type": "array",
                      "items": {
                        "type": "object",
                        "additionalProperties": {},
                        "key_type": {
                          "type": "string"
                        }
                      },
                      "title": "$extracted_comments"
                    },
                    {
                      "description": "the generated text",
                      "type": "string",
                      "title": "$json_comments"
                    },
                    {
                      "type": "string",
                      "title": "$diff_with_lines"
                    },
                    {
                      "description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
                      "type": "string",
                      "title": "$filename",
                      "default": ""
                    },
                    {
                      "description": "the message added to the messages list",
                      "type": "string",
                      "title": "$diff_to_string"
                    }
                  ],
                  "branches": [],
                  "branch_name": "next"
                }
              }
            },
            "unpack_input": {
              "message": "."
            },
            "parallel_execution": false,
            "component_plugin_name": "NodesPlugin",
            "component_plugin_version": "25.4.0.dev0"
          },
          "367ae568-317d-42ec-ae70-4c41afe0dbd0": {
            "component_type": "StartNode",
            "id": "367ae568-317d-42ec-ae70-4c41afe0dbd0",
            "name": "__StartStep__",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "description": "iterated input for the map step",
                "type": "array",
                "items": {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                },
                "title": "$file_diff_list"
              }
            ],
            "outputs": [
              {
                "description": "iterated input for the map step",
                "type": "array",
                "items": {
                  "description": "\"message\" input variable for the template",
                  "title": "message"
                },
                "title": "$file_diff_list"
              }
            ],
            "branches": [
              "next"
            ]
          },
          "6f62aecf-03a1-4e38-b551-8eef0efaf4bb": {
            "component_type": "EndNode",
            "id": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb",
            "name": "None End node",
            "description": "End node representing all transitions to None in the WayFlow flow",
            "metadata": {},
            "inputs": [
              {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$filepath_list"
              },
              {
                "type": "array",
                "items": {},
                "title": "$nested_comment_list"
              }
            ],
            "outputs": [
              {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$filepath_list"
              },
              {
                "type": "array",
                "items": {},
                "title": "$nested_comment_list"
              }
            ],
            "branches": [],
            "branch_name": "next"
          }
        }
      }
    },
    "47e367be-4d74-49dc-ac3b-89bb97ffa7df": {
      "component_type": "FlowNode",
      "id": "47e367be-4d74-49dc-ac3b-89bb97ffa7df",
      "name": "retrieve_diff_flowstep",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "type": "string",
          "title": "$repo_dirpath_io"
        }
      ],
      "outputs": [
        {
          "type": "string",
          "title": "$raw_pr_diff"
        },
        {
          "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$file_diff_list",
          "default": []
        }
      ],
      "branches": [
        "next"
      ],
      "subflow": {
        "component_type": "Flow",
        "id": "9e7aed22-876c-4c32-9d44-20ee7ceb3771",
        "name": "Retrieve PR diff flow",
        "description": "",
        "metadata": {
          "__metadata_info__": {}
        },
        "inputs": [
          {
            "type": "string",
            "title": "$repo_dirpath_io"
          }
        ],
        "outputs": [
          {
            "type": "string",
            "title": "$raw_pr_diff"
          },
          {
            "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
            "type": "array",
            "items": {
              "type": "string"
            },
            "title": "$file_diff_list",
            "default": []
          }
        ],
        "start_node": {
          "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
        },
        "nodes": [
          {
            "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
          },
          {
            "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
          },
          {
            "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
          },
          {
            "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
          }
        ],
        "control_flow_connections": [
          {
            "component_type": "ControlFlowEdge",
            "id": "60dc14b8-d9b9-4aec-a958-9f3676848f48",
            "name": "start_step_to_get_pr_diff_control_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "from_node": {
              "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            }
          },
          {
            "component_type": "ControlFlowEdge",
            "id": "500f97de-78b1-42e0-944c-0375dfca734e",
            "name": "get_pr_diff_to_extract_into_list_of_file_diff_control_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "from_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            }
          },
          {
            "component_type": "ControlFlowEdge",
            "id": "22d0cf0d-8edb-4b04-8f54-a234f5705360",
            "name": "extract_into_list_of_file_diff_to_None End node_control_flow_edge",
            "description": null,
            "metadata": {},
            "from_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            },
            "from_branch": null,
            "to_node": {
              "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
            }
          }
        ],
        "data_flow_connections": [
          {
            "component_type": "DataFlowEdge",
            "id": "106e3740-de45-4472-8168-2873ae1dbc82",
            "name": "start_step_$repo_dirpath_io_to_get_pr_diff_$repo_dirpath_io_data_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "source_node": {
              "$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
            },
            "source_output": "$repo_dirpath_io",
            "destination_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "destination_input": "$repo_dirpath_io"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "a32cbb1c-eafe-4138-80e2-2cf2e1248312",
            "name": "get_pr_diff_$raw_pr_diff_to_extract_into_list_of_file_diff_$raw_pr_diff_data_flow_edge",
            "description": null,
            "metadata": {
              "__metadata_info__": {}
            },
            "source_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "source_output": "$raw_pr_diff",
            "destination_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            },
            "destination_input": "$raw_pr_diff"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "3ef5dcf4-acdf-4962-8df6-07b53f249e18",
            "name": "get_pr_diff_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
            },
            "source_output": "$raw_pr_diff",
            "destination_node": {
              "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
            },
            "destination_input": "$raw_pr_diff"
          },
          {
            "component_type": "DataFlowEdge",
            "id": "08cbca39-e591-4cf4-9057-ae67938d9557",
            "name": "extract_into_list_of_file_diff_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
            "description": null,
            "metadata": {},
            "source_node": {
              "$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
            },
            "source_output": "$file_diff_list",
            "destination_node": {
              "$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
            },
            "destination_input": "$file_diff_list"
          }
        ],
        "$referenced_components": {
          "5c73da9c-6ba9-44ce-aab1-212a78d0a720": {
            "component_type": "ExtendedToolNode",
            "id": "5c73da9c-6ba9-44ce-aab1-212a78d0a720",
            "name": "get_pr_diff",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "type": "string",
                "title": "$repo_dirpath_io"
              }
            ],
            "outputs": [
              {
                "type": "string",
                "title": "$raw_pr_diff"
              }
            ],
            "branches": [
              "next"
            ],
            "tool": {
              "component_type": "ServerTool",
              "id": "275aaf19-cdd4-4ed7-a436-e53f922cd740",
              "name": "local_get_pr_diff_tool",
              "description": "# docs-skiprow\nRetrieves code diff with a git command given the  # docs-skiprow\npath to the repository root folder.  # docs-skiprow",
              "metadata": {
                "__metadata_info__": {}
              },
              "inputs": [
                {
                  "type": "string",
                  "title": "repo_dirpath"
                }
              ],
              "outputs": [
                {
                  "type": "string",
                  "title": "tool_output"
                }
              ]
            },
            "input_mapping": {
              "repo_dirpath": "$repo_dirpath_io"
            },
            "output_mapping": {
              "tool_output": "$raw_pr_diff"
            },
            "raise_exceptions": true,
            "component_plugin_name": "NodesPlugin",
            "component_plugin_version": "25.4.0.dev0"
          },
          "4fcb7ebe-325b-446d-a46b-59187c30e260": {
            "component_type": "StartNode",
            "id": "4fcb7ebe-325b-446d-a46b-59187c30e260",
            "name": "start_step",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "type": "string",
                "title": "$repo_dirpath_io"
              }
            ],
            "outputs": [
              {
                "type": "string",
                "title": "$repo_dirpath_io"
              }
            ],
            "branches": [
              "next"
            ]
          },
          "cf841053-2414-48b6-ba6d-0f0f5e11044c": {
            "component_type": "PluginRegexNode",
            "id": "cf841053-2414-48b6-ba6d-0f0f5e11044c",
            "name": "extract_into_list_of_file_diff",
            "description": "",
            "metadata": {
              "__metadata_info__": {}
            },
            "inputs": [
              {
                "description": "raw text to extract information from",
                "type": "string",
                "title": "$raw_pr_diff"
              }
            ],
            "outputs": [
              {
                "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$file_diff_list",
                "default": []
              }
            ],
            "branches": [
              "next"
            ],
            "input_mapping": {
              "text": "$raw_pr_diff"
            },
            "output_mapping": {
              "output": "$file_diff_list"
            },
            "regex_pattern": "(diff --git[\\s\\S]*?)(?=diff --git|$)",
            "return_first_match_only": false,
            "component_plugin_name": "NodesPlugin",
            "component_plugin_version": "25.4.0.dev0"
          },
          "dd0e56ab-1267-4345-9f59-ecc053baf2af": {
            "component_type": "EndNode",
            "id": "dd0e56ab-1267-4345-9f59-ecc053baf2af",
            "name": "None End node",
            "description": "End node representing all transitions to None in the WayFlow flow",
            "metadata": {},
            "inputs": [
              {
                "type": "string",
                "title": "$raw_pr_diff"
              },
              {
                "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$file_diff_list",
                "default": []
              }
            ],
            "outputs": [
              {
                "type": "string",
                "title": "$raw_pr_diff"
              },
              {
                "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
                "type": "array",
                "items": {
                  "type": "string"
                },
                "title": "$file_diff_list",
                "default": []
              }
            ],
            "branches": [],
            "branch_name": "next"
          }
        }
      }
    },
    "020c885e-6d0b-472a-bb91-246ab70ab1db": {
      "component_type": "StartNode",
      "id": "020c885e-6d0b-472a-bb91-246ab70ab1db",
      "name": "__StartStep__",
      "description": "",
      "metadata": {
        "__metadata_info__": {}
      },
      "inputs": [
        {
          "type": "string",
          "title": "$repo_dirpath_io"
        }
      ],
      "outputs": [
        {
          "type": "string",
          "title": "$repo_dirpath_io"
        }
      ],
      "branches": [
        "next"
      ]
    },
    "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93": {
      "component_type": "EndNode",
      "id": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93",
      "name": "None End node",
      "description": "End node representing all transitions to None in the WayFlow flow",
      "metadata": {},
      "inputs": [
        {
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$filepath_list"
        },
        {
          "type": "array",
          "items": {},
          "title": "$nested_comment_list"
        },
        {
          "type": "string",
          "title": "$raw_pr_diff"
        },
        {
          "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$file_diff_list",
          "default": []
        }
      ],
      "outputs": [
        {
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$filepath_list"
        },
        {
          "type": "array",
          "items": {},
          "title": "$nested_comment_list"
        },
        {
          "type": "string",
          "title": "$raw_pr_diff"
        },
        {
          "description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
          "type": "array",
          "items": {
            "type": "string"
          },
          "title": "$file_diff_list",
          "default": []
        }
      ],
      "branches": [],
      "branch_name": "next"
    }
  },
  "agentspec_version": "25.4.1"
}

You can then load the configuration back to an assistant using the AgentSpecLoader.

from wayflowcore.agentspec import AgentSpecLoader

tool_registry = {
    "local_get_pr_diff_tool": local_get_pr_diff_tool,
    "format_git_diff": format_git_diff,
}

assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)

Note

This guide uses the following extension/plugin Agent Spec components:

  • PluginOutputMessageNode

  • PluginExtractNode

  • PluginRegexNode

  • ExtendedLlmNode

  • ExtendedToolNode

  • ExtendedMapNode

See the list of available Agent Spec extension/plugin components in the API Reference

Recap#

In this tutorial you learned how to build a simple PR bot using WayFlow Flows, and learned:

Finally, you learned how to structure code when building assistant as code and how to execute and combine sub flows to build complex assistant.

This is an example of the kind of fully featured tool that you can build with WayFlow.

Next Steps#

Now that you learned how to build a PR reviewing assistant, you may want to check our other guides such as:

Full Code#

Click on the card at the top of this page to download the full code for this guide or copy the code below.

  1# Copyright © 2025 Oracle and/or its affiliates.
  2#
  3# This software is under the Universal Permissive License
  4# %%[markdown]
  5# Tutorial - Build a Simple Code Review Assistant
  6# -----------------------------------------------
  7
  8# How to use:
  9# Create a new Python virtual environment and install the latest WayFlow version.
 10# ```bash
 11# python -m venv venv-wayflowcore
 12# source venv-wayflowcore/bin/activate
 13# pip install --upgrade pip
 14# pip install "wayflowcore==26.1" 
 15# ```
 16
 17# You can now run the script
 18# 1. As a Python file:
 19# ```bash
 20# python usecase_prbot.py
 21# ```
 22# 2. As a Notebook (in VSCode):
 23# When viewing the file,
 24#  - press the keys Ctrl + Enter to run the selected cell
 25#  - or Shift + Enter to run the selected cell and move to the cell below# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl) or Apache License
 26# 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0), at your option.
 27
 28# nosec
 29
 30
 31from types import MethodType
 32from typing import Dict, List
 33
 34
 35# %%[markdown]
 36## Define the LLM
 37
 38# %%
 39from wayflowcore.models import VllmModel
 40
 41llm = VllmModel(
 42    model_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
 43    host_port="VLLM_HOST_PORT",
 44)
 45
 46# %%[markdown]
 47## Define the tool that retrieves the PR diff
 48
 49# %%
 50from wayflowcore.tools import tool
 51
 52
 53@tool(description_mode="only_docstring")
 54def local_get_pr_diff_tool(repo_dirpath: str) -> str:
 55    """
 56    Retrieves code diff with a git command given the
 57    path to the repository root folder.
 58    """
 59    import subprocess
 60
 61    result = subprocess.run(
 62        ["git", "diff", "HEAD"],
 63        capture_output=True,
 64        cwd=repo_dirpath,
 65        text=True,
 66    )
 67    return result.stdout.strip()
 68
 69
 70# %%[markdown]
 71## Define a mocked PR diff
 72
 73# %%
 74MOCK_DIFF = """
 75diff --git src://calculators/utils.py dst://calculators/utils.py
 76index 12345678..90123456 100644
 77--- src://calculators/utils.py
 78+++ dst://calculators/utils.py
 79@@ -10,6 +10,15 @@
 80
 81 def calculate_total(data):
 82     # TODO: implement tax calculation
 83     return data
 84
 85+def get_items(items=[]):
 86+    result = []
 87+    for item in items:
 88+        result.append(item * 2)
 89+    return result
 90+
 91+def process_numbers(numbers):
 92+    res = []
 93+    for x in numbers:
 94+        res.append(x + 1)
 95+    return res
 96+
 97 def calculate_average(numbers):
 98     return sum(numbers) / len(numbers)
 99
100
101diff --git src://example/utils.py dst://example/utils.py
102index 000000000..123456789
103--- /dev/null
104+++ dst://example/utils.py
105@@ -0,0 +1,20 @@
106+# Copyright © 2024 Oracle and/or its affiliates.
107+
108+def calculate_sum(numbers=[]):
109+    total = 0
110+    for num in numbers:
111+        total += num
112+    return total
113+
114+
115+def process_data(data):
116+    # TODO: Handle exceptions here
117+    result = data * 2
118+    return result
119+
120+
121+def main():
122+    numbers = [1, 2, 3, 4, 5]
123+    result = calculate_sum(numbers)
124+    print("Sum:", result)
125+    data = 10
126+    processed_data = process_data(data)
127+    print("Processed Data:", processed_data)
128+
129+
130+if __name__ == "__main__":
131+    main()
132""".strip()
133
134
135
136# %%[markdown]
137## Create the flow that retrieves the diff of a PR
138
139# %%
140from wayflowcore.controlconnection import ControlFlowEdge
141from wayflowcore.dataconnection import DataFlowEdge
142from wayflowcore.flow import Flow
143from wayflowcore.property import StringProperty
144from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
145
146# IO Variable Names
147REPO_DIRPATH_IO = "$repo_dirpath_io"
148PR_DIFF_IO = "$raw_pr_diff"
149FILE_DIFF_LIST_IO = "$file_diff_list"
150
151# Define the steps
152
153start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
154
155# Step 1: Retrieve the pull request diff using the local tool
156get_pr_diff_step = ToolExecutionStep(
157    name="get_pr_diff",
158    tool=local_get_pr_diff_tool,
159    raise_exceptions=True,
160    input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
161    output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
162)
163
164# Step 2: Extract the file diffs from the raw diff using a regular expression
165extract_into_list_of_file_diff_step = RegexExtractionStep(
166    name="extract_into_list_of_file_diff",
167    regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
168    return_first_match_only=False,
169    input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
170    output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
171)
172
173# Define the sub flow
174retrieve_diff_subflow = Flow(
175    name="Retrieve PR diff flow",
176    begin_step=start_step,
177    control_flow_edges=[
178        ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
179        ControlFlowEdge(
180            source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
181        ),
182        ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
183    ],
184    data_flow_edges=[
185        DataFlowEdge(
186            source_step=start_step,
187            source_output=REPO_DIRPATH_IO,
188            destination_step=get_pr_diff_step,
189            destination_input=REPO_DIRPATH_IO,
190        ),
191        DataFlowEdge(
192            source_step=get_pr_diff_step,
193            source_output=PR_DIFF_IO,
194            destination_step=extract_into_list_of_file_diff_step,
195            destination_input=PR_DIFF_IO,
196        ),
197    ],
198)
199
200
201# %%[markdown]
202## Alternative step that retrieves the PR diff through an API call
203
204# %%
205from wayflowcore.steps import ApiCallStep
206
207# IO Variable Names
208USER_PROVIDED_TOKEN_IO = "$user_provided_token"
209REPO_WORKSPACE_IO = "$repo_workspace"
210REPO_SLUG_IO = "$repo_slug"
211PULL_REQUEST_ID_IO = "$pull_request_id"
212PR_DIFF_IO = "$raw_pr_diff"
213
214get_pr_diff_step = ApiCallStep(
215    url="https://example.com/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}.diff",
216    method="GET",
217    headers={"Authorization": "Bearer {{token}}"},
218    ignore_bad_http_requests=False,
219    num_retry_on_bad_http_request=3,
220    store_response=True,
221    input_mapping={
222        "token": USER_PROVIDED_TOKEN_IO,
223        "workspace": REPO_WORKSPACE_IO,
224        "repo_slug": REPO_SLUG_IO,
225        "pr_id": PULL_REQUEST_ID_IO,
226    },
227    output_mapping={ApiCallStep.HTTP_RESPONSE: PR_DIFF_IO},
228)
229
230
231# %%[markdown]
232## Test the flow that retrieves the PR diff
233
234# %%
235from wayflowcore.executors.executionstatus import FinishedStatus
236
237# Replace the path below with the path to your actual codebase sample git repository.
238PATH_TO_DIR = "path/to/repository_root"
239
240test_conversation = retrieve_diff_subflow.start_conversation(
241    inputs={
242        REPO_DIRPATH_IO: PATH_TO_DIR,
243    }
244)
245
246execution_status = test_conversation.execute()
247
248if not isinstance(execution_status, FinishedStatus):
249    raise ValueError("Unexpected status type")
250
251FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
252
253print(FILE_DIFF_LIST[0])
254
255
256# %%[markdown]
257## Define the tool that formats the diff for the LLM
258
259# %%
260PR_BOT_CHECKS = [
261    """
262Name: TODO_WITHOUT_TICKET
263Description: TODO comments should reference a ticket number for tracking.
264Example code:
265```python
266# TODO: Add validation here
267def process_user_input(data):
268    return data
269```
270Example comment:
271[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
272""",
273    """
274Name: MUTABLE_DEFAULT_ARGUMENT
275Description: Using mutable objects as default arguments can lead to unexpected behavior.
276Example code:
277```python
278def add_item(item, items=[]):
279    items.append(item)
280    return items
281```
282Example comment:
283[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
284""",
285    """
286Name: NON_DESCRIPTIVE_NAME
287Description: Variable names should clearly indicate their purpose or content.
288Example code:
289```python
290def process(lst):
291    res = []
292    for i in lst:
293        res.append(i * 2)
294    return res
295```
296Example comment:
297[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
298""",
299]
300
301CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
302
303PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
304
305## Context
306The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
307After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
308
309## Input
310### Code diff
311{{diff}}
312
313## Task
314Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
315The rules are the following:
316
317{{checks}}
318
319### Reponse Format
320You need to return a review as a json as follows:
321```json
322[
323    {
324        "content": "the comment as a text",
325        "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
326        "line": "line number where the comment applies"
327    },
328
329]
330```
331Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
332If there are no comments and the code seems fine, just output an empty JSON list."""
333
334
335@tool(description_mode="only_docstring")
336def format_git_diff(diff_text: str) -> str:
337    """
338    Formats a git diff by adding line numbers to each line except removal lines.
339    """
340
341    def pad_number(number: int, width: int) -> str:
342        """Right-align a number with specified width using space padding."""
343        return str(number).rjust(width)
344
345    LINE_NUMBER_WIDTH = 5
346    PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
347    current_line_number = 0
348    formatted_lines = []
349
350    for line in diff_text.split("\n"):
351        # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
352        if line.startswith("@@"):
353            try:
354                # Extract the starting line number and line count
355                _, position_info, _ = line.split("@@")
356                new_file_info = position_info.split()[1][1:]  # Remove the '+' prefix
357                start_line, line_count = map(int, new_file_info.split(","))
358
359                current_line_number = start_line
360                formatted_lines.append(line)
361                continue
362
363            except (ValueError, IndexError):
364                raise ValueError(f"Invalid diff header format: {line}")
365
366        # Handle content lines
367        if current_line_number > 0 and line:
368            if not line.startswith("-"):
369                # Add line number for added/context lines
370                line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
371                formatted_lines.append(f"{line_prefix} {line}")
372                current_line_number += 1
373            else:
374                # Just add padding for removal lines
375                formatted_lines.append(" " * PADDING_WIDTH + line)
376
377    return "\n".join(formatted_lines)
378
379
380# %%[markdown]
381## Create the flow that generates review comments
382
383# %%
384from wayflowcore._utils._templating_helpers import render_template_partially
385from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
386from wayflowcore.steps import (
387    ExtractValueFromJsonStep,
388    MapStep,
389    OutputMessageStep,
390    PromptExecutionStep,
391    ToolExecutionStep,
392)
393
394# IO Variable Names
395DIFF_TO_STRING_IO = "$diff_to_string"
396DIFF_WITH_LINES_IO = "$diff_with_lines"
397FILEPATH_IO = "$filename"
398JSON_COMMENTS_IO = "$json_comments"
399EXTRACTED_COMMENTS_IO = "$extracted_comments"
400NESTED_COMMENT_LIST_IO = "$nested_comment_list"
401FILEPATH_LIST_IO = "$filepath_list"
402
403# Define the steps
404
405# Step 1: Format the diff to a string
406format_diff_to_string_step = OutputMessageStep(
407    name="format_diff_to_string",
408    message_template="{{ message | string }}",
409    output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
410)
411
412# Step 2: Add lines on the diff using a tool
413add_lines_on_diff_step = ToolExecutionStep(
414    name="add_lines_on_diff",
415    tool=format_git_diff,
416    input_mapping={"diff_text": DIFF_TO_STRING_IO},
417    output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
418)
419
420# Step 3: Extract the file path from the diff string using a regular expression
421extract_file_path_step = RegexExtractionStep(
422    name="extract_file_path",
423    regex_pattern=r"diff --git a/(.+?) b/",
424    return_first_match_only=True,
425    input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
426    output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
427)
428
429# Step 4: Generate comments using a prompt
430generate_comments_step = PromptExecutionStep(
431    name="generate_comments",
432    prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
433    llm=llm,
434    input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
435    output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
436)
437
438# Step 5: Extract comments from the JSON output
439# Define the value type for extracted comments
440comments_valuetype = ListProperty(
441    name="values",
442    description="The extracted comments content and line number",
443    item_type=DictProperty(value_type=AnyProperty()),
444)
445extract_comments_from_json_step = ExtractValueFromJsonStep(
446    name="extract_comments_from_json",
447    output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
448    retry=True,
449    llm=llm,
450    input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
451    output_mapping={"values": EXTRACTED_COMMENTS_IO},
452)
453
454# Define the sub flow to generate comments for each file diff
455generate_comments_subflow = Flow(
456    name="Generate review comments flow",
457    begin_step=format_diff_to_string_step,
458    control_flow_edges=[
459        ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
460        ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
461        ControlFlowEdge(extract_file_path_step, generate_comments_step),
462        ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
463        ControlFlowEdge(extract_comments_from_json_step, None),
464    ],
465    data_flow_edges=[
466        DataFlowEdge(
467            format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
468        ),
469        DataFlowEdge(
470            format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
471        ),
472        DataFlowEdge(
473            add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
474        ),
475        DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
476        DataFlowEdge(
477            generate_comments_step,
478            JSON_COMMENTS_IO,
479            extract_comments_from_json_step,
480            JSON_COMMENTS_IO,
481        ),
482    ],
483)
484
485# Use the MapStep to apply the sub flow to each file
486for_each_file_step = MapStep(
487    flow=generate_comments_subflow,
488    unpack_input={"message": "."},
489    input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
490    output_descriptors=[
491        ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
492        ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
493    ],
494    output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
495)
496
497generate_all_comments_subflow = Flow.from_steps([for_each_file_step])
498
499
500# %%[markdown]
501## Test the flow that generates review comments
502
503# %%
504# we reuse the FILE_DIFF_LIST from the previous test
505test_conversation = generate_all_comments_subflow.start_conversation(
506    inputs={
507        FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
508    }
509)
510
511execution_status = test_conversation.execute()
512
513if not isinstance(execution_status, FinishedStatus):
514    raise ValueError("Unexpected status type")
515
516NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
517FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
518print(NESTED_COMMENT_LIST[0])
519print(FILEPATH_LIST)
520
521
522
523# %%[markdown]
524## Create tool that formats the review comments
525
526# %%
527@tool(description_mode="only_docstring")
528def flatten_information(
529    nested_comments_list: List[List[Dict[str, str]]], filepath_list: List[str]
530) -> List[Dict[str, str]]:
531    """Flattens information from comments and filepaths."""
532    if len(nested_comments_list) != len(filepath_list):
533        raise ValueError(
534            f"Inconsistent list lengths ({len(nested_comments_list)=} and {len(filepath_list)=})"
535        )
536
537    result: List[Dict[str, str]] = []
538    for comments_list, filepath in zip(nested_comments_list, filepath_list):
539        for comment_dict in comments_list:
540            result.append(
541                {
542                    **{key: str(value) for key, value in comment_dict.items()},
543                    "path": filepath,
544                }
545            )
546
547    return result
548
549
550# %%[markdown]
551## Create flow that posts review comments to bitbucket
552
553# %%
554import json
555
556# IO Values
557PR_POST_URL_IO = "$pr_post_url"
558FLATTENED_COMMENT_LIST_IO = "$flattened_comment_list"
559FINAL_HTTP_CODES_IO = "$http_codes"
560
561# Define the steps
562
563# Step 1: Flatten the generated comments into a list of comments
564flatten_nested_comments_list_step = ToolExecutionStep(
565    name="flatten_nested_comment_list",
566    tool=flatten_information,
567    input_mapping={
568        "nested_comments_list": NESTED_COMMENT_LIST_IO,
569        "filepath_list": FILEPATH_LIST_IO,
570    },
571    output_mapping={ToolExecutionStep.TOOL_OUTPUT: FLATTENED_COMMENT_LIST_IO},
572)
573
574# Step 2: Post the comments to bitbucket
575post_comment_step = ApiCallStep(
576    url="https://example.com/rest/api/latest/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}/comments?diffType=EFFECTIVE&markup=true&avatarSize=48",
577    method="POST",
578    json_body=json.dumps(
579        {
580            "text": "{{content}}",
581            "severity": "NORMAL",
582            "anchor": {
583                "diffType": "EFFECTIVE",
584                "path": "{{path}}",
585                "lineType": "ADDED",
586                "line": "{{line | int}}",
587                "fileType": "TO",
588            },
589        }
590    ),
591    headers={"Accept": "application/json", "Authorization": "Bearer {{token}}"},
592    ignore_bad_http_requests=False,
593    num_retry_on_bad_http_request=3,
594    store_response=True,
595    input_mapping={
596        "token": USER_PROVIDED_TOKEN_IO,
597        "workspace": REPO_WORKSPACE_IO,
598        "repo_slug": REPO_SLUG_IO,
599        "pr_id": PULL_REQUEST_ID_IO,
600    },
601)
602
603post_comments_mapstep = MapStep(
604    name="post_comment",
605    flow=Flow.from_steps([post_comment_step]),
606    unpack_input={"content": ".content", "line": ".line", "path": ".path"},
607    input_mapping={MapStep.ITERATED_INPUT: FLATTENED_COMMENT_LIST_IO},
608    output_descriptors=[ApiCallStep.HTTP_STATUS_CODE],
609    output_mapping={ApiCallStep.HTTP_STATUS_CODE: FINAL_HTTP_CODES_IO},
610)
611
612post_comments_subflow = Flow(
613    name="Post comments to PR flow",
614    begin_step=flatten_nested_comments_list_step,
615    control_flow_edges=[
616        ControlFlowEdge(flatten_nested_comments_list_step, post_comments_mapstep),
617        ControlFlowEdge(post_comments_mapstep, None),
618    ],
619    data_flow_edges=[
620        DataFlowEdge(
621            flatten_nested_comments_list_step,
622            FLATTENED_COMMENT_LIST_IO,
623            post_comments_mapstep,
624            FLATTENED_COMMENT_LIST_IO,
625        )
626    ],
627)
628from wayflowcore.steps.step import StepResult
629
630
631async def _mock_api_post_step_invoke(self, inputs, conversation):
632    output_values = {ApiCallStep.HTTP_RESPONSE: MOCK_DIFF, ApiCallStep.HTTP_STATUS_CODE: 200}
633    return StepResult(
634        outputs=output_values,
635    )
636
637
638post_comment_step.invoke_async = MethodType(_mock_api_post_step_invoke, post_comment_step)
639
640
641# %%[markdown]
642## Test flow that posts review comments
643
644# %%
645# we reuse the NESTED_COMMENT_LIST and FILEPATH_LIST from the previous test
646
647test_conversation = post_comments_subflow.start_conversation(
648    inputs={
649        USER_PROVIDED_TOKEN_IO: "MY_TOKEN",
650        REPO_WORKSPACE_IO: "MY_REPO_WORKSPACE",
651        REPO_SLUG_IO: "MY_REPO_SLUG",
652        PULL_REQUEST_ID_IO: "MY_REPO_ID",
653        NESTED_COMMENT_LIST_IO: NESTED_COMMENT_LIST,
654        FILEPATH_LIST_IO: FILEPATH_LIST,
655    }
656)
657execution_status = test_conversation.execute()
658
659if not isinstance(execution_status, FinishedStatus):
660    raise ValueError("Unexpected status type")
661
662FINAL_HTTP_CODES = execution_status.output_values[FINAL_HTTP_CODES_IO]
663print(FINAL_HTTP_CODES)
664
665
666# %%[markdown]
667## Create flow that performs the review
668
669# %%
670from wayflowcore.steps import FlowExecutionStep
671
672
673# Steps
674retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
675generate_all_comments_flowstep = FlowExecutionStep(
676    name="generate_comments_flowstep",
677    flow=generate_all_comments_subflow,
678)
679
680pr_bot = Flow(
681    name="PR bot flow",
682    begin_step=retrieve_diff_flowstep,
683    control_flow_edges=[
684        ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
685        ControlFlowEdge(generate_all_comments_flowstep, None),
686    ],
687    data_flow_edges=[
688        DataFlowEdge(
689            retrieve_diff_flowstep,
690            FILE_DIFF_LIST_IO,
691            generate_all_comments_flowstep,
692            FILE_DIFF_LIST_IO,
693        )
694    ],
695)
696
697
698# %%[markdown]
699## Tests flow that performs the review
700
701# %%
702# Replace the path below with the path to your actual codebase sample git repository.
703PATH_TO_DIR = "path/to/repository_root"
704
705conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
706
707execution_status = conversation.execute()
708
709if not isinstance(execution_status, FinishedStatus):
710    raise ValueError("Unexpected status type")
711
712print(execution_status.output_values)
713
714NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
715
716
717# %%[markdown]
718## Export config to Agent Spec
719
720# %%
721from wayflowcore.agentspec import AgentSpecExporter
722
723serialized_assistant = AgentSpecExporter().to_json(pr_bot)
724
725
726# %%[markdown]
727## Load Agent Spec config
728
729# %%
730from wayflowcore.agentspec import AgentSpecLoader
731
732tool_registry = {
733    "local_get_pr_diff_tool": local_get_pr_diff_tool,
734    "format_git_diff": format_git_diff,
735}
736
737assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)