Build a Simple Code Review Assistant#
Prerequisites
This guide does not assume any prior knowledge about Project WayFlow. However, it assumes the reader has a basic knowledge of LLMs.
You will need a working installation of WayFlow - see Installation.
Learning goals#
In this use-case tutorial, you will build a more advanced WayFlow application, a Pull Request (PR) Reviewing Assistant, using a WayFlow Flow to automate basic reviews of Python source code.
In this tutorial you will:
Learn the basics of using Flows to build an assistant.
Learn how to compose multiple sub-flows to create a more complex Flow.
Learn more about building Tools that can be used within your Flows.
You can download a Jupyter Notebook for this use-case to follow along from Code PR Review Bot Tutorial.
Introduction to the task#
Code reviews are crucial for maintaining code quality and reviewers often spend considerable time pointing out routine issues such as the presence of debug statements, formatting inconsistencies, or common coding convention violations that may not be fully captured by static code analysis tools. This consumes valuable time that could be spent on reviewing more important things such as the core logic, architecture, and business requirements.
Note
Building an agent with WayFlow to perform such code reviews has a number of advantages:
Review rules can be written using natural language, making an agent much more flexible than a simple static checker.
Writing rules in natural language makes updating the rules very easy.
More general issues can be captured. You can allow the LLM to infer from the rule to more general cases that could be missed by a simple static checker.
New review rules can be generated from the collected comments of existing PRs.
In this tutorial, you will create a WayFlow Flow assistant designed to scan Python pull requests for common oversights such as:
Having TODO comments without associated tickets.
Using unclear or ambiguous variable naming.
Using risky Python code practices such as mutable defaults.
To build this assistant you will break the task into configuration and two sub-flows that will be composed into a single flow:
Configure your application, choose an LLM and import required modules [Part 1].
The first sub-flow retrieves and diffs information from a local codebase in a Git repository [Part 2].
The second sub-flow iterates over the file diffs using a MapStep and generates comments with an LLM using the PromptExecutionStep [Step 3].
You will also learn how to extract information using the RegexExtractionStep and the ExtractValueFromJsonStep, and how to build and execute tools with the ServerTool and the ToolExecutionStep.
Note
This is not a production-ready code review assistant that can be used as-is.
Setup#
First, let’s set up the environment. For this tutorial you need to have wayflowcore installed (for additional information please read the
installation guide).
Next download the example codebase Git repository, example codebase Git repository. This will be used
to generate the sample code diffs for the assistant to review.
Extract the codebase Git repository folder from the compressed archive. Make a note of where the codebase Git repository is extracted to.
Part 1: Imports and LLM configuration#
First, set up the environment. For this tutorial you need to have wayflowcore installed, for additional information, read the
installation guide.
WayFlow supports several LLMs API providers. To learn more about the supported LLM providers, read the guide, how to use LLMs from different providers.
First choose an LLM from one of the options below:
from wayflowcore.models import OCIGenAIModel, OCIClientConfigWithApiKey
llm = OCIGenAIModel(
model_id="provider.model-id",
compartment_id="compartment-id",
client_config=OCIClientConfigWithApiKey(
service_endpoint="https://url-to-service-endpoint.com",
),
)
from wayflowcore.models import VllmModel
llm = VllmModel(
model_id="model-id",
host_port="VLLM_HOST_PORT",
)
from wayflowcore.models import OllamaModel
llm = OllamaModel(
model_id="model-id",
)
Note
API keys should never be stored in code. Use environment variables and/or tools such as python-dotenv instead.
Be cautious when using external LLM providers and ensure that you comply with your organization’s security policies and any applicable laws and regulations. Consider using a self-hosted LLM solution or a provider that offers on-premises deployment options if you need to maintain strict control over your code and data.
Part 2: Retrieve the PR diff information#
The first phase of the assistant requires retrieving information about the code diffs from a code repository. You have already extracted the sample codebase Git repository to your local environment.
This will be a sub-flow that consists of two simple steps:
ToolExecutionStep that collects PR diff information using a Python subprocess to run the Git command.
RegexExtractionStep which separates the raw diff information into diffs for each file.
First, take a look at what a diff looks like. The following example shows how a real diff appears when using Git:
MOCK_DIFF = """
diff --git src://calculators/utils.py dst://calculators/utils.py
index 12345678..90123456 100644
--- src://calculators/utils.py
+++ dst://calculators/utils.py
@@ -10,6 +10,15 @@
def calculate_total(data):
# TODO: implement tax calculation
return data
+def get_items(items=[]):
+ result = []
+ for item in items:
+ result.append(item * 2)
+ return result
+
+def process_numbers(numbers):
+ res = []
+ for x in numbers:
+ res.append(x + 1)
+ return res
+
def calculate_average(numbers):
return sum(numbers) / len(numbers)
diff --git src://example/utils.py dst://example/utils.py
index 000000000..123456789
--- /dev/null
+++ dst://example/utils.py
@@ -0,0 +1,20 @@
+# Copyright © 2024 Oracle and/or its affiliates.
+
+def calculate_sum(numbers=[]):
+ total = 0
+ for num in numbers:
+ total += num
+ return total
+
+
+def process_data(data):
+ # TODO: Handle exceptions here
+ result = data * 2
+ return result
+
+
+def main():
+ numbers = [1, 2, 3, 4, 5]
+ result = calculate_sum(numbers)
+ print("Sum:", result)
+ data = 10
+ processed_data = process_data(data)
+ print("Processed Data:", processed_data)
+
+
+if __name__ == "__main__":
+ main()
""".strip()
Reading a diff: Removals are identified by the “-” marks and additions by the “+” marks. In this example, there were only additions.
The diff above contains information about two files, calculators/utils.py and example/utils.py.
This is an example diff and it is different from the diff that will be generated from the sample codebase.
It is included here to show how a Git diff looks and is shorter than the diff that you generate from the sample codebase.
Build a tool#
You need to create a tool to extract a code diff from the local code repository. The @tool decorator can be used for that purpose by simply wrapping a Python function.
The function, local_get_pr_diff_tool, in the code below does the work of extracting the diffs by
running the git diff HEAD shell command and capturing the output. It uses a subprocess to run the shell command.
To turn this function into a WayFlow tool, a @tool annotation is used to create a ServerTool from the function.
1from wayflowcore.tools import tool
2
3
4@tool(description_mode="only_docstring")
5def local_get_pr_diff_tool(repo_dirpath: str) -> str:
6 """
7 Retrieves code diff with a git command given the
8 path to the repository root folder.
9 """
10 import subprocess # nosec: documentation example invoking git locally
11
12 result = subprocess.run(
13 ["git", "diff", "HEAD"],
14 capture_output=True,
15 cwd=repo_dirpath,
16 text=True,
17 ) # nosec: documentation example invoking git locally
18 return result.stdout.strip()
Building the steps and the sub-flow#
Let’s write the code for the first sub-flow.
1from wayflowcore.controlconnection import ControlFlowEdge
2from wayflowcore.dataconnection import DataFlowEdge
3from wayflowcore.flow import Flow
4from wayflowcore.property import StringProperty
5from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
6
7# IO Variable Names
8REPO_DIRPATH_IO = "$repo_dirpath_io"
9PR_DIFF_IO = "$raw_pr_diff"
10FILE_DIFF_LIST_IO = "$file_diff_list"
11
12# Define the steps
13
14start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
15
16# Step 1: Retrieve the pull request diff using the local tool
17get_pr_diff_step = ToolExecutionStep(
18 name="get_pr_diff",
19 tool=local_get_pr_diff_tool,
20 raise_exceptions=True,
21 input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
22 output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
23)
24
25# Step 2: Extract the file diffs from the raw diff using a regular expression
26extract_into_list_of_file_diff_step = RegexExtractionStep(
27 name="extract_into_list_of_file_diff",
28 regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
29 return_first_match_only=False,
30 input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
31 output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
32)
33
34# Define the sub flow
35retrieve_diff_subflow = Flow(
36 name="Retrieve PR diff flow",
37 begin_step=start_step,
38 control_flow_edges=[
39 ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
40 ControlFlowEdge(
41 source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
42 ),
43 ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
44 ],
45 data_flow_edges=[
46 DataFlowEdge(
47 source_step=start_step,
48 source_output=REPO_DIRPATH_IO,
49 destination_step=get_pr_diff_step,
50 destination_input=REPO_DIRPATH_IO,
51 ),
52 DataFlowEdge(
53 source_step=get_pr_diff_step,
54 source_output=PR_DIFF_IO,
55 destination_step=extract_into_list_of_file_diff_step,
56 destination_input=PR_DIFF_IO,
57 ),
58 ],
59)
API Reference: Flow | RegexExtractionStep | ToolExecutionStep | API Reference: tool
The code does the following:
It lists the names of the steps and input/output variables for the sub-flow.
It then creates the different steps within the sub-flow.
Finally, it instantiates the sub-flow. This will be covered in more detail later in the tutorial.
For clarity, the variable names are also prefixed with a dollar ($) sign. This is not necessary and is only done for code clarity. The variable
REPO_DIRPATH_IO is used to hold the file path to the sample codebase Git repository and you will use this to pass in the location of the
codebase Git repository.
Additionally, you can give explicit names to the input/output variables used in the Flow, e.g. “$repo_dirpath_io” for the variable holding the
path to the local repository. Finally, we define those explicit names as string variables (e.g. REPO_DIRPATH_IO) to minimize the number of
magic strings in the code.
See also
To learn about the basics of Flows, check out our, introductory tutorial on WayFlow Flows.
Now take a look at each of the steps used in the sub-flow in more detail.
Get the PR diff, get_pr_diff_step#
This uses a ToolExecutionStep to gather the diff information - see the notes on how this is done earlier. When creating it, you need to
provide the following:
tool: Specifies the tool that will called within the step. This is the tool that was created earlier,local_get_pr_diff_tool.raise_exceptions: Whether to raise exceptions generated by the tool that is called. Here it is set toTrueand so exceptions will be raised.input_mapping: Specifies the names used for the input parameters of the step. See ToolExecutionStep for more details on using aninput_mappingwith this type of step.output_mapping: Specifies the name used foe the output parameter of the step. The name held inPR_DIFF_IOwill be mapped to the name for the output parameter of the step. Again, see ToolExecutionStep for more details on using anoutput_mappingwith this type of step.
Extract file diffs into a list, extract_into_list_of_file_diff_step#
You now have the diff information from the PR. This step performs a regex extraction on the raw diff text to extract the code to review.
Use a RegexExtractionStep to perform this action. When creating the step, you need to provide the following:
regex_pattern: The regex pattern for the extraction. This usesre.findallunderneath.return_first_match_only: You want to return all results, so set this toFalse.input_mapping: Specifies the names used for the input parameters of the step. The input parameter will be mapped to the name, held inPR_DIFF_IO. See RegexExtractionStep for more details on using aninput_mappingwith this type of step.output_mapping: Specifies the name used for the output parameter of the step. Here, the default nameRegexExtractionStep.TEXTis renamed to the name defined inPR_DIFF_IO. Again, see RegexExtractionStep for more details on using anoutput_mappingwith this type of step.
About the pattern:
(diff --git[\s\S]*?)(?=diff --git|$)
The pattern looks for text starting with diff --git, followed by any characters (both whitespace [s] and non-whitespace [S]), until it
encounters either another diff --git or the end of the text ($). However, it does not include the next diff --git or the end in the match.
The *? makes it “lazy” or non-greedy, meaning it takes the shortest possible match, rather than the longest.
Tip
Recent Large Language Models are very helpful tools to create, debug and explain Regex patterns given a natural language description.
Finally, create the sub-flow using the Flow class. You specify the steps in the Flow, the starting step of the Flow, the transitions between steps and how data, from the variables, is to pass from one step to the next.
The transitions between steps are defined with ControlFlowEdges. These take a source step and a destination step. Each
ControlFlowEdge maps one such transition.
Passing values between steps is a very common occurrence when building Flows. This is done using DataFlowEdges which define that a value is passed from one step to another.
Inputs to a step will most commonly be for parameters within a Jinja template, of which there are several examples of in this tutorial, or parameters to callables used by tools. In a DataFlowEdge you can use the name of the parameter, a string, to act as the destination of a value that is being passed in. It is often less error-prone if you create a variable that is set to the name.
Similarly, when a value is the output of a step, such as when a user’s input is captured in an InputMessageStep, the value is
available as a property of the step, for example InputMessageStep.USER_PROVIDED_INPUT. But, it lacks a meaningful name, so it is often helpful to
specify one. This is done using an output_mapping when creating the step. Again, you will want to create a variable to hold the name to avoid
errors.
Defining a Flow#
Defining the Flow is the last step in the code shown above. There are a couple of things that are worth highlighting:
begin_step: A start step needs to be defined for a Flow.control_flow_edges: The transitions between the steps in the Flow are defined as ControlFlowEdges. They have asource_step, which defines the start of a transition, and adestination_step, which defines the destination of a transition. All transitions for the flow will need to be defined.data_flow_edges: Maps the variables between steps connected by a transition using DataFlowEdges. It maps variables from a source step into variables in a destination step. You only need to do this for the variables that need to be passed between steps.
Testing the flow#
You can test this sub-flow by creating an assistant conversation with Flow.start_conversation() and specifying the inputs,
in this case the location of the Git repository. The conversation can then be executed with Conversation.execute().
This returns an object that represents the status of the conversation which you can check to confirm that the conversation has successfully finished.
The code below shows how the inputs are passed in. Set the PATH_TO_DIR to the actual path you extracted the sample codebase
Git repository to. You then extract the outputs from the conversation.
The full code for testing the sub-flow is shown below:
1from wayflowcore.executors.executionstatus import FinishedStatus
2
3# Replace the path below with the path to your actual codebase sample git repository.
4PATH_TO_DIR = "path/to/repository_root"
5
6test_conversation = retrieve_diff_subflow.start_conversation(
7 inputs={
8 REPO_DIRPATH_IO: PATH_TO_DIR,
9 }
10)
11
12execution_status = test_conversation.execute()
13
14if not isinstance(execution_status, FinishedStatus):
15 raise ValueError("Unexpected status type")
16
17FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
18
19print(FILE_DIFF_LIST[0])
API Reference: Flow
Part 3: Review the list of diffs#
Now that we have a list of diffs for each file, we can review them and generate comments using an LLM.
This task can be broken into a sub-flow made up of five steps:
OutputMessageStep: This converts the file diff list into a string to be processed by the following steps.
ToolExecutionStep: This prefixes the diffs with line numbers for additional context to the LLM.
RegexExtractionStep: This extracts the file path from the diff string.
PromptExecutionStep: This generates comments using the LLM based on a list of user-defined checks.
ExtractValueFromJsonStep: This extracts the comments and lines they apply to from the LLM output.
Build the tools and checks#
Before creating the steps and sub-flow to generate the comments, it is important to define the list of checks the assistant should perform, along with any specific instructions. Additionally, a tool must be created to prefix the diffs with line numbers, allowing the LLM to determine where to add comments.
Below is the full code to achieve this. It is broken into sections so that you can see, in detail, what is happening in each part.
1PR_BOT_CHECKS = [
2 """
3Name: TODO_WITHOUT_TICKET
4Description: TODO comments should reference a ticket number for tracking.
5Example code:
6```python
7# TODO: Add validation here
8def process_user_input(data):
9 return data
10```
11Example comment:
12[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
13""",
14 """
15Name: MUTABLE_DEFAULT_ARGUMENT
16Description: Using mutable objects as default arguments can lead to unexpected behavior.
17Example code:
18```python
19def add_item(item, items=[]):
20 items.append(item)
21 return items
22```
23Example comment:
24[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
25""",
26 """
27Name: NON_DESCRIPTIVE_NAME
28Description: Variable names should clearly indicate their purpose or content.
29Example code:
30```python
31def process(lst):
32 res = []
33 for i in lst:
34 res.append(i * 2)
35 return res
36```
37Example comment:
38[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
39""",
40]
41
42CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
43
44PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
45
46## Context
47The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
48After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
49
50## Input
51### Code diff
52{{diff}}
53
54## Task
55Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
56The rules are the following:
57
58{{checks}}
59
60### Response Format
61You need to return a review as a json as follows:
62```json
63[
64 {
65 "content": "the comment as a text",
66 "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
67 "line": "line number where the comment applies"
68 },
69 …
70]
71```
72Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
73If there are no comments and the code seems fine, just output an empty JSON list."""
74
75
76@tool(description_mode="only_docstring")
77def format_git_diff(diff_text: str) -> str:
78 """
79 Formats a git diff by adding line numbers to each line except removal lines.
80 """
81
82 def pad_number(number: int, width: int) -> str:
83 """Right-align a number with specified width using space padding."""
84 return str(number).rjust(width)
85
86 LINE_NUMBER_WIDTH = 5
87 PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
88 current_line_number = 0
89 formatted_lines = []
90
91 for line in diff_text.split("\n"):
92 # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
93 if line.startswith("@@"):
94 try:
95 # Extract the starting line number and line count
96 _, position_info, _ = line.split("@@")
97 new_file_info = position_info.split()[1][1:] # Remove the '+' prefix
98 start_line, line_count = map(int, new_file_info.split(","))
99
100 current_line_number = start_line
101 formatted_lines.append(line)
102 continue
103
104 except (ValueError, IndexError):
105 raise ValueError(f"Invalid diff header format: {line}")
106
107 # Handle content lines
108 if current_line_number > 0 and line:
109 if not line.startswith("-"):
110 # Add line number for added/context lines
111 line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
112 formatted_lines.append(f"{line_prefix} {line}")
113 current_line_number += 1
114 else:
115 # Just add padding for removal lines
116 formatted_lines.append(" " * PADDING_WIDTH + line)
117
118 return "\n".join(formatted_lines)
API Reference: ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep
Checks and LLM instructions#
You will use three simple checks that are shown below. For each check you specify a name, a description of what the LLM should be checking, as well as a code and expected comment example so that the LLM gets a better understanding of what the task is about.
The prompt uses a simple structure:
Role Definition: Define who/what you want the LLM to act as (e.g., “You are a very experienced code reviewer”).
Context Section: Provide relevant background information or specific circumstances that frame the task.
Input Section: Specify the exact information, data, or materials that the LLM will be provided with.
Task Section: Clearly state what you want the LLM to do with the input provided.
Response Format Section: Define how you want the response to be structured or formatted (e.g., bullet points, JSON, with XML tags, and so on).
The prompts are defined in the array, PR_BOT_CHECKS. The individual prompts for the checks are then concatenated into a single string,
CONCATENATED_CHECKS, so that it can be used inside the system prompt you will be passing to the LLM.
Define a system prompt, or prompt template, PROMPT_TEMPLATE. It contains placeholders for the diff and the checks that will be replaced
when specialising the prompt for each diff.
Tip
How to write high-quality prompts
There is no consensus on what makes the best LLM prompt. However, it is noted that for recent LLMs, a great strategy to use to prompt an LLM is simply to be very specific about the task to be solved, giving enough context and explaining potential edge cases to consider.
Given a prompt, try to determine whether giving the set of instructions to an experienced colleague, that has no prior context about the task, to solve would be sufficient for them to get to the intended result.
Diff formatting tool#
You next need to create a tool using the ServerTool to format the diffs in a manner that makes them consumable
by the LLM. A tool, as you will have already seen, is a simple wrapper around a python callable that makes it useable within a flow.
The function, format_git_diff, in the code above does the work of formatting the diffs.
See also
For more information about WayFlow tools please read our guide, How to use tools.
Building the steps and the sub-flow#
With the prompts and diff formatting tool written you can now build the second sub-flow. This sub-flow will iterate over the diffs, generated previously, and then use an LLM to generate review comments from them.
1from wayflowcore._utils._templating_helpers import render_template_partially
2from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
3from wayflowcore.steps import (
4 ExtractValueFromJsonStep,
5 MapStep,
6 OutputMessageStep,
7 PromptExecutionStep,
8 ToolExecutionStep,
9)
10
11# IO Variable Names
12DIFF_TO_STRING_IO = "$diff_to_string"
13DIFF_WITH_LINES_IO = "$diff_with_lines"
14FILEPATH_IO = "$filename"
15JSON_COMMENTS_IO = "$json_comments"
16EXTRACTED_COMMENTS_IO = "$extracted_comments"
17NESTED_COMMENT_LIST_IO = "$nested_comment_list"
18FILEPATH_LIST_IO = "$filepath_list"
19
20# Define the steps
21
22# Step 1: Format the diff to a string
23format_diff_to_string_step = OutputMessageStep(
24 name="format_diff_to_string",
25 message_template="{{ message | string }}",
26 output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
27)
28
29# Step 2: Add lines on the diff using a tool
30add_lines_on_diff_step = ToolExecutionStep(
31 name="add_lines_on_diff",
32 tool=format_git_diff,
33 input_mapping={"diff_text": DIFF_TO_STRING_IO},
34 output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
35)
36
37# Step 3: Extract the file path from the diff string using a regular expression
38extract_file_path_step = RegexExtractionStep(
39 name="extract_file_path",
40 regex_pattern=r"diff --git src://(.+?) dst://",
41 return_first_match_only=True,
42 input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
43 output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
44)
45
46# Step 4: Generate comments using a prompt
47generate_comments_step = PromptExecutionStep(
48 name="generate_comments",
49 prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
50 llm=llm,
51 input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
52 output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
53)
54
55# Step 5: Extract comments from the JSON output
56# Define the value type for extracted comments
57comments_valuetype = ListProperty(
58 name="values",
59 description="The extracted comments content and line number",
60 item_type=DictProperty(value_type=AnyProperty()),
61 default_value=[],
62)
63extract_comments_from_json_step = ExtractValueFromJsonStep(
64 name="extract_comments_from_json",
65 output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
66 retry=True,
67 llm=llm,
68 input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
69 output_mapping={"values": EXTRACTED_COMMENTS_IO},
70)
71
72# Define the sub flow to generate comments for each file diff
73generate_comments_subflow = Flow(
74 name="Generate review comments flow",
75 begin_step=format_diff_to_string_step,
76 control_flow_edges=[
77 ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
78 ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
79 ControlFlowEdge(extract_file_path_step, generate_comments_step),
80 ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
81 ControlFlowEdge(extract_comments_from_json_step, None),
82 ],
83 data_flow_edges=[
84 DataFlowEdge(
85 format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
86 ),
87 DataFlowEdge(
88 format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
89 ),
90 DataFlowEdge(
91 add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
92 ),
93 DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
94 DataFlowEdge(
95 generate_comments_step,
96 JSON_COMMENTS_IO,
97 extract_comments_from_json_step,
98 JSON_COMMENTS_IO,
99 ),
100 ],
101)
102
103# Use the MapStep to apply the sub flow to each file
104for_each_file_step = MapStep(
105 flow=generate_comments_subflow,
106 unpack_input={"message": "."},
107 input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
108 output_descriptors=[
109 ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
110 ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
111 ],
112 output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
113)
114
115generate_all_comments_subflow = Flow.from_steps([for_each_file_step])
API Reference: Property | ListProperty | DictProperty | StringProperty | ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep
Take a look at each of the steps used in the sub-flow to get an understanding of what is happening.
Format diff to string, format_diff_to_string_step#
This step converts the file diff list into a string so that it can be used by the following steps.
This is done with the string Jinja filter as follows: {{ message | string }}. It uses an OutputMessageStep
to achieve this.
Note
Jinja templating introduces security concerns that are addressed by WayFlow by restricting Jinja’s rendering capabilities. Please check our guide on How to write secure prompts with Jinja templating for more information.
Add lines to the diff, add_lines_on_diff_step#
This step prefixes the diff with the line numbers required to review comments. It uses a, ToolExecutionStep, to run the tool that you previously defined in order to do this.
The input to the tool, within the I/O dictionary, is specified using the input_mapping. For all these steps, it is important to remember
that the outputs of one step are linked to the inputs of the next.
Extract file path, extract_file_path_step#
This extracts the file path from the diff string. The file path is needed for assigning the review comments. The RegexExtractionStep step is used to extract the file path from the diff.
The regular expression is applied to the diff string, extracted form the input map using the input_mapping parameter.
Note: Compared to the RegexExtractionStep used in Part 1, here only the first match is required.
Generate comments, generate_comments_step#
This generates comments using the LLM and the prompt template defined earlier. The PromptExecutionStep step executes the prompt with the LLM defined earlier in this tutorial.
Since the list of checks has already been defined, the template can be pre-rendered using the render_template_partially method. This renders the parts of the
template that have been provided, while the remaining information is gathered from the I/O dictionary.
Extract comments from JSON, extract_comments_from_json_step#
This extracts the comments and line numbers from the generated LLM output, which is a serialized JSON structure due to the prompt used.
A ExtractValueFromJsonStep is used to do the extraction. When creating the step, specify the following in
addition to the usual input_mapping and output_mapping:
output_values: This defines the JQ query to extract the comments form the JSON generated by the LLM.llms: An LLM that can be used to help resolve any parsing errors. This is related toretry.retry: If parsing fails, you may want to retry. This is set toTrue, which results in trying to use the LLM to help resolve any such issues.
Create the sub-flow, generate_comments_subflow#
Here you define what steps are in the sub-flow, what the transitions between the steps are and what will be the starting step. This is exactly the same process you did previously when defining the sub-flow to fetch the PR data.
Applying the comment generation to all file diffs#
Now that you have the sub-flow create, you need to apply it to every file diff. This is done using a MapStep.
MapStep takes a sub-flow as input, in this case, the generate_comments_subflow, and applies it to an iterable—in this case, the list of file
diffs.
You simply specify:
flow: The sub-flow to map, that is applied to the iterable.unpack_input: Defines how to unpack the input. A JQ query can be used to transform the input, but in this case, it is kept as a list.input_mapping: Defines what the sub-flow will iterate over. The key,MapStep.ITERATED_INPUT, is used to pass in the diffs.output_descriptors: Specifies the values to collect from the output generated by applying the sub-flow. In this case, these will be the generated comments and the associated file path.
Note
The MapStep works similarly to how the Python map function works. For more information, see https://docs.python.org/3/library/functions.html#map
Finally, create the sub-flow to generate all comments using the helper method create_single_step_flow.
Testing the sub-flow#
You can test the sub-flow by creating a conversation, as shown in the code below, and specifying the inputs as done in, Part 2: Retrieve the PR diff information.
Since each sub-flow is tested independently, you can reuse the output from the first sub-flow.
1# we reuse the FILE_DIFF_LIST from the previous test
2test_conversation = generate_all_comments_subflow.start_conversation(
3 inputs={
4 FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
5 }
6)
7
8execution_status = test_conversation.execute()
9
10if not isinstance(execution_status, FinishedStatus):
11 raise ValueError("Unexpected status type")
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
14FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
15print(NESTED_COMMENT_LIST[0])
16print(FILEPATH_LIST)
Building the final Flow#
Congratulations! You have completed the three sub-flows, which, when combined into a single flow, will retrieve the PR diff information, generate comments on the diffs using an LLM.
You will wire the sub-flows that you have built together by wrapping them in a FlowExecutionStep. The FlowExecutionSteps are then composed into the final combined Flow.
The code for this is shown below:
1from wayflowcore.steps import FlowExecutionStep
2
3
4# Steps
5retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
6generate_all_comments_flowstep = FlowExecutionStep(
7 name="generate_comments_flowstep",
8 flow=generate_all_comments_subflow,
9)
10
11pr_bot = Flow(
12 name="PR bot flow",
13 begin_step=retrieve_diff_flowstep,
14 control_flow_edges=[
15 ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
16 ControlFlowEdge(generate_all_comments_flowstep, None),
17 ],
18 data_flow_edges=[
19 DataFlowEdge(
20 retrieve_diff_flowstep,
21 FILE_DIFF_LIST_IO,
22 generate_all_comments_flowstep,
23 FILE_DIFF_LIST_IO,
24 )
25 ],
26)
API Reference: Flow | FlowExecutionStep
Testing the combined assistant#
You can now run the PR bot end-to-end on your repo or locally.
Set the PATH_TO_DIR to the actual path you extracted the sample codebase Git repository to. You can also see how the output of the conversation
is extracted from the execution_status object, execution_status.output_values.
1# Replace the path below with the path to your actual codebase sample git repository.
2PATH_TO_DIR = "path/to/repository_root"
3
4conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
5
6execution_status = conversation.execute()
7
8if not isinstance(execution_status, FinishedStatus):
9 raise ValueError("Unexpected status type")
10
11print(execution_status.output_values)
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
Agent Spec Exporting/Loading#
You can export the assistant configuration to its Agent Spec configuration using the AgentSpecExporter.
from wayflowcore.agentspec import AgentSpecExporter
serialized_assistant = AgentSpecExporter().to_json(pr_bot)
Here is what the Agent Spec representation will look like ↓
Click here to see the assistant configuration.
{
"component_type": "Flow",
"id": "9c65246d-a0dd-4ec4-801d-afd640b2488e",
"name": "PR bot flow",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"start_node": {
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
"nodes": [
{
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
{
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
{
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
{
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "a5c123ff-c14c-4291-b174-61d61170f187",
"name": "retrieve_diff_flowstep_to_generate_comments_flowstep_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"from_branch": null,
"to_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
}
},
{
"component_type": "ControlFlowEdge",
"id": "8a10b23a-2d0c-46c4-82ac-e66ad0b9399b",
"name": "__StartStep___to_retrieve_diff_flowstep_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
"from_branch": null,
"to_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
}
},
{
"component_type": "ControlFlowEdge",
"id": "dac07720-8a5a-4a61-b1e7-50be506ed937",
"name": "generate_comments_flowstep_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"from_branch": null,
"to_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "7b12dfed-309b-46ff-8a2d-bb6f2a3154b6",
"name": "retrieve_diff_flowstep_$file_diff_list_to_generate_comments_flowstep_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"destination_input": "$file_diff_list"
},
{
"component_type": "DataFlowEdge",
"id": "51122844-22d3-40a8-b652-1b020ce24945",
"name": "__StartStep___$repo_dirpath_io_to_retrieve_diff_flowstep_$repo_dirpath_io_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
"source_output": "$repo_dirpath_io",
"destination_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"destination_input": "$repo_dirpath_io"
},
{
"component_type": "DataFlowEdge",
"id": "72aa469c-98cd-4f0d-9496-0aa454373aef",
"name": "generate_comments_flowstep_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"source_output": "$filepath_list",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$filepath_list"
},
{
"component_type": "DataFlowEdge",
"id": "eac1b375-1541-41f7-87f3-f3e626cc2c9c",
"name": "generate_comments_flowstep_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"source_output": "$nested_comment_list",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$nested_comment_list"
},
{
"component_type": "DataFlowEdge",
"id": "0869acb5-4d8f-4b17-b59b-3b915912b628",
"name": "retrieve_diff_flowstep_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"source_output": "$raw_pr_diff",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$raw_pr_diff"
},
{
"component_type": "DataFlowEdge",
"id": "9fb2ab9e-ece1-4195-8f51-ef618dcb72bb",
"name": "retrieve_diff_flowstep_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$file_diff_list"
}
],
"$referenced_components": {
"43d58c76-23a0-4d10-943d-f9c5e0835a7c": {
"component_type": "FlowNode",
"id": "43d58c76-23a0-4d10-943d-f9c5e0835a7c",
"name": "generate_comments_flowstep",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"branches": [
"next"
],
"subflow": {
"component_type": "Flow",
"id": "f95e0e5d-f573-4e25-9d68-8508371246f9",
"name": "flow_028a7dfb__auto",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"start_node": {
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
"nodes": [
{
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
{
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
{
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "85a2cdff-6ad4-4f58-8d1c-c8deeb05880c",
"name": "__StartStep___to_step_0_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
"from_branch": null,
"to_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
}
},
{
"component_type": "ControlFlowEdge",
"id": "396e218f-225e-4e36-a33c-a176ca77d345",
"name": "step_0_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"from_branch": null,
"to_node": {
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "6c8b8f78-b587-49ff-a401-6262cdafb0ee",
"name": "__StartStep___$file_diff_list_to_step_0_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"destination_input": "$file_diff_list"
},
{
"component_type": "DataFlowEdge",
"id": "84d3a783-38c8-4d53-bc0b-4205732d1fbf",
"name": "step_0_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"source_output": "$filepath_list",
"destination_node": {
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
},
"destination_input": "$filepath_list"
},
{
"component_type": "DataFlowEdge",
"id": "b7ffd4c3-4a03-47f0-95fc-0ba670010729",
"name": "step_0_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"source_output": "$nested_comment_list",
"destination_node": {
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
},
"destination_input": "$nested_comment_list"
}
],
"$referenced_components": {
"f127a297-842d-4d17-bc89-4704019458d7": {
"component_type": "ExtendedMapNode",
"id": "f127a297-842d-4d17-bc89-4704019458d7",
"name": "step_0",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
}
],
"branches": [
"next"
],
"input_mapping": {
"iterated_input": "$file_diff_list"
},
"output_mapping": {
"$extracted_comments": "$nested_comment_list",
"$filename": "$filepath_list"
},
"flow": {
"component_type": "Flow",
"id": "3da67cce-b8de-40be-bb8d-e1edead178f0",
"name": "Generate review comments flow",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"outputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
},
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
},
{
"type": "string",
"title": "$diff_with_lines"
},
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
},
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"start_node": {
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
"nodes": [
{
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
{
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
{
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
{
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
{
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
{
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
{
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "becf6951-96fd-4152-97d0-4a4eff042a29",
"name": "format_diff_to_string_to_add_lines_on_diff_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"from_branch": null,
"to_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
}
},
{
"component_type": "ControlFlowEdge",
"id": "c197b0d5-8002-4910-ae8d-61f97f1f8f26",
"name": "add_lines_on_diff_to_extract_file_path_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"from_branch": null,
"to_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
}
},
{
"component_type": "ControlFlowEdge",
"id": "406e0670-cc49-4da4-8d15-8c1c320193e8",
"name": "extract_file_path_to_generate_comments_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"from_branch": null,
"to_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
}
},
{
"component_type": "ControlFlowEdge",
"id": "e54eb347-2e6c-42c4-a7d6-a42c8059bdf3",
"name": "generate_comments_to_extract_comments_from_json_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"from_branch": null,
"to_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
}
},
{
"component_type": "ControlFlowEdge",
"id": "ebe5e60b-2724-4b51-b287-79f3e8e7fdd1",
"name": "__StartStep___to_format_diff_to_string_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
"from_branch": null,
"to_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
}
},
{
"component_type": "ControlFlowEdge",
"id": "98e7631e-7206-4ba9-b5b0-eb308ac89c0f",
"name": "extract_comments_from_json_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
"from_branch": null,
"to_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "ab8ed6de-3ea7-424e-a830-bca10ac57a32",
"name": "format_diff_to_string_$diff_to_string_to_add_lines_on_diff_$diff_to_string_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"source_output": "$diff_to_string",
"destination_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"destination_input": "$diff_to_string"
},
{
"component_type": "DataFlowEdge",
"id": "3caaa171-9b4b-44df-8ebd-4d060329f91a",
"name": "format_diff_to_string_$diff_to_string_to_extract_file_path_$diff_to_string_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"source_output": "$diff_to_string",
"destination_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"destination_input": "$diff_to_string"
},
{
"component_type": "DataFlowEdge",
"id": "cdf0945b-5a96-42ff-b410-f7c56b5f8e45",
"name": "add_lines_on_diff_$diff_with_lines_to_generate_comments_$diff_with_lines_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"source_output": "$diff_with_lines",
"destination_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"destination_input": "$diff_with_lines"
},
{
"component_type": "DataFlowEdge",
"id": "ca6ed62b-6f6a-405f-9f16-5e1304de6608",
"name": "extract_file_path_$filename_to_generate_comments_$filename_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"source_output": "$filename",
"destination_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"destination_input": "$filename"
},
{
"component_type": "DataFlowEdge",
"id": "dec4b4bb-56c9-445a-a282-9d095ff6038e",
"name": "generate_comments_$json_comments_to_extract_comments_from_json_$json_comments_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"source_output": "$json_comments",
"destination_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
"destination_input": "$json_comments"
},
{
"component_type": "DataFlowEdge",
"id": "611478d7-281a-4587-81e6-97e8c745da53",
"name": "__StartStep___message_to_format_diff_to_string_message_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
"source_output": "message",
"destination_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"destination_input": "message"
},
{
"component_type": "DataFlowEdge",
"id": "227ae098-0baf-4fe8-9615-094bb386c9a9",
"name": "extract_comments_from_json_$extracted_comments_to_None End node_$extracted_comments_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
"source_output": "$extracted_comments",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$extracted_comments"
},
{
"component_type": "DataFlowEdge",
"id": "6e25b4d8-5656-471b-8ffa-1fe8cfffbc05",
"name": "generate_comments_$json_comments_to_None End node_$json_comments_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"source_output": "$json_comments",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$json_comments"
},
{
"component_type": "DataFlowEdge",
"id": "fdbf1eeb-0278-4dc8-b897-c924937a1692",
"name": "add_lines_on_diff_$diff_with_lines_to_None End node_$diff_with_lines_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"source_output": "$diff_with_lines",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$diff_with_lines"
},
{
"component_type": "DataFlowEdge",
"id": "3b6bcba7-635b-45fa-b450-cf0a15dae463",
"name": "extract_file_path_$filename_to_None End node_$filename_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"source_output": "$filename",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$filename"
},
{
"component_type": "DataFlowEdge",
"id": "2f95704b-4cc1-4983-8a20-e39c79a94e01",
"name": "format_diff_to_string_$diff_to_string_to_None End node_$diff_to_string_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"source_output": "$diff_to_string",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$diff_to_string"
}
],
"$referenced_components": {
"6000ee3f-ac80-4937-b36c-94fd65cdcda4": {
"component_type": "ExtendedToolNode",
"id": "6000ee3f-ac80-4937-b36c-94fd65cdcda4",
"name": "add_lines_on_diff",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$diff_to_string"
}
],
"outputs": [
{
"type": "string",
"title": "$diff_with_lines"
}
],
"branches": [
"next"
],
"tool": {
"component_type": "ServerTool",
"id": "e936566f-7a25-40f3-9434-3e740a7bfb02",
"name": "format_git_diff",
"description": "Formats a git diff by adding line numbers to each line except removal lines.",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "diff_text"
}
],
"outputs": [
{
"type": "string",
"title": "tool_output"
}
]
},
"input_mapping": {
"diff_text": "$diff_to_string"
},
"output_mapping": {
"tool_output": "$diff_with_lines"
},
"raise_exceptions": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"f0fb3ab4-a950-43b6-a583-6f0044f18c7f": {
"component_type": "PluginOutputMessageNode",
"id": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f",
"name": "format_diff_to_string",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"outputs": [
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"branches": [
"next"
],
"expose_message_as_output": true,
"message": "{{ message | string }}",
"input_mapping": {},
"output_mapping": {
"output_message": "$diff_to_string"
},
"message_type": "AGENT",
"rephrase": false,
"llm_config": null,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"6f6dc822-9352-47ae-9b48-173402a334fe": {
"component_type": "PluginRegexNode",
"id": "6f6dc822-9352-47ae-9b48-173402a334fe",
"name": "extract_file_path",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "raw text to extract information from",
"type": "string",
"title": "$diff_to_string"
}
],
"outputs": [
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
}
],
"branches": [
"next"
],
"input_mapping": {
"text": "$diff_to_string"
},
"output_mapping": {
"output": "$filename"
},
"regex_pattern": "diff --git a/(.+?) b/",
"return_first_match_only": true,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"0ce752d7-3ef1-481b-bb01-c7081ef86103": {
"component_type": "ExtendedLlmNode",
"id": "0ce752d7-3ef1-481b-bb01-c7081ef86103",
"name": "generate_comments",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"filename\" input variable for the template",
"type": "string",
"title": "$filename"
},
{
"description": "\"diff\" input variable for the template",
"type": "string",
"title": "$diff_with_lines"
}
],
"outputs": [
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
}
],
"branches": [
"next"
],
"llm_config": {
"component_type": "VllmConfig",
"id": "fb043839-1e69-404c-a178-d8c3de0bfe20",
"name": "LLAMA_MODEL_ID",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"default_generation_parameters": null,
"url": "LLAMA_API_URL",
"model_id": "LLAMA_MODEL_ID"
},
"prompt_template": "You are a very experienced code reviewer. You are given a git diff on a file: {{ filename }}\n\n## Context\nThe git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.\nAfter the line number, a line that was changed has a \"+\" before the code. All lines without a \"+\" are just here for context, you will not comment on them.\n\n## Input\n### Code diff\n{{ diff }}\n\n## Task\nYour task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.\nThe rules are the following:\n\n\nName: TODO_WITHOUT_TICKET\nDescription: TODO comments should reference a ticket number for tracking.\nExample code:\n```python\n# TODO: Add validation here\ndef process_user_input(data):\n return data\n```\nExample comment:\n[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., \"TODO: Add validation here (TICKET-1234)\").\n\n\n---\n\n\nName: MUTABLE_DEFAULT_ARGUMENT\nDescription: Using mutable objects as default arguments can lead to unexpected behavior.\nExample code:\n```python\ndef add_item(item, items=[]):\n items.append(item)\n return items\n```\nExample comment:\n[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`\n\n\n---\n\n\nName: NON_DESCRIPTIVE_NAME\nDescription: Variable names should clearly indicate their purpose or content.\nExample code:\n```python\ndef process(lst):\n res = []\n for i in lst:\n res.append(i * 2)\n return res\n```\nExample comment:\n[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'\n\n\n### Response Format\nYou need to return a review as a json as follows:\n```json\n[\n {\n \"content\": \"the comment as a text\",\n \"suggestion\": \"if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line\",\n \"line\": \"line number where the comment applies\"\n },\n \u2026\n]\n```\nPlease use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.\nIf there are no comments and the code seems fine, just output an empty JSON list.",
"input_mapping": {
"diff": "$diff_with_lines",
"filename": "$filename"
},
"output_mapping": {
"output": "$json_comments"
},
"prompt_template_object": null,
"send_message": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"48057b9c-bee7-4286-baf5-625b6f1a6f1a": {
"component_type": "PluginExtractNode",
"id": "48057b9c-bee7-4286-baf5-625b6f1a6f1a",
"name": "extract_comments_from_json",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "raw text to extract information from",
"type": "string",
"title": "$json_comments"
}
],
"outputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
}
],
"branches": [
"next"
],
"input_mapping": {
"text": "$json_comments"
},
"output_mapping": {
"values": "$extracted_comments"
},
"output_values": {
"values": "[.[] | {\"content\": .[\"content\"], \"line\": .[\"line\"]}]"
},
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"e20f5870-d594-4089-9fcd-08146232910d": {
"component_type": "StartNode",
"id": "e20f5870-d594-4089-9fcd-08146232910d",
"name": "__StartStep__",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"outputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"branches": [
"next"
]
},
"39f36227-8910-414c-8b6b-517c0d65b0d8": {
"component_type": "EndNode",
"id": "39f36227-8910-414c-8b6b-517c0d65b0d8",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
},
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
},
{
"type": "string",
"title": "$diff_with_lines"
},
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
},
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"outputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
},
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
},
{
"type": "string",
"title": "$diff_with_lines"
},
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
},
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"branches": [],
"branch_name": "next"
}
}
},
"unpack_input": {
"message": "."
},
"parallel_execution": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"367ae568-317d-42ec-ae70-4c41afe0dbd0": {
"component_type": "StartNode",
"id": "367ae568-317d-42ec-ae70-4c41afe0dbd0",
"name": "__StartStep__",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"branches": [
"next"
]
},
"6f62aecf-03a1-4e38-b551-8eef0efaf4bb": {
"component_type": "EndNode",
"id": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"branches": [],
"branch_name": "next"
}
}
}
},
"47e367be-4d74-49dc-ac3b-89bb97ffa7df": {
"component_type": "FlowNode",
"id": "47e367be-4d74-49dc-ac3b-89bb97ffa7df",
"name": "retrieve_diff_flowstep",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [
"next"
],
"subflow": {
"component_type": "Flow",
"id": "9e7aed22-876c-4c32-9d44-20ee7ceb3771",
"name": "Retrieve PR diff flow",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"start_node": {
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
"nodes": [
{
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
{
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
{
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
{
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "60dc14b8-d9b9-4aec-a958-9f3676848f48",
"name": "start_step_to_get_pr_diff_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
"from_branch": null,
"to_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
}
},
{
"component_type": "ControlFlowEdge",
"id": "500f97de-78b1-42e0-944c-0375dfca734e",
"name": "get_pr_diff_to_extract_into_list_of_file_diff_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"from_branch": null,
"to_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
}
},
{
"component_type": "ControlFlowEdge",
"id": "22d0cf0d-8edb-4b04-8f54-a234f5705360",
"name": "extract_into_list_of_file_diff_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
"from_branch": null,
"to_node": {
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "106e3740-de45-4472-8168-2873ae1dbc82",
"name": "start_step_$repo_dirpath_io_to_get_pr_diff_$repo_dirpath_io_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
"source_output": "$repo_dirpath_io",
"destination_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"destination_input": "$repo_dirpath_io"
},
{
"component_type": "DataFlowEdge",
"id": "a32cbb1c-eafe-4138-80e2-2cf2e1248312",
"name": "get_pr_diff_$raw_pr_diff_to_extract_into_list_of_file_diff_$raw_pr_diff_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"source_output": "$raw_pr_diff",
"destination_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
"destination_input": "$raw_pr_diff"
},
{
"component_type": "DataFlowEdge",
"id": "3ef5dcf4-acdf-4962-8df6-07b53f249e18",
"name": "get_pr_diff_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"source_output": "$raw_pr_diff",
"destination_node": {
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
},
"destination_input": "$raw_pr_diff"
},
{
"component_type": "DataFlowEdge",
"id": "08cbca39-e591-4cf4-9057-ae67938d9557",
"name": "extract_into_list_of_file_diff_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
},
"destination_input": "$file_diff_list"
}
],
"$referenced_components": {
"5c73da9c-6ba9-44ce-aab1-212a78d0a720": {
"component_type": "ExtendedToolNode",
"id": "5c73da9c-6ba9-44ce-aab1-212a78d0a720",
"name": "get_pr_diff",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
}
],
"branches": [
"next"
],
"tool": {
"component_type": "ServerTool",
"id": "275aaf19-cdd4-4ed7-a436-e53f922cd740",
"name": "local_get_pr_diff_tool",
"description": "# docs-skiprow\nRetrieves code diff with a git command given the # docs-skiprow\npath to the repository root folder. # docs-skiprow",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "repo_dirpath"
}
],
"outputs": [
{
"type": "string",
"title": "tool_output"
}
]
},
"input_mapping": {
"repo_dirpath": "$repo_dirpath_io"
},
"output_mapping": {
"tool_output": "$raw_pr_diff"
},
"raise_exceptions": true,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"4fcb7ebe-325b-446d-a46b-59187c30e260": {
"component_type": "StartNode",
"id": "4fcb7ebe-325b-446d-a46b-59187c30e260",
"name": "start_step",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"branches": [
"next"
]
},
"cf841053-2414-48b6-ba6d-0f0f5e11044c": {
"component_type": "PluginRegexNode",
"id": "cf841053-2414-48b6-ba6d-0f0f5e11044c",
"name": "extract_into_list_of_file_diff",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "raw text to extract information from",
"type": "string",
"title": "$raw_pr_diff"
}
],
"outputs": [
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [
"next"
],
"input_mapping": {
"text": "$raw_pr_diff"
},
"output_mapping": {
"output": "$file_diff_list"
},
"regex_pattern": "(diff --git[\\s\\S]*?)(?=diff --git|$)",
"return_first_match_only": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"dd0e56ab-1267-4345-9f59-ecc053baf2af": {
"component_type": "EndNode",
"id": "dd0e56ab-1267-4345-9f59-ecc053baf2af",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [],
"branch_name": "next"
}
}
}
},
"020c885e-6d0b-472a-bb91-246ab70ab1db": {
"component_type": "StartNode",
"id": "020c885e-6d0b-472a-bb91-246ab70ab1db",
"name": "__StartStep__",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"branches": [
"next"
]
},
"a544af64-e63b-4ccf-9ab0-8d25cdbc0b93": {
"component_type": "EndNode",
"id": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [],
"branch_name": "next"
}
},
"agentspec_version": "25.4.1"
}
component_type: Flow
id: 9c65246d-a0dd-4ec4-801d-afd640b2488e
name: PR bot flow
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
start_node:
$component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
nodes:
- $component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
- $component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
- $component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
- $component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
control_flow_connections:
- component_type: ControlFlowEdge
id: a5c123ff-c14c-4291-b174-61d61170f187
name: retrieve_diff_flowstep_to_generate_comments_flowstep_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
from_branch: null
to_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
- component_type: ControlFlowEdge
id: 8a10b23a-2d0c-46c4-82ac-e66ad0b9399b
name: __StartStep___to_retrieve_diff_flowstep_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
from_branch: null
to_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
- component_type: ControlFlowEdge
id: dac07720-8a5a-4a61-b1e7-50be506ed937
name: generate_comments_flowstep_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
from_branch: null
to_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
data_flow_connections:
- component_type: DataFlowEdge
id: 7b12dfed-309b-46ff-8a2d-bb6f2a3154b6
name: retrieve_diff_flowstep_$file_diff_list_to_generate_comments_flowstep_$file_diff_list_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
source_output: $file_diff_list
destination_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
destination_input: $file_diff_list
- component_type: DataFlowEdge
id: 51122844-22d3-40a8-b652-1b020ce24945
name: __StartStep___$repo_dirpath_io_to_retrieve_diff_flowstep_$repo_dirpath_io_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
source_output: $repo_dirpath_io
destination_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
destination_input: $repo_dirpath_io
- component_type: DataFlowEdge
id: 72aa469c-98cd-4f0d-9496-0aa454373aef
name: generate_comments_flowstep_$filepath_list_to_None End node_$filepath_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
source_output: $filepath_list
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $filepath_list
- component_type: DataFlowEdge
id: eac1b375-1541-41f7-87f3-f3e626cc2c9c
name: generate_comments_flowstep_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
source_output: $nested_comment_list
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $nested_comment_list
- component_type: DataFlowEdge
id: 0869acb5-4d8f-4b17-b59b-3b915912b628
name: retrieve_diff_flowstep_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
source_output: $raw_pr_diff
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $raw_pr_diff
- component_type: DataFlowEdge
id: 9fb2ab9e-ece1-4195-8f51-ef618dcb72bb
name: retrieve_diff_flowstep_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
source_output: $file_diff_list
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $file_diff_list
$referenced_components:
43d58c76-23a0-4d10-943d-f9c5e0835a7c:
component_type: FlowNode
id: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
name: generate_comments_flowstep
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
branches:
- next
subflow:
component_type: Flow
id: f95e0e5d-f573-4e25-9d68-8508371246f9
name: flow_028a7dfb__auto
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
start_node:
$component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
nodes:
- $component_ref: f127a297-842d-4d17-bc89-4704019458d7
- $component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
- $component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
control_flow_connections:
- component_type: ControlFlowEdge
id: 85a2cdff-6ad4-4f58-8d1c-c8deeb05880c
name: __StartStep___to_step_0_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
from_branch: null
to_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
- component_type: ControlFlowEdge
id: 396e218f-225e-4e36-a33c-a176ca77d345
name: step_0_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
from_branch: null
to_node:
$component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
data_flow_connections:
- component_type: DataFlowEdge
id: 6c8b8f78-b587-49ff-a401-6262cdafb0ee
name: __StartStep___$file_diff_list_to_step_0_$file_diff_list_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
source_output: $file_diff_list
destination_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
destination_input: $file_diff_list
- component_type: DataFlowEdge
id: 84d3a783-38c8-4d53-bc0b-4205732d1fbf
name: step_0_$filepath_list_to_None End node_$filepath_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
source_output: $filepath_list
destination_node:
$component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
destination_input: $filepath_list
- component_type: DataFlowEdge
id: b7ffd4c3-4a03-47f0-95fc-0ba670010729
name: step_0_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
source_output: $nested_comment_list
destination_node:
$component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
destination_input: $nested_comment_list
$referenced_components:
f127a297-842d-4d17-bc89-4704019458d7:
component_type: ExtendedMapNode
id: f127a297-842d-4d17-bc89-4704019458d7
name: step_0
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- type: array
items: {}
title: $nested_comment_list
- type: array
items:
type: string
title: $filepath_list
branches:
- next
input_mapping:
iterated_input: $file_diff_list
output_mapping:
$extracted_comments: $nested_comment_list
$filename: $filepath_list
flow:
component_type: Flow
id: 3da67cce-b8de-40be-bb8d-e1edead178f0
name: Generate review comments flow
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"message" input variable for the template'
title: message
outputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
- description: the generated text
type: string
title: $json_comments
- type: string
title: $diff_with_lines
- description: the first extracted value using the regex "diff --git a/(.+?)
b/" from the raw input
type: string
title: $filename
default: ''
- description: the message added to the messages list
type: string
title: $diff_to_string
start_node:
$component_ref: e20f5870-d594-4089-9fcd-08146232910d
nodes:
- $component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
- $component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
- $component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
- $component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
- $component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
- $component_ref: e20f5870-d594-4089-9fcd-08146232910d
- $component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
control_flow_connections:
- component_type: ControlFlowEdge
id: becf6951-96fd-4152-97d0-4a4eff042a29
name: format_diff_to_string_to_add_lines_on_diff_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
from_branch: null
to_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
- component_type: ControlFlowEdge
id: c197b0d5-8002-4910-ae8d-61f97f1f8f26
name: add_lines_on_diff_to_extract_file_path_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
from_branch: null
to_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
- component_type: ControlFlowEdge
id: 406e0670-cc49-4da4-8d15-8c1c320193e8
name: extract_file_path_to_generate_comments_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
from_branch: null
to_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
- component_type: ControlFlowEdge
id: e54eb347-2e6c-42c4-a7d6-a42c8059bdf3
name: generate_comments_to_extract_comments_from_json_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
from_branch: null
to_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
- component_type: ControlFlowEdge
id: ebe5e60b-2724-4b51-b287-79f3e8e7fdd1
name: __StartStep___to_format_diff_to_string_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: e20f5870-d594-4089-9fcd-08146232910d
from_branch: null
to_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
- component_type: ControlFlowEdge
id: 98e7631e-7206-4ba9-b5b0-eb308ac89c0f
name: extract_comments_from_json_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
from_branch: null
to_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
data_flow_connections:
- component_type: DataFlowEdge
id: ab8ed6de-3ea7-424e-a830-bca10ac57a32
name: format_diff_to_string_$diff_to_string_to_add_lines_on_diff_$diff_to_string_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
source_output: $diff_to_string
destination_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
destination_input: $diff_to_string
- component_type: DataFlowEdge
id: 3caaa171-9b4b-44df-8ebd-4d060329f91a
name: format_diff_to_string_$diff_to_string_to_extract_file_path_$diff_to_string_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
source_output: $diff_to_string
destination_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
destination_input: $diff_to_string
- component_type: DataFlowEdge
id: cdf0945b-5a96-42ff-b410-f7c56b5f8e45
name: add_lines_on_diff_$diff_with_lines_to_generate_comments_$diff_with_lines_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
source_output: $diff_with_lines
destination_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
destination_input: $diff_with_lines
- component_type: DataFlowEdge
id: ca6ed62b-6f6a-405f-9f16-5e1304de6608
name: extract_file_path_$filename_to_generate_comments_$filename_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
source_output: $filename
destination_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
destination_input: $filename
- component_type: DataFlowEdge
id: dec4b4bb-56c9-445a-a282-9d095ff6038e
name: generate_comments_$json_comments_to_extract_comments_from_json_$json_comments_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
source_output: $json_comments
destination_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
destination_input: $json_comments
- component_type: DataFlowEdge
id: 611478d7-281a-4587-81e6-97e8c745da53
name: __StartStep___message_to_format_diff_to_string_message_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: e20f5870-d594-4089-9fcd-08146232910d
source_output: message
destination_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
destination_input: message
- component_type: DataFlowEdge
id: 227ae098-0baf-4fe8-9615-094bb386c9a9
name: extract_comments_from_json_$extracted_comments_to_None End node_$extracted_comments_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
source_output: $extracted_comments
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $extracted_comments
- component_type: DataFlowEdge
id: 6e25b4d8-5656-471b-8ffa-1fe8cfffbc05
name: generate_comments_$json_comments_to_None End node_$json_comments_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
source_output: $json_comments
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $json_comments
- component_type: DataFlowEdge
id: fdbf1eeb-0278-4dc8-b897-c924937a1692
name: add_lines_on_diff_$diff_with_lines_to_None End node_$diff_with_lines_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
source_output: $diff_with_lines
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $diff_with_lines
- component_type: DataFlowEdge
id: 3b6bcba7-635b-45fa-b450-cf0a15dae463
name: extract_file_path_$filename_to_None End node_$filename_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
source_output: $filename
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $filename
- component_type: DataFlowEdge
id: 2f95704b-4cc1-4983-8a20-e39c79a94e01
name: format_diff_to_string_$diff_to_string_to_None End node_$diff_to_string_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
source_output: $diff_to_string
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $diff_to_string
$referenced_components:
6000ee3f-ac80-4937-b36c-94fd65cdcda4:
component_type: ExtendedToolNode
id: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
name: add_lines_on_diff
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $diff_to_string
outputs:
- type: string
title: $diff_with_lines
branches:
- next
tool:
component_type: ServerTool
id: e936566f-7a25-40f3-9434-3e740a7bfb02
name: format_git_diff
description: Formats a git diff by adding line numbers to each line
except removal lines.
metadata:
__metadata_info__: {}
inputs:
- type: string
title: diff_text
outputs:
- type: string
title: tool_output
input_mapping:
diff_text: $diff_to_string
output_mapping:
tool_output: $diff_with_lines
raise_exceptions: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
f0fb3ab4-a950-43b6-a583-6f0044f18c7f:
component_type: PluginOutputMessageNode
id: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
name: format_diff_to_string
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"message" input variable for the template'
title: message
outputs:
- description: the message added to the messages list
type: string
title: $diff_to_string
branches:
- next
expose_message_as_output: True
message: '{{ message | string }}'
input_mapping: {}
output_mapping:
output_message: $diff_to_string
message_type: AGENT
rephrase: false
llm_config: null
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
6f6dc822-9352-47ae-9b48-173402a334fe:
component_type: PluginRegexNode
id: 6f6dc822-9352-47ae-9b48-173402a334fe
name: extract_file_path
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: raw text to extract information from
type: string
title: $diff_to_string
outputs:
- description: the first extracted value using the regex "diff --git
a/(.+?) b/" from the raw input
type: string
title: $filename
default: ''
branches:
- next
input_mapping:
text: $diff_to_string
output_mapping:
output: $filename
regex_pattern: diff --git a/(.+?) b/
return_first_match_only: true
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
0ce752d7-3ef1-481b-bb01-c7081ef86103:
component_type: ExtendedLlmNode
id: 0ce752d7-3ef1-481b-bb01-c7081ef86103
name: generate_comments
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"filename" input variable for the template'
type: string
title: $filename
- description: '"diff" input variable for the template'
type: string
title: $diff_with_lines
outputs:
- description: the generated text
type: string
title: $json_comments
branches:
- next
llm_config:
component_type: VllmConfig
id: fb043839-1e69-404c-a178-d8c3de0bfe20
name: LLAMA_MODEL_ID
description: null
metadata:
__metadata_info__: {}
default_generation_parameters: null
url: LLAMA_API_URL
model_id: LLAMA_MODEL_ID
prompt_template: "You are a very experienced code reviewer. You are\
\ given a git diff on a file: {{ filename }}\n\n## Context\nThe\
\ git diff contains all changes of a single file. All lines are\
\ prepended with their number. Lines without line number where removed\
\ from the file.\nAfter the line number, a line that was changed\
\ has a \"+\" before the code. All lines without a \"+\" are just\
\ here for context, you will not comment on them.\n\n## Input\n\
### Code diff\n{{ diff }}\n\n## Task\nYour task is to review these\
\ changes, according to different rules. Only comment lines that\
\ were added, so the lines that have a + just after the line number.\n\
The rules are the following:\n\n\nName: TODO_WITHOUT_TICKET\nDescription:\
\ TODO comments should reference a ticket number for tracking.\n\
Example code:\n```python\n# TODO: Add validation here\ndef process_user_input(data):\n\
\ return data\n```\nExample comment:\n[BOT] TODO_WITHOUT_TICKET:\
\ TODO comment should reference a ticket number for tracking (e.g.,\
\ \"TODO: Add validation here (TICKET-1234)\").\n\n\n---\n\n\nName:\
\ MUTABLE_DEFAULT_ARGUMENT\nDescription: Using mutable objects as\
\ default arguments can lead to unexpected behavior.\nExample code:\n\
```python\ndef add_item(item, items=[]):\n items.append(item)\n\
\ return items\n```\nExample comment:\n[BOT] MUTABLE_DEFAULT_ARGUMENT:\
\ Avoid using mutable default arguments. Use None and initialize\
\ in the function: `def add_item(item, items=None): items = items\
\ or []`\n\n\n---\n\n\nName: NON_DESCRIPTIVE_NAME\nDescription:\
\ Variable names should clearly indicate their purpose or content.\n\
Example code:\n```python\ndef process(lst):\n res = []\n for\
\ i in lst:\n res.append(i * 2)\n return res\n```\nExample\
\ comment:\n[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names:\
\ 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i'\
\ could be 'number'\n\n\n### Response Format\nYou need to return\
\ a review as a json as follows:\n```json\n[\n {\n \"\
content\": \"the comment as a text\",\n \"suggestion\": \"\
if the change you propose is a single line, then put here the single\
\ line rewritten that includes your proposal change. IMPORTANT:\
\ a single line, which will erase the current line. Put empty string\
\ if no suggestion of if the suggestion is more than a single line\"\
,\n \"line\": \"line number where the comment applies\"\n\
\ },\n \u2026\n]\n```\nPlease use triple backticks ``` to\
\ delimitate your JSON list of comments. Don't output more than\
\ 5 comments, only comment the most relevant sections.\nIf there\
\ are no comments and the code seems fine, just output an empty\
\ JSON list."
input_mapping:
diff: $diff_with_lines
filename: $filename
output_mapping:
output: $json_comments
prompt_template_object: null
send_message: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
48057b9c-bee7-4286-baf5-625b6f1a6f1a:
component_type: PluginExtractNode
id: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
name: extract_comments_from_json
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: raw text to extract information from
type: string
title: $json_comments
outputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
branches:
- next
input_mapping:
text: $json_comments
output_mapping:
values: $extracted_comments
output_values:
values: '[.[] | {"content": .["content"], "line": .["line"]}]'
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
e20f5870-d594-4089-9fcd-08146232910d:
component_type: StartNode
id: e20f5870-d594-4089-9fcd-08146232910d
name: __StartStep__
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"message" input variable for the template'
title: message
outputs:
- description: '"message" input variable for the template'
title: message
branches:
- next
39f36227-8910-414c-8b6b-517c0d65b0d8:
component_type: EndNode
id: 39f36227-8910-414c-8b6b-517c0d65b0d8
name: None End node
description: End node representing all transitions to None in the
WayFlow flow
metadata: {}
inputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
- description: the generated text
type: string
title: $json_comments
- type: string
title: $diff_with_lines
- description: the first extracted value using the regex "diff --git
a/(.+?) b/" from the raw input
type: string
title: $filename
default: ''
- description: the message added to the messages list
type: string
title: $diff_to_string
outputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
- description: the generated text
type: string
title: $json_comments
- type: string
title: $diff_with_lines
- description: the first extracted value using the regex "diff --git
a/(.+?) b/" from the raw input
type: string
title: $filename
default: ''
- description: the message added to the messages list
type: string
title: $diff_to_string
branches: []
branch_name: next
unpack_input:
message: .
parallel_execution: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
367ae568-317d-42ec-ae70-4c41afe0dbd0:
component_type: StartNode
id: 367ae568-317d-42ec-ae70-4c41afe0dbd0
name: __StartStep__
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
branches:
- next
6f62aecf-03a1-4e38-b551-8eef0efaf4bb:
component_type: EndNode
id: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
name: None End node
description: End node representing all transitions to None in the WayFlow
flow
metadata: {}
inputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
branches: []
branch_name: next
47e367be-4d74-49dc-ac3b-89bb97ffa7df:
component_type: FlowNode
id: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
name: retrieve_diff_flowstep
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches:
- next
subflow:
component_type: Flow
id: 9e7aed22-876c-4c32-9d44-20ee7ceb3771
name: Retrieve PR diff flow
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
start_node:
$component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
nodes:
- $component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
- $component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
- $component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
- $component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
control_flow_connections:
- component_type: ControlFlowEdge
id: 60dc14b8-d9b9-4aec-a958-9f3676848f48
name: start_step_to_get_pr_diff_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
from_branch: null
to_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
- component_type: ControlFlowEdge
id: 500f97de-78b1-42e0-944c-0375dfca734e
name: get_pr_diff_to_extract_into_list_of_file_diff_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
from_branch: null
to_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
- component_type: ControlFlowEdge
id: 22d0cf0d-8edb-4b04-8f54-a234f5705360
name: extract_into_list_of_file_diff_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
from_branch: null
to_node:
$component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
data_flow_connections:
- component_type: DataFlowEdge
id: 106e3740-de45-4472-8168-2873ae1dbc82
name: start_step_$repo_dirpath_io_to_get_pr_diff_$repo_dirpath_io_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
source_output: $repo_dirpath_io
destination_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
destination_input: $repo_dirpath_io
- component_type: DataFlowEdge
id: a32cbb1c-eafe-4138-80e2-2cf2e1248312
name: get_pr_diff_$raw_pr_diff_to_extract_into_list_of_file_diff_$raw_pr_diff_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
source_output: $raw_pr_diff
destination_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
destination_input: $raw_pr_diff
- component_type: DataFlowEdge
id: 3ef5dcf4-acdf-4962-8df6-07b53f249e18
name: get_pr_diff_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
source_output: $raw_pr_diff
destination_node:
$component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
destination_input: $raw_pr_diff
- component_type: DataFlowEdge
id: 08cbca39-e591-4cf4-9057-ae67938d9557
name: extract_into_list_of_file_diff_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
source_output: $file_diff_list
destination_node:
$component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
destination_input: $file_diff_list
$referenced_components:
5c73da9c-6ba9-44ce-aab1-212a78d0a720:
component_type: ExtendedToolNode
id: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
name: get_pr_diff
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $raw_pr_diff
branches:
- next
tool:
component_type: ServerTool
id: 275aaf19-cdd4-4ed7-a436-e53f922cd740
name: local_get_pr_diff_tool
description: '# docs-skiprow
Retrieves code diff with a git command given the # docs-skiprow
path to the repository root folder. # docs-skiprow'
metadata:
__metadata_info__: {}
inputs:
- type: string
title: repo_dirpath
outputs:
- type: string
title: tool_output
input_mapping:
repo_dirpath: $repo_dirpath_io
output_mapping:
tool_output: $raw_pr_diff
raise_exceptions: true
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
4fcb7ebe-325b-446d-a46b-59187c30e260:
component_type: StartNode
id: 4fcb7ebe-325b-446d-a46b-59187c30e260
name: start_step
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $repo_dirpath_io
branches:
- next
cf841053-2414-48b6-ba6d-0f0f5e11044c:
component_type: PluginRegexNode
id: cf841053-2414-48b6-ba6d-0f0f5e11044c
name: extract_into_list_of_file_diff
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: raw text to extract information from
type: string
title: $raw_pr_diff
outputs:
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches:
- next
input_mapping:
text: $raw_pr_diff
output_mapping:
output: $file_diff_list
regex_pattern: (diff --git[\s\S]*?)(?=diff --git|$)
return_first_match_only: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
dd0e56ab-1267-4345-9f59-ecc053baf2af:
component_type: EndNode
id: dd0e56ab-1267-4345-9f59-ecc053baf2af
name: None End node
description: End node representing all transitions to None in the WayFlow
flow
metadata: {}
inputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
outputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches: []
branch_name: next
020c885e-6d0b-472a-bb91-246ab70ab1db:
component_type: StartNode
id: 020c885e-6d0b-472a-bb91-246ab70ab1db
name: __StartStep__
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $repo_dirpath_io
branches:
- next
a544af64-e63b-4ccf-9ab0-8d25cdbc0b93:
component_type: EndNode
id: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
name: None End node
description: End node representing all transitions to None in the WayFlow flow
metadata: {}
inputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches: []
branch_name: next
agentspec_version: 25.4.1
You can then load the configuration back to an assistant using the AgentSpecLoader.
from wayflowcore.agentspec import AgentSpecLoader
tool_registry = {
"local_get_pr_diff_tool": local_get_pr_diff_tool,
"format_git_diff": format_git_diff,
}
assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)
Note
This guide uses the following extension/plugin Agent Spec components:
PluginOutputMessageNodePluginExtractNodePluginRegexNodeExtendedLlmNodeExtendedToolNodeExtendedMapNode
See the list of available Agent Spec extension/plugin components in the API Reference
Recap#
In this tutorial you learned how to build a simple PR bot using WayFlow Flows, and learned:
How to use core steps such as the OutputMessageStep and PromptExecutionStep.
How to build and execute tools using the ServerTool and the ToolExecutionStep.
How to extract information using the RegexExtractionStep and the ExtractValueFromJsonStep.
How to apply a sub flow over an iterable data using the MapStep.
Finally, you learned how to structure code when building assistant as code and how to execute and combine sub flows to build complex assistant.
This is an example of the kind of fully featured tool that you can build with WayFlow.
Next Steps#
Now that you learned how to build a PR reviewing assistant, you may want to check our other guides such as:
Full Code#
Click on the card at the top of this page to download the full code for this guide or copy the code below.
1# Copyright © 2025 Oracle and/or its affiliates.
2#
3# This software is under the Apache License 2.0
4# %%[markdown]
5# Tutorial - Build a Simple Code Review Assistant
6# -----------------------------------------------
7
8# How to use:
9# Create a new Python virtual environment and install the latest WayFlow version.
10# ```bash
11# python -m venv venv-wayflowcore
12# source venv-wayflowcore/bin/activate
13# pip install --upgrade pip
14# pip install "wayflowcore==26.1.2"
15# ```
16
17# You can now run the script
18# 1. As a Python file:
19# ```bash
20# python usecase_prbot.py
21# ```
22# 2. As a Notebook (in VSCode):
23# When viewing the file,
24# - press the keys Ctrl + Enter to run the selected cell
25# - or Shift + Enter to run the selected cell and move to the cell below# (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0) or Universal Permissive License
26# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl), at your option.
27
28# nosec
29
30
31from types import MethodType
32from typing import Dict, List
33
34
35# %%[markdown]
36## Define the LLM
37
38# %%
39from wayflowcore.models import VllmModel
40
41llm = VllmModel(
42 model_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
43 host_port="VLLM_HOST_PORT",
44)
45
46# %%[markdown]
47## Define the tool that retrieves the PR diff
48
49# %%
50from wayflowcore.tools import tool
51
52
53@tool(description_mode="only_docstring")
54def local_get_pr_diff_tool(repo_dirpath: str) -> str:
55 """
56 Retrieves code diff with a git command given the
57 path to the repository root folder.
58 """
59 import subprocess # nosec: documentation example invoking git locally
60
61 result = subprocess.run(
62 ["git", "diff", "HEAD"],
63 capture_output=True,
64 cwd=repo_dirpath,
65 text=True,
66 ) # nosec: documentation example invoking git locally
67 return result.stdout.strip()
68
69
70# %%[markdown]
71## Define a mocked PR diff
72
73# %%
74MOCK_DIFF = """
75diff --git src://calculators/utils.py dst://calculators/utils.py
76index 12345678..90123456 100644
77--- src://calculators/utils.py
78+++ dst://calculators/utils.py
79@@ -10,6 +10,15 @@
80
81 def calculate_total(data):
82 # TODO: implement tax calculation
83 return data
84
85+def get_items(items=[]):
86+ result = []
87+ for item in items:
88+ result.append(item * 2)
89+ return result
90+
91+def process_numbers(numbers):
92+ res = []
93+ for x in numbers:
94+ res.append(x + 1)
95+ return res
96+
97 def calculate_average(numbers):
98 return sum(numbers) / len(numbers)
99
100
101diff --git src://example/utils.py dst://example/utils.py
102index 000000000..123456789
103--- /dev/null
104+++ dst://example/utils.py
105@@ -0,0 +1,20 @@
106+# Copyright © 2024 Oracle and/or its affiliates.
107+
108+def calculate_sum(numbers=[]):
109+ total = 0
110+ for num in numbers:
111+ total += num
112+ return total
113+
114+
115+def process_data(data):
116+ # TODO: Handle exceptions here
117+ result = data * 2
118+ return result
119+
120+
121+def main():
122+ numbers = [1, 2, 3, 4, 5]
123+ result = calculate_sum(numbers)
124+ print("Sum:", result)
125+ data = 10
126+ processed_data = process_data(data)
127+ print("Processed Data:", processed_data)
128+
129+
130+if __name__ == "__main__":
131+ main()
132""".strip()
133
134
135
136# %%[markdown]
137## Create the flow that retrieves the diff of a PR
138
139# %%
140from wayflowcore.controlconnection import ControlFlowEdge
141from wayflowcore.dataconnection import DataFlowEdge
142from wayflowcore.flow import Flow
143from wayflowcore.property import StringProperty
144from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
145
146# IO Variable Names
147REPO_DIRPATH_IO = "$repo_dirpath_io"
148PR_DIFF_IO = "$raw_pr_diff"
149FILE_DIFF_LIST_IO = "$file_diff_list"
150
151# Define the steps
152
153start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
154
155# Step 1: Retrieve the pull request diff using the local tool
156get_pr_diff_step = ToolExecutionStep(
157 name="get_pr_diff",
158 tool=local_get_pr_diff_tool,
159 raise_exceptions=True,
160 input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
161 output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
162)
163
164# Step 2: Extract the file diffs from the raw diff using a regular expression
165extract_into_list_of_file_diff_step = RegexExtractionStep(
166 name="extract_into_list_of_file_diff",
167 regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
168 return_first_match_only=False,
169 input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
170 output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
171)
172
173# Define the sub flow
174retrieve_diff_subflow = Flow(
175 name="Retrieve PR diff flow",
176 begin_step=start_step,
177 control_flow_edges=[
178 ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
179 ControlFlowEdge(
180 source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
181 ),
182 ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
183 ],
184 data_flow_edges=[
185 DataFlowEdge(
186 source_step=start_step,
187 source_output=REPO_DIRPATH_IO,
188 destination_step=get_pr_diff_step,
189 destination_input=REPO_DIRPATH_IO,
190 ),
191 DataFlowEdge(
192 source_step=get_pr_diff_step,
193 source_output=PR_DIFF_IO,
194 destination_step=extract_into_list_of_file_diff_step,
195 destination_input=PR_DIFF_IO,
196 ),
197 ],
198)
199
200
201# %%[markdown]
202## Alternative step that retrieves the PR diff through an API call
203
204# %%
205from wayflowcore.retrypolicy import RetryPolicy
206from wayflowcore.steps import ApiCallStep
207
208# IO Variable Names
209USER_PROVIDED_TOKEN_IO = "$user_provided_token" # nosec: placeholder IO variable name
210REPO_WORKSPACE_IO = "$repo_workspace"
211REPO_SLUG_IO = "$repo_slug"
212PULL_REQUEST_ID_IO = "$pull_request_id"
213PR_DIFF_IO = "$raw_pr_diff"
214
215get_pr_diff_step = ApiCallStep(
216 url="https://example.com/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}.diff",
217 method="GET",
218 headers={"Authorization": "Bearer {{token}}"},
219 ignore_bad_http_requests=False,
220 retry_policy=RetryPolicy(max_attempts=2),
221 store_response=True,
222 input_mapping={
223 "token": USER_PROVIDED_TOKEN_IO,
224 "workspace": REPO_WORKSPACE_IO,
225 "repo_slug": REPO_SLUG_IO,
226 "pr_id": PULL_REQUEST_ID_IO,
227 },
228 output_mapping={ApiCallStep.HTTP_RESPONSE: PR_DIFF_IO},
229)
230
231
232# %%[markdown]
233## Test the flow that retrieves the PR diff
234
235# %%
236from wayflowcore.executors.executionstatus import FinishedStatus
237
238# Replace the path below with the path to your actual codebase sample git repository.
239PATH_TO_DIR = "path/to/repository_root"
240
241test_conversation = retrieve_diff_subflow.start_conversation(
242 inputs={
243 REPO_DIRPATH_IO: PATH_TO_DIR,
244 }
245)
246
247execution_status = test_conversation.execute()
248
249if not isinstance(execution_status, FinishedStatus):
250 raise ValueError("Unexpected status type")
251
252FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
253
254print(FILE_DIFF_LIST[0])
255
256
257# %%[markdown]
258## Define the tool that formats the diff for the LLM
259
260# %%
261PR_BOT_CHECKS = [
262 """
263Name: TODO_WITHOUT_TICKET
264Description: TODO comments should reference a ticket number for tracking.
265Example code:
266```python
267# TODO: Add validation here
268def process_user_input(data):
269 return data
270```
271Example comment:
272[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
273""",
274 """
275Name: MUTABLE_DEFAULT_ARGUMENT
276Description: Using mutable objects as default arguments can lead to unexpected behavior.
277Example code:
278```python
279def add_item(item, items=[]):
280 items.append(item)
281 return items
282```
283Example comment:
284[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
285""",
286 """
287Name: NON_DESCRIPTIVE_NAME
288Description: Variable names should clearly indicate their purpose or content.
289Example code:
290```python
291def process(lst):
292 res = []
293 for i in lst:
294 res.append(i * 2)
295 return res
296```
297Example comment:
298[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
299""",
300]
301
302CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
303
304PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
305
306## Context
307The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
308After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
309
310## Input
311### Code diff
312{{diff}}
313
314## Task
315Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
316The rules are the following:
317
318{{checks}}
319
320### Response Format
321You need to return a review as a json as follows:
322```json
323[
324 {
325 "content": "the comment as a text",
326 "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
327 "line": "line number where the comment applies"
328 },
329 …
330]
331```
332Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
333If there are no comments and the code seems fine, just output an empty JSON list."""
334
335
336@tool(description_mode="only_docstring")
337def format_git_diff(diff_text: str) -> str:
338 """
339 Formats a git diff by adding line numbers to each line except removal lines.
340 """
341
342 def pad_number(number: int, width: int) -> str:
343 """Right-align a number with specified width using space padding."""
344 return str(number).rjust(width)
345
346 LINE_NUMBER_WIDTH = 5
347 PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
348 current_line_number = 0
349 formatted_lines = []
350
351 for line in diff_text.split("\n"):
352 # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
353 if line.startswith("@@"):
354 try:
355 # Extract the starting line number and line count
356 _, position_info, _ = line.split("@@")
357 new_file_info = position_info.split()[1][1:] # Remove the '+' prefix
358 start_line, line_count = map(int, new_file_info.split(","))
359
360 current_line_number = start_line
361 formatted_lines.append(line)
362 continue
363
364 except (ValueError, IndexError):
365 raise ValueError(f"Invalid diff header format: {line}")
366
367 # Handle content lines
368 if current_line_number > 0 and line:
369 if not line.startswith("-"):
370 # Add line number for added/context lines
371 line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
372 formatted_lines.append(f"{line_prefix} {line}")
373 current_line_number += 1
374 else:
375 # Just add padding for removal lines
376 formatted_lines.append(" " * PADDING_WIDTH + line)
377
378 return "\n".join(formatted_lines)
379
380
381# %%[markdown]
382## Create the flow that generates review comments
383
384# %%
385from wayflowcore._utils._templating_helpers import render_template_partially
386from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
387from wayflowcore.steps import (
388 ExtractValueFromJsonStep,
389 MapStep,
390 OutputMessageStep,
391 PromptExecutionStep,
392 ToolExecutionStep,
393)
394
395# IO Variable Names
396DIFF_TO_STRING_IO = "$diff_to_string"
397DIFF_WITH_LINES_IO = "$diff_with_lines"
398FILEPATH_IO = "$filename"
399JSON_COMMENTS_IO = "$json_comments"
400EXTRACTED_COMMENTS_IO = "$extracted_comments"
401NESTED_COMMENT_LIST_IO = "$nested_comment_list"
402FILEPATH_LIST_IO = "$filepath_list"
403
404# Define the steps
405
406# Step 1: Format the diff to a string
407format_diff_to_string_step = OutputMessageStep(
408 name="format_diff_to_string",
409 message_template="{{ message | string }}",
410 output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
411)
412
413# Step 2: Add lines on the diff using a tool
414add_lines_on_diff_step = ToolExecutionStep(
415 name="add_lines_on_diff",
416 tool=format_git_diff,
417 input_mapping={"diff_text": DIFF_TO_STRING_IO},
418 output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
419)
420
421# Step 3: Extract the file path from the diff string using a regular expression
422extract_file_path_step = RegexExtractionStep(
423 name="extract_file_path",
424 regex_pattern=r"diff --git src://(.+?) dst://",
425 return_first_match_only=True,
426 input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
427 output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
428)
429
430# Step 4: Generate comments using a prompt
431generate_comments_step = PromptExecutionStep(
432 name="generate_comments",
433 prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
434 llm=llm,
435 input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
436 output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
437)
438
439# Step 5: Extract comments from the JSON output
440# Define the value type for extracted comments
441comments_valuetype = ListProperty(
442 name="values",
443 description="The extracted comments content and line number",
444 item_type=DictProperty(value_type=AnyProperty()),
445 default_value=[],
446)
447extract_comments_from_json_step = ExtractValueFromJsonStep(
448 name="extract_comments_from_json",
449 output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
450 retry=True,
451 llm=llm,
452 input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
453 output_mapping={"values": EXTRACTED_COMMENTS_IO},
454)
455
456# Define the sub flow to generate comments for each file diff
457generate_comments_subflow = Flow(
458 name="Generate review comments flow",
459 begin_step=format_diff_to_string_step,
460 control_flow_edges=[
461 ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
462 ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
463 ControlFlowEdge(extract_file_path_step, generate_comments_step),
464 ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
465 ControlFlowEdge(extract_comments_from_json_step, None),
466 ],
467 data_flow_edges=[
468 DataFlowEdge(
469 format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
470 ),
471 DataFlowEdge(
472 format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
473 ),
474 DataFlowEdge(
475 add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
476 ),
477 DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
478 DataFlowEdge(
479 generate_comments_step,
480 JSON_COMMENTS_IO,
481 extract_comments_from_json_step,
482 JSON_COMMENTS_IO,
483 ),
484 ],
485)
486
487# Use the MapStep to apply the sub flow to each file
488for_each_file_step = MapStep(
489 flow=generate_comments_subflow,
490 unpack_input={"message": "."},
491 input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
492 output_descriptors=[
493 ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
494 ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
495 ],
496 output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
497)
498
499generate_all_comments_subflow = Flow.from_steps([for_each_file_step])
500
501
502# %%[markdown]
503## Test the flow that generates review comments
504
505# %%
506# we reuse the FILE_DIFF_LIST from the previous test
507test_conversation = generate_all_comments_subflow.start_conversation(
508 inputs={
509 FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
510 }
511)
512
513execution_status = test_conversation.execute()
514
515if not isinstance(execution_status, FinishedStatus):
516 raise ValueError("Unexpected status type")
517
518NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
519FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
520print(NESTED_COMMENT_LIST[0])
521print(FILEPATH_LIST)
522
523
524
525# %%[markdown]
526## Create tool that formats the review comments
527
528# %%
529@tool(description_mode="only_docstring")
530def flatten_information(
531 nested_comments_list: List[List[Dict[str, str]]], filepath_list: List[str]
532) -> List[Dict[str, str]]:
533 """Flattens information from comments and filepaths."""
534 if len(nested_comments_list) != len(filepath_list):
535 raise ValueError(
536 f"Inconsistent list lengths ({len(nested_comments_list)=} and {len(filepath_list)=})"
537 )
538
539 result: List[Dict[str, str]] = []
540 for comments_list, filepath in zip(nested_comments_list, filepath_list):
541 for comment_dict in comments_list:
542 result.append(
543 {
544 **{key: str(value) for key, value in comment_dict.items()},
545 "path": filepath,
546 }
547 )
548
549 return result
550
551
552# %%[markdown]
553## Create flow that posts review comments to bitbucket
554
555# %%
556import json
557
558# IO Values
559PR_POST_URL_IO = "$pr_post_url"
560FLATTENED_COMMENT_LIST_IO = "$flattened_comment_list"
561FINAL_HTTP_CODES_IO = "$http_codes"
562
563# Define the steps
564
565# Step 1: Flatten the generated comments into a list of comments
566flatten_nested_comments_list_step = ToolExecutionStep(
567 name="flatten_nested_comment_list",
568 tool=flatten_information,
569 input_mapping={
570 "nested_comments_list": NESTED_COMMENT_LIST_IO,
571 "filepath_list": FILEPATH_LIST_IO,
572 },
573 output_mapping={ToolExecutionStep.TOOL_OUTPUT: FLATTENED_COMMENT_LIST_IO},
574)
575
576# Step 2: Post the comments to bitbucket
577post_comment_step = ApiCallStep(
578 url="https://example.com/rest/api/latest/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}/comments?diffType=EFFECTIVE&markup=true&avatarSize=48",
579 method="POST",
580 data=json.dumps(
581 {
582 "text": "{{content}}",
583 "severity": "NORMAL",
584 "anchor": {
585 "diffType": "EFFECTIVE",
586 "path": "{{path}}",
587 "lineType": "ADDED",
588 "line": "{{line | int}}",
589 "fileType": "TO",
590 },
591 }
592 ),
593 headers={"Accept": "application/json", "Authorization": "Bearer {{token}}"},
594 ignore_bad_http_requests=False,
595 retry_policy=RetryPolicy(max_attempts=2),
596 store_response=True,
597 input_mapping={
598 "token": USER_PROVIDED_TOKEN_IO,
599 "workspace": REPO_WORKSPACE_IO,
600 "repo_slug": REPO_SLUG_IO,
601 "pr_id": PULL_REQUEST_ID_IO,
602 },
603)
604
605post_comments_mapstep = MapStep(
606 name="post_comment",
607 flow=Flow.from_steps([post_comment_step]),
608 unpack_input={"content": ".content", "line": ".line", "path": ".path"},
609 input_mapping={MapStep.ITERATED_INPUT: FLATTENED_COMMENT_LIST_IO},
610 output_descriptors=[ApiCallStep.HTTP_STATUS_CODE],
611 output_mapping={ApiCallStep.HTTP_STATUS_CODE: FINAL_HTTP_CODES_IO},
612)
613
614post_comments_subflow = Flow(
615 name="Post comments to PR flow",
616 begin_step=flatten_nested_comments_list_step,
617 control_flow_edges=[
618 ControlFlowEdge(flatten_nested_comments_list_step, post_comments_mapstep),
619 ControlFlowEdge(post_comments_mapstep, None),
620 ],
621 data_flow_edges=[
622 DataFlowEdge(
623 flatten_nested_comments_list_step,
624 FLATTENED_COMMENT_LIST_IO,
625 post_comments_mapstep,
626 FLATTENED_COMMENT_LIST_IO,
627 )
628 ],
629)
630from wayflowcore.steps.step import StepResult
631
632
633async def _mock_api_post_step_invoke(self, inputs, conversation):
634 output_values = {ApiCallStep.HTTP_RESPONSE: MOCK_DIFF, ApiCallStep.HTTP_STATUS_CODE: 200}
635 return StepResult(
636 outputs=output_values,
637 )
638
639
640post_comment_step.invoke_async = MethodType(_mock_api_post_step_invoke, post_comment_step)
641
642
643# %%[markdown]
644## Test flow that posts review comments
645
646# %%
647# we reuse the NESTED_COMMENT_LIST and FILEPATH_LIST from the previous test
648
649test_conversation = post_comments_subflow.start_conversation(
650 inputs={
651 USER_PROVIDED_TOKEN_IO: "MY_TOKEN",
652 REPO_WORKSPACE_IO: "MY_REPO_WORKSPACE",
653 REPO_SLUG_IO: "MY_REPO_SLUG",
654 PULL_REQUEST_ID_IO: "MY_REPO_ID",
655 NESTED_COMMENT_LIST_IO: NESTED_COMMENT_LIST,
656 FILEPATH_LIST_IO: FILEPATH_LIST,
657 }
658)
659execution_status = test_conversation.execute()
660
661if not isinstance(execution_status, FinishedStatus):
662 raise ValueError("Unexpected status type")
663
664FINAL_HTTP_CODES = execution_status.output_values[FINAL_HTTP_CODES_IO]
665print(FINAL_HTTP_CODES)
666
667
668# %%[markdown]
669## Create flow that performs the review
670
671# %%
672from wayflowcore.steps import FlowExecutionStep
673
674
675# Steps
676retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
677generate_all_comments_flowstep = FlowExecutionStep(
678 name="generate_comments_flowstep",
679 flow=generate_all_comments_subflow,
680)
681
682pr_bot = Flow(
683 name="PR bot flow",
684 begin_step=retrieve_diff_flowstep,
685 control_flow_edges=[
686 ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
687 ControlFlowEdge(generate_all_comments_flowstep, None),
688 ],
689 data_flow_edges=[
690 DataFlowEdge(
691 retrieve_diff_flowstep,
692 FILE_DIFF_LIST_IO,
693 generate_all_comments_flowstep,
694 FILE_DIFF_LIST_IO,
695 )
696 ],
697)
698
699
700# %%[markdown]
701## Tests flow that performs the review
702
703# %%
704# Replace the path below with the path to your actual codebase sample git repository.
705PATH_TO_DIR = "path/to/repository_root"
706
707conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
708
709execution_status = conversation.execute()
710
711if not isinstance(execution_status, FinishedStatus):
712 raise ValueError("Unexpected status type")
713
714print(execution_status.output_values)
715
716NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
717
718
719# %%[markdown]
720## Export config to Agent Spec
721
722# %%
723from wayflowcore.agentspec import AgentSpecExporter
724
725serialized_assistant = AgentSpecExporter().to_json(pr_bot)
726
727
728# %%[markdown]
729## Load Agent Spec config
730
731# %%
732from wayflowcore.agentspec import AgentSpecLoader
733
734tool_registry = {
735 "local_get_pr_diff_tool": local_get_pr_diff_tool,
736 "format_git_diff": format_git_diff,
737}
738
739assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)