Build a Simple Code Review Assistant#
Prerequisites
This guide does not assume any prior knowledge about Project WayFlow. However, it assumes the reader has a basic knowledge of LLMs.
You will need a working installation of WayFlow - see Installation.
Learning goals#
In this use-case tutorial, you will build a more advanced WayFlow application, a Pull Request (PR) Reviewing Assistant, using a WayFlow Flow to automate basic reviews of Python source code.
In this tutorial you will:
Learn the basics of using Flows to build an assistant.
Learn how to compose multiple sub-flows to create a more complex Flow.
Learn more about building Tools that can be used within your Flows.
You can download a Jupyter Notebook for this use-case to follow along from Code PR Review Bot Tutorial
.
Introduction to the task#
Code reviews are crucial for maintaining code quality and reviewers often spend considerable time pointing out routine issues such as the presence of debug statements, formatting inconsistencies, or common coding convention violations that may not be fully captured by static code analysis tools. This consumes valuable time that could be spent on reviewing more important things such as the core logic, architecture, and business requirements.
Note
Building an agent with WayFlow to perform such code reviews has a number of advantages:
Review rules can be written using natural language, making an agent much more flexible than a simple static checker.
Writing rules in natural language makes updating the rules very easy.
More general issues can be captured. You can allow the LLM to infer from the rule to more general cases that could be missed by a simple static checker.
New review rules can be generated from the collected comments of existing PRs.
In this tutorial, you will create a WayFlow Flow assistant designed to scan Python pull requests for common oversights such as:
Having TODO comments without associated tickets.
Using unclear or ambiguous variable naming.
Using risky Python code practices such as mutable defaults.
To build this assistant you will break the task into configuration and two sub-flows that will be composed into a single flow:
Configure your application, choose an LLM and import required modules [Part 1].
The first sub-flow retrieves and diffs information from a local codebase in a Git repository [Part 2].
The second sub-flow iterates over the file diffs using a MapStep and generates comments with an LLM using the PromptExecutionStep [Step 3].
You will also learn how to extract information using the RegexExtractionStep and the ExtractValueFromJsonStep, and how to build and execute tools with the ServerTool and the ToolExecutionStep.
Note
This is not a production-ready code review assistant that can be used as-is.
Setup#
First, let’s set up the environment. For this tutorial you need to have wayflowcore
installed (for additional information please read the
installation guide).
Next download the example codebase Git repository, example codebase Git repository
. This will be used
to generate the sample code diffs for the assistant to review.
Extract the codebase Git repository folder from the compressed archive. Make a note of where the codebase Git repository is extracted to.
Part 1: Imports and LLM configuration#
First, set up the environment. For this tutorial you need to have wayflowcore
installed, for additional information, read the
installation guide.
WayFlow supports several LLMs API providers. To learn more about the supported LLM providers, read the guide, how to use LLMs from different providers.
First choose an LLM from one of the options below:
from wayflowcore.models import OCIGenAIModel
if __name__ == "__main__":
llm = OCIGenAIModel(
model_id="provider.model-id",
service_endpoint="https://url-to-service-endpoint.com",
compartment_id="compartment-id",
auth_type="API_KEY",
)
from wayflowcore.models import VllmModel
llm = VllmModel(
model_id="model-id",
host_port="VLLM_HOST_PORT",
)
from wayflowcore.models import OllamaModel
llm = OllamaModel(
model_id="model-id",
)
Note
API keys should never be stored in code. Use environment variables and/or tools such as python-dotenv instead.
Be cautious when using external LLM providers and ensure that you comply with your organization’s security policies and any applicable laws and regulations. Consider using a self-hosted LLM solution or a provider that offers on-premises deployment options if you need to maintain strict control over your code and data.
Part 2: Retrieve the PR diff information#
The first phase of the assistant requires retrieving information about the code diffs from a code repository. You have already extracted the sample codebase Git repository to your local environment.
This will be a sub-flow that consists of two simple steps:
ToolExecutionStep that collects PR diff information using a Python subprocess to run the Git command.
RegexExtractionStep which separates the raw diff information into diffs for each file.
First, take a look at what a diff looks like. The following example shows how a real diff appears when using Git:
MOCK_DIFF = """
diff --git src://calculators/utils.py dst://calculators/utils.py
index 12345678..90123456 100644
--- src://calculators/utils.py
+++ dst://calculators/utils.py
@@ -10,6 +10,15 @@
def calculate_total(data):
# TODO: implement tax calculation
return data
+def get_items(items=[]):
+ result = []
+ for item in items:
+ result.append(item * 2)
+ return result
+
+def process_numbers(numbers):
+ res = []
+ for x in numbers:
+ res.append(x + 1)
+ return res
+
def calculate_average(numbers):
return sum(numbers) / len(numbers)
diff --git src://example/utils.py dst://example/utils.py
index 000000000..123456789
--- /dev/null
+++ dst://example/utils.py
@@ -0,0 +1,20 @@
+# Copyright © 2024 Oracle and/or its affiliates.
+
+def calculate_sum(numbers=[]):
+ total = 0
+ for num in numbers:
+ total += num
+ return total
+
+
+def process_data(data):
+ # TODO: Handle exceptions here
+ result = data * 2
+ return result
+
+
+def main():
+ numbers = [1, 2, 3, 4, 5]
+ result = calculate_sum(numbers)
+ print("Sum:", result)
+ data = 10
+ processed_data = process_data(data)
+ print("Processed Data:", processed_data)
+
+
+if __name__ == "__main__":
+ main()
""".strip()
Reading a diff: Removals are identified by the “-” marks and additions by the “+” marks. In this example, there were only additions.
The diff above contains information about two files, calculators/utils.py
and example/utils.py
.
This is an example diff and it is different from the diff that will be generated from the sample codebase.
It is included here to show how a Git diff looks and is shorter than the diff that you generate from the sample codebase.
Build a tool#
You need to create a tool to extract a code diff from the local code repository. The @tool decorator can be used for that purpose by simply wrapping a Python function.
The function, local_get_pr_diff_tool
, in the code below does the work of extracting the diffs by
running the git diff HEAD
shell command and capturing the output. It uses a subprocess to run the shell command.
To turn this function into a WayFlow tool, a @tool
annotation is used to create a ServerTool from the function.
1from wayflowcore.tools import tool
2
3
4@tool(description_mode="only_docstring")
5def local_get_pr_diff_tool(repo_dirpath: str) -> str:
6 """
7 Retrieves code diff with a git command given the
8 path to the repository root folder.
9 """
10 import subprocess
11
12 result = subprocess.run(
13 ["git", "diff", "HEAD"],
14 capture_output=True,
15 cwd=repo_dirpath,
16 text=True,
17 )
18 return result.stdout.strip()
Building the steps and the sub-flow#
Let’s write the code for the first sub-flow.
1from wayflowcore.controlconnection import ControlFlowEdge
2from wayflowcore.dataconnection import DataFlowEdge
3from wayflowcore.flow import Flow
4from wayflowcore.property import StringProperty
5from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
6
7# IO Variable Names
8REPO_DIRPATH_IO = "$repo_dirpath_io"
9PR_DIFF_IO = "$raw_pr_diff"
10FILE_DIFF_LIST_IO = "$file_diff_list"
11
12# Define the steps
13
14start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
15
16# Step 1: Retrieve the pull request diff using the local tool
17get_pr_diff_step = ToolExecutionStep(
18 name="get_pr_diff",
19 tool=local_get_pr_diff_tool,
20 raise_exceptions=True,
21 input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
22 output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
23)
24
25# Step 2: Extract the file diffs from the raw diff using a regular expression
26extract_into_list_of_file_diff_step = RegexExtractionStep(
27 name="extract_into_list_of_file_diff",
28 regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
29 return_first_match_only=False,
30 input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
31 output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
32)
33
34# Define the sub flow
35retrieve_diff_subflow = Flow(
36 name="Retrieve PR diff flow",
37 begin_step=start_step,
38 control_flow_edges=[
39 ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
40 ControlFlowEdge(
41 source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
42 ),
43 ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
44 ],
45 data_flow_edges=[
46 DataFlowEdge(
47 source_step=start_step,
48 source_output=REPO_DIRPATH_IO,
49 destination_step=get_pr_diff_step,
50 destination_input=REPO_DIRPATH_IO,
51 ),
52 DataFlowEdge(
53 source_step=get_pr_diff_step,
54 source_output=PR_DIFF_IO,
55 destination_step=extract_into_list_of_file_diff_step,
56 destination_input=PR_DIFF_IO,
57 ),
58 ],
59)
API Reference: Flow | RegexExtractionStep | ToolExecutionStep | API Reference: tool
The code does the following:
It lists the names of the steps and input/output variables for the sub-flow.
It then creates the different steps within the sub-flow.
Finally, it instantiates the sub-flow. This will be covered in more detail later in the tutorial.
For clarity, the variable names are also prefixed with a dollar ($) sign. This is not necessary and is only done for code clarity. The variable
REPO_DIRPATH_IO
is used to hold the file path to the sample codebase Git repository and you will use this to pass in the location of the
codebase Git repository.
Additionally, you can give explicit names to the input/output variables used in the Flow, e.g. “$repo_dirpath_io” for the variable holding the
path to the local repository. Finally, we define those explicit names as string variables (e.g. REPO_DIRPATH_IO
) to minimize the number of
magic strings in the code.
See also
To learn about the basics of Flows, check out our, introductory tutorial on WayFlow Flows.
Now take a look at each of the steps used in the sub-flow in more detail.
Get the PR diff, get_pr_diff_step
#
This uses a ToolExecutionStep
to gather the diff information - see the notes on how this is done earlier. When creating it, you need to
provide the following:
tool
: Specifies the tool that will called within the step. This is the tool that was created earlier,local_get_pr_diff_tool
.raise_exceptions
: Whether to raise exceptions generated by the tool that is called. Here it is set toTrue
and so exceptions will be raised.input_mapping
: Specifies the names used for the input parameters of the step. See ToolExecutionStep for more details on using aninput_mapping
with this type of step.output_mapping
: Specifies the name used foe the output parameter of the step. The name held inPR_DIFF_IO
will be mapped to the name for the output parameter of the step. Again, see ToolExecutionStep for more details on using anoutput_mapping
with this type of step.
Extract file diffs into a list, extract_into_list_of_file_diff_step
#
You now have the diff information from the PR. This step performs a regex extraction on the raw diff text to extract the code to review.
Use a RegexExtractionStep
to perform this action. When creating the step, you need to provide the following:
regex_pattern
: The regex pattern for the extraction. This usesre.findall
underneath.return_first_match_only
: You want to return all results, so set this toFalse
.input_mapping
: Specifies the names used for the input parameters of the step. The input parameter will be mapped to the name, held inPR_DIFF_IO
. See RegexExtractionStep for more details on using aninput_mapping
with this type of step.output_mapping
: Specifies the name used for the output parameter of the step. Here, the default nameRegexExtractionStep.TEXT
is renamed to the name defined inPR_DIFF_IO
. Again, see RegexExtractionStep for more details on using anoutput_mapping
with this type of step.
About the pattern:
(diff --git[\s\S]*?)(?=diff --git|$)
The pattern looks for text starting with diff --git
, followed by any characters (both whitespace [s] and non-whitespace [S]), until it
encounters either another diff --git
or the end of the text ($). However, it does not include the next diff --git
or the end in the match.
The *? makes it “lazy” or non-greedy, meaning it takes the shortest possible match, rather than the longest.
Tip
Recent Large Language Models are very helpful tools to create, debug and explain Regex patterns given a natural language description.
Finally, create the sub-flow using the Flow class. You specify the steps in the Flow, the starting step of the Flow, the transitions between steps and how data, from the variables, is to pass from one step to the next.
The transitions between steps are defined with ControlFlowEdges. These take a source step and a destination step. Each
ControlFlowEdge
maps one such transition.
Passing values between steps is a very common occurrence when building Flows. This is done using DataFlowEdges which define that a value is passed from one step to another.
Inputs to a step will most commonly be for parameters within a Jinja template, of which there are several examples of in this tutorial, or parameters to callables used by tools. In a DataFlowEdge you can use the name of the parameter, a string, to act as the destination of a value that is being passed in. It is often less error-prone if you create a variable that is set to the name.
Similarly, when a value is the output of a step, such as when a user’s input is captured in an InputMessageStep, the value is
available as a property of the step, for example InputMessageStep.USER_PROVIDED_INPUT
. But, it lacks a meaningful name, so it is often helpful to
specify one. This is done using an output_mapping
when creating the step. Again, you will want to create a variable to hold the name to avoid
errors.
Defining a Flow#
Defining the Flow is the last step in the code shown above. There are a couple of things that are worth highlighting:
begin_step
: A start step needs to be defined for a Flow.control_flow_edges
: The transitions between the steps in the Flow are defined as ControlFlowEdges. They have asource_step
, which defines the start of a transition, and adestination_step
, which defines the destination of a transition. All transitions for the flow will need to be defined.data_flow_edges
: Maps the variables between steps connected by a transition using DataFlowEdges. It maps variables from a source step into variables in a destination step. You only need to do this for the variables that need to be passed between steps.
Testing the flow#
You can test this sub-flow by creating an assistant conversation with Flow.start_conversation()
and specifying the inputs,
in this case the location of the Git repository. The conversation can then be executed with Conversation.execute()
.
This returns an object that represents the status of the conversation which you can check to confirm that the conversation has successfully finished.
The code below shows how the inputs are passed in. Set the PATH_TO_DIR
to the actual path you extracted the sample codebase
Git repository to. You then extract the outputs from the conversation.
The full code for testing the sub-flow is shown below:
1from wayflowcore.executors.executionstatus import FinishedStatus
2
3# Replace the path below with the path to your actual codebase sample git repository.
4PATH_TO_DIR = "path/to/repository_root"
5
6test_conversation = retrieve_diff_subflow.start_conversation(
7 inputs={
8 REPO_DIRPATH_IO: PATH_TO_DIR,
9 }
10)
11
12execution_status = test_conversation.execute()
13
14if not isinstance(execution_status, FinishedStatus):
15 raise ValueError("Unexpected status type")
16
17FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
18
19print(FILE_DIFF_LIST[0])
API Reference: Flow
Part 3: Review the list of diffs#
Now that we have a list of diffs for each file, we can review them and generate comments using an LLM.
This task can be broken into a sub-flow made up of five steps:
OutputMessageStep: This converts the file diff list into a string to be processed by the following steps.
ToolExecutionStep: This prefixes the diffs with line numbers for additional context to the LLM.
RegexExtractionStep: This extracts the file path from the diff string.
PromptExecutionStep: This generates comments using the LLM based on a list of user-defined checks.
ExtractValueFromJsonStep: This extracts the comments and lines they apply to from the LLM output.
Build the tools and checks#
Before creating the steps and sub-flow to generate the comments, it is important to define the list of checks the assistant should perform, along with any specific instructions. Additionally, a tool must be created to prefix the diffs with line numbers, allowing the LLM to determine where to add comments.
Below is the full code to achieve this. It is broken into sections so that you can see, in detail, what is happening in each part.
1PR_BOT_CHECKS = [
2 """
3Name: TODO_WITHOUT_TICKET
4Description: TODO comments should reference a ticket number for tracking.
5Example code:
6```python
7# TODO: Add validation here
8def process_user_input(data):
9 return data
10```
11Example comment:
12[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
13""",
14 """
15Name: MUTABLE_DEFAULT_ARGUMENT
16Description: Using mutable objects as default arguments can lead to unexpected behavior.
17Example code:
18```python
19def add_item(item, items=[]):
20 items.append(item)
21 return items
22```
23Example comment:
24[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
25""",
26 """
27Name: NON_DESCRIPTIVE_NAME
28Description: Variable names should clearly indicate their purpose or content.
29Example code:
30```python
31def process(lst):
32 res = []
33 for i in lst:
34 res.append(i * 2)
35 return res
36```
37Example comment:
38[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
39""",
40]
41
42CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
43
44PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
45
46## Context
47The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
48After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
49
50## Input
51### Code diff
52{{diff}}
53
54## Task
55Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
56The rules are the following:
57
58{{checks}}
59
60### Reponse Format
61You need to return a review as a json as follows:
62```json
63[
64 {
65 "content": "the comment as a text",
66 "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
67 "line": "line number where the comment applies"
68 },
69 …
70]
71```
72Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
73If there are no comments and the code seems fine, just output an empty JSON list."""
74
75
76@tool(description_mode="only_docstring")
77def format_git_diff(diff_text: str) -> str:
78 """
79 Formats a git diff by adding line numbers to each line except removal lines.
80 """
81
82 def pad_number(number: int, width: int) -> str:
83 """Right-align a number with specified width using space padding."""
84 return str(number).rjust(width)
85
86 LINE_NUMBER_WIDTH = 5
87 PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
88 current_line_number = 0
89 formatted_lines = []
90
91 for line in diff_text.split("\n"):
92 # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
93 if line.startswith("@@"):
94 try:
95 # Extract the starting line number and line count
96 _, position_info, _ = line.split("@@")
97 new_file_info = position_info.split()[1][1:] # Remove the '+' prefix
98 start_line, line_count = map(int, new_file_info.split(","))
99
100 current_line_number = start_line
101 formatted_lines.append(line)
102 continue
103
104 except (ValueError, IndexError):
105 raise ValueError(f"Invalid diff header format: {line}")
106
107 # Handle content lines
108 if current_line_number > 0 and line:
109 if not line.startswith("-"):
110 # Add line number for added/context lines
111 line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
112 formatted_lines.append(f"{line_prefix} {line}")
113 current_line_number += 1
114 else:
115 # Just add padding for removal lines
116 formatted_lines.append(" " * PADDING_WIDTH + line)
117
118 return "\n".join(formatted_lines)
API Reference: ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep
Checks and LLM instructions#
You will use three simple checks that are shown below. For each check you specify a name, a description of what the LLM should be checking, as well as a code and expected comment example so that the LLM gets a better understanding of what the task is about.
The prompt uses a simple structure:
Role Definition: Define who/what you want the LLM to act as (e.g., “You are a very experienced code reviewer”).
Context Section: Provide relevant background information or specific circumstances that frame the task.
Input Section: Specify the exact information, data, or materials that the LLM will be provided with.
Task Section: Clearly state what you want the LLM to do with the input provided.
Response Format Section: Define how you want the response to be structured or formatted (e.g., bullet points, JSON, with XML tags, and so on).
The prompts are defined in the array, PR_BOT_CHECKS
. The individual prompts for the checks are then concatenated into a single string,
CONCATENATED_CHECKS
, so that it can be used inside the system prompt you will be passing to the LLM.
Define a system prompt, or prompt template, PROMPT_TEMPLATE
. It contains placeholders for the diff and the checks that will be replaced
when specialising the prompt for each diff.
Tip
How to write high-quality prompts
There is no consensus on what makes the best LLM prompt. However, it is noted that for recent LLMs, a great strategy to use to prompt an LLM is simply to be very specific about the task to be solved, giving enough context and explaining potential edge cases to consider.
Given a prompt, try to determine whether giving the set of instructions to an experienced colleague, that has no prior context about the task, to solve would be sufficient for them to get to the intended result.
Diff formatting tool#
You next need to create a tool using the ServerTool to format the diffs in a manner that makes them consumable
by the LLM. A tool, as you will have already seen, is a simple wrapper around a python
callable that makes it useable within a flow.
The function, format_git_diff
, in the code above does the work of formatting the diffs.
See also
For more information about WayFlow tools please read our guide, How to use tools.
Building the steps and the sub-flow#
With the prompts and diff formatting tool written you can now build the second sub-flow. This sub-flow will iterate over the diffs, generated previously, and then use an LLM to generate review comments from them.
1from wayflowcore._utils._templating_helpers import render_template_partially
2from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
3from wayflowcore.steps import (
4 ExtractValueFromJsonStep,
5 MapStep,
6 OutputMessageStep,
7 PromptExecutionStep,
8 ToolExecutionStep,
9)
10
11# IO Variable Names
12DIFF_TO_STRING_IO = "$diff_to_string"
13DIFF_WITH_LINES_IO = "$diff_with_lines"
14FILEPATH_IO = "$filename"
15JSON_COMMENTS_IO = "$json_comments"
16EXTRACTED_COMMENTS_IO = "$extracted_comments"
17NESTED_COMMENT_LIST_IO = "$nested_comment_list"
18FILEPATH_LIST_IO = "$filepath_list"
19
20# Define the steps
21
22# Step 1: Format the diff to a string
23format_diff_to_string_step = OutputMessageStep(
24 name="format_diff_to_string",
25 message_template="{{ message | string }}",
26 output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
27)
28
29# Step 2: Add lines on the diff using a tool
30add_lines_on_diff_step = ToolExecutionStep(
31 name="add_lines_on_diff",
32 tool=format_git_diff,
33 input_mapping={"diff_text": DIFF_TO_STRING_IO},
34 output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
35)
36
37# Step 3: Extract the file path from the diff string using a regular expression
38extract_file_path_step = RegexExtractionStep(
39 name="extract_file_path",
40 regex_pattern=r"diff --git a/(.+?) b/",
41 return_first_match_only=True,
42 input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
43 output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
44)
45
46# Step 4: Generate comments using a prompt
47generate_comments_step = PromptExecutionStep(
48 name="generate_comments",
49 prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
50 llm=llm,
51 input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
52 output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
53)
54
55# Step 5: Extract comments from the JSON output
56# Define the value type for extracted comments
57comments_valuetype = ListProperty(
58 name="values",
59 description="The extracted comments content and line number",
60 item_type=DictProperty(value_type=AnyProperty()),
61)
62extract_comments_from_json_step = ExtractValueFromJsonStep(
63 name="extract_comments_from_json",
64 output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
65 retry=True,
66 llm=llm,
67 input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
68 output_mapping={"values": EXTRACTED_COMMENTS_IO},
69)
70
71# Define the sub flow to generate comments for each file diff
72generate_comments_subflow = Flow(
73 name="Generate review comments flow",
74 begin_step=format_diff_to_string_step,
75 control_flow_edges=[
76 ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
77 ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
78 ControlFlowEdge(extract_file_path_step, generate_comments_step),
79 ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
80 ControlFlowEdge(extract_comments_from_json_step, None),
81 ],
82 data_flow_edges=[
83 DataFlowEdge(
84 format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
85 ),
86 DataFlowEdge(
87 format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
88 ),
89 DataFlowEdge(
90 add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
91 ),
92 DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
93 DataFlowEdge(
94 generate_comments_step,
95 JSON_COMMENTS_IO,
96 extract_comments_from_json_step,
97 JSON_COMMENTS_IO,
98 ),
99 ],
100)
101
102# Use the MapStep to apply the sub flow to each file
103for_each_file_step = MapStep(
104 flow=generate_comments_subflow,
105 unpack_input={"message": "."},
106 input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
107 output_descriptors=[
108 ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
109 ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
110 ],
111 output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
112)
113
114generate_all_comments_subflow = Flow.from_steps([for_each_file_step])
API Reference: Property | ListProperty | DictProperty | StringProperty | ExtractValueFromJsonStep | MapStep | OutputMessageStep | PromptExecutionStep | ToolExecutionStep
Take a look at each of the steps used in the sub-flow to get an understanding of what is happening.
Format diff to string, format_diff_to_string_step
#
This step converts the file diff list into a string so that it can be used by the following steps.
This is done with the string
Jinja filter as follows: {{ message | string }}
. It uses an OutputMessageStep
to achieve this.
Add lines to the diff, add_lines_on_diff_step
#
This step prefixes the diff with the line numbers required to review comments. It uses a, ToolExecutionStep, to run the tool that you previously defined in order to do this.
The input to the tool, within the I/O dictionary, is specified using the input_mapping
. For all these steps, it is important to remember
that the outputs of one step are linked to the inputs of the next.
Extract file path, extract_file_path_step
#
This extracts the file path from the diff string. The file path is needed for assigning the review comments. The RegexExtractionStep step is used to extract the file path from the diff.
The regular expression is applied to the diff string, extracted form the input map using the input_mapping
parameter.
Note: Compared to the RegexExtractionStep used in Part 1, here only the first match is required.
Generate comments, generate_comments_step
#
This generates comments using the LLM and the prompt template defined earlier. The PromptExecutionStep step executes the prompt with the LLM defined earlier in this tutorial.
Since the list of checks has already been defined, the template can be pre-rendered using the render_template_partially
method. This renders the parts of the
template that have been provided, while the remaining information is gathered from the I/O dictionary.
Extract comments from JSON, extract_comments_from_json_step
#
This extracts the comments and line numbers from the generated LLM output, which is a serialized JSON structure due to the prompt used.
A ExtractValueFromJsonStep is used to do the extraction. When creating the step, specify the following in
addition to the usual input_mapping
and output_mapping
:
output_values
: This defines the JQ query to extract the comments form the JSON generated by the LLM.llms
: An LLM that can be used to help resolve any parsing errors. This is related toretry
.retry
: If parsing fails, you may want to retry. This is set toTrue
, which results in trying to use the LLM to help resolve any such issues.
Create the sub-flow, generate_comments_subflow
#
Here you define what steps are in the sub-flow, what the transitions between the steps are and what will be the starting step. This is exactly the same process you did previously when defining the sub-flow to fetch the PR data.
Applying the comment generation to all file diffs#
Now that you have the sub-flow create, you need to apply it to every file diff. This is done using a MapStep.
MapStep
takes a sub-flow as input, in this case, the generate_comments_subflow
, and applies it to an iterable—in this case, the list of file
diffs.
You simply specify:
flow
: The sub-flow to map, that is applied to the iterable.unpack_input
: Defines how to unpack the input. A JQ query can be used to transform the input, but in this case, it is kept as a list.input_mapping
: Defines what the sub-flow will iterate over. The key,MapStep.ITERATED_INPUT
, is used to pass in the diffs.output_descriptors
: Specifies the values to collect from the output generated by applying the sub-flow. In this case, these will be the generated comments and the associated file path.
Note
The MapStep works similarly to how the Python map function works. For more information, see https://docs.python.org/3/library/functions.html#map
Finally, create the sub-flow to generate all comments using the helper method create_single_step_flow
.
Testing the sub-flow#
You can test the sub-flow by creating a conversation, as shown in the code below, and specifying the inputs as done in, Part 2: Retrieve the PR diff information
.
Since each sub-flow is tested independently, you can reuse the output from the first sub-flow.
1# we reuse the FILE_DIFF_LIST from the previous test
2test_conversation = generate_all_comments_subflow.start_conversation(
3 inputs={
4 FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
5 }
6)
7
8execution_status = test_conversation.execute()
9
10if not isinstance(execution_status, FinishedStatus):
11 raise ValueError("Unexpected status type")
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
14FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
15print(NESTED_COMMENT_LIST[0])
16print(FILEPATH_LIST)
Building the final Flow#
Congratulations! You have completed the three sub-flows, which, when combined into a single flow, will retrieve the PR diff information, generate comments on the diffs using an LLM.
You will wire the sub-flows that you have built together by wrapping them in a FlowExecutionStep. The FlowExecutionSteps are then composed into the final combined Flow.
The code for this is shown below:
1from wayflowcore.steps import FlowExecutionStep
2
3
4# Steps
5retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
6generate_all_comments_flowstep = FlowExecutionStep(
7 name="generate_comments_flowstep",
8 flow=generate_all_comments_subflow,
9)
10
11pr_bot = Flow(
12 name="PR bot flow",
13 begin_step=retrieve_diff_flowstep,
14 control_flow_edges=[
15 ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
16 ControlFlowEdge(generate_all_comments_flowstep, None),
17 ],
18 data_flow_edges=[
19 DataFlowEdge(
20 retrieve_diff_flowstep,
21 FILE_DIFF_LIST_IO,
22 generate_all_comments_flowstep,
23 FILE_DIFF_LIST_IO,
24 )
25 ],
26)
API Reference: Flow | FlowExecutionStep
Testing the combined assistant#
You can now run the PR bot end-to-end on your repo or locally.
Set the PATH_TO_DIR
to the actual path you extracted the sample codebase Git repository to. You can also see how the output of the conversation
is extracted from the execution_status
object, execution_status.output_values
.
1# Replace the path below with the path to your actual codebase sample git repository.
2PATH_TO_DIR = "path/to/repository_root"
3
4conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
5
6execution_status = conversation.execute()
7
8if not isinstance(execution_status, FinishedStatus):
9 raise ValueError("Unexpected status type")
10
11print(execution_status.output_values)
12
13NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
Agent Spec Exporting/Loading#
You can export the assistant configuration to its Agent Spec configuration using the AgentSpecExporter
.
from wayflowcore.agentspec import AgentSpecExporter
serialized_assistant = AgentSpecExporter().to_json(pr_bot)
Here is what the Agent Spec representation will look like ↓
Click here to see the assistant configuration.
{
"component_type": "Flow",
"id": "9c65246d-a0dd-4ec4-801d-afd640b2488e",
"name": "PR bot flow",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"start_node": {
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
"nodes": [
{
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
{
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
{
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
{
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "a5c123ff-c14c-4291-b174-61d61170f187",
"name": "retrieve_diff_flowstep_to_generate_comments_flowstep_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"from_branch": null,
"to_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
}
},
{
"component_type": "ControlFlowEdge",
"id": "8a10b23a-2d0c-46c4-82ac-e66ad0b9399b",
"name": "__StartStep___to_retrieve_diff_flowstep_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
"from_branch": null,
"to_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
}
},
{
"component_type": "ControlFlowEdge",
"id": "dac07720-8a5a-4a61-b1e7-50be506ed937",
"name": "generate_comments_flowstep_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"from_branch": null,
"to_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "7b12dfed-309b-46ff-8a2d-bb6f2a3154b6",
"name": "retrieve_diff_flowstep_$file_diff_list_to_generate_comments_flowstep_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"destination_input": "$file_diff_list"
},
{
"component_type": "DataFlowEdge",
"id": "51122844-22d3-40a8-b652-1b020ce24945",
"name": "__StartStep___$repo_dirpath_io_to_retrieve_diff_flowstep_$repo_dirpath_io_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "020c885e-6d0b-472a-bb91-246ab70ab1db"
},
"source_output": "$repo_dirpath_io",
"destination_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"destination_input": "$repo_dirpath_io"
},
{
"component_type": "DataFlowEdge",
"id": "72aa469c-98cd-4f0d-9496-0aa454373aef",
"name": "generate_comments_flowstep_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"source_output": "$filepath_list",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$filepath_list"
},
{
"component_type": "DataFlowEdge",
"id": "eac1b375-1541-41f7-87f3-f3e626cc2c9c",
"name": "generate_comments_flowstep_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "43d58c76-23a0-4d10-943d-f9c5e0835a7c"
},
"source_output": "$nested_comment_list",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$nested_comment_list"
},
{
"component_type": "DataFlowEdge",
"id": "0869acb5-4d8f-4b17-b59b-3b915912b628",
"name": "retrieve_diff_flowstep_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"source_output": "$raw_pr_diff",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$raw_pr_diff"
},
{
"component_type": "DataFlowEdge",
"id": "9fb2ab9e-ece1-4195-8f51-ef618dcb72bb",
"name": "retrieve_diff_flowstep_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "47e367be-4d74-49dc-ac3b-89bb97ffa7df"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93"
},
"destination_input": "$file_diff_list"
}
],
"$referenced_components": {
"43d58c76-23a0-4d10-943d-f9c5e0835a7c": {
"component_type": "FlowNode",
"id": "43d58c76-23a0-4d10-943d-f9c5e0835a7c",
"name": "generate_comments_flowstep",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"branches": [
"next"
],
"subflow": {
"component_type": "Flow",
"id": "f95e0e5d-f573-4e25-9d68-8508371246f9",
"name": "flow_028a7dfb__auto",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"start_node": {
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
"nodes": [
{
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
{
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
{
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "85a2cdff-6ad4-4f58-8d1c-c8deeb05880c",
"name": "__StartStep___to_step_0_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
"from_branch": null,
"to_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
}
},
{
"component_type": "ControlFlowEdge",
"id": "396e218f-225e-4e36-a33c-a176ca77d345",
"name": "step_0_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"from_branch": null,
"to_node": {
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "6c8b8f78-b587-49ff-a401-6262cdafb0ee",
"name": "__StartStep___$file_diff_list_to_step_0_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "367ae568-317d-42ec-ae70-4c41afe0dbd0"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"destination_input": "$file_diff_list"
},
{
"component_type": "DataFlowEdge",
"id": "84d3a783-38c8-4d53-bc0b-4205732d1fbf",
"name": "step_0_$filepath_list_to_None End node_$filepath_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"source_output": "$filepath_list",
"destination_node": {
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
},
"destination_input": "$filepath_list"
},
{
"component_type": "DataFlowEdge",
"id": "b7ffd4c3-4a03-47f0-95fc-0ba670010729",
"name": "step_0_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "f127a297-842d-4d17-bc89-4704019458d7"
},
"source_output": "$nested_comment_list",
"destination_node": {
"$component_ref": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb"
},
"destination_input": "$nested_comment_list"
}
],
"$referenced_components": {
"f127a297-842d-4d17-bc89-4704019458d7": {
"component_type": "ExtendedMapNode",
"id": "f127a297-842d-4d17-bc89-4704019458d7",
"name": "step_0",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
}
],
"branches": [
"next"
],
"input_mapping": {
"iterated_input": "$file_diff_list"
},
"output_mapping": {
"$extracted_comments": "$nested_comment_list",
"$filename": "$filepath_list"
},
"flow": {
"component_type": "Flow",
"id": "3da67cce-b8de-40be-bb8d-e1edead178f0",
"name": "Generate review comments flow",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"outputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
},
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
},
{
"type": "string",
"title": "$diff_with_lines"
},
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
},
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"start_node": {
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
"nodes": [
{
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
{
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
{
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
{
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
{
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
{
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
{
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "becf6951-96fd-4152-97d0-4a4eff042a29",
"name": "format_diff_to_string_to_add_lines_on_diff_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"from_branch": null,
"to_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
}
},
{
"component_type": "ControlFlowEdge",
"id": "c197b0d5-8002-4910-ae8d-61f97f1f8f26",
"name": "add_lines_on_diff_to_extract_file_path_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"from_branch": null,
"to_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
}
},
{
"component_type": "ControlFlowEdge",
"id": "406e0670-cc49-4da4-8d15-8c1c320193e8",
"name": "extract_file_path_to_generate_comments_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"from_branch": null,
"to_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
}
},
{
"component_type": "ControlFlowEdge",
"id": "e54eb347-2e6c-42c4-a7d6-a42c8059bdf3",
"name": "generate_comments_to_extract_comments_from_json_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"from_branch": null,
"to_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
}
},
{
"component_type": "ControlFlowEdge",
"id": "ebe5e60b-2724-4b51-b287-79f3e8e7fdd1",
"name": "__StartStep___to_format_diff_to_string_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
"from_branch": null,
"to_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
}
},
{
"component_type": "ControlFlowEdge",
"id": "98e7631e-7206-4ba9-b5b0-eb308ac89c0f",
"name": "extract_comments_from_json_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
"from_branch": null,
"to_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "ab8ed6de-3ea7-424e-a830-bca10ac57a32",
"name": "format_diff_to_string_$diff_to_string_to_add_lines_on_diff_$diff_to_string_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"source_output": "$diff_to_string",
"destination_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"destination_input": "$diff_to_string"
},
{
"component_type": "DataFlowEdge",
"id": "3caaa171-9b4b-44df-8ebd-4d060329f91a",
"name": "format_diff_to_string_$diff_to_string_to_extract_file_path_$diff_to_string_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"source_output": "$diff_to_string",
"destination_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"destination_input": "$diff_to_string"
},
{
"component_type": "DataFlowEdge",
"id": "cdf0945b-5a96-42ff-b410-f7c56b5f8e45",
"name": "add_lines_on_diff_$diff_with_lines_to_generate_comments_$diff_with_lines_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"source_output": "$diff_with_lines",
"destination_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"destination_input": "$diff_with_lines"
},
{
"component_type": "DataFlowEdge",
"id": "ca6ed62b-6f6a-405f-9f16-5e1304de6608",
"name": "extract_file_path_$filename_to_generate_comments_$filename_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"source_output": "$filename",
"destination_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"destination_input": "$filename"
},
{
"component_type": "DataFlowEdge",
"id": "dec4b4bb-56c9-445a-a282-9d095ff6038e",
"name": "generate_comments_$json_comments_to_extract_comments_from_json_$json_comments_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"source_output": "$json_comments",
"destination_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
"destination_input": "$json_comments"
},
{
"component_type": "DataFlowEdge",
"id": "611478d7-281a-4587-81e6-97e8c745da53",
"name": "__StartStep___message_to_format_diff_to_string_message_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "e20f5870-d594-4089-9fcd-08146232910d"
},
"source_output": "message",
"destination_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"destination_input": "message"
},
{
"component_type": "DataFlowEdge",
"id": "227ae098-0baf-4fe8-9615-094bb386c9a9",
"name": "extract_comments_from_json_$extracted_comments_to_None End node_$extracted_comments_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "48057b9c-bee7-4286-baf5-625b6f1a6f1a"
},
"source_output": "$extracted_comments",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$extracted_comments"
},
{
"component_type": "DataFlowEdge",
"id": "6e25b4d8-5656-471b-8ffa-1fe8cfffbc05",
"name": "generate_comments_$json_comments_to_None End node_$json_comments_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "0ce752d7-3ef1-481b-bb01-c7081ef86103"
},
"source_output": "$json_comments",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$json_comments"
},
{
"component_type": "DataFlowEdge",
"id": "fdbf1eeb-0278-4dc8-b897-c924937a1692",
"name": "add_lines_on_diff_$diff_with_lines_to_None End node_$diff_with_lines_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "6000ee3f-ac80-4937-b36c-94fd65cdcda4"
},
"source_output": "$diff_with_lines",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$diff_with_lines"
},
{
"component_type": "DataFlowEdge",
"id": "3b6bcba7-635b-45fa-b450-cf0a15dae463",
"name": "extract_file_path_$filename_to_None End node_$filename_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "6f6dc822-9352-47ae-9b48-173402a334fe"
},
"source_output": "$filename",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$filename"
},
{
"component_type": "DataFlowEdge",
"id": "2f95704b-4cc1-4983-8a20-e39c79a94e01",
"name": "format_diff_to_string_$diff_to_string_to_None End node_$diff_to_string_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f"
},
"source_output": "$diff_to_string",
"destination_node": {
"$component_ref": "39f36227-8910-414c-8b6b-517c0d65b0d8"
},
"destination_input": "$diff_to_string"
}
],
"$referenced_components": {
"6000ee3f-ac80-4937-b36c-94fd65cdcda4": {
"component_type": "ExtendedToolNode",
"id": "6000ee3f-ac80-4937-b36c-94fd65cdcda4",
"name": "add_lines_on_diff",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$diff_to_string"
}
],
"outputs": [
{
"type": "string",
"title": "$diff_with_lines"
}
],
"branches": [
"next"
],
"tool": {
"component_type": "ServerTool",
"id": "e936566f-7a25-40f3-9434-3e740a7bfb02",
"name": "format_git_diff",
"description": "Formats a git diff by adding line numbers to each line except removal lines.",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "diff_text"
}
],
"outputs": [
{
"type": "string",
"title": "tool_output"
}
]
},
"input_mapping": {
"diff_text": "$diff_to_string"
},
"output_mapping": {
"tool_output": "$diff_with_lines"
},
"raise_exceptions": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"f0fb3ab4-a950-43b6-a583-6f0044f18c7f": {
"component_type": "PluginOutputMessageNode",
"id": "f0fb3ab4-a950-43b6-a583-6f0044f18c7f",
"name": "format_diff_to_string",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"outputs": [
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"branches": [
"next"
],
"expose_message_as_output": true,
"message": "{{ message | string }}",
"input_mapping": {},
"output_mapping": {
"output_message": "$diff_to_string"
},
"message_type": "AGENT",
"rephrase": false,
"llm_config": null,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"6f6dc822-9352-47ae-9b48-173402a334fe": {
"component_type": "PluginRegexNode",
"id": "6f6dc822-9352-47ae-9b48-173402a334fe",
"name": "extract_file_path",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "raw text to extract information from",
"type": "string",
"title": "$diff_to_string"
}
],
"outputs": [
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
}
],
"branches": [
"next"
],
"input_mapping": {
"text": "$diff_to_string"
},
"output_mapping": {
"output": "$filename"
},
"regex_pattern": "diff --git a/(.+?) b/",
"return_first_match_only": true,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"0ce752d7-3ef1-481b-bb01-c7081ef86103": {
"component_type": "ExtendedLlmNode",
"id": "0ce752d7-3ef1-481b-bb01-c7081ef86103",
"name": "generate_comments",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"filename\" input variable for the template",
"type": "string",
"title": "$filename"
},
{
"description": "\"diff\" input variable for the template",
"type": "string",
"title": "$diff_with_lines"
}
],
"outputs": [
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
}
],
"branches": [
"next"
],
"llm_config": {
"component_type": "VllmConfig",
"id": "fb043839-1e69-404c-a178-d8c3de0bfe20",
"name": "LLAMA_MODEL_ID",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"default_generation_parameters": null,
"url": "LLAMA_API_URL",
"model_id": "LLAMA_MODEL_ID"
},
"prompt_template": "You are a very experienced code reviewer. You are given a git diff on a file: {{ filename }}\n\n## Context\nThe git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.\nAfter the line number, a line that was changed has a \"+\" before the code. All lines without a \"+\" are just here for context, you will not comment on them.\n\n## Input\n### Code diff\n{{ diff }}\n\n## Task\nYour task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.\nThe rules are the following:\n\n\nName: TODO_WITHOUT_TICKET\nDescription: TODO comments should reference a ticket number for tracking.\nExample code:\n```python\n# TODO: Add validation here\ndef process_user_input(data):\n return data\n```\nExample comment:\n[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., \"TODO: Add validation here (TICKET-1234)\").\n\n\n---\n\n\nName: MUTABLE_DEFAULT_ARGUMENT\nDescription: Using mutable objects as default arguments can lead to unexpected behavior.\nExample code:\n```python\ndef add_item(item, items=[]):\n items.append(item)\n return items\n```\nExample comment:\n[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`\n\n\n---\n\n\nName: NON_DESCRIPTIVE_NAME\nDescription: Variable names should clearly indicate their purpose or content.\nExample code:\n```python\ndef process(lst):\n res = []\n for i in lst:\n res.append(i * 2)\n return res\n```\nExample comment:\n[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'\n\n\n### Reponse Format\nYou need to return a review as a json as follows:\n```json\n[\n {\n \"content\": \"the comment as a text\",\n \"suggestion\": \"if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line\",\n \"line\": \"line number where the comment applies\"\n },\n \u2026\n]\n```\nPlease use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.\nIf there are no comments and the code seems fine, just output an empty JSON list.",
"input_mapping": {
"diff": "$diff_with_lines",
"filename": "$filename"
},
"output_mapping": {
"output": "$json_comments"
},
"prompt_template_object": null,
"send_message": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"48057b9c-bee7-4286-baf5-625b6f1a6f1a": {
"component_type": "PluginExtractNode",
"id": "48057b9c-bee7-4286-baf5-625b6f1a6f1a",
"name": "extract_comments_from_json",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "raw text to extract information from",
"type": "string",
"title": "$json_comments"
}
],
"outputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
}
],
"branches": [
"next"
],
"input_mapping": {
"text": "$json_comments"
},
"output_mapping": {
"values": "$extracted_comments"
},
"output_values": {
"values": "[.[] | {\"content\": .[\"content\"], \"line\": .[\"line\"]}]"
},
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"e20f5870-d594-4089-9fcd-08146232910d": {
"component_type": "StartNode",
"id": "e20f5870-d594-4089-9fcd-08146232910d",
"name": "__StartStep__",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"outputs": [
{
"description": "\"message\" input variable for the template",
"title": "message"
}
],
"branches": [
"next"
]
},
"39f36227-8910-414c-8b6b-517c0d65b0d8": {
"component_type": "EndNode",
"id": "39f36227-8910-414c-8b6b-517c0d65b0d8",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
},
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
},
{
"type": "string",
"title": "$diff_with_lines"
},
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
},
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"outputs": [
{
"description": "The extracted comments content and line number",
"type": "array",
"items": {
"type": "object",
"additionalProperties": {},
"key_type": {
"type": "string"
}
},
"title": "$extracted_comments"
},
{
"description": "the generated text",
"type": "string",
"title": "$json_comments"
},
{
"type": "string",
"title": "$diff_with_lines"
},
{
"description": "the first extracted value using the regex \"diff --git a/(.+?) b/\" from the raw input",
"type": "string",
"title": "$filename",
"default": ""
},
{
"description": "the message added to the messages list",
"type": "string",
"title": "$diff_to_string"
}
],
"branches": [],
"branch_name": "next"
}
}
},
"unpack_input": {
"message": "."
},
"parallel_execution": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"367ae568-317d-42ec-ae70-4c41afe0dbd0": {
"component_type": "StartNode",
"id": "367ae568-317d-42ec-ae70-4c41afe0dbd0",
"name": "__StartStep__",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"outputs": [
{
"description": "iterated input for the map step",
"type": "array",
"items": {
"description": "\"message\" input variable for the template",
"title": "message"
},
"title": "$file_diff_list"
}
],
"branches": [
"next"
]
},
"6f62aecf-03a1-4e38-b551-8eef0efaf4bb": {
"component_type": "EndNode",
"id": "6f62aecf-03a1-4e38-b551-8eef0efaf4bb",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
}
],
"branches": [],
"branch_name": "next"
}
}
}
},
"47e367be-4d74-49dc-ac3b-89bb97ffa7df": {
"component_type": "FlowNode",
"id": "47e367be-4d74-49dc-ac3b-89bb97ffa7df",
"name": "retrieve_diff_flowstep",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [
"next"
],
"subflow": {
"component_type": "Flow",
"id": "9e7aed22-876c-4c32-9d44-20ee7ceb3771",
"name": "Retrieve PR diff flow",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"start_node": {
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
"nodes": [
{
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
{
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
{
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
{
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
}
],
"control_flow_connections": [
{
"component_type": "ControlFlowEdge",
"id": "60dc14b8-d9b9-4aec-a958-9f3676848f48",
"name": "start_step_to_get_pr_diff_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
"from_branch": null,
"to_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
}
},
{
"component_type": "ControlFlowEdge",
"id": "500f97de-78b1-42e0-944c-0375dfca734e",
"name": "get_pr_diff_to_extract_into_list_of_file_diff_control_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"from_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"from_branch": null,
"to_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
}
},
{
"component_type": "ControlFlowEdge",
"id": "22d0cf0d-8edb-4b04-8f54-a234f5705360",
"name": "extract_into_list_of_file_diff_to_None End node_control_flow_edge",
"description": null,
"metadata": {},
"from_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
"from_branch": null,
"to_node": {
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
}
}
],
"data_flow_connections": [
{
"component_type": "DataFlowEdge",
"id": "106e3740-de45-4472-8168-2873ae1dbc82",
"name": "start_step_$repo_dirpath_io_to_get_pr_diff_$repo_dirpath_io_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "4fcb7ebe-325b-446d-a46b-59187c30e260"
},
"source_output": "$repo_dirpath_io",
"destination_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"destination_input": "$repo_dirpath_io"
},
{
"component_type": "DataFlowEdge",
"id": "a32cbb1c-eafe-4138-80e2-2cf2e1248312",
"name": "get_pr_diff_$raw_pr_diff_to_extract_into_list_of_file_diff_$raw_pr_diff_data_flow_edge",
"description": null,
"metadata": {
"__metadata_info__": {}
},
"source_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"source_output": "$raw_pr_diff",
"destination_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
"destination_input": "$raw_pr_diff"
},
{
"component_type": "DataFlowEdge",
"id": "3ef5dcf4-acdf-4962-8df6-07b53f249e18",
"name": "get_pr_diff_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "5c73da9c-6ba9-44ce-aab1-212a78d0a720"
},
"source_output": "$raw_pr_diff",
"destination_node": {
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
},
"destination_input": "$raw_pr_diff"
},
{
"component_type": "DataFlowEdge",
"id": "08cbca39-e591-4cf4-9057-ae67938d9557",
"name": "extract_into_list_of_file_diff_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge",
"description": null,
"metadata": {},
"source_node": {
"$component_ref": "cf841053-2414-48b6-ba6d-0f0f5e11044c"
},
"source_output": "$file_diff_list",
"destination_node": {
"$component_ref": "dd0e56ab-1267-4345-9f59-ecc053baf2af"
},
"destination_input": "$file_diff_list"
}
],
"$referenced_components": {
"5c73da9c-6ba9-44ce-aab1-212a78d0a720": {
"component_type": "ExtendedToolNode",
"id": "5c73da9c-6ba9-44ce-aab1-212a78d0a720",
"name": "get_pr_diff",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
}
],
"branches": [
"next"
],
"tool": {
"component_type": "ServerTool",
"id": "275aaf19-cdd4-4ed7-a436-e53f922cd740",
"name": "local_get_pr_diff_tool",
"description": "# docs-skiprow\nRetrieves code diff with a git command given the # docs-skiprow\npath to the repository root folder. # docs-skiprow",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "repo_dirpath"
}
],
"outputs": [
{
"type": "string",
"title": "tool_output"
}
]
},
"input_mapping": {
"repo_dirpath": "$repo_dirpath_io"
},
"output_mapping": {
"tool_output": "$raw_pr_diff"
},
"raise_exceptions": true,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"4fcb7ebe-325b-446d-a46b-59187c30e260": {
"component_type": "StartNode",
"id": "4fcb7ebe-325b-446d-a46b-59187c30e260",
"name": "start_step",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"branches": [
"next"
]
},
"cf841053-2414-48b6-ba6d-0f0f5e11044c": {
"component_type": "PluginRegexNode",
"id": "cf841053-2414-48b6-ba6d-0f0f5e11044c",
"name": "extract_into_list_of_file_diff",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"description": "raw text to extract information from",
"type": "string",
"title": "$raw_pr_diff"
}
],
"outputs": [
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [
"next"
],
"input_mapping": {
"text": "$raw_pr_diff"
},
"output_mapping": {
"output": "$file_diff_list"
},
"regex_pattern": "(diff --git[\\s\\S]*?)(?=diff --git|$)",
"return_first_match_only": false,
"component_plugin_name": "NodesPlugin",
"component_plugin_version": "25.4.0.dev0"
},
"dd0e56ab-1267-4345-9f59-ecc053baf2af": {
"component_type": "EndNode",
"id": "dd0e56ab-1267-4345-9f59-ecc053baf2af",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"outputs": [
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [],
"branch_name": "next"
}
}
}
},
"020c885e-6d0b-472a-bb91-246ab70ab1db": {
"component_type": "StartNode",
"id": "020c885e-6d0b-472a-bb91-246ab70ab1db",
"name": "__StartStep__",
"description": "",
"metadata": {
"__metadata_info__": {}
},
"inputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"outputs": [
{
"type": "string",
"title": "$repo_dirpath_io"
}
],
"branches": [
"next"
]
},
"a544af64-e63b-4ccf-9ab0-8d25cdbc0b93": {
"component_type": "EndNode",
"id": "a544af64-e63b-4ccf-9ab0-8d25cdbc0b93",
"name": "None End node",
"description": "End node representing all transitions to None in the WayFlow flow",
"metadata": {},
"inputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"outputs": [
{
"type": "array",
"items": {
"type": "string"
},
"title": "$filepath_list"
},
{
"type": "array",
"items": {},
"title": "$nested_comment_list"
},
{
"type": "string",
"title": "$raw_pr_diff"
},
{
"description": "the list of extracted value using the regex \"(diff --git[\\s\\S]*?)(?=diff --git|$)\" from the raw input",
"type": "array",
"items": {
"type": "string"
},
"title": "$file_diff_list",
"default": []
}
],
"branches": [],
"branch_name": "next"
}
},
"agentspec_version": "25.4.1"
}
component_type: Flow
id: 9c65246d-a0dd-4ec4-801d-afd640b2488e
name: PR bot flow
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
start_node:
$component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
nodes:
- $component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
- $component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
- $component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
- $component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
control_flow_connections:
- component_type: ControlFlowEdge
id: a5c123ff-c14c-4291-b174-61d61170f187
name: retrieve_diff_flowstep_to_generate_comments_flowstep_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
from_branch: null
to_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
- component_type: ControlFlowEdge
id: 8a10b23a-2d0c-46c4-82ac-e66ad0b9399b
name: __StartStep___to_retrieve_diff_flowstep_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
from_branch: null
to_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
- component_type: ControlFlowEdge
id: dac07720-8a5a-4a61-b1e7-50be506ed937
name: generate_comments_flowstep_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
from_branch: null
to_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
data_flow_connections:
- component_type: DataFlowEdge
id: 7b12dfed-309b-46ff-8a2d-bb6f2a3154b6
name: retrieve_diff_flowstep_$file_diff_list_to_generate_comments_flowstep_$file_diff_list_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
source_output: $file_diff_list
destination_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
destination_input: $file_diff_list
- component_type: DataFlowEdge
id: 51122844-22d3-40a8-b652-1b020ce24945
name: __StartStep___$repo_dirpath_io_to_retrieve_diff_flowstep_$repo_dirpath_io_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 020c885e-6d0b-472a-bb91-246ab70ab1db
source_output: $repo_dirpath_io
destination_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
destination_input: $repo_dirpath_io
- component_type: DataFlowEdge
id: 72aa469c-98cd-4f0d-9496-0aa454373aef
name: generate_comments_flowstep_$filepath_list_to_None End node_$filepath_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
source_output: $filepath_list
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $filepath_list
- component_type: DataFlowEdge
id: eac1b375-1541-41f7-87f3-f3e626cc2c9c
name: generate_comments_flowstep_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
source_output: $nested_comment_list
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $nested_comment_list
- component_type: DataFlowEdge
id: 0869acb5-4d8f-4b17-b59b-3b915912b628
name: retrieve_diff_flowstep_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
source_output: $raw_pr_diff
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $raw_pr_diff
- component_type: DataFlowEdge
id: 9fb2ab9e-ece1-4195-8f51-ef618dcb72bb
name: retrieve_diff_flowstep_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
source_output: $file_diff_list
destination_node:
$component_ref: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
destination_input: $file_diff_list
$referenced_components:
43d58c76-23a0-4d10-943d-f9c5e0835a7c:
component_type: FlowNode
id: 43d58c76-23a0-4d10-943d-f9c5e0835a7c
name: generate_comments_flowstep
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
branches:
- next
subflow:
component_type: Flow
id: f95e0e5d-f573-4e25-9d68-8508371246f9
name: flow_028a7dfb__auto
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
start_node:
$component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
nodes:
- $component_ref: f127a297-842d-4d17-bc89-4704019458d7
- $component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
- $component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
control_flow_connections:
- component_type: ControlFlowEdge
id: 85a2cdff-6ad4-4f58-8d1c-c8deeb05880c
name: __StartStep___to_step_0_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
from_branch: null
to_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
- component_type: ControlFlowEdge
id: 396e218f-225e-4e36-a33c-a176ca77d345
name: step_0_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
from_branch: null
to_node:
$component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
data_flow_connections:
- component_type: DataFlowEdge
id: 6c8b8f78-b587-49ff-a401-6262cdafb0ee
name: __StartStep___$file_diff_list_to_step_0_$file_diff_list_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 367ae568-317d-42ec-ae70-4c41afe0dbd0
source_output: $file_diff_list
destination_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
destination_input: $file_diff_list
- component_type: DataFlowEdge
id: 84d3a783-38c8-4d53-bc0b-4205732d1fbf
name: step_0_$filepath_list_to_None End node_$filepath_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
source_output: $filepath_list
destination_node:
$component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
destination_input: $filepath_list
- component_type: DataFlowEdge
id: b7ffd4c3-4a03-47f0-95fc-0ba670010729
name: step_0_$nested_comment_list_to_None End node_$nested_comment_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: f127a297-842d-4d17-bc89-4704019458d7
source_output: $nested_comment_list
destination_node:
$component_ref: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
destination_input: $nested_comment_list
$referenced_components:
f127a297-842d-4d17-bc89-4704019458d7:
component_type: ExtendedMapNode
id: f127a297-842d-4d17-bc89-4704019458d7
name: step_0
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- type: array
items: {}
title: $nested_comment_list
- type: array
items:
type: string
title: $filepath_list
branches:
- next
input_mapping:
iterated_input: $file_diff_list
output_mapping:
$extracted_comments: $nested_comment_list
$filename: $filepath_list
flow:
component_type: Flow
id: 3da67cce-b8de-40be-bb8d-e1edead178f0
name: Generate review comments flow
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"message" input variable for the template'
title: message
outputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
- description: the generated text
type: string
title: $json_comments
- type: string
title: $diff_with_lines
- description: the first extracted value using the regex "diff --git a/(.+?)
b/" from the raw input
type: string
title: $filename
default: ''
- description: the message added to the messages list
type: string
title: $diff_to_string
start_node:
$component_ref: e20f5870-d594-4089-9fcd-08146232910d
nodes:
- $component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
- $component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
- $component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
- $component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
- $component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
- $component_ref: e20f5870-d594-4089-9fcd-08146232910d
- $component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
control_flow_connections:
- component_type: ControlFlowEdge
id: becf6951-96fd-4152-97d0-4a4eff042a29
name: format_diff_to_string_to_add_lines_on_diff_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
from_branch: null
to_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
- component_type: ControlFlowEdge
id: c197b0d5-8002-4910-ae8d-61f97f1f8f26
name: add_lines_on_diff_to_extract_file_path_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
from_branch: null
to_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
- component_type: ControlFlowEdge
id: 406e0670-cc49-4da4-8d15-8c1c320193e8
name: extract_file_path_to_generate_comments_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
from_branch: null
to_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
- component_type: ControlFlowEdge
id: e54eb347-2e6c-42c4-a7d6-a42c8059bdf3
name: generate_comments_to_extract_comments_from_json_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
from_branch: null
to_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
- component_type: ControlFlowEdge
id: ebe5e60b-2724-4b51-b287-79f3e8e7fdd1
name: __StartStep___to_format_diff_to_string_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: e20f5870-d594-4089-9fcd-08146232910d
from_branch: null
to_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
- component_type: ControlFlowEdge
id: 98e7631e-7206-4ba9-b5b0-eb308ac89c0f
name: extract_comments_from_json_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
from_branch: null
to_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
data_flow_connections:
- component_type: DataFlowEdge
id: ab8ed6de-3ea7-424e-a830-bca10ac57a32
name: format_diff_to_string_$diff_to_string_to_add_lines_on_diff_$diff_to_string_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
source_output: $diff_to_string
destination_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
destination_input: $diff_to_string
- component_type: DataFlowEdge
id: 3caaa171-9b4b-44df-8ebd-4d060329f91a
name: format_diff_to_string_$diff_to_string_to_extract_file_path_$diff_to_string_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
source_output: $diff_to_string
destination_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
destination_input: $diff_to_string
- component_type: DataFlowEdge
id: cdf0945b-5a96-42ff-b410-f7c56b5f8e45
name: add_lines_on_diff_$diff_with_lines_to_generate_comments_$diff_with_lines_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
source_output: $diff_with_lines
destination_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
destination_input: $diff_with_lines
- component_type: DataFlowEdge
id: ca6ed62b-6f6a-405f-9f16-5e1304de6608
name: extract_file_path_$filename_to_generate_comments_$filename_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
source_output: $filename
destination_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
destination_input: $filename
- component_type: DataFlowEdge
id: dec4b4bb-56c9-445a-a282-9d095ff6038e
name: generate_comments_$json_comments_to_extract_comments_from_json_$json_comments_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
source_output: $json_comments
destination_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
destination_input: $json_comments
- component_type: DataFlowEdge
id: 611478d7-281a-4587-81e6-97e8c745da53
name: __StartStep___message_to_format_diff_to_string_message_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: e20f5870-d594-4089-9fcd-08146232910d
source_output: message
destination_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
destination_input: message
- component_type: DataFlowEdge
id: 227ae098-0baf-4fe8-9615-094bb386c9a9
name: extract_comments_from_json_$extracted_comments_to_None End node_$extracted_comments_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
source_output: $extracted_comments
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $extracted_comments
- component_type: DataFlowEdge
id: 6e25b4d8-5656-471b-8ffa-1fe8cfffbc05
name: generate_comments_$json_comments_to_None End node_$json_comments_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 0ce752d7-3ef1-481b-bb01-c7081ef86103
source_output: $json_comments
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $json_comments
- component_type: DataFlowEdge
id: fdbf1eeb-0278-4dc8-b897-c924937a1692
name: add_lines_on_diff_$diff_with_lines_to_None End node_$diff_with_lines_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
source_output: $diff_with_lines
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $diff_with_lines
- component_type: DataFlowEdge
id: 3b6bcba7-635b-45fa-b450-cf0a15dae463
name: extract_file_path_$filename_to_None End node_$filename_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 6f6dc822-9352-47ae-9b48-173402a334fe
source_output: $filename
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $filename
- component_type: DataFlowEdge
id: 2f95704b-4cc1-4983-8a20-e39c79a94e01
name: format_diff_to_string_$diff_to_string_to_None End node_$diff_to_string_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
source_output: $diff_to_string
destination_node:
$component_ref: 39f36227-8910-414c-8b6b-517c0d65b0d8
destination_input: $diff_to_string
$referenced_components:
6000ee3f-ac80-4937-b36c-94fd65cdcda4:
component_type: ExtendedToolNode
id: 6000ee3f-ac80-4937-b36c-94fd65cdcda4
name: add_lines_on_diff
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $diff_to_string
outputs:
- type: string
title: $diff_with_lines
branches:
- next
tool:
component_type: ServerTool
id: e936566f-7a25-40f3-9434-3e740a7bfb02
name: format_git_diff
description: Formats a git diff by adding line numbers to each line
except removal lines.
metadata:
__metadata_info__: {}
inputs:
- type: string
title: diff_text
outputs:
- type: string
title: tool_output
input_mapping:
diff_text: $diff_to_string
output_mapping:
tool_output: $diff_with_lines
raise_exceptions: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
f0fb3ab4-a950-43b6-a583-6f0044f18c7f:
component_type: PluginOutputMessageNode
id: f0fb3ab4-a950-43b6-a583-6f0044f18c7f
name: format_diff_to_string
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"message" input variable for the template'
title: message
outputs:
- description: the message added to the messages list
type: string
title: $diff_to_string
branches:
- next
expose_message_as_output: True
message: '{{ message | string }}'
input_mapping: {}
output_mapping:
output_message: $diff_to_string
message_type: AGENT
rephrase: false
llm_config: null
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
6f6dc822-9352-47ae-9b48-173402a334fe:
component_type: PluginRegexNode
id: 6f6dc822-9352-47ae-9b48-173402a334fe
name: extract_file_path
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: raw text to extract information from
type: string
title: $diff_to_string
outputs:
- description: the first extracted value using the regex "diff --git
a/(.+?) b/" from the raw input
type: string
title: $filename
default: ''
branches:
- next
input_mapping:
text: $diff_to_string
output_mapping:
output: $filename
regex_pattern: diff --git a/(.+?) b/
return_first_match_only: true
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
0ce752d7-3ef1-481b-bb01-c7081ef86103:
component_type: ExtendedLlmNode
id: 0ce752d7-3ef1-481b-bb01-c7081ef86103
name: generate_comments
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"filename" input variable for the template'
type: string
title: $filename
- description: '"diff" input variable for the template'
type: string
title: $diff_with_lines
outputs:
- description: the generated text
type: string
title: $json_comments
branches:
- next
llm_config:
component_type: VllmConfig
id: fb043839-1e69-404c-a178-d8c3de0bfe20
name: LLAMA_MODEL_ID
description: null
metadata:
__metadata_info__: {}
default_generation_parameters: null
url: LLAMA_API_URL
model_id: LLAMA_MODEL_ID
prompt_template: "You are a very experienced code reviewer. You are\
\ given a git diff on a file: {{ filename }}\n\n## Context\nThe\
\ git diff contains all changes of a single file. All lines are\
\ prepended with their number. Lines without line number where removed\
\ from the file.\nAfter the line number, a line that was changed\
\ has a \"+\" before the code. All lines without a \"+\" are just\
\ here for context, you will not comment on them.\n\n## Input\n\
### Code diff\n{{ diff }}\n\n## Task\nYour task is to review these\
\ changes, according to different rules. Only comment lines that\
\ were added, so the lines that have a + just after the line number.\n\
The rules are the following:\n\n\nName: TODO_WITHOUT_TICKET\nDescription:\
\ TODO comments should reference a ticket number for tracking.\n\
Example code:\n```python\n# TODO: Add validation here\ndef process_user_input(data):\n\
\ return data\n```\nExample comment:\n[BOT] TODO_WITHOUT_TICKET:\
\ TODO comment should reference a ticket number for tracking (e.g.,\
\ \"TODO: Add validation here (TICKET-1234)\").\n\n\n---\n\n\nName:\
\ MUTABLE_DEFAULT_ARGUMENT\nDescription: Using mutable objects as\
\ default arguments can lead to unexpected behavior.\nExample code:\n\
```python\ndef add_item(item, items=[]):\n items.append(item)\n\
\ return items\n```\nExample comment:\n[BOT] MUTABLE_DEFAULT_ARGUMENT:\
\ Avoid using mutable default arguments. Use None and initialize\
\ in the function: `def add_item(item, items=None): items = items\
\ or []`\n\n\n---\n\n\nName: NON_DESCRIPTIVE_NAME\nDescription:\
\ Variable names should clearly indicate their purpose or content.\n\
Example code:\n```python\ndef process(lst):\n res = []\n for\
\ i in lst:\n res.append(i * 2)\n return res\n```\nExample\
\ comment:\n[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names:\
\ 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i'\
\ could be 'number'\n\n\n### Reponse Format\nYou need to return\
\ a review as a json as follows:\n```json\n[\n {\n \"\
content\": \"the comment as a text\",\n \"suggestion\": \"\
if the change you propose is a single line, then put here the single\
\ line rewritten that includes your proposal change. IMPORTANT:\
\ a single line, which will erase the current line. Put empty string\
\ if no suggestion of if the suggestion is more than a single line\"\
,\n \"line\": \"line number where the comment applies\"\n\
\ },\n \u2026\n]\n```\nPlease use triple backticks ``` to\
\ delimitate your JSON list of comments. Don't output more than\
\ 5 comments, only comment the most relevant sections.\nIf there\
\ are no comments and the code seems fine, just output an empty\
\ JSON list."
input_mapping:
diff: $diff_with_lines
filename: $filename
output_mapping:
output: $json_comments
prompt_template_object: null
send_message: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
48057b9c-bee7-4286-baf5-625b6f1a6f1a:
component_type: PluginExtractNode
id: 48057b9c-bee7-4286-baf5-625b6f1a6f1a
name: extract_comments_from_json
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: raw text to extract information from
type: string
title: $json_comments
outputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
branches:
- next
input_mapping:
text: $json_comments
output_mapping:
values: $extracted_comments
output_values:
values: '[.[] | {"content": .["content"], "line": .["line"]}]'
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
e20f5870-d594-4089-9fcd-08146232910d:
component_type: StartNode
id: e20f5870-d594-4089-9fcd-08146232910d
name: __StartStep__
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: '"message" input variable for the template'
title: message
outputs:
- description: '"message" input variable for the template'
title: message
branches:
- next
39f36227-8910-414c-8b6b-517c0d65b0d8:
component_type: EndNode
id: 39f36227-8910-414c-8b6b-517c0d65b0d8
name: None End node
description: End node representing all transitions to None in the
WayFlow flow
metadata: {}
inputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
- description: the generated text
type: string
title: $json_comments
- type: string
title: $diff_with_lines
- description: the first extracted value using the regex "diff --git
a/(.+?) b/" from the raw input
type: string
title: $filename
default: ''
- description: the message added to the messages list
type: string
title: $diff_to_string
outputs:
- description: The extracted comments content and line number
type: array
items:
type: object
additionalProperties: {}
key_type:
type: string
title: $extracted_comments
- description: the generated text
type: string
title: $json_comments
- type: string
title: $diff_with_lines
- description: the first extracted value using the regex "diff --git
a/(.+?) b/" from the raw input
type: string
title: $filename
default: ''
- description: the message added to the messages list
type: string
title: $diff_to_string
branches: []
branch_name: next
unpack_input:
message: .
parallel_execution: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
367ae568-317d-42ec-ae70-4c41afe0dbd0:
component_type: StartNode
id: 367ae568-317d-42ec-ae70-4c41afe0dbd0
name: __StartStep__
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
outputs:
- description: iterated input for the map step
type: array
items:
description: '"message" input variable for the template'
title: message
title: $file_diff_list
branches:
- next
6f62aecf-03a1-4e38-b551-8eef0efaf4bb:
component_type: EndNode
id: 6f62aecf-03a1-4e38-b551-8eef0efaf4bb
name: None End node
description: End node representing all transitions to None in the WayFlow
flow
metadata: {}
inputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
branches: []
branch_name: next
47e367be-4d74-49dc-ac3b-89bb97ffa7df:
component_type: FlowNode
id: 47e367be-4d74-49dc-ac3b-89bb97ffa7df
name: retrieve_diff_flowstep
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches:
- next
subflow:
component_type: Flow
id: 9e7aed22-876c-4c32-9d44-20ee7ceb3771
name: Retrieve PR diff flow
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
start_node:
$component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
nodes:
- $component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
- $component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
- $component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
- $component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
control_flow_connections:
- component_type: ControlFlowEdge
id: 60dc14b8-d9b9-4aec-a958-9f3676848f48
name: start_step_to_get_pr_diff_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
from_branch: null
to_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
- component_type: ControlFlowEdge
id: 500f97de-78b1-42e0-944c-0375dfca734e
name: get_pr_diff_to_extract_into_list_of_file_diff_control_flow_edge
description: null
metadata:
__metadata_info__: {}
from_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
from_branch: null
to_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
- component_type: ControlFlowEdge
id: 22d0cf0d-8edb-4b04-8f54-a234f5705360
name: extract_into_list_of_file_diff_to_None End node_control_flow_edge
description: null
metadata: {}
from_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
from_branch: null
to_node:
$component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
data_flow_connections:
- component_type: DataFlowEdge
id: 106e3740-de45-4472-8168-2873ae1dbc82
name: start_step_$repo_dirpath_io_to_get_pr_diff_$repo_dirpath_io_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 4fcb7ebe-325b-446d-a46b-59187c30e260
source_output: $repo_dirpath_io
destination_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
destination_input: $repo_dirpath_io
- component_type: DataFlowEdge
id: a32cbb1c-eafe-4138-80e2-2cf2e1248312
name: get_pr_diff_$raw_pr_diff_to_extract_into_list_of_file_diff_$raw_pr_diff_data_flow_edge
description: null
metadata:
__metadata_info__: {}
source_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
source_output: $raw_pr_diff
destination_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
destination_input: $raw_pr_diff
- component_type: DataFlowEdge
id: 3ef5dcf4-acdf-4962-8df6-07b53f249e18
name: get_pr_diff_$raw_pr_diff_to_None End node_$raw_pr_diff_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
source_output: $raw_pr_diff
destination_node:
$component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
destination_input: $raw_pr_diff
- component_type: DataFlowEdge
id: 08cbca39-e591-4cf4-9057-ae67938d9557
name: extract_into_list_of_file_diff_$file_diff_list_to_None End node_$file_diff_list_data_flow_edge
description: null
metadata: {}
source_node:
$component_ref: cf841053-2414-48b6-ba6d-0f0f5e11044c
source_output: $file_diff_list
destination_node:
$component_ref: dd0e56ab-1267-4345-9f59-ecc053baf2af
destination_input: $file_diff_list
$referenced_components:
5c73da9c-6ba9-44ce-aab1-212a78d0a720:
component_type: ExtendedToolNode
id: 5c73da9c-6ba9-44ce-aab1-212a78d0a720
name: get_pr_diff
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $raw_pr_diff
branches:
- next
tool:
component_type: ServerTool
id: 275aaf19-cdd4-4ed7-a436-e53f922cd740
name: local_get_pr_diff_tool
description: '# docs-skiprow
Retrieves code diff with a git command given the # docs-skiprow
path to the repository root folder. # docs-skiprow'
metadata:
__metadata_info__: {}
inputs:
- type: string
title: repo_dirpath
outputs:
- type: string
title: tool_output
input_mapping:
repo_dirpath: $repo_dirpath_io
output_mapping:
tool_output: $raw_pr_diff
raise_exceptions: true
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
4fcb7ebe-325b-446d-a46b-59187c30e260:
component_type: StartNode
id: 4fcb7ebe-325b-446d-a46b-59187c30e260
name: start_step
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $repo_dirpath_io
branches:
- next
cf841053-2414-48b6-ba6d-0f0f5e11044c:
component_type: PluginRegexNode
id: cf841053-2414-48b6-ba6d-0f0f5e11044c
name: extract_into_list_of_file_diff
description: ''
metadata:
__metadata_info__: {}
inputs:
- description: raw text to extract information from
type: string
title: $raw_pr_diff
outputs:
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches:
- next
input_mapping:
text: $raw_pr_diff
output_mapping:
output: $file_diff_list
regex_pattern: (diff --git[\s\S]*?)(?=diff --git|$)
return_first_match_only: false
component_plugin_name: NodesPlugin
component_plugin_version: 25.4.0.dev0
dd0e56ab-1267-4345-9f59-ecc053baf2af:
component_type: EndNode
id: dd0e56ab-1267-4345-9f59-ecc053baf2af
name: None End node
description: End node representing all transitions to None in the WayFlow
flow
metadata: {}
inputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
outputs:
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches: []
branch_name: next
020c885e-6d0b-472a-bb91-246ab70ab1db:
component_type: StartNode
id: 020c885e-6d0b-472a-bb91-246ab70ab1db
name: __StartStep__
description: ''
metadata:
__metadata_info__: {}
inputs:
- type: string
title: $repo_dirpath_io
outputs:
- type: string
title: $repo_dirpath_io
branches:
- next
a544af64-e63b-4ccf-9ab0-8d25cdbc0b93:
component_type: EndNode
id: a544af64-e63b-4ccf-9ab0-8d25cdbc0b93
name: None End node
description: End node representing all transitions to None in the WayFlow flow
metadata: {}
inputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
outputs:
- type: array
items:
type: string
title: $filepath_list
- type: array
items: {}
title: $nested_comment_list
- type: string
title: $raw_pr_diff
- description: the list of extracted value using the regex "(diff --git[\s\S]*?)(?=diff
--git|$)" from the raw input
type: array
items:
type: string
title: $file_diff_list
default: []
branches: []
branch_name: next
agentspec_version: 25.4.1
You can then load the configuration back to an assistant using the AgentSpecLoader
.
from wayflowcore.agentspec import AgentSpecLoader
tool_registry = {
"local_get_pr_diff_tool": local_get_pr_diff_tool,
"format_git_diff": format_git_diff,
}
assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)
Note
This guide uses the following extension/plugin Agent Spec components:
PluginOutputMessageNode
PluginExtractNode
PluginRegexNode
ExtendedLlmNode
ExtendedToolNode
ExtendedMapNode
See the list of available Agent Spec extension/plugin components in the API Reference
Recap#
In this tutorial you learned how to build a simple PR bot using WayFlow Flows, and learned:
How to use core steps such as the OutputMessageStep and PromptExecutionStep.
How to build and execute tools using the ServerTool and the ToolExecutionStep.
How to extract information using the RegexExtractionStep and the ExtractValueFromJsonStep.
How to apply a sub flow over an iterable data using the MapStep.
Finally, you learned how to structure code when building assistant as code and how to execute and combine sub flows to build complex assistant.
This is an example of the kind of fully featured tool that you can build with WayFlow.
Next Steps#
Now that you learned how to build a PR reviewing assistant, you may want to check our other guides such as:
Full Code#
Click on the card at the top of this page to download the full code for this guide or copy the code below.
1# Copyright © 2025 Oracle and/or its affiliates.
2#
3# This software is under the Universal Permissive License
4# %%[markdown]
5# Tutorial - Build a Simple Code Review Assistant
6# -----------------------------------------------
7
8# How to use:
9# Create a new Python virtual environment and install the latest WayFlow version.
10# ```bash
11# python -m venv venv-wayflowcore
12# source venv-wayflowcore/bin/activate
13# pip install --upgrade pip
14# pip install "wayflowcore==26.1"
15# ```
16
17# You can now run the script
18# 1. As a Python file:
19# ```bash
20# python usecase_prbot.py
21# ```
22# 2. As a Notebook (in VSCode):
23# When viewing the file,
24# - press the keys Ctrl + Enter to run the selected cell
25# - or Shift + Enter to run the selected cell and move to the cell below# (UPL) 1.0 (LICENSE-UPL or https://oss.oracle.com/licenses/upl) or Apache License
26# 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0), at your option.
27
28# nosec
29
30
31from types import MethodType
32from typing import Dict, List
33
34
35# %%[markdown]
36## Define the LLM
37
38# %%
39from wayflowcore.models import VllmModel
40
41llm = VllmModel(
42 model_id="meta-llama/Meta-Llama-3.1-8B-Instruct",
43 host_port="VLLM_HOST_PORT",
44)
45
46# %%[markdown]
47## Define the tool that retrieves the PR diff
48
49# %%
50from wayflowcore.tools import tool
51
52
53@tool(description_mode="only_docstring")
54def local_get_pr_diff_tool(repo_dirpath: str) -> str:
55 """
56 Retrieves code diff with a git command given the
57 path to the repository root folder.
58 """
59 import subprocess
60
61 result = subprocess.run(
62 ["git", "diff", "HEAD"],
63 capture_output=True,
64 cwd=repo_dirpath,
65 text=True,
66 )
67 return result.stdout.strip()
68
69
70# %%[markdown]
71## Define a mocked PR diff
72
73# %%
74MOCK_DIFF = """
75diff --git src://calculators/utils.py dst://calculators/utils.py
76index 12345678..90123456 100644
77--- src://calculators/utils.py
78+++ dst://calculators/utils.py
79@@ -10,6 +10,15 @@
80
81 def calculate_total(data):
82 # TODO: implement tax calculation
83 return data
84
85+def get_items(items=[]):
86+ result = []
87+ for item in items:
88+ result.append(item * 2)
89+ return result
90+
91+def process_numbers(numbers):
92+ res = []
93+ for x in numbers:
94+ res.append(x + 1)
95+ return res
96+
97 def calculate_average(numbers):
98 return sum(numbers) / len(numbers)
99
100
101diff --git src://example/utils.py dst://example/utils.py
102index 000000000..123456789
103--- /dev/null
104+++ dst://example/utils.py
105@@ -0,0 +1,20 @@
106+# Copyright © 2024 Oracle and/or its affiliates.
107+
108+def calculate_sum(numbers=[]):
109+ total = 0
110+ for num in numbers:
111+ total += num
112+ return total
113+
114+
115+def process_data(data):
116+ # TODO: Handle exceptions here
117+ result = data * 2
118+ return result
119+
120+
121+def main():
122+ numbers = [1, 2, 3, 4, 5]
123+ result = calculate_sum(numbers)
124+ print("Sum:", result)
125+ data = 10
126+ processed_data = process_data(data)
127+ print("Processed Data:", processed_data)
128+
129+
130+if __name__ == "__main__":
131+ main()
132""".strip()
133
134
135
136# %%[markdown]
137## Create the flow that retrieves the diff of a PR
138
139# %%
140from wayflowcore.controlconnection import ControlFlowEdge
141from wayflowcore.dataconnection import DataFlowEdge
142from wayflowcore.flow import Flow
143from wayflowcore.property import StringProperty
144from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep
145
146# IO Variable Names
147REPO_DIRPATH_IO = "$repo_dirpath_io"
148PR_DIFF_IO = "$raw_pr_diff"
149FILE_DIFF_LIST_IO = "$file_diff_list"
150
151# Define the steps
152
153start_step = StartStep(name="start_step", input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])
154
155# Step 1: Retrieve the pull request diff using the local tool
156get_pr_diff_step = ToolExecutionStep(
157 name="get_pr_diff",
158 tool=local_get_pr_diff_tool,
159 raise_exceptions=True,
160 input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
161 output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
162)
163
164# Step 2: Extract the file diffs from the raw diff using a regular expression
165extract_into_list_of_file_diff_step = RegexExtractionStep(
166 name="extract_into_list_of_file_diff",
167 regex_pattern=r"(diff --git[\s\S]*?)(?=diff --git|$)",
168 return_first_match_only=False,
169 input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
170 output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
171)
172
173# Define the sub flow
174retrieve_diff_subflow = Flow(
175 name="Retrieve PR diff flow",
176 begin_step=start_step,
177 control_flow_edges=[
178 ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
179 ControlFlowEdge(
180 source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
181 ),
182 ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
183 ],
184 data_flow_edges=[
185 DataFlowEdge(
186 source_step=start_step,
187 source_output=REPO_DIRPATH_IO,
188 destination_step=get_pr_diff_step,
189 destination_input=REPO_DIRPATH_IO,
190 ),
191 DataFlowEdge(
192 source_step=get_pr_diff_step,
193 source_output=PR_DIFF_IO,
194 destination_step=extract_into_list_of_file_diff_step,
195 destination_input=PR_DIFF_IO,
196 ),
197 ],
198)
199
200
201# %%[markdown]
202## Alternative step that retrieves the PR diff through an API call
203
204# %%
205from wayflowcore.steps import ApiCallStep
206
207# IO Variable Names
208USER_PROVIDED_TOKEN_IO = "$user_provided_token"
209REPO_WORKSPACE_IO = "$repo_workspace"
210REPO_SLUG_IO = "$repo_slug"
211PULL_REQUEST_ID_IO = "$pull_request_id"
212PR_DIFF_IO = "$raw_pr_diff"
213
214get_pr_diff_step = ApiCallStep(
215 url="https://example.com/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}.diff",
216 method="GET",
217 headers={"Authorization": "Bearer {{token}}"},
218 ignore_bad_http_requests=False,
219 num_retry_on_bad_http_request=3,
220 store_response=True,
221 input_mapping={
222 "token": USER_PROVIDED_TOKEN_IO,
223 "workspace": REPO_WORKSPACE_IO,
224 "repo_slug": REPO_SLUG_IO,
225 "pr_id": PULL_REQUEST_ID_IO,
226 },
227 output_mapping={ApiCallStep.HTTP_RESPONSE: PR_DIFF_IO},
228)
229
230
231# %%[markdown]
232## Test the flow that retrieves the PR diff
233
234# %%
235from wayflowcore.executors.executionstatus import FinishedStatus
236
237# Replace the path below with the path to your actual codebase sample git repository.
238PATH_TO_DIR = "path/to/repository_root"
239
240test_conversation = retrieve_diff_subflow.start_conversation(
241 inputs={
242 REPO_DIRPATH_IO: PATH_TO_DIR,
243 }
244)
245
246execution_status = test_conversation.execute()
247
248if not isinstance(execution_status, FinishedStatus):
249 raise ValueError("Unexpected status type")
250
251FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]
252
253print(FILE_DIFF_LIST[0])
254
255
256# %%[markdown]
257## Define the tool that formats the diff for the LLM
258
259# %%
260PR_BOT_CHECKS = [
261 """
262Name: TODO_WITHOUT_TICKET
263Description: TODO comments should reference a ticket number for tracking.
264Example code:
265```python
266# TODO: Add validation here
267def process_user_input(data):
268 return data
269```
270Example comment:
271[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
272""",
273 """
274Name: MUTABLE_DEFAULT_ARGUMENT
275Description: Using mutable objects as default arguments can lead to unexpected behavior.
276Example code:
277```python
278def add_item(item, items=[]):
279 items.append(item)
280 return items
281```
282Example comment:
283[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
284""",
285 """
286Name: NON_DESCRIPTIVE_NAME
287Description: Variable names should clearly indicate their purpose or content.
288Example code:
289```python
290def process(lst):
291 res = []
292 for i in lst:
293 res.append(i * 2)
294 return res
295```
296Example comment:
297[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
298""",
299]
300
301CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)
302
303PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}
304
305## Context
306The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
307After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.
308
309## Input
310### Code diff
311{{diff}}
312
313## Task
314Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
315The rules are the following:
316
317{{checks}}
318
319### Reponse Format
320You need to return a review as a json as follows:
321```json
322[
323 {
324 "content": "the comment as a text",
325 "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
326 "line": "line number where the comment applies"
327 },
328 …
329]
330```
331Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
332If there are no comments and the code seems fine, just output an empty JSON list."""
333
334
335@tool(description_mode="only_docstring")
336def format_git_diff(diff_text: str) -> str:
337 """
338 Formats a git diff by adding line numbers to each line except removal lines.
339 """
340
341 def pad_number(number: int, width: int) -> str:
342 """Right-align a number with specified width using space padding."""
343 return str(number).rjust(width)
344
345 LINE_NUMBER_WIDTH = 5
346 PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
347 current_line_number = 0
348 formatted_lines = []
349
350 for line in diff_text.split("\n"):
351 # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
352 if line.startswith("@@"):
353 try:
354 # Extract the starting line number and line count
355 _, position_info, _ = line.split("@@")
356 new_file_info = position_info.split()[1][1:] # Remove the '+' prefix
357 start_line, line_count = map(int, new_file_info.split(","))
358
359 current_line_number = start_line
360 formatted_lines.append(line)
361 continue
362
363 except (ValueError, IndexError):
364 raise ValueError(f"Invalid diff header format: {line}")
365
366 # Handle content lines
367 if current_line_number > 0 and line:
368 if not line.startswith("-"):
369 # Add line number for added/context lines
370 line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
371 formatted_lines.append(f"{line_prefix} {line}")
372 current_line_number += 1
373 else:
374 # Just add padding for removal lines
375 formatted_lines.append(" " * PADDING_WIDTH + line)
376
377 return "\n".join(formatted_lines)
378
379
380# %%[markdown]
381## Create the flow that generates review comments
382
383# %%
384from wayflowcore._utils._templating_helpers import render_template_partially
385from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
386from wayflowcore.steps import (
387 ExtractValueFromJsonStep,
388 MapStep,
389 OutputMessageStep,
390 PromptExecutionStep,
391 ToolExecutionStep,
392)
393
394# IO Variable Names
395DIFF_TO_STRING_IO = "$diff_to_string"
396DIFF_WITH_LINES_IO = "$diff_with_lines"
397FILEPATH_IO = "$filename"
398JSON_COMMENTS_IO = "$json_comments"
399EXTRACTED_COMMENTS_IO = "$extracted_comments"
400NESTED_COMMENT_LIST_IO = "$nested_comment_list"
401FILEPATH_LIST_IO = "$filepath_list"
402
403# Define the steps
404
405# Step 1: Format the diff to a string
406format_diff_to_string_step = OutputMessageStep(
407 name="format_diff_to_string",
408 message_template="{{ message | string }}",
409 output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
410)
411
412# Step 2: Add lines on the diff using a tool
413add_lines_on_diff_step = ToolExecutionStep(
414 name="add_lines_on_diff",
415 tool=format_git_diff,
416 input_mapping={"diff_text": DIFF_TO_STRING_IO},
417 output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
418)
419
420# Step 3: Extract the file path from the diff string using a regular expression
421extract_file_path_step = RegexExtractionStep(
422 name="extract_file_path",
423 regex_pattern=r"diff --git a/(.+?) b/",
424 return_first_match_only=True,
425 input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
426 output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
427)
428
429# Step 4: Generate comments using a prompt
430generate_comments_step = PromptExecutionStep(
431 name="generate_comments",
432 prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
433 llm=llm,
434 input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
435 output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
436)
437
438# Step 5: Extract comments from the JSON output
439# Define the value type for extracted comments
440comments_valuetype = ListProperty(
441 name="values",
442 description="The extracted comments content and line number",
443 item_type=DictProperty(value_type=AnyProperty()),
444)
445extract_comments_from_json_step = ExtractValueFromJsonStep(
446 name="extract_comments_from_json",
447 output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
448 retry=True,
449 llm=llm,
450 input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
451 output_mapping={"values": EXTRACTED_COMMENTS_IO},
452)
453
454# Define the sub flow to generate comments for each file diff
455generate_comments_subflow = Flow(
456 name="Generate review comments flow",
457 begin_step=format_diff_to_string_step,
458 control_flow_edges=[
459 ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
460 ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
461 ControlFlowEdge(extract_file_path_step, generate_comments_step),
462 ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
463 ControlFlowEdge(extract_comments_from_json_step, None),
464 ],
465 data_flow_edges=[
466 DataFlowEdge(
467 format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
468 ),
469 DataFlowEdge(
470 format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
471 ),
472 DataFlowEdge(
473 add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
474 ),
475 DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
476 DataFlowEdge(
477 generate_comments_step,
478 JSON_COMMENTS_IO,
479 extract_comments_from_json_step,
480 JSON_COMMENTS_IO,
481 ),
482 ],
483)
484
485# Use the MapStep to apply the sub flow to each file
486for_each_file_step = MapStep(
487 flow=generate_comments_subflow,
488 unpack_input={"message": "."},
489 input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
490 output_descriptors=[
491 ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
492 ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
493 ],
494 output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
495)
496
497generate_all_comments_subflow = Flow.from_steps([for_each_file_step])
498
499
500# %%[markdown]
501## Test the flow that generates review comments
502
503# %%
504# we reuse the FILE_DIFF_LIST from the previous test
505test_conversation = generate_all_comments_subflow.start_conversation(
506 inputs={
507 FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
508 }
509)
510
511execution_status = test_conversation.execute()
512
513if not isinstance(execution_status, FinishedStatus):
514 raise ValueError("Unexpected status type")
515
516NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
517FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
518print(NESTED_COMMENT_LIST[0])
519print(FILEPATH_LIST)
520
521
522
523# %%[markdown]
524## Create tool that formats the review comments
525
526# %%
527@tool(description_mode="only_docstring")
528def flatten_information(
529 nested_comments_list: List[List[Dict[str, str]]], filepath_list: List[str]
530) -> List[Dict[str, str]]:
531 """Flattens information from comments and filepaths."""
532 if len(nested_comments_list) != len(filepath_list):
533 raise ValueError(
534 f"Inconsistent list lengths ({len(nested_comments_list)=} and {len(filepath_list)=})"
535 )
536
537 result: List[Dict[str, str]] = []
538 for comments_list, filepath in zip(nested_comments_list, filepath_list):
539 for comment_dict in comments_list:
540 result.append(
541 {
542 **{key: str(value) for key, value in comment_dict.items()},
543 "path": filepath,
544 }
545 )
546
547 return result
548
549
550# %%[markdown]
551## Create flow that posts review comments to bitbucket
552
553# %%
554import json
555
556# IO Values
557PR_POST_URL_IO = "$pr_post_url"
558FLATTENED_COMMENT_LIST_IO = "$flattened_comment_list"
559FINAL_HTTP_CODES_IO = "$http_codes"
560
561# Define the steps
562
563# Step 1: Flatten the generated comments into a list of comments
564flatten_nested_comments_list_step = ToolExecutionStep(
565 name="flatten_nested_comment_list",
566 tool=flatten_information,
567 input_mapping={
568 "nested_comments_list": NESTED_COMMENT_LIST_IO,
569 "filepath_list": FILEPATH_LIST_IO,
570 },
571 output_mapping={ToolExecutionStep.TOOL_OUTPUT: FLATTENED_COMMENT_LIST_IO},
572)
573
574# Step 2: Post the comments to bitbucket
575post_comment_step = ApiCallStep(
576 url="https://example.com/rest/api/latest/projects/{{workspace}}/repos/{{repo_slug}}/pull-requests/{{pr_id}}/comments?diffType=EFFECTIVE&markup=true&avatarSize=48",
577 method="POST",
578 json_body=json.dumps(
579 {
580 "text": "{{content}}",
581 "severity": "NORMAL",
582 "anchor": {
583 "diffType": "EFFECTIVE",
584 "path": "{{path}}",
585 "lineType": "ADDED",
586 "line": "{{line | int}}",
587 "fileType": "TO",
588 },
589 }
590 ),
591 headers={"Accept": "application/json", "Authorization": "Bearer {{token}}"},
592 ignore_bad_http_requests=False,
593 num_retry_on_bad_http_request=3,
594 store_response=True,
595 input_mapping={
596 "token": USER_PROVIDED_TOKEN_IO,
597 "workspace": REPO_WORKSPACE_IO,
598 "repo_slug": REPO_SLUG_IO,
599 "pr_id": PULL_REQUEST_ID_IO,
600 },
601)
602
603post_comments_mapstep = MapStep(
604 name="post_comment",
605 flow=Flow.from_steps([post_comment_step]),
606 unpack_input={"content": ".content", "line": ".line", "path": ".path"},
607 input_mapping={MapStep.ITERATED_INPUT: FLATTENED_COMMENT_LIST_IO},
608 output_descriptors=[ApiCallStep.HTTP_STATUS_CODE],
609 output_mapping={ApiCallStep.HTTP_STATUS_CODE: FINAL_HTTP_CODES_IO},
610)
611
612post_comments_subflow = Flow(
613 name="Post comments to PR flow",
614 begin_step=flatten_nested_comments_list_step,
615 control_flow_edges=[
616 ControlFlowEdge(flatten_nested_comments_list_step, post_comments_mapstep),
617 ControlFlowEdge(post_comments_mapstep, None),
618 ],
619 data_flow_edges=[
620 DataFlowEdge(
621 flatten_nested_comments_list_step,
622 FLATTENED_COMMENT_LIST_IO,
623 post_comments_mapstep,
624 FLATTENED_COMMENT_LIST_IO,
625 )
626 ],
627)
628from wayflowcore.steps.step import StepResult
629
630
631async def _mock_api_post_step_invoke(self, inputs, conversation):
632 output_values = {ApiCallStep.HTTP_RESPONSE: MOCK_DIFF, ApiCallStep.HTTP_STATUS_CODE: 200}
633 return StepResult(
634 outputs=output_values,
635 )
636
637
638post_comment_step.invoke_async = MethodType(_mock_api_post_step_invoke, post_comment_step)
639
640
641# %%[markdown]
642## Test flow that posts review comments
643
644# %%
645# we reuse the NESTED_COMMENT_LIST and FILEPATH_LIST from the previous test
646
647test_conversation = post_comments_subflow.start_conversation(
648 inputs={
649 USER_PROVIDED_TOKEN_IO: "MY_TOKEN",
650 REPO_WORKSPACE_IO: "MY_REPO_WORKSPACE",
651 REPO_SLUG_IO: "MY_REPO_SLUG",
652 PULL_REQUEST_ID_IO: "MY_REPO_ID",
653 NESTED_COMMENT_LIST_IO: NESTED_COMMENT_LIST,
654 FILEPATH_LIST_IO: FILEPATH_LIST,
655 }
656)
657execution_status = test_conversation.execute()
658
659if not isinstance(execution_status, FinishedStatus):
660 raise ValueError("Unexpected status type")
661
662FINAL_HTTP_CODES = execution_status.output_values[FINAL_HTTP_CODES_IO]
663print(FINAL_HTTP_CODES)
664
665
666# %%[markdown]
667## Create flow that performs the review
668
669# %%
670from wayflowcore.steps import FlowExecutionStep
671
672
673# Steps
674retrieve_diff_flowstep = FlowExecutionStep(name="retrieve_diff_flowstep", flow=retrieve_diff_subflow)
675generate_all_comments_flowstep = FlowExecutionStep(
676 name="generate_comments_flowstep",
677 flow=generate_all_comments_subflow,
678)
679
680pr_bot = Flow(
681 name="PR bot flow",
682 begin_step=retrieve_diff_flowstep,
683 control_flow_edges=[
684 ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
685 ControlFlowEdge(generate_all_comments_flowstep, None),
686 ],
687 data_flow_edges=[
688 DataFlowEdge(
689 retrieve_diff_flowstep,
690 FILE_DIFF_LIST_IO,
691 generate_all_comments_flowstep,
692 FILE_DIFF_LIST_IO,
693 )
694 ],
695)
696
697
698# %%[markdown]
699## Tests flow that performs the review
700
701# %%
702# Replace the path below with the path to your actual codebase sample git repository.
703PATH_TO_DIR = "path/to/repository_root"
704
705conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})
706
707execution_status = conversation.execute()
708
709if not isinstance(execution_status, FinishedStatus):
710 raise ValueError("Unexpected status type")
711
712print(execution_status.output_values)
713
714NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
715
716
717# %%[markdown]
718## Export config to Agent Spec
719
720# %%
721from wayflowcore.agentspec import AgentSpecExporter
722
723serialized_assistant = AgentSpecExporter().to_json(pr_bot)
724
725
726# %%[markdown]
727## Load Agent Spec config
728
729# %%
730from wayflowcore.agentspec import AgentSpecLoader
731
732tool_registry = {
733 "local_get_pr_diff_tool": local_get_pr_diff_tool,
734 "format_git_diff": format_git_diff,
735}
736
737assistant = AgentSpecLoader(tool_registry=tool_registry).load_json(serialized_assistant)