# Build a Simple Code Review Assistant

> **Prerequisites**
> 
> This guide does not assume any prior knowledge about Project WayFlow. However, it assumes the reader has a basic knowledge of LLMs.
>
> You will need a working installation of WayFlow - see [Installation](https://TODO/development/docs/installation.html).

## Learning goals

In this use-case tutorial, you will build a more advanced WayFlow application, a **Pull Request (PR) Reviewing Assistant**, using a WayFlow Flow to automate basic reviews of Python source code.

In this tutorial you will:

1. Learn the basics of using Flows to build an assistant.
2. Learn how to compose multiple sub-flows to create a more complex Flow.
3. Learn more about building Tools that can be used within your Flows.

## Introduction to the task

Code reviews are crucial for maintaining code quality and reviewers often spend considerable time pointing out routine issues such as the presence of debug statements, formatting inconsistencies, or common coding convention violations that may not be fully captured by static code analysis tools. This consumes valuable time that could be spent on reviewing more important things such as the core logic, architecture, and business requirements.

Building an agent with WayFlow to perform such code reviews has a number of advantages:

1. Review rules can be written using natural language. This can make an agent much more flexible than simple static checker.
2. More general issues can be captured. You can allow the LLM to infer from the rule to more general cases that could be missed by a simple static checker.
3. New review rules can be learned by looking at comment traces

In this tutorial, you will create a WayFlow Flow assistant designed to scan Python pull requests for common oversights such as:

1. Having TODO comments without associated tickets.
2. Using unclear or ambiguous variable naming.
3. Using risky Python code practices such as mutable defaults.

To build this assistant you will break the task into configuration and two sub-flows that will be composed into a single flow:

![PR Review Bot Chematic Diagram.](https://TODO/development/docs/_images/prbot_main.svg)


1. Configure your application, choose an LLM and import required modules [Part 1].
2. The first sub-flow retrieves and diffs information from a local codebase in a Git repository [*Part 2*].
3. The second sub-flow iterates over the file diffs using a [MapStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.mapstep.MapStep) and generates comments with an LLM using the [PromptExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.promptexecutionstep.PromptExecutionStep) [Step 3].

You will also learn how to extract information using the [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep) and the [ExtractValueFromJsonStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.extractvaluefromjsonstep.ExtractValueFromJsonStep), and how to build and execute tools with the [ServerTool](https://TODO/development/docs/api/tools.html#wayflowcore.tools.servertools.ServerTool) and the [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep).

> **Note**
>
> This is not a production-ready code review assistant that can be used as-is.

## Setup

First, let’s set up the environment. For this tutorial you need to have wayflowcore installed.

Next download the example codebase Git repository, :download:`example codebase Git repository <../_static/usecases/agentix.zip>`. This will be used

Extract the codebase Git repository folder from the compressed archive. Make a note of where the codebase Git repository is extracted to.


## Part 1: Imports and LLM configuration

First, set up the environment. For this tutorial you need to have wayflowcore installed, for additional information, read the [Installation guide](https://TODO/development/docs/installation.html).Then configure a Large Language Model (LLM) to use. WayFlow supports several LLMs API providers, to learn more about the supported LLM providers please read our guide, how to use LLMs from different providers.

Choose an LLM from one of the options below:

> **Note**
> API keys should never be stored in code. Use environment variables and/or tools such as [python-dotenv](https://pypi.org/project/python-dotenv) instead.
>

#### OCI GenAI (Cohere)

In [None]:
from wayflowcore.models import OCIGenAIModel

if __name__ == "__main__":

    llm = OCIGenAIModel(
        model_id="cohere.model-id",
        service_endpoint="https://url-to-service-endpoint.com",
        compartment_id="compartment-id",
        auth_type="API_KEY",
    )

#### Hosted Model

In [None]:
from wayflowcore.models import VllmModel

llm = VllmModel(
    model_id="model-id",
    host_port="VLLM_HOST_PORT",
)

#### Ollama

In [None]:
from wayflowcore.models import OllamaModel

llm = OllamaModel(
    model_id="model-id",
)

Be cautious when using external LLM providers and ensure that you comply with your organization's security policies and any applicable laws and regulations. Consider using a self-hosted LLM solution or a provider that offers on-premises deployment options if you need to maintain strict control over your code and data.

## Part 2: Retrieve the PR diff information

The first phase of the assistant requires retrieving information about the code diffs from a code repository. You have already extracted the sample
codebase git repository to your local environment.

This will be a sub-flow that consists of two simple steps:

* [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep): that collects PR diff information using a Python subprocess to run the git command.
* [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep): which separate the raw diff information into diffs for each file.

First, take a look at what a diff looks like. The following example shows how a real diff appears when using Git:

In [None]:
MOCK_DIFF = """
diff --git src://calculators/utils.py dst://calculators/utils.py
index 12345678..90123456 100644
--- src://calculators/utils.py
+++ dst://calculators/utils.py
@@ -10,6 +10,15 @@

 def calculate_total(data):
     # TODO: implement tax calculation
     return data

+def get_items(items=[]):
+    result = []
+    for item in items:
+        result.append(item * 2)
+    return result
+
+def process_numbers(numbers):
+    res = []
+    for x in numbers:
+        res.append(x + 1)
+    return res
+
 def calculate_average(numbers):
     return sum(numbers) / len(numbers)


diff --git src://example/utils.py dst://example/utils.py
index 000000000..123456789
--- /dev/null
+++ dst://example/utils.py
@@ -0,0 +1,20 @@
+# Copyright (C) 2024 Oracle and/or its affiliates.
+
+def calculate_sum(numbers=[]):
+    total = 0
+    for num in numbers:
+        total += num
+    return total
+
+
+def process_data(data):
+    # TODO: Handle exceptions here
+    result = data * 2
+    return result
+
+
+def main():
+    numbers = [1, 2, 3, 4, 5]
+    result = calculate_sum(numbers)
+    print("Sum:", result)
+    data = 10
+    processed_data = process_data(data)
+    print("Processed Data:", processed_data)
+
+
+if __name__ == "__main__":
+    main()
""".strip()

**Reading a diff**: Removals are identified by the "-" marks and additions by the "+" marks. In this example, there were only additions.

The diff above contains information about two files, ``calculators/utils.py`` and ``example/utils.py``. This is an example diff and it is different
from the diff that will be generated from the sample codebase. It is included here to show how a Git diff looks and is shorter than the diff that you generate from the sample codebase.

### Build a tool

To extract the diffs from the codebase git repository you need to create a tool, using the [ServerTool](https://TODO/development/docs/api/tools.html#wayflowcore.tools.servertools.ServerTool), that will do this.

The function, ``local_get_pr_diff_tool``, in the code below does the work of extracting the diffs by running the `git diff HEAD` shell command and capturing the output. It uses a subprocess to run the shell command.

To turn this function into a WayFlow tool, a `@tool` annotation is used to create a [ServerTool](https://TODO/development/docs/api/tools.html#wayflowcore.tools.servertools.ServerTool) from the function.

In [None]:
from wayflowcore.tools import tool

@tool(description_mode="only_docstring")
def local_get_pr_diff_tool(repo_dirpath: str) -> str:
    """
    Retrieves code diff with a git command given the
    path to the repository root folder.
    """
    import subprocess

    result = subprocess.run(
        ["git", "diff", "HEAD"],
        capture_output=True,
        cwd=repo_dirpath,
        text=True,
    )
    return result.stdout.strip()

### Building the steps and the sub-flow

Let's write the code for the first sub-flow.

In [None]:
from wayflowcore.controlconnection import ControlFlowEdge
from wayflowcore.dataconnection import DataFlowEdge
from wayflowcore.flow import Flow
from wayflowcore.property import StringProperty
from wayflowcore.steps import RegexExtractionStep, StartStep, ToolExecutionStep

# Step Names
START_STEP = "start_step"
GET_PR_DIFF_STEP = "get_pr_diff"
EXTRACT_INTO_LIST_OF_FILE_DIFF_STEP = "extract_into_list_of_file_diff"

# IO Variable Names
REPO_DIRPATH_IO = "$repo_dirpath_io"
PR_DIFF_IO = "$raw_pr_diff"
FILE_DIFF_LIST_IO = "$file_diff_list"

# Define the steps

start_step = StartStep(input_descriptors=[StringProperty(name=REPO_DIRPATH_IO)])

# Step 1: Retrieve the pull request diff using the local tool
get_pr_diff_step = ToolExecutionStep(
    tool=local_get_pr_diff_tool,
    raise_exceptions=True,
    input_mapping={"repo_dirpath": REPO_DIRPATH_IO},
    output_mapping={ToolExecutionStep.TOOL_OUTPUT: PR_DIFF_IO},
)

# Step 2: Extract the file diffs from the raw diff using a regular expression
extract_into_list_of_file_diff_step = RegexExtractionStep(
    regex_pattern="(diff --git[\s\S]*?)(?=diff --git|$)",
    return_first_match_only=False,
    input_mapping={RegexExtractionStep.TEXT: PR_DIFF_IO},
    output_mapping={RegexExtractionStep.OUTPUT: FILE_DIFF_LIST_IO},
)

# Define the sub flow
retrieve_diff_subflow = Flow(
    begin_step=start_step,
    steps={
        START_STEP: start_step,
        GET_PR_DIFF_STEP: get_pr_diff_step,
        EXTRACT_INTO_LIST_OF_FILE_DIFF_STEP: extract_into_list_of_file_diff_step,
    },
    control_flow_edges=[
        ControlFlowEdge(source_step=start_step, destination_step=get_pr_diff_step),
        ControlFlowEdge(
            source_step=get_pr_diff_step, destination_step=extract_into_list_of_file_diff_step
        ),
        ControlFlowEdge(source_step=extract_into_list_of_file_diff_step, destination_step=None),
    ],
    data_flow_edges=[
        DataFlowEdge(
            source_step=start_step,
            source_output=REPO_DIRPATH_IO,
            destination_step=get_pr_diff_step,
            destination_input=REPO_DIRPATH_IO,
        ),
        DataFlowEdge(
            source_step=get_pr_diff_step,
            source_output=PR_DIFF_IO,
            destination_step=extract_into_list_of_file_diff_step,
            destination_input=PR_DIFF_IO,
        ),
    ],
)

**API Reference:** [Flow](https://TODO/development/docs/api/flows.html#wayflowcore.flow.Flow) | [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep) | [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep)

The code does the following:

#. It lists the names of the steps and input/output variables for the sub-flow.
#. It then creates the different steps within the sub-flow.
#. Finally, it instantiates the sub-flow. This will be covered in more detail later in the tutorial.

For clarity, the variable names are also prefixed with a dollar ($) sign. This is not necessary and is only done for code clarity. The variable
`REPO_DIRPATH_IO` is used to hold the file path to the sample codebase Git repository and you will use this to pass in the location of the codebase Git repository.

Additionally, you can give explicit names to the input/output variables used in the Flow, e.g. "$repo_dirpath_io" for the variable holding the path to the local repository. Finally, we define those explicit names as string variables (e.g. ``REPO_DIRPATH_IO``) to minimize the number of magic strings in the code.

> **See also**
> To learn about the basics of Flows, check out our, :doc:`introductory tutorial on WayFlow Flows <basic_flow>`.

Now take a look at each of the steps used in the sub-flow in more detail.

#### Get the PR diff, `get_pr_diff_step`

This uses a [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep) to gather the diff information - see the notes on how this is done earlier. When creating it, you need to
provide the following:

* `tool`: Specifies the tool that will called within the step. This is the tool that was created earlier, `local_get_pr_diff_tool`.
* `raise_exceptions`: Whether to raise exceptions generated by the tool that is called. Here it is set to `True` and so exceptions will be raised.
* `input_mapping`: Specifies the names used for the input parameters of the step. See :ref:`ToolExecutionStep <toolexecutionstep>` for more details on using an ``input_mapping`` with this type of step.
* `output_mapping`: Specifies the name used foe the output parameter of the step. The name held in ``PR_DIFF_IO`` will be mapped to the name for the output parameter of the step. Again, see :ref:`ToolExecutionStep <toolexecutionstep>` for more details on using an ``output_mapping`` with this type of step.

#### Extract file diffs into a list, `extract_into_list_of_file_diff_step`

You now have the diff information from the PR. This step performs a regex extraction on the raw diff text to extract the code to review.

Use a [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep) to perform this action. When creating the step, you need to provide the following:

* `regex_pattern`: The regex pattern for the extraction. This uses `re.findall` underneath.
* `return_first_match_only`: You want to return all results, so set this to `False`.
* `input_mapping`: Specifies the names used for the input parameters of the step. The input parameter will be mapped to the name, held in ``PR_DIFF_IO``. See :ref:`RegexExtractionStep <regexextractionstep>` for more details on using an ``input_mapping`` with this type of step.
* `output_mapping`: Specifies the name used for the output parameter of the step. Here, the default name ``RegexExtractionStep.TEXT`` is renamed to the name defined in ``PR_DIFF_IO``. Again, see :ref:`RegexExtractionStep <regexextractionstep>` for more details on using an ``output_mapping`` with this type of step.

**About the pattern:**

    (diff --git[\s\S]*?)(?=diff --git|$)

The pattern looks for text starting with `diff --git`, followed by any characters (both whitespace `[\s]` and non-whitespace `[\S]`), until it
encounters either another `diff --git` or the end of the text ($). However, it does not include the next `diff --git` or the end in the match.

The `\*?` makes it "lazy" or non-greedy, meaning it takes the shortest possible match, rather than the longest.

> **Tip**
> Recent Large Language Models are very helpful tools to create, debug and explain Regex patterns given a natural language
> description.

Finally, create the sub-flow using the [Flow](https://TODO/development/docs/api/flows.html#wayflowcore.flow.Flow) class. You specify the steps in the Flow, the starting step of the Flow, the transitions between steps and how data, from the variables, is to pass from one step to the next.

### Defining a Flow

Defining the Flow is the last step in the code shown above. There are a couple of things that are worth highlighting:

* `begin_step`: A start step needs to be defined for a [Flow](https://TODO/development/docs/api/flows.html#wayflowcore.flow.Flow).
* `steps`: A list of the steps that make up the [Flow](https://TODO/development/docs/api/flows.html#wayflowcore.flow.Flow).
* `control_flow_edges`: The transitions between the steps in the [Flow](https://TODO/development/docs/api/flows.html#wayflowcore.flow.Flow) are defined as [ControlFlowEdges](https://TODO/development/docs/api/flows.html#wayflowcore.controlconnection.ControlFlowEdge). They have a `source_step`, which defines the start of a transition, and a `destination_step`, which defines the destination of a transition. All transitions for the flow will need to be defined.
* `data_flow_edges`: Maps the variables between steps connected by a transition using [DataFlowEdges](https://TODO/development/docs/api/flows.html#wayflowcore.dataconnection.DataFlowEdge). It maps variables from a source step into variables in a destination step. You only need to do this for the variables that need to be passed between steps.

### Testing the flow

You can test this sub-flow by creating an assistant conversation with the `start_conversation` method and specifying the inputs, in this case
the location of the Git repository. The conversation can then be executed with the conversation `execute` method, into which you pass the conversation object.
This returns an object that represents the status of the conversation which you can check to confirm that the conversation has successfully finished.

The code below shows how the inputs are passed in. Set the `PATH_TO_DIR` to the actual path you extracted the sample codebase Git repository to. You then extract the outputs from the conversation.

Create a variable that will point to the location of your actual codebase sample Git repository.

In [None]:
# Replace the path below with the path to your actual codebase sample git repository.
#PATH_TO_DIR = "path/to/repository_root"
PATH_TO_DIR = "/Users/krifoste/Downloads/agentix"

The full code for testing the sub-flow is shown below:

In [None]:
from wayflowcore.executors.executionstatus import FinishedStatus

test_conversation = retrieve_diff_subflow.start_conversation(
    inputs={
        REPO_DIRPATH_IO: PATH_TO_DIR,
    }
)

execution_status = test_conversation.execute()

assert isinstance(execution_status, FinishedStatus)
FILE_DIFF_LIST = execution_status.output_values[FILE_DIFF_LIST_IO]

print(FILE_DIFF_LIST[0])

**API Reference:** [Flow](https://TODO/development/docs/api/flows.html#wayflowcore.flow.Flow)

## Part 3: Review the list of diffs

Now that we have a list of diffs for each file, we can review them and generate comments using an LLM.

This task can be broken into a sub-flow made up of five steps:

* [OutputMessageStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.outputmessagestep.OutputMessageStep): This converts the file diff list into a string to be processed by the following steps.
* [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep): This prefixes the diffs with line numbers for additional context to the LLM.
* [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep): This extracts the file path from the diff string.
* [PromptExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.promptexecutionstep.PromptExecutionStep): This generates comments using the LLM based on a list of user-defined checks.
* [ExtractValueFromJsonStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.extractvaluefromjsonstep.ExtractValueFromJsonStep): This extracts the comments and lines they apply to from the LLM output.

![PR Review Bot Chematic Diagram.](https://TODO/development/docs/_images/prbot_generate_comment.svg)

### Build the tools and checks

Before creating the steps and sub-flow to generate the comments, it is important to define the list of checks the assistant should perform,
along with any specific instructions. Additionally, a tool must be created to prefix the diffs with line numbers, allowing the LLM to determine
where to add comments.

Below is the full code to achieve this. It is broken into sections so that you can see, in detail, what is happening in each part.

In [None]:
PR_BOT_CHECKS = [
    """
Name: TODO_WITHOUT_TICKET
Description: TODO comments should reference a ticket number for tracking.
Example code:
```python
# TODO: Add validation here
def process_user_input(data):
    return data
```
Example comment:
[BOT] TODO_WITHOUT_TICKET: TODO comment should reference a ticket number for tracking (e.g., "TODO: Add validation here (TICKET-1234)").
""",
    """
Name: MUTABLE_DEFAULT_ARGUMENT
Description: Using mutable objects as default arguments can lead to unexpected behavior.
Example code:
```python
def add_item(item, items=[]):
    items.append(item)
    return items
```
Example comment:
[BOT] MUTABLE_DEFAULT_ARGUMENT: Avoid using mutable default arguments. Use None and initialize in the function: `def add_item(item, items=None): items = items or []`
""",
    """
Name: NON_DESCRIPTIVE_NAME
Description: Variable names should clearly indicate their purpose or content.
Example code:
```python
def process(lst):
    res = []
    for i in lst:
        res.append(i * 2)
    return res
```
Example comment:
[BOT] NON_DESCRIPTIVE_NAME: Use more descriptive names: 'lst' could be 'numbers', 'res' could be 'doubled_numbers', 'i' could be 'number'
""",
]

CONCATENATED_CHECKS = "\n\n---\n\n".join(check for check in PR_BOT_CHECKS)

PROMPT_TEMPLATE = """You are a very experienced code reviewer. You are given a git diff on a file: {{filename}}

## Context
The git diff contains all changes of a single file. All lines are prepended with their number. Lines without line number where removed from the file.
After the line number, a line that was changed has a "+" before the code. All lines without a "+" are just here for context, you will not comment on them.

## Input
### Code diff
{{diff}}

## Task
Your task is to review these changes, according to different rules. Only comment lines that were added, so the lines that have a + just after the line number.
The rules are the following:

{{checks}}

### Reponse Format
You need to return a review as a json as follows:
```json
[
    {
        "content": "the comment as a text",
        "suggestion": "if the change you propose is a single line, then put here the single line rewritten that includes your proposal change. IMPORTANT: a single line, which will erase the current line. Put empty string if no suggestion of if the suggestion is more than a single line",
        "line": "line number where the comment applies"
    },
    …
]
```
Please use triple backticks ``` to delimitate your JSON list of comments. Don't output more than 5 comments, only comment the most relevant sections.
If there are no comments and the code seems fine, just output an empty JSON list."""


# Tools
@tool(description_mode="only_docstring")
def format_git_diff(diff_text: str) -> str:
    """
    Formats a git diff with line numbers everywhere.
    """

    def pad_number(number: int, width: int) -> str:
        """Right-align a number with specified width using space padding."""
        return str(number).rjust(width)

    LINE_NUMBER_WIDTH = 5
    PADDING_WIDTH = LINE_NUMBER_WIDTH + 1
    current_line_number = 0
    formatted_lines = []

    for line in diff_text.split("\n"):
        # Handle diff header lines (e.g., "@@ -1,7 +1,6 @@")
        if line.startswith("@@"):
            try:
                # Extract the starting line number and line count
                _, position_info, _ = line.split("@@")
                new_file_info = position_info.split()[1][1:]  # Remove the '+' prefix
                start_line, line_count = map(int, new_file_info.split(","))

                current_line_number = start_line
                formatted_lines.append(line)
                continue

            except (ValueError, IndexError):
                raise ValueError(f"Invalid diff header format: {line}")

        # Handle content lines
        if current_line_number > 0 and line:
            if not line.startswith("-"):
                # Add line number for added/context lines
                line_prefix = pad_number(current_line_number, LINE_NUMBER_WIDTH)
                formatted_lines.append(f"{line_prefix} {line}")
                current_line_number += 1
            else:
                # Just add padding for removal lines
                formatted_lines.append(" " * PADDING_WIDTH + line)

    return "\n".join(formatted_lines)

**API Reference:** [ExtractValueFromJsonStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.extractvaluefromjsonstep.ExtractValueFromJsonStep) | [MapStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.mapstep.MapStep) |
[OutputMessageStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.outputmessagestep.OutputMessageStep) | [PromptExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.promptexecutionstep.PromptExecutionStep) | [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep)

#### Checks and LLM instructions

You will use three simple checks that are shown below. For each check you specify a name, a description of what the LLM should be checking,
as well as a code and expected comment example so that the LLM gets a better understanding of what the task is about.

The prompt uses a simple structure:

#. **Role Definition**: Define who/what you want the LLM to act as (e.g., "You are a very experienced code reviewer").
#. **Context Section**: Provide relevant background information or specific circumstances that frame the task.
#. **Input Section**: Specify the exact information, data, or materials that the LLM will be provided with.
#. **Task Section**: Clearly state what you want the LLM to do with the input provided.
#. **Response Format Section**: Define how you want the response to be structured or formatted (e.g., bullet points, JSON, with XML tags, and so on).

The prompts are defined in the array, `PR_BOT_CHECKS`. The individual prompts for the checks are then concatenated into a single string,
`CONCATENATED_CHECKS`, so that it can be used inside the system prompt you will be passing to the LLM.

Define a system prompt, or prompt template, `PROMPT_TEMPLATE`. It contains placeholders for the diff and the checks that will be replaced when specialising
the prompt for each diff.

> **Tip**
> **How to write high-quality prompts**
>
>   There is no consensus on what makes the best LLM prompt. However, it is noted that for recent LLMs, a great strategy
>   to use to prompt an LLM is simply to be very specific about the task to be solved, giving enough context and explaining
>   potential edge cases to consider.
>
>   Given a prompt, try to determine whether giving the set of instructions to an experienced colleague, that has no prior
>   context about the task, to solve would be sufficient for them to get to the intended result.

#### Diff formatting tool

You next need to create a tool to format the diffs in a manner that makes them consumable
by the LLM. A tool, as you will have already seen, is a simple wrapper around a `python` callable that makes it useable within a flow.

The function, `format_git_diff`, in the code above does the work of formatting the diffs.

> **See also**
> 
>    For more information about WayFlow tools please read our guide, [How to use tools](https://TODO/development/docs/howtoguides/howto_build_assistants_with_tools.html).
>

#### Building the steps and the sub-flow

With the prompts and diff formatting tool written you can now build the second sub-flow. This sub-flow will iterate over the diffs, generated
previously, and then use an LLM to generate review comments from them.

In [None]:
from wayflowcore._utils._templating_helpers import render_template_partially
from wayflowcore.property import AnyProperty, DictProperty, ListProperty, StringProperty
from wayflowcore.steps import (
    ExtractValueFromJsonStep,
    MapStep,
    OutputMessageStep,
    PromptExecutionStep,
    ToolExecutionStep,
)

# Step Names
FORMAT_DIFF_TO_STRING_STEP = "format_diff_to_string"
EXTRACT_FILE_PATH_STEP = "extract_file_path"
EXTRACT_COMMENTS_FROM_JSON_STEP = "extract_comments_from_json"
GENERATE_COMMENTS_STEP = "generate_comments"
ADD_LINES_ON_DIFF_STEP = "add_lines_on_diff"

# IO Variable Names
DIFF_TO_STRING_IO = "$diff_to_string"
DIFF_WITH_LINES_IO = "$diff_with_lines"
FILEPATH_IO = "$filename"
JSON_COMMENTS_IO = "$json_comments"
EXTRACTED_COMMENTS_IO = "$extracted_comments"
NESTED_COMMENT_LIST_IO = "$nested_comment_list"
FILEPATH_LIST_IO = "$filepath_list"

# Define the steps

# Step 1: Format the diff to a string
format_diff_to_string_step = OutputMessageStep(
    message_template="{{ message | string }}",
    output_mapping={OutputMessageStep.OUTPUT: DIFF_TO_STRING_IO},
)

# Step 2: Add lines on the diff using a tool
add_lines_on_diff_step = ToolExecutionStep(
    tool=format_git_diff,
    input_mapping={"diff_text": DIFF_TO_STRING_IO},
    output_mapping={ToolExecutionStep.TOOL_OUTPUT: DIFF_WITH_LINES_IO},
)

# Step 3: Extract the file path from the diff string using a regular expression
extract_file_path_step = RegexExtractionStep(
    regex_pattern=r"diff --git a/(.+?) b/",
    return_first_match_only=True,
    input_mapping={RegexExtractionStep.TEXT: DIFF_TO_STRING_IO},
    output_mapping={RegexExtractionStep.OUTPUT: FILEPATH_IO},
)

# Step 4: Generate comments using a prompt
generate_comments_step = PromptExecutionStep(
    prompt_template=render_template_partially(PROMPT_TEMPLATE, {"checks": CONCATENATED_CHECKS}),
    llm=llm,
    input_mapping={"diff": DIFF_WITH_LINES_IO, "filename": FILEPATH_IO},
    output_mapping={PromptExecutionStep.OUTPUT: JSON_COMMENTS_IO},
)

# Step 5: Extract comments from the JSON output
# Define the value type for extracted comments
comments_valuetype = ListProperty(
    name="values",
    description="The extracted comments content and line number",
    item_type=DictProperty(value_type=AnyProperty()),
)
extract_comments_from_json_step = ExtractValueFromJsonStep(
    output_values={comments_valuetype: '[.[] | {"content": .["content"], "line": .["line"]}]'},
    retry=True,
    llm=llm,
    input_mapping={ExtractValueFromJsonStep.TEXT: JSON_COMMENTS_IO},
    output_mapping={"values": EXTRACTED_COMMENTS_IO},
)

# Define the sub flow to generate comments for each file diff
generate_comments_subflow = Flow(
    begin_step=format_diff_to_string_step,
    steps={
        FORMAT_DIFF_TO_STRING_STEP: format_diff_to_string_step,
        ADD_LINES_ON_DIFF_STEP: add_lines_on_diff_step,
        EXTRACT_FILE_PATH_STEP: extract_file_path_step,
        GENERATE_COMMENTS_STEP: generate_comments_step,
        EXTRACT_COMMENTS_FROM_JSON_STEP: extract_comments_from_json_step,
    },
    control_flow_edges=[
        ControlFlowEdge(format_diff_to_string_step, add_lines_on_diff_step),
        ControlFlowEdge(add_lines_on_diff_step, extract_file_path_step),
        ControlFlowEdge(extract_file_path_step, generate_comments_step),
        ControlFlowEdge(generate_comments_step, extract_comments_from_json_step),
        ControlFlowEdge(extract_comments_from_json_step, None),
    ],
    data_flow_edges=[
        DataFlowEdge(
            format_diff_to_string_step, DIFF_TO_STRING_IO, add_lines_on_diff_step, DIFF_TO_STRING_IO
        ),
        DataFlowEdge(
            format_diff_to_string_step, DIFF_TO_STRING_IO, extract_file_path_step, DIFF_TO_STRING_IO
        ),
        DataFlowEdge(
            add_lines_on_diff_step, DIFF_WITH_LINES_IO, generate_comments_step, DIFF_WITH_LINES_IO
        ),
        DataFlowEdge(extract_file_path_step, FILEPATH_IO, generate_comments_step, FILEPATH_IO),
        DataFlowEdge(
            generate_comments_step,
            JSON_COMMENTS_IO,
            extract_comments_from_json_step,
            JSON_COMMENTS_IO,
        ),
    ],
)

# Use the MapStep to apply the sub flow to each file
for_each_file_step = MapStep(
    flow=generate_comments_subflow,
    unpack_input={"message": "."},
    input_mapping={MapStep.ITERATED_INPUT: FILE_DIFF_LIST_IO},
    output_descriptors=[
        ListProperty(name=NESTED_COMMENT_LIST_IO, item_type=AnyProperty()),
        ListProperty(name=FILEPATH_LIST_IO, item_type=StringProperty()),
    ],
    output_mapping={EXTRACTED_COMMENTS_IO: NESTED_COMMENT_LIST_IO, FILEPATH_IO: FILEPATH_LIST_IO},
)

generate_all_comments_subflow = Flow.from_steps([for_each_file_step])

**API Reference:** [Property](https://TODO/development/docs/api/flows.html#wayflowcore.property.Property) | [ListProperty](https://TODO/development/docs/api/flows.html#wayflowcore.property.ListProperty) | [DictProperty](https://TODO/development/docs/api/flows.html#wayflowcore.property.DictProperty) | [StringProperty](https://TODO/development/docs/api/flows.html#wayflowcore.property.StringProperty) |
[ExtractValueFromJsonStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.extractvaluefromjsonstep.ExtractValueFromJsonStep) | [MapStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.mapstep.MapStep) | [OutputMessageStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.outputmessagestep.OutputMessageStep) | [PromptExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.promptexecutionstep.PromptExecutionStep) | [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep)

Take a look at each of the steps used in the sub-flow to get an understanding of what is happening.

#### Format diff to string, `format_diff_to_string_step`

This step converts the file diff list into a string so that it can be used by the following steps.

This is done with the `string` Jinja filter as follows: `{{ message | string }}`. It uses an [OutputMessageStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.outputmessagestep.OutputMessageStep)
to achieve this.

#### Add lines to the diff, `add_lines_on_diff_step`

This step prefixes the diff with the line numbers required to review comments. It uses a, [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep), to run the
tool that you previously defined in order to do this.

The input to the tool, within the I/O dictionary, is specified using the `input_mapping`. For all these steps, it is important to remember
that the outputs of one step are linked to the inputs of the next.

#### Extract file path, `extract_file_path_step`

This extracts the file path from the diff string. The file path is needed for assigning the review comments. The [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep) step
is used to extract the file path from the diff.

The regular expression is applied to the diff string, extracted form the input map using the ``input_mapping`` parameter.

Note: Compared to the [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep) used in Part 1, here only the first matchs is required.

#### Generate comments, `generate_comments_step`

This generates comments using the LLM and the prompt template defined earlier. The [PromptExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.promptexecutionstep.PromptExecutionStep) step executes
the prompt with the LLM defined earlier in this tutorial.

Since the list of checks has already been defined, the template can be pre-rendered using the `render_template_partially` method. This renders the parts of the
template that have been provided, while the remaining information is gathered from the I/O dictionary.

#### Extract comments from JSON, `extract_comments_from_json_step`

This extracts the comments and line numbers from the generated LLM output, which is a serialized JSON structure due to the prompt used.
A [ExtractValueFromJsonStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.extractvaluefromjsonstep.ExtractValueFromJsonStep) is used to do the extraction. When creating the step, specify the following in addition to the usual `input_mapping` and `output_mapping`:

* `output_values`: This defines the [JQ](https://jqlang.github.io/jq/) query to extract the comments form the JSON generated by the LLM.
* `llms`: An LLM that can be used to help resolve any parsing errors. This is related to `retry`.
* `retry`: If parsing fails, you may want to retry. This is set to `True`, which results in trying to use the the LLM to help resolve any such issues.

#### Create the sub-flow, `generate_comments_subflow`

Here you define what steps are in the sub-flow, what the transitions between the steps are and what will be the starting step. This is exactly
the same process you did previously when defining the sub-flow to fetch the PR data.

#### Applying the comment generation to all file diffs

Now that you have the sub-flow create, you need to apply it to every file diff. This is done using a [MapStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.mapstep.MapStep). `MapStep` takes a sub-flow as input, in this case, the `generate_comments_subflow`,
and applies it to an iterable—in this case, the list of file diffs.

You simply specify:

* `flow`: The sub-flow to map, that is applied to the iterable.
* `unpack_input`: Defines how to unpack the input. A [JQ](https://jqlang.github.io/jq/)  query can be used to transform the input, but in this case, it is kept as a list.
* `input_mapping`: Defines what the sub-flow will iterate over. The key, [MapStep.ITERATED_INPUT](https://TODO/development/docs/api/flows.html#wayflowcore.steps.mapstep.MapStep.ITERATED_INPUT), is used to pass in the diffs.
* `output_descriptors`: Specifies the values to collect from the output generated by applying the sub-flow. In this case, these will be the generated comments and the associated file path.

.. note::
   The :ref:`MapStep <mapstep>` works similarly to how the Python map function works. For more information, see
   https://docs.python.org/3/library/functions.html#map

Finally, create the sub-flow to generate all comments using the helper method ``create_single_step_flow``.

#### Testing the sub-flow

You can test the sub-flow by creating a conversation, as shown in the code below, and specifying the inputs as done in, `Part 2: Retrieve the PR diff information`.

Since each sub-flow is tested independently, you can reuse the output from the first sub-flow.

In [None]:
# we reuse the FILE_DIFF_LIST from the previous test
test_conversation = generate_all_comments_subflow.start_conversation(
    inputs={
        FILE_DIFF_LIST_IO: FILE_DIFF_LIST,
    }
)

execution_status = test_conversation.execute()

assert isinstance(execution_status, FinishedStatus)
NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]
FILEPATH_LIST = execution_status.output_values[FILEPATH_LIST_IO]
print(NESTED_COMMENT_LIST[0])
print(FILEPATH_LIST)

## Building the final Flow

Congratulations! You have completed the three sub-flows, which, when combined into a single flow, will retrieve the PR diff information, generate comments on the diffs using an LLM.

You will wire the sub-flows that you have built together by wrapping them in a [FlowExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.flowexecutionstep.FlowExecutionStep).
The [FlowExecutionSteps](https://TODO/development/docs/api/flows.html#wayflowcore.steps.flowexecutionstep.FlowExecutionStep) are then composed into the final combined Flow.

The code for this is shown below:

In [None]:
from wayflowcore.steps import FlowExecutionStep

# Step Names
RETRIEVE_DIFF_FLOWSTEP = "retrieve_diff_flowstep"
GENERATE_COMMENTS_FLOWSTEP = "generate_comments_flowstep"

# Steps
retrieve_diff_flowstep = FlowExecutionStep(flow=retrieve_diff_subflow)
generate_all_comments_flowstep = FlowExecutionStep(flow=generate_all_comments_subflow)

pr_bot = Flow(
    begin_step=retrieve_diff_flowstep,
    steps={
        RETRIEVE_DIFF_FLOWSTEP: retrieve_diff_flowstep,
        GENERATE_COMMENTS_FLOWSTEP: generate_all_comments_flowstep,
    },
    control_flow_edges=[
        ControlFlowEdge(retrieve_diff_flowstep, generate_all_comments_flowstep),
        ControlFlowEdge(generate_all_comments_flowstep, None),
    ],
    data_flow_edges=[
        DataFlowEdge(
            retrieve_diff_flowstep,
            FILE_DIFF_LIST_IO,
            generate_all_comments_flowstep,
            FILE_DIFF_LIST_IO,
        )
    ],
)

**API Reference:** [Flow](https://TODO/development/docs/api/flows.html#wayflowcore.flow.Flow) | [FlowExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.flowexecutionstep.FlowExecutionStep)

### Testing the combined assistant

You can now run the PR bot end-to-end on your repo or locally.

Set the `PATH_TO_DIR` to the actual path you extracted the sample codebase git repository to. You can also see how the output of the conversation
is extracted from the `execution_status` object, `execution_status.output_values`.

In [None]:
conversation = pr_bot.start_conversation(inputs={REPO_DIRPATH_IO: PATH_TO_DIR})

execution_status = conversation.execute()

assert isinstance(execution_status, FinishedStatus)
print(execution_status.output_values)

NESTED_COMMENT_LIST = execution_status.output_values[NESTED_COMMENT_LIST_IO]

## Conclusion

In this tutorial you learned how to build a simple PR bot using WayFlow Flows, and learned:

- How to use core steps such as the [OutputMessageStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.outputmessagestep.OutputMessageStep) and [PromptExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.promptexecutionstep.PromptExecutionStep).
- How to build and execute tools using the [ServerTool](https://TODO/development/docs/api/tools.html#wayflowcore.tools.servertools.ServerTool) and the [ToolExecutionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.toolexecutionstep.ToolExecutionStep).
- How to extract information using the [RegexExtractionStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.regexextractionstep.RegexExtractionStep) and the [ExtractValueFromJsonStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.textextractionstep.extractvaluefromjsonstep.ExtractValueFromJsonStep).
- How to apply a sub flow over an iterable data using the [MapStep](https://TODO/development/docs/api/flows.html#wayflowcore.steps.mapstep.MapStep).

Finally, you learned how to structure code when building assistant as code and how to execute and combine sub flows to build complex assistant.

This is an example of the kind of fully featured tool that you can build with WayFlow.

## Next Steps

Now that you learned how to build a PR reviewing assistant, you may want to check our other guides such as:

- [Build a Simple Assistant with Agents](https://TODO/development/docs/tutorials/basic_agent.html).
- [How to Catch Exceptions in Flows](https://TODO/development/docs/howtoguides/catching_exceptions.html).