Extending pipelines

Motivation

In the previous tutorial, we showed how Spannerlib can be used to build sophisticated LLM agents by combining LLMs with structured IE functions. The agent was able to be elegantly described by a small number of logical statements using spannerlog with the use of very generic IE functions.

However, an elegant codebase is not worth much if it is hard to extend. In this tutorial, we will show how easy it is to extend spannerlib code, demonstrating that the spannerlib framework can be used to create modular code that is easily modifiable.

Our usecase will be to extend our previous code documentation agent with two of the most commonly used prompt augmentation techniques used when building LLM agents. Namely:

Retrival Augmented Generation (RAG)
Few-shot Prompting

We will introduce both briefly here.

RAG - is a technique that utilized a vector database to dynamically augment an LLM prompt with information that might be relevant and helpful for the LLM in answering the prompt well. RAG requires a vector database, in which documents are stored along side an embedded vector representation of themselves. In RAG, before calling an LLM with a given quetion/task, we:

Embed the question as a vector.
Look for similar document in our database, based on vector similarity measures.
Add these documents to the prompt.
Call the LLM function.

In our use-case, we will demonstrate how to add RAG over stackoverflow posts to add better context to our code completion agent.

Few-shot Prompting - is a technique for tweaking a prompt to elicit answers that use the type of semantics/style/reasoning that we would like the LLM to perform. To do so, for a given task, we augment the prompt by adding question answer pairs of similar tasks to the prompt. This will help condition the activation of the LLMs towards similar answer that are relevant for our question.

In our use-case, we will demonstrate how to add Few-shot Prompting using a database that collected question answer pairs that were given a positive review by a given user, to implement a userfeed back system. This system will enable us to improve our system per user without requiring any fine tuning of the model’s weights.

Problem definition

Given:

A collection of python files.
A cursor position in a python file.
A vector database populated with high quality answers from stack overflow
A database that contains question answer pairs from previous tasks that got positive feedback from users

Return:

A doc string of the python function that wraps the position of our cursor.

We will reuse all previously introduced IE functions and add a new one:

vector_search(query_document,k,namespace)->(similar_document) which uses an external vector database and given a query document and a number \(k\) returns \(k\) similar documents from a given namespace in the vector database.
- Note that vector DBs include namespaces for their data to enable categorising vectors and querying per category.

TLDR

%%spannerlog
# get documents from a vector db based on a prompt
RagContext(cursor,lex_concat(context))<-
    DocumentFunctionPrompt(cursor,prompt),
    vector_search(prompt,4,'stackoverflow')->(context,similarity_score).

# inject documents into a prompt as context
RagPrompt(cursor,prompt)<-
    RagContext(cursor,context),
    DocumentFunctionPrompt(cursor,document_promps),
    format($rag_prompt,context,document_promps)->(prompt).

# format all user approved completion into q,a pairs
FewShotExamples(user,lex_concat(qa_pair))<-
    PositiveFeedback(user,q,a),
    format($single_example_template,q,a)->(qa_pair).

# build a few shot prompt from approved completions
FewShotPrompt(user,prompt)<-
    FewShotExamples(user,examples),
    format($fewshot_template,examples)->(prompt).

# combine the few shot prompt with the rag prompt
# and call the llm to get the answer
FewShotRagDocument(user,cursor,prompt)<-
    PerUserCompletion(user,cursor),
    FewShotPrompt(user,few_shot_prompt),
    RagPrompt(cursor,rag_prompt),
    format("{} {}",rag_prompt,few_shot_prompt)->(prompt),
    llm($model,prompt)->(answer).

Importing IE functions and Logic from previous tutorials

# importing dependencies
import re
import pandas as pd
from pandas import DataFrame
from pathlib import Path
from spannerlib.utils import load_env
from spannerlib import get_magic_session,Session,Span
import ast

# load openAI api key from env file
load_env()

IE functions and logic from previous implementations

from spannerlib.tutorials.basic import llm,format_ie,string_schema,_get_client
from spannerlib.tutorials.copilot import ast_xpath,ast_to_span,lex_concat

sess = get_magic_session()
sess.register('llm',llm,[str,str],[str])
sess.register('format', format_ie, string_schema,[str])
sess.register('ast_xpath',ast_xpath,[(str,Path,Span),str],[ast.AST])
sess.register('ast_to_span',ast_to_span,[(str,Span,Path),ast.AST],[Span])
sess.register_agg('lex_concat',lex_concat,[(str,Span)],[str])

code_file = Path('copilot_data/example_code.py')

example_files = pd.DataFrame([(Span(code_file),)])
cursors =pd.DataFrame([(Span(code_file,16,17),)])

sess.import_rel('Files',example_files)
sess.import_rel('Cursors',cursors)

func_document_template = """system: based on the following context:
{}
Explain the following function:
{}
In the format of a doc string.
"""
sess.import_var('func_document_template',func_document_template)

FuncDefSpan(span,name)<-
    Files(text),
    ast_xpath(text, "//FunctionDef")->(node),
    ast_to_span(text,node)->(span),
    expr_eval("{0}.name",node)->(name).

FuncCallSpan(span,name)<-
    Files(text),
    ast_xpath(text, "//Call/func/Name")->(node),
    ast_to_span(text,node)->(span),
    as_str(span)->(name).

CursorWrappingFunc(cursor,name)<-
    Cursors(cursor),
    FuncDefSpan(span,name),
    span_contained(cursor,span)->(True).

Mentions(lex_concat(caller_span),called_name)<-
    FuncCallSpan(called_span,called_name),
    FuncDefSpan(caller_span,caller_name),
    span_contained(called_span,caller_span)->(True).

model = 'gpt-3.5-turbo'
DocumentFunctionPrompt(cursor,prompt)<-
    CursorWrappingFunc(cursor,name),
    Mentions(mentions,name),
    FuncDefSpan(def_span,name),
    as_str(def_span)->(def_string),
    format($func_document_template,mentions,def_string)->(prompt).

DocumentFunction(cursor,answer)<-
    DocumentFunctionPrompt(cursor,prompt),
    llm($model,prompt)->(answer).

Adding RAG

building a vecdb IE function

If the implementation details are not of interest, feel free to move to the next section.

import faiss
import numpy as np
import openai
from collections import defaultdict
from openai import OpenAI

def get_openai_embeddings(texts):
    client = _get_client()
    response = client.embeddings.create(
        model="text-embedding-ada-002",  # or another embedding model
        input=texts
    )
    embeddings = [item.embedding for item in response.data]
    return np.array(embeddings)

class VecDB():
    def __init__(self):
        self.index_map={}# namespace: index
        self.doc_map=defaultdict(list)# namespace: list of docs
        self.dim = 1536
    def add_index(self,namespace):
        self.index_map[namespace] = faiss.IndexFlatL2(self.dim)

    def add_docs(self,documents,namespace='default'):
        if not namespace in self.index_map:
            self.add_index(namespace)
        documents = [str(doc) for doc in documents]
        embeddings = get_openai_embeddings(documents)
        self.index_map[namespace].add(embeddings.astype('float32'))
        self.doc_map[namespace].extend(documents)

    def search(self, query, k=1,namespace='default'):
        query_embedding = get_openai_embeddings([query])[0]
        index = self.index_map[namespace]
        documents = self.doc_map[namespace]
        D, I = index.search(np.array([query_embedding]).astype('float32'), k)
        return [(documents[i], float(D[0][j])) for j, i in enumerate(I[0])]

documents = [
    "FAISS is a library for efficient similarity search.",
    "Vector databases are crucial for RAG pipelines.",
    "FAISS was developed by Facebook AI Research.",
    "RAG combines retrieval and generation for better results."
]

db=VecDB()
db.add_docs(documents)
db.search("RAG?",4)

[('RAG combines retrieval and generation for better results.',
  0.22323353588581085),
 ('Vector databases are crucial for RAG pipelines.', 0.3760342001914978),
 ('FAISS was developed by Facebook AI Research.', 0.5168014168739319),
 ('FAISS is a library for efficient similarity search.', 0.5336617231369019)]

sess.register('vector_search',db.search,[(str,Span),int,str],[str,float])

Adding stack overflow posts to vector DB

docs = Path('copilot_data/stackoverflow_posts.txt').read_text().split('DELIM')
docs = [doc.strip() for doc in docs]
docs

['1. **Use clear and concise language**\n   Always strive for clarity in your documentation. Use simple, straightforward language and provide examples:\n\n   ```python\n   def calculate_area(length, width):\n       """\n       Calculate the area of a rectangle.\n\n       :param length: The length of the rectangle\n       :param width: The width of the rectangle\n       :return: The area of the rectangle\n       """\n       return length * width\n   ```',
 '2. **Include code examples with comments**\n   Provide relevant code snippets with inline comments to explain each step:\n\n   ```javascript\n   // Function to calculate factorial\n   function factorial(n) {\n       if (n === 0 || n === 1) {\n           return 1; // Base case: 0! and 1! are 1\n       } else {\n           return n * factorial(n - 1); // Recursive case\n       }\n   }\n   ```',
 "3. **Structure your documentation with markdown**\n   Use markdown to structure your documentation for better readability:\n\n   ```markdown\n   # My Project\n\n   ## Installation\n   ```bash\n   npm install my-project\n   ```\n\n   ## Usage\n   ```javascript\n   const myProject = require('my-project');\n   myProject.doSomething();\n   ```\n   ```",
 '4. **Write for your audience with examples**\n   Adjust your language and examples based on your audience:\n\n   ```python\n   # For beginners\n   name = input("What\'s your name? ")\n   print(f"Hello, {name}!")\n\n   # For advanced users\n   def greet(name: str) -> str:\n       return f"Hello, {name}!"\n   ```',
 '5. **Keep it up-to-date with version information**\n   Include version information and update logs:\n\n   ```python\n   """\n   MyModule - A helpful utility\n\n   Version: 1.2.3\n   Last Updated: 2024-07-30\n\n   Changelog:\n   - 1.2.3: Fixed bug in process_data function\n   - 1.2.2: Added new feature X\n   """\n\n   def process_data(data):\n       # Implementation here\n       pass\n   ```',
 '6. **Use diagrams and visuals with code**\n   Include ASCII diagrams or links to visual aids in your code comments:\n\n   ```python\n   def binary_search(arr, target):\n       """\n       Performs binary search.\n\n       ASCII Diagram:\n       [1, 3, 5, 7, 9]\n        ^     ^     ^\n       low   mid   high\n\n       :param arr: Sorted array\n       :param target: Target value\n       :return: Index of target or -1 if not found\n       """\n       # Implementation here\n       pass\n   ```',
 '7. **Provide a table of contents with code sections**\n   For longer documents, include a table of contents with links to code sections:\n\n   ```markdown\n   # Table of Contents\n   1. [Installation](#installation)\n   2. [Usage](#usage)\n   3. [API Reference](#api-reference)\n\n   ## Installation\n   ```bash\n   pip install mypackage\n   ```\n\n   ## Usage\n   ```python\n   import mypackage\n   mypackage.function()\n   ```\n\n   ## API Reference\n   ```python\n   def function(param1, param2):\n       """Detailed function description"""\n       pass\n   ```\n   ```',
 '8. **Use consistent formatting**\n   Maintain consistent formatting throughout your documentation:\n\n   ```python\n   def function_one(param1: int, param2: str) -> bool:\n       """Does something."""\n       pass\n\n   def function_two(param1: float, param2: list) -> dict:\n       """Does something else."""\n       pass\n   ```',
 '9. **Include a "Getting Started" section with code**\n   Provide a quick start guide with simple code examples:\n\n   ```python\n   # Getting Started with MyLibrary\n\n   # 1. Import the library\n   import mylibrary\n\n   # 2. Create an instance\n   my_instance = mylibrary.MyClass()\n\n   # 3. Use a basic function\n   result = my_instance.do_something()\n\n   # 4. Print the result\n   print(result)\n   ```',
 '10. **Document error messages and troubleshooting steps**\n    Include common error messages and their solutions:\n\n    ```python\n    try:\n        result = 10 / 0\n    except ZeroDivisionError as e:\n        print(f"Error: {e}")\n        print("Solution: Ensure the divisor is not zero.")\n    ```',
 '11. **Use version control for documentation**\n    Show how to include documentation in version control:\n\n    ```bash\n    # Add documentation to git\n    git add docs/\n\n    # Commit changes\n    git commit -m "Updated API documentation for v2.0"\n\n    # Push to remote repository\n    git push origin main\n    ```',
 '12. **Provide examples of input and output**\n    When documenting functions or APIs, include examples of expected inputs and outputs:\n\n    ```python\n    def square(n):\n        """\n        Return the square of a number.\n\n        :param n: The number to square\n        :return: The square of the input number\n\n        Example:\n        >>> square(4)\n        16\n        >>> square(-3)\n        9\n        """\n        return n ** 2\n    ```',
 '13. **Use docstrings for inline documentation**\n    Use docstrings to provide inline documentation:\n\n    ```python\n    class MyClass:\n        """\n        A class that represents MyClass.\n\n        Attributes:\n            attr1 (int): Description of attr1\n            attr2 (str): Description of attr2\n        """\n\n        def __init__(self, attr1, attr2):\n            self.attr1 = attr1\n            self.attr2 = attr2\n\n        def my_method(self, param1):\n            """\n            Description of my_method.\n\n            :param param1: Description of param1\n            :return: Description of return value\n            """\n            pass\n    ```',
 '14. **Include a changelog in your code**\n    Maintain a changelog to track major changes:\n\n    ```python\n    """\n    Changelog:\n\n    v1.1.0 (2024-07-30):\n    - Added new feature X\n    - Fixed bug in function Y\n\n    v1.0.1 (2024-07-15):\n    - Updated documentation\n    - Performance improvements\n\n    v1.0.0 (2024-07-01):\n    - Initial release\n    """\n\n    # Your code here\n    ```',
 "15. **Provide context and explanations in comments**\n    Don't just describe what something does, explain why it's important:\n\n    ```python\n    # We use a cache to store expensive computation results\n    # This significantly improves performance for repeated calls\n    cache = {}\n\n    def expensive_function(n):\n        if n in cache:\n            return cache[n]\n        result = # ... some expensive computation\n        cache[n] = result\n        return result\n    ```",
 '16. **Use links effectively in documentation**\n    Link to related sections or external resources:\n\n    ```python\n    """\n    For more information on this module, see:\n    - [API Documentation](https://example.com/api-docs)\n    - [Usage Examples](https://example.com/examples)\n    - Related function: `other_function()`\n    """\n\n    def my_function():\n        pass\n\n    def other_function():\n        pass\n    ```',
 "17. **Include a search function (for online docs)**\n    For online documentation, implement a search feature. Here's a simple JavaScript example:\n\n    ```javascript\n    function searchDocs() {\n        var input = document.getElementById('searchInput').value.toLowerCase();\n        var elements = document.getElementsByClassName('searchable');\n        \n        for (var i = 0; i < elements.length; i++) {\n            var content = elements[i].textContent.toLowerCase();\n            if (content.includes(input)) {\n                elements[i].style.display = 'block';\n            } else {\n                elements[i].style.display = 'none';\n            }\n        }\n    }\n    ```",
 '18. **Write clear method and function signatures**\n    Clearly document the parameters, return values, and any exceptions:\n\n    ```python\n    def process_data(data: List[Dict[str, Any]],\n                     options: Optional[Dict[str, Any]] = None) -> Tuple[List[Any], int]:\n        """\n        Process the input data according to specified options.\n\n        :param data: A list of dictionaries containing the input data\n        :param options: Optional dictionary of processing options\n        :return: A tuple containing the processed data and a status code\n        :raises ValueError: If the input data is empty or invalid\n        """\n        if not data:\n            raise ValueError("Input data cannot be empty")\n        \n        # Processing logic here\n        \n        return processed_data, status_code\n    ```',
 '19. **Use meaningful variable and function names**\n    Choose descriptive names that convey the purpose or functionality:\n\n    ```python\n    def calculate_total_price(item_prices: List[float], tax_rate: float) -> float:\n        """\n        Calculate the total price including tax.\n\n        :param item_prices: List of individual item prices\n        :param tax_rate: The tax rate as a decimal (e.g., 0.08 for 8%)\n        :return: The total price including tax\n        """\n        subtotal = sum(item_prices)\n        tax_amount = subtotal * tax_rate\n        total_price = subtotal + tax_amount\n        return total_price\n    ```',
 '20. **Include a license and contribution guidelines**\n    For open-source projects, clearly state the license and provide contribution guidelines:\n\n    ```python\n    """\n    MyProject - A helpful Python utility\n\n    Copyright (c) 2024 Your Name\n\n    Licensed under the MIT License.\n    See LICENSE file for details.\n\n    Contribution Guidelines:\n    1. Fork the repository\n    2. Create a new branch for your feature\n    3. Write tests for your changes\n    4. Ensure all tests pass\n    5. Submit a pull request\n\n    For more details, see CONTRIBUTING.md\n    """\n\n    # Your code here\n    ```']

db.add_docs(docs,namespace='stackoverflow')

VecDBQueryExample(relevant_docs,similarity_score)<-
    vector_search('python',4,'stackoverflow')->(relevant_docs,similarity_score).

?VecDBQueryExample(relevant_docs,similarity_score)

'?VecDBQueryExample(relevant_docs,similarity_score)'

relevant_docs	similarity_score
16. Use links effectively in documentation Link to related sections or external resources: ```python """ For more information on this module, see: - [API Documentation](https://example.com/api-docs) - [Usage Examples](https://example.com/examples) - Related function: `other_function()` """ def my_function(): pass def other_function(): pass ```	0.463158
20. Include a license and contribution guidelines For open-source projects, clearly state the license and provide contribution guidelines: ```python """ MyProject - A helpful Python utility Copyright (c) 2024 Your Name Licensed under the MIT License. See LICENSE file for details. Contribution Guidelines: 1. Fork the repository 2. Create a new branch for your feature 3. Write tests for your changes 4. Ensure all tests pass 5. Submit a pull request For more details, see CONTRIBUTING.md """ # Your code here ```	0.474433
4. Write for your audience with examples Adjust your language and examples based on your audience: ```python # For beginners name = input("What's your name? ") print(f"Hello, {name}!") # For advanced users def greet(name: str) -> str: return f"Hello, {name}!" ```	0.479981
9. Include a "Getting Started" section with code Provide a quick start guide with simple code examples: ```python # Getting Started with MyLibrary # 1. Import the library import mylibrary # 2. Create an instance my_instance = mylibrary.MyClass() # 3. Use a basic function result = my_instance.do_something() # 4. Print the result print(result) ```	0.455388

Extending our pipeline

# let us recall the prompt we get from the original implementation

?DocumentFunctionPrompt(C,P)

'?DocumentFunctionPrompt(C,P)'

C	P
[@example_code.py,16,17) "x"	system: based on the following context: def g(x,y): return f(x,y)**2 def method(self, y): return f(self.x, y) Explain the following function: def f(x,y): x+y In the format of a doc string.

We use our previous prompt to query our stackoverflow database for relevant posts. We concat our relevant documents to one string so we can format them in a new prompt.

RagContext(cursor,lex_concat(context))<-
    DocumentFunctionPrompt(cursor,prompt),
    vector_search(prompt,4,'stackoverflow')->(context,similarity_score).
?RagContext(cursor,context)

'?RagContext(cursor,context)'

cursor	context
[@example_code.py,16,17) "x"	1. Use clear and concise language Always strive for clarity in your documentation. Use simple, straightforward language and provide examples: ```python def calculate_area(length, width): """ Calculate the area of a rectangle. :param length: The length of the rectangle :param width: The width of the rectangle :return: The area of the rectangle """ return length * width ``` 12. Provide examples of input and output When documenting functions or APIs, include examples of expected inputs and outputs: ```python def square(n): """ Return the square of a number. :param n: The number to square :return: The square of the input number Example: >>> square(4) 16 >>> square(-3) 9 """ return n 2 ``` 13. Use docstrings for inline documentation Use docstrings to provide inline documentation: ```python class MyClass: """ A class that represents MyClass. Attributes: attr1 (int): Description of attr1 attr2 (str): Description of attr2 """ def __init__(self, attr1, attr2): self.attr1 = attr1 self.attr2 = attr2 def my_method(self, param1): """ Description of my_method. :param param1: Description of param1 :return: Description of return value """ pass ``` 15. Provide context and explanations in comments** Don't just describe what something does, explain why it's important: ```python # We use a cache to store expensive computation results # This significantly improves performance for repeated calls cache = {} def expensive_function(n): if n in cache: return cache[n] result = # ... some expensive computation cache[n] = result return result ```

Now we will build a rag template

rag_prompt = """system: Based on the following context
{}
answer the following question
{}
"""

sess.import_var('rag_prompt',rag_prompt)

And compose our old prompt with the context we got, ultimately sending it to our llm.

RagPrompt(cursor,prompt)<-
    RagContext(cursor,context),
    DocumentFunctionPrompt(cursor,document_promps),
    format($rag_prompt,context,document_promps)->(prompt).

RagCompletion(cursor,answer)<-
    RagPrompt(cursor,prompt),
    llm($model,prompt)->(answer).

?RagCompletion(cursor,answer)

'?RagCompletion(cursor,answer)'

cursor	answer
[@example_code.py,16,17) "x"	python def f(x, y): """ Calculate the sum of two numbers. :param x: The first number to be added :param y: The second number to be added :return: The sum of x and y Example: >>> f(2, 3) 5 >>> f(-2, 5) 3 """ return x + y

Adding user feedback

Getting positive user feedback data.

Here is some example feedback data. Showing a completion that was liked by bob and joe respectively.

positive_feedback =pd.DataFrame([
    ['bob',"""
def calculate_area(length, width):
    return length * width
""",
"""Calculate the area of a rectangle.

Args:
    length (float): The length of the rectangle.
    width (float): The width of the rectangle.

Returns:
    float: The area of the rectangle.

Example:
    >>> calculate_area(5, 3)
    15.0
"""],
['joe',"""
def factorial(n):
    if n < 0:
        raise ValueError("Factorial is not defined for negative numbers")
    if n == 0 or n == 1:
        return 1
    return n * factorial(n - 1)

""",
"""
    Calculate the factorial of a non-negative integer.

    This function computes the factorial of a given non-negative integer using
    a recursive approach.

    :param n: The non-negative integer to calculate the factorial for.
    :type n: int
    :returns: The factorial of the input number.
    :rtype: int
    :raises ValueError: If the input is negative.

    :Example:

    >>> factorial(5)
    120
    >>> factorial(0)
    1

    .. note::
       The factorial of 0 is defined to be 1.

    .. warning::
       This function may cause a stack overflow for very large inputs due to its recursive nature.
    """]
],columns=['user','q','a'])
positive_feedback

	user	q	a
0	bob	def calculate_area(length, width): return...	Calculate the area of a rectangle. Args: ...
1	joe	def factorial(n): if n < 0: raise...	Calculate the factorial of a non-negative...

sess.import_rel('PositiveFeedback',positive_feedback)

Building a few shot prompt and pipeline

fewshot_prompt_template = """system: answer similar to the following:
{}"""

few_shot_single_example_prompt_template = """
user: {}
assistant: {}
"""

sess.import_var('fewshot_template',fewshot_prompt_template)
sess.import_var('single_example_template',few_shot_single_example_prompt_template)

sess.remove_head('FewShotRagDocumentPrompt')

Now we simply get the feedback of the relevant user and compose it into a prompt

FewShotExamples(user,lex_concat(qa_pair))<-
    PositiveFeedback(user,q,a),
    format($single_example_template,q,a)->(qa_pair).

?FewShotExamples('bob',Q)

FewShotPrompt(user,prompt)<-
    FewShotExamples(user,examples),
    format($fewshot_template,examples)->(prompt).

?FewShotPrompt('bob',prompt)

"?FewShotExamples('bob',Q)"

Q
user: def calculate_area(length, width): return length * width assistant: Calculate the area of a rectangle. Args: length (float): The length of the rectangle. width (float): The width of the rectangle. Returns: float: The area of the rectangle. Example: >>> calculate_area(5, 3) 15.0

"?FewShotPrompt('bob',prompt)"

prompt
system: answer similar to the following: user: def calculate_area(length, width): return length * width assistant: Calculate the area of a rectangle. Args: length (float): The length of the rectangle. width (float): The width of the rectangle. Returns: float: The area of the rectangle. Example: >>> calculate_area(5, 3) 15.0

Finally we compose our RAG and Few-shot prompt to get one single prompt.

FewShotRagDocumentPrompt(user,cursor,prompt)<-
    FewShotPrompt(user,few_shot_prompt),
    RagPrompt(cursor,rag_prompt),
    format("{} {}",rag_prompt,few_shot_prompt)->(prompt).

display(sess.export('?FewShotRagDocumentPrompt("bob",C,P)')['P'][0])

'system: Based on the following context\n1. **Use clear and concise language**\n   Always strive for clarity in your documentation. Use simple, straightforward language and provide examples:\n\n   ```python\n   def calculate_area(length, width):\n       """\n       Calculate the area of a rectangle.\n\n       :param length: The length of the rectangle\n       :param width: The width of the rectangle\n       :return: The area of the rectangle\n       """\n       return length * width\n   ```\n12. **Provide examples of input and output**\n    When documenting functions or APIs, include examples of expected inputs and outputs:\n\n    ```python\n    def square(n):\n        """\n        Return the square of a number.\n\n        :param n: The number to square\n        :return: The square of the input number\n\n        Example:\n        >>> square(4)\n        16\n        >>> square(-3)\n        9\n        """\n        return n ** 2\n    ```\n13. **Use docstrings for inline documentation**\n    Use docstrings to provide inline documentation:\n\n    ```python\n    class MyClass:\n        """\n        A class that represents MyClass.\n\n        Attributes:\n            attr1 (int): Description of attr1\n            attr2 (str): Description of attr2\n        """\n\n        def __init__(self, attr1, attr2):\n            self.attr1 = attr1\n            self.attr2 = attr2\n\n        def my_method(self, param1):\n            """\n            Description of my_method.\n\n            :param param1: Description of param1\n            :return: Description of return value\n            """\n            pass\n    ```\n15. **Provide context and explanations in comments**\n    Don\'t just describe what something does, explain why it\'s important:\n\n    ```python\n    # We use a cache to store expensive computation results\n    # This significantly improves performance for repeated calls\n    cache = {}\n\n    def expensive_function(n):\n        if n in cache:\n            return cache[n]\n        result = # ... some expensive computation\n        cache[n] = result\n        return result\n    ```\nanswer the following question\nsystem: based on the following context:\ndef g(x,y):\n    return f(x,y)**2\ndef method(self, y):\n        return f(self.x, y)\nExplain the following function:\ndef f(x,y):\n    x+y\nIn the format of a doc string.\n\n system: answer similar to the following:\n\nuser: \ndef calculate_area(length, width):\n    return length * width\n\nassistant: Calculate the area of a rectangle.\n\nArgs:\n    length (float): The length of the rectangle.\n    width (float): The width of the rectangle.\n\nReturns:\n    float: The area of the rectangle.\n\nExample:\n    >>> calculate_area(5, 3)\n    15.0\n\n'

Now lets assume we have completion requests reaching our agents from multiple users.

per_user_completion = pd.DataFrame(
    [['bob',Span(code_file,16,17)]]
)
sess.import_rel('PerUserCompletion',per_user_completion)

We Generate the prompt based on the user and their cursor position and call the llm.

FewShotCompletion(user,cursor,answer)<-
    PerUserCompletion(user,cursor),
    FewShotRagDocumentPrompt(user,cursor,prompt),
    llm($model,prompt)->(answer).

print(sess.export("?FewShotCompletion('bob',C,A)")['A'][0])

Calculate the sum of two numbers.

Args:
    x (float): The first number.
    y (float): The second number.

Returns:
    float: The sum of x and y.

Example:
    >>> f(3, 5)
    8

Putting it all together

So with the addition of a single simple IE function, we extended our pipeline’s logic:

FuncDefSpan(span,name)<-
    Files(text),
    ast_xpath(text, "//FunctionDef")->(node),
    ast_to_span(text,node)->(span),
    expr_eval("{0}.name",node)->(name).

FuncCallSpan(span,name)<-
    Files(text),
    ast_xpath(text, "//Call/func/Name")->(node),
    ast_to_span(text,node)->(span),
    as_str(span)->(name).

CursorWrappingFunc(cursor,name)<-
    Cursors(cursor),
    FuncDefSpan(span,name),
    span_contained(cursor,span)->(True).

Mentions(lex_concat(caller_span),called_name)<-
    FuncCallSpan(called_span,called_name),
    FuncDefSpan(caller_span,caller_name),
    span_contained(called_span,caller_span)->(True).

model = 'gpt-3.5-turbo'
DocumentFunctionPrompt(cursor,prompt)<-
    CursorWrappingFunc(cursor,name),
    Mentions(mentions,name),
    FuncDefSpan(def_span,name),
    as_str(def_span)->(def_string),
    format($func_document_template,mentions,def_string)->(prompt).

DocumentFunction(cursor,answer)<-
    DocumentFunctionPrompt(cursor,prompt),
    llm($model,prompt)->(answer).

By adding the following rules

RagContext(cursor,lex_concat(context))<-
    DocumentFunctionPrompt(cursor,prompt),
    vector_search(prompt,4,'stackoverflow')->(context,similarity_score).

RagPrompt(cursor,prompt)<-
    RagContext(cursor,context),
    DocumentFunctionPrompt(cursor,document_promps),
    format($rag_prompt,context,document_promps)->(prompt).

RagCompletion(cursor,answer)<-
    RagPrompt(cursor,prompt),
    llm($model,prompt)->(answer).

FewShotExamples(user,lex_concat(qa_pair))<-\
    PositiveFeedback(user,q,a),
    format($single_example_template,q,a)->(qa_pair).

FewShotPrompt(user,prompt)<-
    FewShotExamples(user,examples),
    format($fewshot_template,examples)->(prompt).

FewShotRagDocumentPrompt(user,cursor,prompt)<-
    FewShotPrompt(user,few_shot_prompt),
    RagPrompt(cursor,rag_prompt),
    format("{} {}",rag_prompt,few_shot_prompt)->(prompt).

FewShotCompletion(user,cursor,answer)<-
    PerUserCompletion(user,cursor),
    FewShotRagDocumentPrompt(user,cursor,prompt),
    llm($model,prompt)->(answer).

And that was all it took to add RAG and FewShot prompting to our pipeline.

Note that since we kept our different completion rules, we can also compare them easily. We went from smart addition of context from our code base.

for cursor,answer in sess.export('?DocumentFunction(cursor,answer)').itertuples(index=False,name=None):
    print(answer)

"""
This function calculates the sum of two inputs x and y.
"""

To adding context to stack overflow, resulting in a docstring the follows best practice:

for cursor,answer in sess.export('?RagCompletion(cursor,answer)').itertuples(index=False,name=None):
    print(f'{answer}'.replace('```',''))

python
def f(x, y):
    """
    Calculate the sum of two numbers.

    :param x: The first number to be added
    :param y: The second number to be added
    :return: The sum of x and y

    Example:
    >>> f(2, 3)
    5
    >>> f(-2, 5)
    3
    """
    return x + y

To adding user feedback, resulting in a docstring for bob that follows his preffered doc string formatting style:

for cursor,answer in sess.export('?FewShotCompletion("bob",cursor,answer)').itertuples(index=False,name=None):
    print(answer)

Calculate the sum of two numbers.

Args:
    x (float): The first number.
    y (float): The second number.

Returns:
    float: The sum of x and y.

Example:
    >>> f(3, 5)
    8

This is by no means a production ready agent, but it does demonstrate that complex agents can be programmed and modified easily using few generic IE functions and small elegnat chunks of declerative logic.