LLM standard library

Given our diagram’s ability to use generic functions, we can create a standard library of functions that are useful for LLMs.

import nest_asyncio
nest_asyncio.apply()

Wrapping instructor


source

json_client

 json_client ()

source

tools_client

 tools_client ()

source

complete

 complete (model, messages, response_model, mode='json',
           print_prompt=False, **kwargs)

source

complete_raw

 complete_raw (model, messages, response_model=None, response_schema=None,
               mode='json', **kwargs)

This function is used to complete a chat completion with instructor without having basemodels as input or output. used for disk caching of results.

class UserExtract(BaseModel):
    name: str
    age: int


user, usage = await complete(
    model="gpt-3.5-turbo",
    response_model=UserExtract,
    messages=[
        {"role": "user", "content": "Extract jason is 25 years old"},
    ],
)

user,usage
(UserExtract(name='Extract jason', age=25),
 {'input_tokens': 147, 'output_tokens': 19})

Completion types

Simpler answer


source

answer_question

 answer_question (model, messages, **api_kwargs)
await answer_question("gpt-3.5-turbo",[{"role":"user","content":"What is the capital of France?"}])
('Paris', {'input_tokens': 120, 'output_tokens': 9})

Choice


source

choose

 choose (model, messages, choices, **api_kwargs)
await choose("gpt-3.5-turbo",[{"role":"user","content":"What is the capital of the country France?"}],["PARIS", "HILTON"],print_prompt=True)
[{'role': 'user', 'content': 'What is the capital of the country France?'}]
('PARIS', {'input_tokens': 140, 'output_tokens': 11})

Multi choice


source

choose_many

 choose_many (model, messages, choices, **api_kwargs)
await choose_many("gpt-3.5-turbo",
    messages=[{"role":"user","content":"what parameters did i pass in? my name is jason and i am 25 years old"}],
    choices=["Age", "Name","City"],print_prompt=True)
[{'role': 'user', 'content': 'what parameters did i pass in? my name is jason and i am 25 years old'}]
(['Name', 'Age'], {'input_tokens': 235, 'output_tokens': 32})

Structured output


source

clean_model

 clean_model (model:Type[sqlmodel.main.SQLModel], name:Optional[str]=None)

*Convert an SQLModel to a Pydantic BaseModel. used to clean up the output for the LLM Args: model: SQLModel class to convert name: Optional name for the new model class

Returns: A Pydantic BaseModel class with the same fields*


source

structured_output

 structured_output (model, messages, output_schema, as_json=False,
                    **api_kwargs)
class UserExtract(BaseModel):
    name: str
    age: int

res,usage = await structured_output("gpt-3.5-turbo",
    [{"role":"user","content":"what parameters did i pass in? my name is Jason and i am 25 years old"}],
    UserExtract)
assert res == UserExtract(name="Jason",age=25), res
res,usage
(UserExtract(name='Jason', age=25), {'input_tokens': 157, 'output_tokens': 17})
res,usage = await structured_output("gpt-3.5-turbo",
    [{"role":"user","content":"what parameters did i pass in? my name is Jason and i am 25 years old"}],
    UserExtract,as_json=True)
assert res == {"name": "Jason", "age": 25},res
res,usage
({'name': 'Jason', 'age': 25}, {'input_tokens': 157, 'output_tokens': 17})
import sqlite3
from sqlalchemy.engine import create_engine

source

User

 User (id:Optional[int]=None, name:Optional[str]=None,
       age:Optional[int]=None, email:Optional[str]=None)

*!!! abstract “Usage Documentation” Models

A base class for creating Pydantic models.

Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
    This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
    __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.

__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
    is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
res,usage = await structured_output("gpt-3.5-turbo",
    [{"role":"user","content":"my name is jyson, my age is 25, my id is 1"}],
    User)
res
User(id=1, name='jyson', age=25, email=None)

Tool calling


source

description_to_model

 description_to_model (desc:Dict[str,Any], model_name:Optional[str]=None)

*Create a Pydantic model from a function description.

Args: desc: Function description from function_to_input_description model_name: Optional name for the model class

Returns: Pydantic model class*


source

function_to_input_description

 function_to_input_description (func:Callable)

*Extract parameter information from a function’s signature and docstring.

Args: func: Function to analyze

Returns: Dictionary containing: - name: Function name - params: Dict of parameter info with: - type: Parameter type annotation - description: Parameter description from docstring - default: Default value if any*

async def add(a: int, b: int = 0) -> int:
    """Add two numbers.
    
    Args:
        a: First number to add
        b: Second number to add
    """
    return a + b

# Get function description
desc = function_to_input_description(add)
assert desc == {
    'name': 'add',
    'params': {'a': {'type': int,'description': 'First number to add','default': None},
               'b': {'type': int, 'description': 'Second number to add', 'default': 0}
               }
    }
def math_op(op: Literal["add", "multiply", "divide"],a: int=0, b: int = 0) -> int:
    """Add two numbers.
    
    Args:
        op: Operation to perform
        a: First number to add
        b: Second number to add
    """
    return a + b

# Get function description
desc = function_to_input_description(math_op)
desc
assert desc == {'name': 'math_op',
 'params': {'op': {'type': Literal['add', 'multiply', 'divide'],
   'description': 'Operation to perform',
   'default': None},
  'a': {'type': int, 'description': 'First number to add', 'default': 0},
  'b': {'type': int, 'description': 'Second number to add', 'default': 0}}}

source

function_to_input_model

 function_to_input_model (func:Callable, name:str,
                          descriminator_field:str='tool_name')

*Convert a function to a Pydantic input model.

Args: func: Function to analyze

Returns: Pydantic model class for function inputs*


source

call_tools

 call_tools (model:str, messages:List[Dict[str,str]],
             tools:Dict[str,Callable], call_function:bool=False,
             descriminator_field:str='tool_name', **api_kwargs)

*Call OpenAI chat completion with tool selection and input parsing.

Args: model: OpenAI model name messages: List of message dicts with role and content tools: Dictionary mapping tool names to functions descriminator_field: The name of the field to use as the discriminator Returns: Dict with: - tool_name: Selected tool name - tool_input: Parsed input for the tool*

def add(a: int, b: int = 0) -> int:
    """Add two numbers.
    
    Args:
        a: First number to add
        b: Second number to add
    """
    return a + b

def multiply(x: int, y: int) -> int:
    """Multiply two numbers.
    
    Args:
        x: First number to multiply
        y: Second number to multiply
    """
    return x * y

def divide(numerator: int, denominator: int) -> float:
    """Divide two numbers.
    
    Args:
        numerator: Number to divide
        denominator: Number to divide by
    """
    return numerator / denominator

tools = {
    'add': add,
    'multiply': multiply,
    'divide': divide
}
result,usage = await call_tools(
    model="gpt-3.5-turbo",  
    messages=[{"role": "user", "content": "What is 5 plus 3?"}],
    tools=tools,
    print_prompt=True
)
assert result == {'name': 'add', 'input': {'a': 5, 'b': 3}}
result,usage
[{'role': 'system', 'content': 'choose an appropriate tool to use to answer the following thought based on the following tools:\nadd:Add two numbers.\n\nArgs:\n    a: First number to add\n    b: Second number to add\nmultiply:Multiply two numbers.\n\nArgs:\n    x: First number to multiply\n    y: Second number to multiply\ndivide:Divide two numbers.\n\nArgs:\n    numerator: Number to divide\n    denominator: Number to divide by'}, {'role': 'user', 'content': 'What is 5 plus 3?'}]
({'name': 'add', 'input': {'a': 5, 'b': 3}},
 {'input_tokens': 813, 'output_tokens': 32})
result,usage = await call_tools(
    model="gpt-3.5-turbo",  
    messages=[{"role": "user", "content": "What is 5 plus 3?"}],
    tools=tools,
    call_function=True
)
assert result == {'name': 'add', 'input': {'a': 5, 'b': 3}, 'output': 8}
result,usage
({'name': 'add', 'input': {'a': 5, 'b': 3}, 'output': 8},
 {'input_tokens': 813, 'output_tokens': 32})
def exception_raiser(a:int,b:int):
    raise ValueError("This is a test error")

tools= {
    'add': exception_raiser,
}

result,usage = await call_tools(
    model="gpt-3.5-turbo",  
    messages=[{"role": "user", "content": "What is 5 plus 3?"}],
    tools=tools,
    call_function=True
    )

assert result == {'name': 'add', 'input': {'a': 5, 'b': 3}, 'error': 'This is a test error'}
result,usage
({'name': 'add', 'input': {'a': 5, 'b': 3}, 'error': 'This is a test error'},
 {'input_tokens': 368, 'output_tokens': 33})

The main chat class


source

Chat

 Chat (model:Optional[str]=None,
       messages:Optional[List[Dict[str,str]]]=None,
       output_schema:Optional[pydantic.main.BaseModel]=None,
       as_json:Optional[bool]=False,
       tools:Optional[Dict[str,Callable]]=None,
       call_function:Optional[bool]=False,
       choices:Optional[enum.Enum]=None,
       multi_choice:Optional[bool]=False, seed:Optional[int]=42,
       stop:Union[str,List[str],NoneType]=None, log_prompt:bool=False,
       save_history:bool=False, append_output:bool=False,
       init_messages:Optional[List[Dict[str,str]]]=None, **kwargs)

*A Chat objects the renders a prompt and calls an LLM. Currently supporting openai models.

Args: model: OpenAI model name messages: List of message dicts, must have at least a role and content field output_schema: Optional schema for structured output as_json: Optional boolean to return the response as a json object tools: Optional dictionary of tool names and functions that the LLM can decide to call. Causes the content of the response to be a dict of the form {‘name’:tool_name,‘input’:tool_input_dict} call_function: if tools are provided, whether to call the function and save the output in the output field of the response’s content choices: Optional List of choices for multi-choice questions multi_choice: if choices are provided, whether to choose multiple items from the list seed: Optional seed for random number generation stop: Optional string or list of strings where the model should stop generating save_history: Optional boolean to save the history of the chat between calls append_output: Optional, whether to append the output of the chat to history automatically, default False init_messages: Optional list of messages that are always prepended to messages. Useful for supplying additional messages during calls. Can have template variables that are fed during initialization only. If save_history is True, the init messages are added to the history. **kwargs: Keyword arguments to interpolate into the messages*

For a cheatsheet on how to use jinja templates, see this link

Chat Tests

Basic

chat_with_history = Chat(model="gpt-3.5-turbo", 
                        save_history=True,
                        append_output=True,
                        init_messages=[{"role":"system","content":"You are a helpful {{role}}"}],
                        messages=[{"role": "user", "content": "Hi, im {{name}}, answer me: {{text}}"}],
                        role = 'AI overlord',
                        name = 'ernio',
                        )
res = await chat_with_history(text="What is the capital of France?",print_prompt=True)
res
[{'role': 'system', 'content': 'You are a helpful AI overlord'}, {'role': 'user', 'content': 'Hi, im ernio, answer me: What is the capital of France?'}]
{'role': 'assistant',
 'content': 'The capital of France is Paris.',
 'meta': {'input_tokens': 136, 'output_tokens': 15}}
assert len(chat_with_history.history) == 3
chat_with_history.history
[{'role': 'system', 'content': 'You are a helpful AI overlord'},
 {'role': 'user',
  'content': 'Hi, im ernio, answer me: What is the capital of France?'},
 {'role': 'assistant',
  'content': 'The capital of France is Paris.',
  'meta': {'input_tokens': 136, 'output_tokens': 15}}]
res = await chat_with_history(text="And what is the closest city to it?",print_prompt=True)
res
[{'role': 'system', 'content': 'You are a helpful AI overlord'}, {'role': 'user', 'content': 'Hi, im ernio, answer me: What is the capital of France?'}, {'role': 'assistant', 'content': 'The capital of France is Paris.', 'meta': {'input_tokens': 136, 'output_tokens': 15}}, {'role': 'user', 'content': 'Hi, im ernio, answer me: And what is the closest city to it?'}]
{'role': 'assistant',
 'content': 'The closest city to Paris is Saint-Denis.',
 'meta': {'input_tokens': 169, 'output_tokens': 18}}
assert len(chat_with_history.history)==5

Choices

messages=[
        {"role": "system", "content": "Given a sentence, classify it into one of these topics: science, history, technology, or arts. Choose the single most relevant topic."},
        {"role": "user", "content": "{{text}}"}
    ]
    
topic_classifier = Chat(
    model="gpt-4o-mini",
    messages=messages,
    choices = ['science', 'history', 'technology', 'arts'],
    seed=42,
    log_prompt=True
)
topic_classifier
Chat(model='gpt-4o-mini', required_keys={'text'}, seed=42)
renaisance_topic = await topic_classifier(text="WWII was a global conflict that lasted from 1939 to 1945.")
astroid_topic = await topic_classifier(text="The asteroid belt is a region of space between the orbits of Mars and Jupiter.")

assert renaisance_topic['content'] == 'history',renaisance_topic
assert astroid_topic['content'] == 'science',astroid_topic
calling llm with model=gpt-4o-mini and prompt:
messages=[{'content': 'Given a sentence, classify it into one of these topics: science, '
             'history, technology, or arts. Choose the single most relevant '
             'topic.',
  'role': 'system'},
 {'content': 'WWII was a global conflict that lasted from 1939 to 1945.',
  'role': 'user'}]

calling llm with model=gpt-4o-mini and prompt:
messages=[{'content': 'Given a sentence, classify it into one of these topics: science, '
             'history, technology, or arts. Choose the single most relevant '
             'topic.',
  'role': 'system'},
 {'content': 'The asteroid belt is a region of space between the orbits of '
             'Mars and Jupiter.',
  'role': 'user'}]

Structured output

class Person(BaseModel):
    first_name: str
    last_name: str
    date_of_birth: int

prompted_llm = Chat(model="gpt-4o-mini", messages=
    [   
        {"role": "user", "content": "how old am i? {{name}}, {{age}} years old"},
        {"role": "assistant", "content": "Iam {{model_name}}, You are {{name}}, {{age}} years old"}
    ],
     output_schema=Person)
prompted_llm
Chat(model='gpt-4o-mini', required_keys={'model_name', 'age', 'name'}, output_schema=Person, seed=42)
baked_llm = Chat(model="gpt-4o-mini", messages=
    [
        {"role": "user", "content": "how old am i? {{name}}, {{age}} years old"},
        {"role": "assistant", "content": "Iam {{model_name}}, You are {{name}}, {{age}} years old"}
    ],
    output_schema=Person, model_name="gpt-4o-mini", age=30)
baked_llm
Chat(model='gpt-4o-mini', required_keys={'name'}, output_schema=Person, seed=42)
res = await prompted_llm(model_name="gpt-4o-mini", age=30,name="Dean")
assert res['content'] == Person(first_name='Dean', last_name='', date_of_birth=1993)
res
{'role': 'assistant',
 'content': Person(first_name='Dean', last_name='', date_of_birth=1993),
 'meta': {'input_tokens': 206, 'output_tokens': 27}}
res = await baked_llm(name="Dean")
assert res['content'] == Person(first_name='Dean', last_name='', date_of_birth=1993)
res
{'role': 'assistant',
 'content': Person(first_name='Dean', last_name='', date_of_birth=1993),
 'meta': {'input_tokens': 206, 'output_tokens': 27}}
res = await baked_llm(name="Dean",as_json=True)
assert res['content'] == {"first_name": "Dean", "last_name": "", "date_of_birth": 1993} 
res
{'role': 'assistant',
 'content': {'first_name': 'Dean', 'last_name': '', 'date_of_birth': 1993},
 'meta': {'input_tokens': 206, 'output_tokens': 27}}
course_chooser = Chat(model="gpt-4o-mini",
    messages=[{"role": "user", "content": "{{text}}"}],
    choices=["science","genocide", "history", "defence against the dark arts", "arts"],
    multi_choice=True)
res = await course_chooser(text="choose everything that is not genocide")
assert not 'genocide' in res['content'] , res
res
{'role': 'assistant',
 'content': ['science', 'history', 'defence against the dark arts', 'arts'],
 'meta': {'input_tokens': 236, 'output_tokens': 59}}
def google_search_stub(query:str):
    """
    Search the web for the query
    Args:
        query: The query to search for
    Returns:
        The URL of the search results
    """
    return f"https://www.google.com/search?q={query.replace(' ','_')}"

tools = {'google_search': google_search_stub}

google_search = Chat(model="gpt-4o-mini", messages=[{"role": "user", "content": "{{text}}"}], tools=tools, call_function=True)
res = await google_search(text="What is the capital of France?")
assert res['content'] == {'name': 'google_search',
  'input': {'query': 'What is the capital of France?'},
  'output': 'https://www.google.com/search?q=What_is_the_capital_of_France?'}
res
{'role': 'assistant',
 'content': {'name': 'google_search',
  'input': {'query': 'What is the capital of France?'},
  'output': 'https://www.google.com/search?q=What_is_the_capital_of_France?'},
 'meta': {'input_tokens': 365, 'output_tokens': 32}}

Init messages without saving history

chat_with_init_messages = Chat(model="gpt-4o-mini",
    init_messages=[{"role": "system", "content": "You are an unhelpful assistant. Whenever asked to help, you say no."}],
)
res = await chat_with_init_messages(messages=[{"role": "user", "content": "What is the capital of {{country}}?"}],country="France")
res
assert 'no' in res['content'].lower(),res

Image to Text


source

image_to_text

 image_to_text (path:str, model:str='gpt-4o-mini', url=False)

*This function takes an image (either from a local file path or URL) and uses OpenAI’s vision model to generate a detailed description of the image contents. The results are cached using disk_cache to avoid redundant API calls.

Args: path (str): Path to the image file or URL of the image model (str, optional): OpenAI model to use for image analysis. Defaults to “gpt-4o-mini”. url (bool, optional): Whether the path is a URL. Defaults to False.

Returns: dict: A dictionary containing: - role (str): Always “assistant” - content (str): Detailed description of the image - meta (dict): Usage statistics including input and output tokens*

from textwrap import wrap
res= await image_to_text(get_git_root()/"sample_data/fox.jpeg")

assert 'fox' in res['content']
print('\n'.join(wrap(res['content'],width=100)))
res
The image showcases the close-up face of a fox. The fox has a thick, bushy fur coat that ranges in
color from reddish-brown to lighter orange hues. Its ears are pointed and alert, while its eyes are
sharp and expressive, showcasing a mixture of intelligence and curiosity. The muzzle is slightly
elongated and features a white patch on its chin, complementing the darker tones around the eyes and
nose. The background appears blurred, suggesting that the focus is on the fox, enhancing its
striking features.
{'role': 'assistant',
 'content': 'The image showcases the close-up face of a fox. The fox has a thick, bushy fur coat that ranges in color from reddish-brown to lighter orange hues. Its ears are pointed and alert, while its eyes are sharp and expressive, showcasing a mixture of intelligence and curiosity. The muzzle is slightly elongated and features a white patch on its chin, complementing the darker tones around the eyes and nose. The background appears blurred, suggesting that the focus is on the fox, enhancing its striking features.',
 'meta': {'input_tokens': 8626, 'output_tokens': 109}}
from textwrap import wrap
image_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/3/30/Vulpes_vulpes_ssp_fulvus.jpg/800px-Vulpes_vulpes_ssp_fulvus.jpg'
res= await image_to_text(image_url,url=True)

assert 'fox' in res['content']
print('\n'.join(wrap(res['content'],width=100)))
res
The image features a red fox standing on a snowy landscape. The fox has a predominantly orange-brown
fur coat with a white underside and a bushy, black-tipped tail. Its legs are slender and its ears
are upright, displaying a softer fur on the insides. The fox has bright, alert eyes that seem to be
observing its surroundings. The snow around it is untouched and pristine, enhancing the fox's vivid
coloration against the wintery backdrop. In the background, hints of green foliage can be seen
peeking through the snow.
{'role': 'assistant',
 'content': "The image features a red fox standing on a snowy landscape. The fox has a predominantly orange-brown fur coat with a white underside and a bushy, black-tipped tail. Its legs are slender and its ears are upright, displaying a softer fur on the insides. The fox has bright, alert eyes that seem to be observing its surroundings. The snow around it is untouched and pristine, enhancing the fox's vivid coloration against the wintery backdrop. In the background, hints of green foliage can be seen peeking through the snow.",
 'meta': {'input_tokens': 25627, 'output_tokens': 116}}

Speech to text


source

speech_to_text

 speech_to_text (audio_path:str, model:str='whisper-1')

*Extract text from an audio file using OpenAI’s Whisper model.

Args: audio_path (str): Path to the audio file model (str, optional): OpenAI model to use. Defaults to “whisper-1”.

Returns: dict: A dictionary containing:
- role (str): Always “assistant” - content (str): Transcribed text from the audio*

res = await speech_to_text(get_git_root()/"sample_data/happy_speech.wav")
assert res['content'] == "Look at this, my hands are standing up in my arms, I'm giving myself goosebumps." , res
res
{'role': 'assistant',
 'content': "Look at this, my hands are standing up in my arms, I'm giving myself goosebumps."}

Export