RAG - Baseline implementation#
Overview#
In this part, we will build the building blocks of a RAG solution.
Create a Search Index
Upload the data
Perform a search
Create a prompt
Wire everything together
Goal#
The goal of this section is to familiarize yourself with RAG in a hands-on way, so that later on we can experiment with different aspects.
This will also represent a baseline for our RAG application.
Setup#
%%capture --no-display
%run -i ./pre-requisites.ipynb
%run -i ./helpers/search.ipynb
Import required libraries and environment variables#
import os
import json
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from azure.search.documents.indexes.models import (
SearchIndex,
SearchFieldDataType,
SimpleField,
SearchableField,
SearchField,
VectorSearchProfile,
HnswAlgorithmConfiguration,
VectorSearch,
HnswParameters
)
from azure.search.documents.indexes import SearchIndexClient
import os.path
import openai
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_type = "azure"
openai.api_version = "2023-07-01-preview"
1. Create a Search Index#
For those familiar with relational databases, you can imagine that:
A (search) index ~= A table
it describes the schema of your data
it consists of
field definitions
described byfield attributes
(searchable, filterable, sortable etc)
A (search) document ~= A row in your table
In our case, we would like to represent the following:
Field |
Type |
Description |
---|---|---|
ChunkId |
SimpleField |
The id of the chunk, in the form of |
Source |
SimpleField |
The path to the source document |
ChunkContent |
SearchableField |
The content of the chunk |
ChunkContentVector |
SearchField |
The vectorized content of the chunk |
Run the cell bellow to define a function which creates an index with the above described schema:
def create_index(search_index_name, service_endpoint, key):
client = SearchIndexClient(service_endpoint, AzureKeyCredential(key))
# 1. Define the fields
fields = [
SimpleField(
name="chunkId",
type=SearchFieldDataType.String,
sortable=True,
filterable=True,
key=True,
),
SimpleField(
name="source",
type=SearchFieldDataType.String,
sortable=True,
filterable=True,
),
SearchableField(name="chunkContent", type=SearchFieldDataType.String),
SearchField(
name="chunkContentVector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=1536, # the dimension of the embedded vector
vector_search_profile_name="my-vector-config",
),
]
# 2. Configure the vector search configuration
vector_search = VectorSearch(
profiles=[
VectorSearchProfile(
name="my-vector-config",
algorithm_configuration_name="my-algorithms-config"
)
],
algorithms=[
# Contains configuration options specific to the hnsw approximate nearest neighbors algorithm used during indexing and querying
HnswAlgorithmConfiguration(
name="my-algorithms-config",
kind="hnsw",
# https://learn.microsoft.com/en-us/python/api/azure-search-documents/azure.search.documents.indexes.models.hnswparameters?view=azure-python-preview#variables
parameters=HnswParameters(
m=4,
# The size of the dynamic list containing the nearest neighbors, which is used during index time.
# Increasing this parameter may improve index quality, at the expense of increased indexing time.
ef_construction=400,
# The size of the dynamic list containing the nearest neighbors, which is used during search time.
# Increasing this parameter may improve search results, at the expense of slower search.
ef_search=500,
# The similarity metric to use for vector comparisons.
# Known values are: "cosine", "euclidean", and "dotProduct"
metric="cosine",
),
)
],
)
index = SearchIndex(
name=search_index_name,
fields=fields,
vector_search=vector_search,
)
result = client.create_or_update_index(index)
print(f"Index: {result.name} created or updated")
Run the cell below to create the index. If the index already exists, it will be updated. Make sure to update the seach_index_name
variable to a unique name.
search_index_name = "first_index"
create_index(search_index_name, service_endpoint, search_index_key)
Index: first_index created or updated
2. Upload the Data to the Index#
2.1 Chunking#
Data ingestion requires a special attention as it can impact the outcome of the RAG solution. What chunking strategy to use, what AI Enrichment to perform are just few of the considerations. Further discussion and experimentation will be done in Chapter 3. Experimentation - Chunking
.
In this baseline setup, we have previously chunked the data based on a fixed size (180 tokens) and overlap of 30%.
The chunks can be found here. You can take a look at the content of the file.
2.2 Embedding#
Embedding the chunks in vectors can also be done in various ways. Further discussion and experimentation will be done in Chapter 3. Experimentation - Embeeding
.
In this baseline setup, we will take a vanilla approach, where:
We used the embedding model from OpenAI,
text-embedding-ada-002
since this is one obvious choice to start with
The outcome can be found here. You can take a look at the content of the file.
Let’s define the path to the embedded chunks:
chunk_size = 180
chunk_overlap = 30
path_to_embedded_chunks = f"./output/pre-generated/embeddings/fixed-size-chunks-{chunk_size}-{chunk_overlap}-batch-engineering-mlops-ada.json"
2.3. Upload the data to the Index#
def upload_data(file_path, search_index_name):
try:
with open(file_path, "r") as file:
documents = json.load(file)
search_client = SearchClient(
endpoint=service_endpoint,
index_name=search_index_name,
credential=credential,
)
search_client.upload_documents(documents)
print(
f"Uploaded {len(documents)} documents to Index: {search_index_name}")
except Exception as e:
print(f"Error uploading documents: {e}")
upload_data(path_to_embedded_chunks, search_index_name)
Uploaded 3236 documents to Index: first_index
3. Perform a vector search#
There are various types of search that one can perform such as: keyword search, semantic search, vector search, hybrid search. Since we generated embeddings for our chunks and we would like to leverage the power of vector search, in this baseline solution we will perform a simple vector search.
Perform a vector similarity search#
def search_documents(query_embeddings):
search_client = SearchClient(
service_endpoint, search_index_name,
credential=credential
)
vector_query = VectorizedQuery(
vector=query_embeddings, k_nearest_neighbors=3, fields="chunkContentVector"
)
results = search_client.search(
search_text=None,
vector_queries=[vector_query],
select=["chunkContent", "chunkId", "source"],
)
documents = []
for document in results:
item = {}
item["chunkContent"] = document["chunkContent"]
item["source"] = document["source"]
item["chunkId"] = document["chunkId"]
documents.append(item)
return documents
Run the search_documents function to find the most similar documents to a given query.
query = "What does the develop phase include"
embedded_query = oai_query_embedding(query)
search_documents(embedded_query)
[{'chunkContent': 'Steps\n\nDesign Phase: Both developers design the interface together. This includes:\n\nMethod signatures and names\nWriting documentation or docstrings for what the methods are intended to do.\nArchitecture decisions that would influence testing (Factory patterns, etc.)\n\nImplementation Phase: The developers separate and parallelize work, while continuing to communicate.\n\nDeveloper A will design the implementation of the methods, adhering to the previously decided design.\nDeveloper B will concurrently write tests for the same method signatures, without knowing details of the implementation.\n\nIntegration & Testing Phase: Both developers commit their code and run the tests.',
'source': '..\\data\\docs\\code-with-engineering\\agile-development\\advanced-topics\\collaboration\\virtual-collaboration.md',
'chunkId': 'chunk16_3'},
{'chunkContent': 'In order to minimize the risk and set the expectations on the right way for all parties, an identification phase is important to understand each other.\nSome potential steps in this phase may be as following (not limited):\n\nWorking agreement\n\nIdentification of styles/preferences in communication, sharing, learning, decision making of each team member\n\nTalking about necessity of pair programming\n\nDecisions on backlog management & refinement meetings, weekly design sessions, social time sessions...etc.\n\nSync/Async communication methods, work hours/flexible times\n\nDecisions and identifications of charts that will be helpful to provide transparent and true information to everyone\n\nIdentification of "Software Craftspersonship" areas which means the tools and methods will be widely used during the engagement and taking the required actions on team upskilling side if necessary.',
'source': '..\\data\\docs\\code-with-engineering\\agile-development\\advanced-topics\\collaboration\\teaming-up.md',
'chunkId': 'chunk15_1'},
{'chunkContent': 'Integration & Testing Phase: Both developers commit their code and run the tests.\n\nUtopian Scenario: All tests run and pass correctly.\nRealistic Scenario: The tests have either broken or failed due to flaws in testing. This leads to further clarification of the design and a discussion of why the tests failed.\n\nThe developers will repeat the three phases until the code is functional and tested.\n\nWhen to follow the RTT strategy\n\nRTT works well under specific circumstances. If collaboration needs to happen virtually, and all communication is virtual, RTT reduces the need for constant communication while maintaining the benefits of a joint design session. This considers the human element: Virtual communication is more exhausting than in person communication.',
'source': '..\\data\\docs\\code-with-engineering\\agile-development\\advanced-topics\\collaboration\\virtual-collaboration.md',
'chunkId': 'chunk16_4'}]
4. Create a prompt#
def create_prompt(query, documents):
system_prompt = f"""
Instructions:
"You are an AI assistant that helps users answer questions given a specific context.
You will be given a context (Retrieved Documents) and asked a question (User Question) based on that context.
Your answer should be as precise as possible and should only come from the context.
Please add citation after each sentence when possible in a form "(Source: source+chunkId),
where both 'source' and 'chunkId' are taken from the Retrieved Documents."
"""
user_prompt = f"""
## Retrieve Documents:
{documents}
## User Question
{query}
"""
final_message = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt + "\nEND OF CONTEXT"},
]
return final_message
Create a function to call the Chat Completion endpoint#
For this, we will use OpenAI library for Python:
from openai import AzureOpenAI
def call_llm(messages: list[dict]):
client = AzureOpenAI(
api_key=azure_openai_key,
api_version=azure_openai_api_version,
azure_endpoint=azure_aoai_endpoint
)
response = client.chat.completions.create(
model=azure_openai_chat_deployment, messages=messages)
return response.choices[0].message.content
5. Finally, put all the pieces together#
Note: Usually in a RAG solution there is an intent extraction step. However, since we are having a QA system and not a chat, in our workshop we are assuming that the intent is the actual query.
def custom_rag_solution(query):
try:
# 1. Embed the query using the same embedding model as your data in the Index
query_embeddings = oai_query_embedding(query)
# Intent recognition - skipped in our workhsop
# 1. Search for relevant documents
search_response = search_documents(query_embeddings)
# 2. Create prompt with the query and retrieved documents
prompt_from_chunk_context = create_prompt(query, search_response)
# 3. Call the Azure OpenAI GPT model
response = call_llm(prompt_from_chunk_context)
return response
except Exception as e:
print(f"Error: {e}")
Try it out#
query = "What does the develop phase include?"
print(f"User question: {query}")
response = custom_rag_solution(query)
print(f"Response: {response}")
User question: What does the develop phase include?
Response: The develop phase includes designing the interface, which involves creating method signatures and names, writing documentation for the methods, and making architecture decisions that would influence testing (Source: code-with-engineering/agile-development/advanced-topics/collaboration/virtual-collaboration.md, chunk16_3).
Perfect! This answer seems to make sense.
Now… what?#
Is this good enough?
What does good enough even mean?
How can I prove that this works as expected?
What does works as expected even mean?!
Let’s go to Chapter 3. Experimentation
, to try to tackle these questions.