RAG, Semantic Kernel, LangChain, Azure OpenAI ? #AzureSpringClean

Amine Charot
12 min readDec 27, 2023

--

Hello and welcome all to Azure Spring Clean 2024, During this blog post we are going to talk about :

  • Retrieval Augmented Generation
  • Embedding, Semantic Search, Vector
  • Semantic Kernel, LangChain

Don’t hesitate to visit Azure Spring Clean for more amazing posts !

Retrieval Augmented Generation

Retrieval Augmentation Generation (RAG) is an architecture that augments the capabilities of a Large Language Model (LLM) like ChatGPT by adding an information retrieval system that provides grounding data. Adding an information retrieval system gives you control over grounding data used by an LLM when it formulates a response. For an enterprise solution, RAG architecture means that you can constrain generative AI to your enterprise content sourced from vectorized documents, images, audio, and video. Source :RAG and generative AI — Azure AI Search | Microsoft Learn

Imagine you are a detective who needs to solve a mystery. You have a query, which is the case you are working on, and you need to find some clues to crack it. You also have a partner, who is an expert in finding and analyzing information. Your partner is the retriever, and you are the generator.

The retriever has access to a huge library of documents, which are sources of knowledge that can help you with your query. The retriever can scan the documents and select the most relevant ones for your case. The retriever can also rank the documents by their importance and similarity to your query.

The generator is responsible for producing the final response, which is the solution to your mystery. The generator uses a powerful language model that can generate fluent and coherent texts. The generator can also incorporate the information from the documents that the retriever selected, as well as from your query, to make the text more informative and relevant.

Using Azure, you can easily implement RAG for your own applications. You can use Azure Search to create an index of documents that you want to use as knowledge sources.

Embedding, Semantic Search, Vector

Embedding, semantic search and vector are concepts related to natural language processing, a field of artificial intelligence that deals with understanding and generating natural language.

Embedding is a process of transforming words or sentences into numerical vectors that capture their meaning and context. For example, the word “apple” can be represented by a vector [0.2, -0.5, 0.7] and the word “orange” by a vector [0.3, -0.4, 0.6]. These vectors are not random, but learned from large amounts of text data using algorithms such as word2vec or BERT.

Semantic search is a technique of finding the most similar texts or documents based on their embeddings. For example, if the detective wants to find all the clues that mention “fruit”, he can use semantic search to compare the embedding of “fruit” with the embeddings of all the clues and retrieve the ones that have the highest similarity score. This way, he can narrow down his search and focus on the most relevant information.

Vector is a mathematical object that has both magnitude and direction. It can be used to represent any quantity that has these two properties, such as force, velocity, or position. In embedding and semantic search, vectors are used to represent the meaning and context of words or sentences in a numerical way that can be easily manipulated and compared.

To illustrate these concepts with a parallel story, imagine a detective who needs to solve a mystery. The detective has a collection of clues, such as fingerprints, DNA samples, witness statements, etc. Each clue can be seen as an embedding, a vector that represents some aspect of the crime scene or the suspect. The detective also has a question, such as “Who is the killer?” or “Where is the missing person?”. The question can also be seen as an embedding, a vector that represents the detective’s intent and information need. The detective then performs a semantic search, which means finding the most relevant clues based on their similarity or distance to the question vector. The semantic search can use different methods, such as cosine similarity, dot product, or euclidean distance, to compare vectors and rank clues. The detective then examines the top-ranked clues and tries to draw conclusions and solve the mystery.

Semantic Kernel, LangChain

Semantic Kernel SDK and LangChain are two frameworks that enable developers to build applications using large language models (LLMs) such as OpenAI, Azure OpenAI, and Hugging Face.

Semantic Kernel SDK is an open-source project by Microsoft that allows developers to define plugins that can be chained together and orchestrated by an LLM planner.

  • Plugins are the building blocks of Semantic Kernel SDK. They are pieces of code that can be written in C#, Python, or Java, and that can perform various tasks, such as calling external services, manipulating data, generating content, etc. Plugins can be annotated with attributes that describe their functionality and parameters to the LLMs. Plugins can be chained together to create complex workflows that can be executed by Semantic Kernel SDK.
  • Planners are the brains of Semantic Kernel SDK. They are LLMs that can generate and execute plans based on user goals. Planners can use natural language to ask the user for clarifications, inputs, or feedback, and to provide outputs or suggestions. Planners can also use variables to store and manipulate information across sessions. Planners can leverage the plugins to perform actions that achieve the user goals.
  • Memory is the storage of Semantic Kernel SDK. It is a feature that allows Semantic Kernel SDK to store and retrieve information across sessions using a key-value store. Memory can be used to remember user preferences, context, history, etc. Memory can also be accessed and modified by the plugins and the planners.
  • Embeddings are the representations of Semantic Kernel SDK. They are vectors that capture the semantic meaning of words, phrases, sentences, or documents. Embeddings can be used to compare, search, or cluster similar items based on their meaning. Embeddings can also be used to store and retrieve memory more efficiently and accurately.

More information : Create AI agents with Semantic Kernel | Microsoft Learn

LangChain is a popular library that provides an easy-to-use interface and integrations with several tools to help build workflow chains. Both frameworks have their own advantages and disadvantages, depending on the use case and the level of control desired by the developer.

More information : LangChain

Some of the pros and cons of Semantic Kernel SDK are:

- Pros: It supports multiple programming languages (C#, Python, and Java), it has a powerful planning feature that can generate and execute plans based on user goals, it has a memory feature that can store and retrieve information across sessions, it has a plugin system that can integrate with external tools and services.
- Cons: It is less mature and stable than LangChain, it has fewer out of the box tools and integrations, it requires more coding and configuration to set up and use.

Some of the pros and cons of LangChain are:

  • Pros: It is built around Python and JavaScript, which are widely used and familiar languages, it has more out of the box tools and integrations, such as deeplearning.ai, Copilot, chatGPT, etc., it has a simple and intuitive interface that lowers the barriers for beginners, it has a large and active community of users and contributors.
    - Cons: It only supports Python and JavaScript, it does not have a planning feature that can generate and execute plans based on user goals, it does not have a memory feature that can store and retrieve information across sessions, it does not have a plugin system that can integrate with external tools and services.

Demo — Sherlock GPT :

The demo is about a chatbot that we created to help detective Sherlock Holmes to solve any mystery.

I was playing with Semantic Kernel and created a small project called isGPT, it is powered by Semantic Kernel SDK, langChain, Python & Vue.

This project is a ChatGPT-Like allows to :

  • Chatbot : This is working with your own Data. It needs to have atleast one Plugin to work !
  • Create and Upload an Index in an Azure Search : I have created a Default Plugin that is connected with Azure Search and enables us to request Azure Search if we want. To do so we need an Index which we can create automatically with the UI.
  • Manage and Import a Plugin (Skill) : In order to automotize and adapt to any idea you have, we can manage the Plugins and Import them if we want. In another words, we can just create a Plugin, import it and start using the chatGPT-Like.

Examples :

I want to be able to request my resources in Azure, I want then to perform KQLs, we can create a plugin that has a Semantic Function that transform the input to a KQL Query, and then through the Plugin send a request to Graph API with this Query.

I don’t want to request Azure Search but another Database maybe SQL, we can do the same, we can Create just a plugin that achieves this task and import it, we should enable it after :

How it works ?

The readme is available : GitHub — charotAmine/readme-gpt

Demo — Create a Plugin for Sherlock :

With ChatGPT, Sherlock can upload any mystery he wants to the Azure Search index, and we can add a Semantic Kernel plugin that will analyze the mystery and provide clues, hints and suggestions.

ChatGPT is not only a chatbot, but also a smart assistant that can help Sherlock crack the most challenging cases.

Create Sherlock Plugin :

Our GPT is first answering like this :

Let’s now customize it to our end-user case (Sherlock) so that the chat can answer questions about mysteries !

Thanks to Semantic Kernel, we will be able to abstract all the code behind and focus on only the use case, instead of recoding all the chat, we will just add a plugin to adapt the need :

First let’s create what we call a Semantic Function :

Get Intent : This function will get the user input to transform it to an intent so we can get more efficient response.

To create a Semantic Function, we need the following directory :

the config.json looks like :

{
"type": "completion",
"description": "An AI assistant that infers user intent.",
"completion": {
"temperature": 0.7,
"top_p": 0.5,
"max_tokens": 200,
"number_of_responses": 1,
"presence_penalty": 0,
"frequency_penalty": 0
},
"input": {
"parameters": [
{
"name": "query",
"description": "The question asked by the user.",
"defaultValue": ""
},
{
"name": "chat_history",
"description": "All the user and assistant messages so far .",
"defaultValue": ""
}
]
}
}

and skprompt.txt is :

You're an AI assistant reading the transcript of a conversation
between a user and an assistant. Given the chat history and
user's query, infer user real intent.
The user is a detective that needs to solve several mysteries.
Chat history: ```{{$chat_history}}```
User's query: ```{{$query}}```

Then let’s create a second Semantic Function :

response: This function will help to improve the response and give a realistic response to our detective, same to create it we need directories like :

The skprompt looks like :

you have a system 
You're a helpful assistant called WatsonAI
Always finish with thanks for asking.
Never mention that you found it from a context
Please answer the user's question
find in the context.
If the user's question is unrelated to the information in the
context, say you don't know.
If the user's is saying greetings, always say let's solve the Mystery Sherlock !
chat history : {{$chat_history}}
Context: {{$context}}
User: {{$query}}

Finally, let’s create a Native Function, this Function will be able to interact with Azure Search :

from semantic_kernel.skill_definition import (
sk_function,
sk_function_context_parameter,
)
from semantic_kernel.orchestration.sk_context import SKContext
from semantic_kernel import Kernel,ContextVariables
from semantic_kernel.skill_definition import (
sk_function,
)
from semantic_kernel.orchestration.sk_context import SKContext
from semantic_kernel.connectors.ai.open_ai import (
AzureChatCompletion,
OpenAITextEmbedding,
)
from semantic_kernel.connectors.memory.azure_cognitive_search import (
AzureCognitiveSearchMemoryStore,
)

import os
from dotenv import load_dotenv
load_dotenv()

AZURE_OPENAI_API_TYPE = "azure"
AZURE_OPENAI_API_BASE = os.getenv("OPENAI_API_BASE")
AZURE_OPENAI_API_VERSION = "2023-03-15-preview"
AZURE_OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
AZURE_OPENAI_EMBEDDING_DEPLOYMENT = os.getenv('OPENAI_ADA_EMBEDDING_DEPLOYMENT_NAME')
AZURE_SEARCH_ENDPOINT = os.getenv('AZURE_COGNITIVE_SEARCH_ENDPOINT')
AZURE_SEARCH_KEY = os.getenv('AZURE_COGNITIVE_SEARCH_API_KEY')
AZURE_OPENAI_CHATGPT_DEPLOYMENT = os.getenv("OPENAI_MODEL_NAME")

class SherlockSearch:

async def get_context(self, query: str,index_name: str) -> list[str]:
"""
Gets the relevant documents from Azure Cognitive Search.
"""
kernel = Kernel()
kernel.add_text_embedding_generation_service(
"openai-embedding",
OpenAITextEmbedding(
model_id=AZURE_OPENAI_EMBEDDING_DEPLOYMENT,
api_key=AZURE_OPENAI_API_KEY,
endpoint=AZURE_OPENAI_API_BASE,
api_type=AZURE_OPENAI_API_TYPE,
api_version=AZURE_OPENAI_API_VERSION,
),
)
kernel.register_memory_store(
memory_store=AzureCognitiveSearchMemoryStore(
vector_size=1536,
search_endpoint=AZURE_SEARCH_ENDPOINT,
admin_key=AZURE_SEARCH_KEY,
)
)

docs = await kernel.memory.search_async(index_name, query, limit=10)
context = [doc.text for doc in docs]

return context

@sk_function(
description="This Function gets the needed information to help the detective to solve its mystery.",
name="index_search",
input_description="Questions about the mystery that detective is trying to solve",
)
async def find_response(self, context: SKContext) -> str:
chat_history = context["chat_history"]
query = context["query"]
user_query = context["user_query"]
index_name = context["index_name"]
kernel:Kernel = context["kernel"]

variables = ContextVariables()
variables["query"] = query
variables["chat_history"] = chat_history
variables["user_query"] = user_query
variables["context"] = index_name
variables["options"] = "general"

intent_function = kernel.skills.get_function("sherlockPlugin","getIntent")

response = await kernel.run_async(
intent_function,
input_vars= variables,
)

intent_general = response['input']

list_context = await self.get_context(intent_general,index_name)
variables["context"] = "\n\n".join(list[str](list_context))

chat_function = kernel.skills.get_function("sherlockPlugin", "response")
output = await kernel.run_async(
chat_function,
input_vars=variables
)


return output['input']

Once is done, let’s upload the Plugin !

Let’s just Enable the Plugin we are interested in, in this case it will be SherlockPlugin !

Now after that the plugin got enabled :

We have our WatsonAI !

Upload Case

Let’s assume that we have a case to solve, let’s upload it to Azure Search, and try our WatsonAI (The plugin we just added) to check if it is possible to find the solution. The platform also supports the upload of indexes !

The case :

The Case of the Vanishing Heirloom:

In the quaint village of Harrowdale, an eccentric collector named Reginald Hawthorne is known for his prized possession—an ancient family heirloom, the Enigmatic Emerald, rumored to hold mystical powers. One foggy morning, the village is abuzz with whispers that the Enigmatic Emerald has mysteriously disappeared from its secured display in Hawthorne's private gallery. As an aspiring detective, you decide to take on the challenge and investigate.

Clue 1: The Disappearing Act

Upon arriving at Hawthorne Manor, you find the gallery locked and guarded. The security logs show no signs of forced entry. The last recorded visit was by Hawthorne himself the night before. As you inspect the display case, you notice a faint residue and a single playing card left behind—a joker.

Clue 2: The Cryptic Note

In Hawthorne's study, you find a cryptic note on his desk:

"To reclaim the jewel that shines so bright,
Navigate the shadows and follow the light.
Three clues concealed, a puzzle to unfold,
In riddles and reflections, the truth will be told."

Clue 3: The Mirror Puzzle

The note leads you to a room filled with antique mirrors. One mirror stands out, reflecting an odd pattern of light on the floor. Investigating further, you find a hidden compartment containing a small, ornate key and a mirror shard engraved with the image of a clock pointing to 3:45.

Clue 4: The Clock Puzzle

At 3:45, you return to the gallery and use the key to unlock a secret compartment in the display case. Inside, you discover a miniature clockwork mechanism. Within the mechanism, a hollow space reveals a rolled-up parchment with another riddle:

"Beneath the boughs where whispers are heard,
The emerald rests, like a secret, undisturbed.
Follow the roots, let intuition guide,
In the heart of nature, the truth shall reside."

Help to solve the case :

Code Repository : charotAmine/isGptLike (github.com)

--

--