Semantic Kernel, Prompt Flow : Evaluate your plugins
Once again, I’ve embarked on a journey to grasp the fundamentals of deep learning and machine learning. My motivation stems from a desire to enhance my mastery of Azure OpenAI and gain a deeper understanding of its inner workings. As I delve into this realm, I’ve come to appreciate the significance of evaluating the model. This realization has sparked my curiosity about the methods employed in OpenAI for assessing the outcomes.
In a previous post, I delved into the concepts of Semantic Kernel and touched on some aspects of planners. Now, I want to shine a light on yet another intriguing tool: Prompt Flow.
Determining the effectiveness of your descriptions can pose a challenge. In this segment, we will elaborate on how you can leverage Prompt Flow to assess both plugins and planners, ensuring a consistent production of desired results. From Microsoft Docs :
In the overview and planner articles, we demonstrated the importance of providing descriptions for your plugins so planners can effectively use them for autogenerated plans. Knowing whether or not your descriptions are effective, however, can be difficult. In this section, we’ll describe how you can use Prompt flow to evaluate plugins and planners to ensure that they are consistently producing the desired results.
If we parallel this with the creation of our own model (Seen in the previous Deep Learning posts), we understand that evaluating is a major concept in the AI Field.
That’s what it does, evaluating plugins ? nop:
- Empower Prompt Flow with the capabilities of planners Prompt Flow excels in defining and executing static chains of functions, suitable for many AI applications. However, it falls short in scenarios where you expect an AI application to dynamically adapt to new inputs and situations. This is where Semantic Kernel comes into play.
- Streamline the evaluation of Semantic Kernel Utilizing Prompt Flow, you can harness the capabilities of Azure ML to assess the accuracy, performance, and error rates of your plugins and planners.
- Effortlessly deploy Semantic Kernel to Azure ML Finally, the deployment feature of Prompt Flow allows you to seamlessly deploy AI applications to Azure Machine Learning. This means you can utilize Prompt Flow to effortlessly deploy your Semantic Kernel applications with minimal effort, streamlining the deployment process.
Let’s Dive into Prompt Flow !
Okey, last time, we have created a Sherlock Plugin that enables us to go and look for a document of an investigation and try to solve using our ChatGPT-Like !
Let’s try now to create a plugin and try to use Prompt Flow in top of it to evaluate our model oups sorry let me do it again, to evaluate our PLUGIN ! (Yes seems the same steps as Deep Learning).
Requirements :
In your VS Code, you will need the Promp Flow for VS Code extension :
You also need PromptFlow :
pip install promptflow promptflow-tools
Everything is ready ? Let’s go !
Plugin’s PromptFlow
- Create a PromptFlow :
pf flow init --flow performSherlock
- Create a Plugin :
Let’s first create a plugin; mine it will be about giving some answers of two different questions, let’s keep it simple :
from semantic_kernel.skill_definition import (
sk_function,
sk_function_context_parameter,
)
from semantic_kernel.orchestration.sk_context import SKContext
class Sherlock:
@sk_function(
description="Returns the wheels number of a car",
name="wheels"
)
def wheels_car(self) -> str:
return "4"
@sk_function(
description="Returns the number of glasses in Marine's House",
name="windows",
)
def windows_house(self) -> str:
return "20"
- Create a Planner :
import asyncio
from promptflow import tool
import semantic_kernel as sk
from semantic_kernel.planning.action_planner import ActionPlanner
from plugins.SherlockPlugin.Sherlock import Sherlock as Sherlock
from promptflow.connections import (
AzureOpenAIConnection,
)
from semantic_kernel.connectors.ai.open_ai import (
AzureChatCompletion,
AzureTextCompletion,
)
import semantic_kernel.connectors.ai.open_ai as sk_oai
@tool
def my_python_tool(
input: str,
deployment_type: str,
deployment_name: str,
AzureOpenAIConnection: AzureOpenAIConnection,
) -> str:
# Initialize the kernel
kernel = sk.Kernel(log=sk.NullLogger())
print(AzureOpenAIConnection)
chat_service = sk_oai.AzureChatCompletion(
deployment_name=deployment_name,
endpoint=AzureOpenAIConnection.api_base,
api_key=AzureOpenAIConnection.api_key,
api_version="2023-12-01-preview",
)
kernel.add_chat_service("chat-gpt", chat_service)
planner = ActionPlanner(kernel=kernel)
# Import the native functions
Sherlock_plugin = kernel.import_skill(Sherlock(), "SherlockPlugin")
print("Kernel")
print(kernel)
ask = "Use the available Sherlock functions to solve this word problem: " + input
plan = asyncio.run(planner.create_plan_async(ask))
print("MY QUESTION :")
print(plan)
# Execute the plan
result = asyncio.run(plan.invoke_async()).result
for index, step in enumerate(plan._steps):
print("Function: " + step.skill_name + "." + step._function.name)
print("Input vars: " + str(step.parameters.variables))
print("Output vars: " + str(step._outputs))
print("Result: " + str(result))
return str(result)
- Modify our yaml file to create a flow :
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
environment:
python_requirements_txt: requirements.txt
inputs:
text:
type: string
default: How many windows has marine ?
outputs:
output_prompt:
type: string
reference: ${echo_my_prompt.output}
nodes:
- name: echo_my_prompt
type: python
source:
type: code
path: hello.py
inputs:
AzureOpenAIConnection: sherlock_plugin
input: ${inputs.text}
deployment_type: Standard
deployment_name: gpt-35-turbo
Now we need an “AzureOpenAIConnection”, how to create it ?
In the connection Tab, create your connection.
- Run the PromptFlow :
Seems good ! How to run a batch of questions ?
Well I have created a dataset file :
{"text": "How many wheels have a car","groundtruth":"4"}
{"text": "How many windows marine has","groundtruth":"20"}
Let’s run a batch by clicking here :
Once is done, let’s check our results :
Now that we have our outputs, let’s evaluate the output.
Evaluation PromptFlow
Remember about what we were talking in the last blog post about the Neural Network Classification ? The loss and the accuracy, well let’s try to do the same to evaluate our Planner & Plugin :
- Create an aggregation Function :
from typing import List
from promptflow import tool
from promptflow import log_metric
@tool
def accuracy_aggregate(processed_results: List[int]):
num_exception = 0
num_correct = 0
for i in range(len(processed_results)):
if processed_results[i] == -1:
num_exception += 1
elif processed_results[i] == 1:
num_correct += 1
num_total = len(processed_results)
accuracy = round(1.0 * num_correct / num_total, 2)
error_rate = round(1.0 * num_exception / num_total, 2)
log_metric(key="accuracy", value=accuracy)
log_metric(key="error_rate", value=error_rate)
return {
"num_total": num_total,
"num_correct": num_correct,
"num_exception": num_exception,
"accuracy": accuracy,
"error_rate": error_rate
}
if __name__ == "__main__":
numbers = [4, 20]
accuracy = accuracy_aggregate(numbers)
print("The accuracy is", accuracy)
- Create a line processing Function :
from promptflow import tool
@tool
def line_process(groundtruth: str, prediction: str) -> int:
processed_result = 0
if prediction == "JSONDecodeError" or prediction.startswith("Unknown Error:"):
processed_result = -1
return processed_result
try:
groundtruth = int(groundtruth)
prediction = int(prediction)
except ValueError:
processed_result = -1
return processed_result
if round(prediction, 2) == round(groundtruth, 2):
processed_result = 1
return processed_result
if __name__ == "__main__":
processed_result = line_process("2", "2")
print("The processed result is", processed_result)
processed_result = line_process("2", "3")
print("The processed result is", processed_result)
processed_result = line_process("20", "2")
print("The processed result is", processed_result)
- Create the Flow :
$schema: https://azuremlschemas.azureedge.net/promptflow/latest/Flow.schema.json
inputs:
groundtruth:
type: string
default: "1"
prediction:
type: string
default: "2"
outputs:
score:
type: string
reference: ${line_process.output}
nodes:
- name: line_process
type: python
source:
type: code
path: line_process.py
inputs:
groundtruth: ${inputs.groundtruth}
prediction: ${inputs.prediction}
- name: aggregate
type: python
source:
type: code
path: aggregate.py
inputs:
processed_results: ${line_process.output}
aggregation: true
Remember all this is to test our plugin that we just created. So let’s calculate the accuracy of our plugin :
pf run create --flow C:\Users\aminecharot\Documents\file\openAiProject01\backend\sherlockEvaluation --data ./data.jsonl --column-mapping groundtruth='${data.groundtruth}' prediction='${run.outputs.output_prompt}' --run performSherlock_default_20240112_155443_440000 --stream --name pe001
In this command, I am instructing it to execute the ‘sherlockEvaluation’ for the data stored in ‘./data.jsonl’ (as defined in a previous step). The evaluation essentially involves providing both the input and the predicted output. We obtain the predicted output through the plugin’s prompt flow execution, specifically using the command: prediction='${run.outputs.output_prompt}'
from the execution named 'performSherlock_default_20240112_155443_440000'.
To sum it up, this command initiates the prompt flow for evaluation, utilizing the output obtained from running the batch in the plugin’s prompt flow.
Since it was not a huge plugin, the accuracy must be at 100% :
Here is it !