Benchmarks

Benchmarking is a process of evaluating the performance of a system or a component against a set of predefined metrics. In the context of ToothFairyAI, benchmarks are used to store predefined sets of questions, answers, reasoning snippets, and context to rate and verify the quality of the generated responses. Compared to other platforms, ToothFairyAI benchmrks are agentic in nature meaning that rather than looking at the quality of the responses of a model, the benchmarks are used to evaluate the quality of the responses of the agent. This includes scenarios where the agent is required to generate a response based on a given set of documents (rag) or based on a series of tools to be used to accomplish a task.

Create a benchmark

Click on the Create button.
Assing a name and a description (optional) to the benchmark for easy identification.
Upload the benchmark csv file - The benchmark csv file should contain the following columns:

File size limits for benchmark CSV files:

All subscriptions: Maximum 2MB per CSV file (fixed limit for benchmark files)

question	answer	reasoning	context
Question 1	Answer 1	Reasoning 1	Context 1
Question 2	Answer 2	Reasoning 2	Context 2
Question 3	Answer 3	Reasoning 3	Context 3

For best results in terms of compatibility of the csv files you want to upload it is reccomended to use csv with text qualification with double quotes notation.

For example:

"What is the capital of France?";"Paris";"The capital of France is Paris.";"Let's talk about geography."
"What is the capital of Italy?";"Rome";"The capital of Italy is Rome.";"Let's talk about best countries in the world."

How to structure the CSV files

When uploading a CSV file, it is important to ensure that the file is correctly formatted. The file must not contain a header row with the column names. Each field should be wrapped in double quotes and separated by semicolons.

Upon uploading the file, the system will automatically parse the file and display the contents in the preview section, specifically the first 10 rows. In case the file is not correctly formatted, an error message will be displayed.

Click on the Save button to create the benchmark.

Benchmark availability

Benchmarks are available only for Pro and Enterprise plans. Please contact us for more information to start creating benchmarks.

Benchmark details

Question: The question that the agent is required to answer . This fields mimics the user input.
Answer: The expected answer that the agent should generate. This field is used to evaluate the quality of the response.
Reasoning: The reasoning behind the answer. This field is used to evaluate the quality of the response.
Context: The context in which the question is asked. This field mimics the context in which the user is asking the question. For example the summary of previous turns in a conversation.

Orchestrator and Desktop agents

At this stage, benchmarks do not support the orchestrator and desktop agents. Benchmarks are only available for the programmer, operator, and assistant agents

Benchmarks

Create a benchmark​

Benchmark details​

Create a benchmark

Benchmark details