Threshold optimization
Route score thresholds are what defines whether a route should be chosen. If the score we identify for any given route is higher than the Route.score_threshold
it passes, otherwise it does not and either another route is chosen, or we return no route.
Given that this one score_threshold
parameter can define the choice of a route, it’s important to get it right — but it’s incredibly inefficient to do so manually. Instead, we can use the fit
and evaluate
methods of our SemanticRouter
. All we must do is pass a smaller number of (utterance, target route) examples to our methods, and with fit
we will often see dramatically improved performance within seconds.
Full Example
Define SemanticRouter
As usual we will define our SemanticRouter
. The SemanticRouter
requires just routes
and an encoder
. If using dynamic routes you must also define an llm
(or use the OpenAI default).
We will start by defining four routes; politics, chitchat, mathematics, and biology.
For our encoder we will use the local HuggingFaceEncoder
. Other popular encoders include CohereEncoder
, FastEmbedEncoder
, OpenAIEncoder
, and AzureOpenAIEncoder
.
Now we initialize our SemanticRouter
.
By default, we should get reasonable performance:
We can evaluate the performance of our route layer using the evaluate
method. All we need is to pass a list of utterances and target route labels:
On this small subset we get perfect accuracy — but what if we try with a larger, more robust dataset?
Hint: try using GPT-4 or another LLM to generate some examples for your own use-cases. The more accurate examples you provide, the better you can expect the routes to perform on your actual use-case.
Ouch, that’s not so good! Fortunately, we can easily improve our performance here.
Router Optimization
Our optimization works by finding the best route thresholds for each Route
in our SemanticRouter
. We can see the current, default thresholds by calling the get_thresholds
method:
These are all preset route threshold values. Fortunately, it’s very easy to optimize these — we simply call the fit
method and provide our training utterances X
, and target route labels y
:
Let’s see what our new thresholds look like:
These are vastly different thresholds to what we were seeing before — it’s worth noting that optimal values for different encoders can vary greatly. For example, OpenAI’s Ada 002 model, when used with our encoders will tend to output much larger numbers in the 0.5
to 0.8
range.
After training we have a final performance of:
That is much better. If we wanted to optimize this further we can focus on adding more utterances to our existing routes, analyzing where exactly our failures are, and modifying our routes around those. This extended optimization process is much more manual, but with it we can continue optimizing routes to get even better performance.