Route Threshold Optimization
Route score thresholds are what defines whether a route should be
chosen. If the score we identify for any given route is higher than the
Route.score_threshold
it passes, otherwise it does not and either
another route is chosen, or we return no route.
Given that this one score_threshold
parameter can define the choice
of a route, it’s important to get it right — but it’s incredibly
inefficient to do so manually. Instead, we can use the fit
and
evaluate
methods of our RouteLayer
. All we must do is pass a
smaller number of (utterance, target route) examples to our methods,
and with fit
we will often see dramatically improved performance
within seconds.
Full Example
Define RouteLayer
As usual we will define our RouteLayer
. The RouteLayer
requires
just routes
and an encoder
. If using dynamic routes you must
also define an llm
(or use the OpenAI default).
We will start by defining four routes; politics, chitchat, mathematics, and biology.
For our encoder we will use the local HuggingFaceEncoder
. Other
popular encoders include CohereEncoder
, FastEmbedEncoder
,
OpenAIEncoder
, and AzureOpenAIEncoder
.
Now we initialize our RouteLayer
.
By default, we should get reasonable performance:
We can evaluate the performance of our route layer using the
evaluate
method. All we need is to pass a list of utterances and
target route labels:
On this small subset we get perfect accuracy — but what if we try we a larger, more robust dataset?
Hint: try using GPT-4 or another LLM to generate some examples for your own use-cases. The more accurate examples you provide, the better you can expect the routes to perform on your actual use-case.
Ouch, that’s not so good! Fortunately, we can easily improve our performance here.
Route Layer Optimization
Our optimization works by finding the best route thresholds for each
Route
in our RouteLayer
. We can see the current, default
thresholds by calling the get_thresholds
method:
These are all preset route threshold values. Fortunately, it’s very easy
to optimize these — we simply call the fit
method and provide our
training utterances X
, and target route labels y
:
Let’s see what our new thresholds look like:
These are vastly different thresholds to what we were seeing before —
it’s worth noting that optimal values for different encoders can vary
greatly. For example, OpenAI’s Ada 002 model, when used with our
encoders will tend to output much larger numbers in the 0.5
to
0.8
range.
After training we have a final performance of:
That is much better. If we wanted to optimize this further we can focus on adding more utterances to our existing routes, analyzing where exactly our failures are, and modifying our routes around those. This extended optimzation process is much more manual, but with it we can continue optimizing routes to get even better performance.