Hello everyone I made good progress with

Hello everyone! I made good progress with my AI app: using a Qdrant vector storage and OpenAI, I can query a French book and it works great 🙂
Now, what about English? When I ask a question in English, it answers properly in English using the context of the book. It's magic. Of course, proper names remain in French, which makes sense.
It happens that I have a French -> English glossary for these names. I wrote a test OpenAI prompt in which I give the first English/French answer and a table of the few relevant glossary entries, and presto, it rewrites the text by replacing the French names by their English equivalent. Very nice. So, where is my problem?
My problem is to find these "relevant glossary entries". I thought about building a separate Vector Store with the Glossary, and perform a query() with the first English/French answer, but the list of Nodes I get back is very far from being relevant (my glossary has one Node per entry). Even when I increase greatly the number of returned Nodes, I don't get relevant entries. I even get entries that have words that are nowhere in the text. Weird.
How can I narrow the search?

Here is the solution I found. It's kind of brute force and does not work 100% of the time, but it's a start.

First, I build a second Vector Store index with only the Glossary French expressions.
I call OpenAI and ask it to extract from the English/French answer the groups of words that are not English and to make a list from that.
For each word (or group of words) of this least, I perform a similarity_search() on the French-only glossary to find the relevant entries. k=2 seems enough
I consolidate all these searches results in one string that gives me a French->English sub-glossary.
I call OpenAI to ask it to translate the original English/French answer into English using the sub-glossary

It work pretty well, except when there are similar entries in the Glossary. The search can pick the wrong entry. "Junk-in, junk-out" syndrome I guess.
The entire process takes 3 calls to OpenAI: the original question, the extraction query, and the translation query. So it's more expensive.
Anyone cares to comment?

Find answers from the community

Hello everyone I made good progress with