Here is the solution I found. It's kind of brute force and does not work 100% of the time, but it's a start.
- First, I build a second Vector Store index with only the Glossary French expressions.
- I call OpenAI and ask it to extract from the English/French answer the groups of words that are not English and to make a list from that.
- For each word (or group of words) of this least, I perform a similarity_search() on the French-only glossary to find the relevant entries. k=2 seems enough
- I consolidate all these searches results in one string that gives me a French->English sub-glossary.
- I call OpenAI to ask it to translate the original English/French answer into English using the sub-glossary
It work pretty well, except when there are similar entries in the Glossary. The search can pick the wrong entry. "Junk-in, junk-out" syndrome I guess.
The entire process takes 3 calls to OpenAI: the original question, the extraction query, and the translation query. So it's more expensive.
Anyone cares to comment?