Hi guys so I would like some guidance on

At a glance

Hi guys - so I would like some guidance on how to tune my bot to give more detailed answers. The source data are different formatted word documents. At the moment it summarises "too much" and I really need to generate large volumes of text with the output. Where should I start?

7 comments

LLogan M

Likely would need to either

a) appened to every query asking it to write a certain length
b) modify the prompt templates for the query engine

HHarrison

Ok great I've done that, much better results! But how do I improve accuracy further. So I've fed the system 20 hypothetical use cases about delivering AWS, Azure and GCP Services to a bunch of fictional clients. The sort of questions I'm asking it are "Tell me about a time when we delivered Azure", "Tell me all the clients we have done a data migration at, and what were the associated business outcomes"..

I think there are two parts for investigation:

HHarrison

1) unstructured.io - adding structure may help here for the simplenodeparser?? could be wrong??
2) a knowledge graph to get more detailed answers

is there anything else I should be looking at? for 1) is there any further documentation I can harvest to understand how that works? I assumed I would need to parse into separate indexes then add meta data to each index? Then get the query engine to go through each index with a meta tag of "Azure" for instance ?

LLogan M

Unstructured io is really good for parsing large amounts of unstructured data, but it sounds like since you've created the data yourself it's already pretty structured 😅 but if you wanted to learn more about unstructured, I would read about their core library or pipeline templates here
https://github.com/Unstructured-IO

You could try a knowledge graph index, but tbh I'm not sure how well it will work 🤔

Is the accuracy bad because it's not retrieving proper nodes? Or bad because the LLM is not understanding something properly?

HHarrison

Thanks for the advice ! So let's take this scenario.. If sent a query which said "tell me all the clients we have delivered Azure at" it would only give me a handful, when in fact in the source data there are loads of examples.

HHarrison

So I guess in this case I either need the query engine to send more nodes to the LLM and/or summarise each node when building the index in a more concise way, so it can fit more within the prompt ? Any recommendations on what I should look at to experiment here ?

LLogan M

Yea there's a few options that come mind

use a router query engine, and route queries like that to a list index so that all the data gets read/summarized to answer the query. Normal queries would go to a vector index

setup a sql index with stats on your clients, in addition to a vector index, using a router query engine. This db could be automatically built by the LLM actually (which the sql index supports), or it could assist in extracting stats for the db (using a pydantic program, or pydantic output parsing)

Add a reply

Find answers from the community

Hi guys so I would like some guidance on