Hey all I have a lot of chat arquives

At a glance

Hey all! I have a lot of chat arquives and wanted to train an AI to behave like this specific person. Do you think it would be best to use indices or maybe fine tuning?

15 comments

LLogan M

For generating a persona, I think fine-tuning would work best (assuming you have access to enough resources)

Otherwise, prepending some short examples of the person taking to each prompt could work, but that uses valuable input space. For that, I would use langchain

llama_index is more for information retrieval, but you could maaaybe do some hacky stuff to get it to respond more like your persona without fine-tuning. (i.e. changing the text_qa prompt to be something like "Given the following examples of a chat, generate a response to the user query while pretending to be user X")

DDeFinn | Ludmila

Humm thanks Logan! I'll do some tests and comeback with insights.

DDeFinn | Ludmila

Another doubt I have is regarding multiple sources. I've created a basic PoC app to chat about a long report I have. I've created a graph index that combines a vector index and the empty index, and created a langchain tool with that graph index

DDeFinn | Ludmila

https://twitter.com/DeFinnTheFarmer/status/1636513659017936896?t=ltOdoEISZZ9DpmsCcC7JMg&s=19

DDeFinn | Ludmila

But I feel it's having a hard time combining the document resources with pre-trained one

DDeFinn | Ludmila

When querying the graph directly it combines the answers bette

DDeFinn | Ludmila

Is it a bad idea to feed a single tool with a graph? Should I have more than one tool?

DDeFinn | Ludmila

And I want to expand it to use more resources as other documents and books

LLogan M

If you are using it with langchain, there are two main options for multiple sources

use a different langchain tool for each index, with a proper description for each index/tool
use another graph index, that combines all your sub-indexes and helps route queries to the proper indexes (https://gpt-index.readthedocs.io/en/latest/how_to/composability.html) -- you can basically wrap any set of indexes with another index

DDeFinn | Ludmila

I'm using the graph index for now, but it's hard to write a good description. Using different tools I imagine that makes it easier to write the descriptions, but harder to have combined answers, right?

LLogan M

Yea, writing the summary for each index in the graph can be hard. You can also use a temporary list index and response_mode="tree_summarize" to let the LLM write the summary for you

LLogan M

Using multiple tools, yea I think the combined answers thing can be a problem (I think langchain only uses one tool per response? I need to look into that a bit more)

DDeFinn | Ludmila

I think that if langchain can break the prompt into 2 or 3 steps it can use different tools, but then it increases prompting complexity. I think graph will work better for me. Ty so much Logan!

LLogan M

Cool! Good luck 💪

DDeFinn | Ludmila

Ty! By the way I was checking Twitter, and maybe query decomposition is what will do the trick for me https://twitter.com/jerryjliu0/status/1635306871069474820?t=ts-YhAYNIhQptWHWLn5sIg&s=19

Add a reply

Find answers from the community

Hey all I have a lot of chat arquives