Find answers from the community

Updated 2 months ago

Sure does Check ths guide https github

M
L
17 comments
Do I have this right?

When we are loading the data we are getting GPT to do something to do (Building the index) but its not being stored as a json file here? Then from the Index which is currently in ChatGPT Memory we can ask our question? While both are costing tokens.

How does this cut down cost?, or is it the query to the json files that have the stored information cutting the cost as we then have built our index into the json which is using embedding for OpenAI?

My example would be I want to have a bunch of data for a python library and I want to question the whole documentation using ChatGPT and GPT Index.

  1. Get the data and put it in text files
  2. Index it using the shown methods
  3. Start chatting?
Is it really that simple?
"When we are loading the data we are getting GPT to do something to do (Building the index) but its not being stored as a json file here?"
  • In this example, when the index is created, we call ada-002 from OpenAI to create embedding vectors for text chunks
  • these vectors are in memory (we support 3rd party vector stores if you end up having a large index)
  • when we query, we embed the query text, and return the closest matched text chunk using cosine similarity
  • That text chunk is sent to the LLM, which generates an answer to the query
  • the index can be saved to/loaded from disk at anytime, so you only need to calculate embeddings once for all your text
"How does this cut down cost?"
  • Instead of looking at every piece of text in our index (expensive!), in the example above, we only looked at one (cheaper!)
"3. Start chatting? "
"Is it really that simple?"
  • yea, pretty much! Answering questions about documentation is a perfect use case. It may or may not take some experiments to get the right index type and data organization, but it's not too complicated.
That first part is insane!

So for example

I got a page about creating "Containers",
It finds a chunk about "Containers" Relays that back in response.
Instead of reading the 300 Words on that page it only reads the 300 as 1 Chunk. Saving 299 Words?

-----------------------------------------

I got an extended question here.

I have Code Blocks that contain code examples, now if I wanted a user to use Chat to say create an example from one of our examples I could use a "trigger word" - "Code" which will only search for "code" chunks to which it will respond with?

Thanks for the detailed response as well!
  1. Ah, maybe I oversold it a bit πŸ˜† When I said chunks, I mean each input document is split into chunks according to either how much text can be sent to the LLM at once, or you can pre-define the chunk size.
So in your example, it would hopefully match to the text chunk containing information on containers. The key cost-savings part being that the LLM didn't have to read all your documentation, just the part that's relevant.

  1. Yea, that might work! For this example, it would really depend on how you structure your index
From what you are describing, you might get the best results using a vector index for each page, and then wrapping all those vector indexes with either a keyword index or another vector index. Or maybe even some other combination.

By stacking indexes, it tries to ensure that queries get routed to only relevant information. Check out the docs for this here: https://gpt-index.readthedocs.io/en/latest/how_to/composability.html

(In fact, the llama index docs contain a lot of good information, you might find it helpful πŸ’ͺ )
Again thanks for the information, I will be doing more research!

Never knew we could utilize OpenAI for our own Data and now do it for near to nothing with GPT Turbo.

Could OpenAI stop this if they removed embeddings?
Even if openAI removed embeddings, there are lots of other options (other 3rd party vendors like cohere, or even open-source models from huggingface will work well enough for most embeddings)
From your experience how much are you saving doing it this way instead of just Davichi or direct Chat Turbo?
Doing it which way? Using the vector index from from llama index?

There's really no comparison. Davinici and ChatGPT have a limited context window, so you need some sort of architecture in place to present the LLM with relevant info to answer a question.

Sure, you could do something more manual on your own, but I think llama_index streamlines the process quite a bit. Furthermore, as LLMs come down in cost (open source models, smaller models running on consumer hardware), having a process in place like llama_index is a good advantage.

In my personal opinion, doing anything at a huge scale with OpenAI is definitely going to be expensive πŸ’Έ open-source models and consumer hardware needs to catch up to bring the costs down
"There's really no comparison. Davinici and ChatGPT have a limited context window, so you need some sort of architecture in place to present the LLM with relevant info to answer a question. "

I get that issue all the time with just Davinchi, so the benifits here really are that we can hold and share as in give to LLMs more data for less?
Yea exactly! Llama Index can hold a ton of data in the index, and provide a way for the LLM to work with it, while still observing that maximum context window
Sorry 1 more question, so with Embedded basically its here is "500 words of blah blah" however I am going to give you that 500 words in 20 numbers. "You remember what I am talking about now"? GIVE ME RESPONSE!

Sorry I am trying to just dumb it down some more so I know exactly where I am going to spend my time. haha
Haha yea pretty much! Embeddings take those 500 words and represents them with a list of numbers (by default, it's a list of 1536 numbers πŸ’« )
Ok cool, so def worth it! again thanks for your help and hopefully others can learn from this as well. πŸ™‚ Have a good day
Good luck! Happy to help! πŸ’ͺ
Ran a small test with a Mid Sized new Python Library and yeah, I have to say this will change everything for companies. Esp now since OpenAI dont read / train off your data anymore.
Glad it worked out for you! And yea that was a very smart change they made haha
I noticed this guide: https://github.com/jerryjliu/gpt_index/blob/main/examples/vector_indices/SimpleIndexDemo-ChatGPT.ipynb got updated and I am trying to read from disk but it is taking a while and using a bit of tokens to do so.
Add a reply
Sign up and join the conversation on Discord