No worries! I will try to clarify π
So when you index a bunch of documents, they get broken into chunks and embeddings are generated for each chunk. The are chunked using the node parser, which has a default chunk_size_limit of 3900 and an default overlap of 200. If you set chunk_size_limit directly in the service context, then that will become the chunk size limit for this step
Then, during queries, the index retrieves the top k chunks of text (assuming you have a vector index here). If the text from the chunk + the prompt template + the query is bigger than max_input_size minus num_output, it breaks up the text into multiple chunks. This process is controlled by the prompt helper settings.
If top k is bigger than one, it refines an answer over the chunks. After getting a response from the first chunk, it sends the next chunk + prompt template + query + previous answer to the LLM to get an updated answer
So... pretty complicated haha llama index is always trying to make sure the text sent to the LLM isn't too big
Oh I see, i understand now ππ» Thank you so much!
I feel like giving chatgpt 3.5 context makes it much less smart, is this really the case...?
Are you getting answers like "The previous answer remains the same" stuff?
There's been a ton of problems with gpt-3.5 lately. Especially with the refine prompt.
It really feels like they downgraded the model lol
I have a refine prompt that I've been working on. I can share it if you want to try it out?
not really, it is just refusing to use the tools and context I provided, it insists on "fetching stuff on the internet" (which i didnt know it was capable of, maybe it's just lying to me)
maybe they downgraded to get more paying users for gpt4 XD
please do! would love to try it
this is my exact conspiracy theory too LOL 10x cheaper than davinci-003 seemed too good to be true
one sec, I'll get the code!
from langchain.prompts.chat import (
AIMessagePromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate,
)
from llama_index.prompts.prompts import RefinePrompt
# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
HumanMessagePromptTemplate.from_template("{query_str}"),
AIMessagePromptTemplate.from_template("{existing_answer}"),
HumanMessagePromptTemplate.from_template(
"I have more context below which can be used "
"(only if needed) to update your previous answer.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context, update the previous answer to better "
"answer my previous query."
"If the previous answer remains the same, repeat it verbatim. "
"Never reference the new context or my previous query directly.",
),
]
CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
...
index.query("my query", similarity_top_k=3, refine_template=CHAT_REFINE_PROMPT)
In my (limited) testing, this seemed to improve the quality of the refine process with gpt3.5. Lately lots of people have been complaining about it giving answers like "The new context is not relevant, so the previous answer remains the same" which is very unhelpful lol
thank you! ill try it tmr and get back with my experience with it π
is davinci-003 or gpt3.5 better...?
Davinci-003 is much better, at least in my experience (hallucinates less, better at following instructions)
And of course gpt4 is king. But it's also still on waitlists, and is also expensive
Oh I see! The refine prompt is working good btw π
I'm just curious, are you working on this full time? Is llama index backed by a company or just a community
Amazing! Maybe I should make a PR for it. Always scary to change something used to frequently though lol
Nah this is my spare time thing. I work full time elsewhere as a machine learning engineer.
Llama Index might be a company/full time thing someday though π
Oh i see, that's very cool π So do you have to design algorithms as a ML engineer?
It's more of training/designing models and datasets. Been doing lots of work with document analysis mostly (extracting key information/line items from invoices), some product categorization, and a few other random projects lol
It's not bad haha but tbh working on the llama index stuff has been much more interesting π
i know that feeling π working at a big company can be boring and your work can be very isolated
btw, you mentioned that we could save the json in memory, i was thinking caching it in redis, would that make any sense at all?
Yea that definitely makes sense! You could use save_to_string
and cache the entire string. I'm not sure how well that will scale though as the index gets bigger
There are some other options, like using qdrant or chroma, etc. For the docstore, there is also a recently added mongodb support, and I think official redis support is coming soon (but that's just for the documents)
Yea pretty much! I mean, it will vectorize text the same way for all vector indexes, it's just differences in how they vectors are stored (simple is all local and in memory + save to disk, others are more like dedicated databases)
oh i see! The chroma example looks a bit too simplified, I read it is purely in memory, so does it mean I don't have to start an "instance" of chroma (unlike other vector DBs)?
also, from some blog posts i see that they do
from llama_index import GPTSimpleVectorIndex
Whereas in the docs its
from gpt_index import GPTSimpleVectorIndex
I'm guessing they are the same? π
Yeaaa they are the same. There was a renaming at some point, and now it's complicated haha always use llama_index tho
And yea, normally with these vector store integrations you'll have an "instance" of that vector store running somewhere already
oh that's very convenient then! so much easier to serve it as an api this way, thanks!
Yea no worries! It also scales a little better as you index more data πͺπͺ
I was wondering is there a catch to storing everything in memory? How is it persisted and what if theres too much data..?
It's persisted by calling save/load from disk.
I've used it with index.json files that were up to 2GB, and tbh I didn't really notice any performance impacts π€ at a certain point yea its going to hit a scaling wall. But then that's where the vector store and doc store integrations come in
i was wondering, can i just vectorized and index the content of a few html pages in a json file? Would that impact performance a lot? I know vector search is fast
that's great to know π
Yea they aren't wrong. For production use-cases you would want a dedicated server running. (but again, this is only when you are dealing with large amounts of documents/vectors)
Other vector clients like pinecone and weaviate provide dedicated servers for you, so you don't have to worry about deploying your own. Depends on what works best for you
aw i see, i'm trying to ship a product to production, so i guess i will need weaviate then, i'm just wondering if theres a production vector database that's in memory, like how redis caches stuff in memory even in production
I think it can't really be in memory, or it has to be a combination of in memory/disk. It's similar to setting up a SQL database, but it just holds vector data instead π But with enough data, the vectors will use a lot of RAM if all held in memory
If you aren't planning on inserting much data though and only supporting queries, anything running in memory would be fine (GPTSimpleVectorIndex, that chroma example that was in memory)
yup that's what i thought too, so was a bit skeptical of chroma when i read its completely in memory. I'm planning to build something and sell it as a service, so it's up to the user to decide how much data to insert ... I guess ill go with something like weaviate then π
Sounds like a good choice then! π
Btw, the chatbot tutorial uses a graph index as a top level index, Iβm wondering why isnβt all the 10k files just vectorized into a single json file for faster lookupβ¦?
Because each index contains financial information for a specific year. Don't want to mix that data up in one pile yanno?
Plus with separate indexes, you can use the query decomposition transform to compare different years
Ah I see, was concerned that it was to optimize performance π thanks!
Btw, is there a place where people building these sort of things hangout? (Like conferences or something) we are building something and would like to find someone whoβs fully focused on tech π¬
hmmm not really sure haha. You can ask in #π¦founders maybe? I know most in-person activity is focused in silicon valley (assuming you are in north america)
I'm in the middle of nowhere lol so I'm little removed from that scene
How did you started contributing to llama if I may ask
Not too familiar with open source world
Honestly, I think my github feed recommended the repo LOL and I just really liked what the repo was doing
made a few small contributions to help learn the codebase (minor features/bug fixes), and here I am lol
Thatβs very impressive! Extremely gifted engineer π¬
haha nah, years of experience before that π
hey logan, i'm wondering do you know anything about deploying models on the cloud?
Deploying models on the cloud isn't tooo bad. Basically yea you setup a docker container, and basically you can deploy that
If you need a GPU, your options are basically google or aws. But the IT team at my job handles everything after we create the docker image π
I've used sagemaker a bit in the past, and it wasn't the best experience (their docs are π© )
But yup trynna do that, what embedded does llamaindex useβ¦? When using weaviate there are tons of options, such as choosing which text2vec, NER, QnA transformers, not sure will these override what llamaindex is using? Slightly confused π₯²π
okok ill take a moment to read up on this, thanks!!!
do you know how good is ada2? If i use the pre trainer weaciate transformers i get to use all: text2vec-transformers, QnA-transformers, and NER-transformers
Tbh I think Ada is pretty good, at least in my experience.
If you use a weaviate model, it sounds like text2vec is what you would want
It would be nice if they shared benchmark results of all these models on traditional benchmark datasets so people could compare them π
oh i see, i also looked into weaviate a bit more and it seems like i have to define a schema? I"m wondering how does weaviate stores the embeddings llamaindex sends to weaviate, since we never have to define a schema?
Ah i see, it's like a catch all schema for documents
for these production vector DBs, which one would you recommend TBH
Tbh they all seem really similar. Although I've heard some sketchy stuff about pinecone so maybe stay away from that one lol
Seems like weaviate, qdrant, and chroma are the most popular
lol thanks for telling me this, i wasn't gonna choose pinecone anyway cuz i can't self host it but i'm surprised theres sketchy stuff about it..??
Just a few weird comments people were making in the enterprise channel LOL seemed like they had some insider info on how their data is stored and managed. Some weird stuff with payment issues too
Then a few more people piped in after that haha
Well I definitely thought pinecone looked solid at first π¬
hahaha it seems popular too! Who knows xD
Maybe I should try it so no one else has to π¬
Hey Logan, I'm wondering do you know how well does gpt or davinci handle structured data, and is this something relevant to llamaindex? For example, if i have a CSV file and i want to ask some questions on it, would vectorizing the content even make sense?
Like, it can do some text2sql for structured data. So given the schemas of a few tables and maybe some extra context description of the tables, it can convert user queries to sql commands
However, sometimes the models hallucinate things like column names, especially gpt3.5 lol
You could also convert each row in the index to a document, but that doesn't make sense to do for every document type
oh i see, in what form should the schemas be provided in? Is there a strict format?
For the struct indices, it either needs to be a database or a pandas file
Then from there, the code derives the schema automatically
Hey Logan! Iβm wondering how does semantic search work under the hood for llamaindex?
Hey! Yea sure thing
The process goes something like this, with a vector index
- ingest documents. These documents are broken up into smaller overlapping nodes, so that they can be used for embeddings and LLM calls
- each node is then embedded using the embed_model (default is text-ada-002 from openai, which embeds uses 1536 dimenstions)
- then at query time, the query text is also embedded. Cosine similarity is calculated comparing the query embedding to all node embeddings
- the top 2 (by default) nodes are retrieved.
- if the text from the nodes is too big to fit into a single llm call, it gets broken into overlapping chunks again
- the first call to the llm sends your query and the node text, inside a prompt template
- the llm returns an answer to the query
- if there is more text for the model to read, the next chunk of text is sent. This time, the text, query, template, and existing answer is sent. The llm has to either update the existing answer using the new context, or repeat it
- finally, the answer is returned to the user, along with the source nodes + similarities used to create that answer
I hope that's what you were looking for haha
It is! So to my understanding llamaindex really shines when it comes to unstructured data like documents with lots of texts. But for structured data, can I just use a vector database like weaviate instead for semantic search? Interested in hearing about your thoughts
And thank you for being so helpful as always π
For structured data, embeddings don't make as much sense, especially for highly numeric spreadsheets and stuff like that
For that, what you can do is use an LLM to convert queries into something like sql commands (which llama index also does, but it will only return the result of that sql)
If you mean structured data as in JSON or something more textual, there are ways to make embeddings work for it
for structured data i mean something more like a vector database with a schema, like weaviate. Weaviate offers semantic search out of the box, so i'm guessing for these kind of structured data i don't need llamaindex?
Ahhh I see. I think llama index still provides some value there (since it integrates with weaviate, handles a lot of document ingestion, chunking, prompting), but definitely up to you π
By "document", do you mean an actual document? As long as llamaindex will improve semantic search in weaviate i'm happy, just wanted to see what you think π
Yea, I meant full document file π