Find answers from the community

Updated 12 months ago

Retrieval

At a glance
but this work very well :vector_store = self.setup_chroma(collectionName)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader(
input_files = files
).load_data()

node_parser = SentenceSplitter(chunk_size=180, chunk_overlap=80)
nodes = node_parser.get_nodes_from_documents(documents)
index = VectorStoreIndex(nodes, storage_context=storage_context)
L
A
15 comments
Why is your chunk size so small? (And overlap is very large here compared to chunk size)
If you aren't changing the top k, it's only retrieving the top 2 nodes -- and both those nodes are tiny
So retrieval will probably not be good with these settings
What should be ideal size of chunk and chunk_overlap
I am using gpt-3.5-turbo AND text-embedding-ada-002
The default is a good choice (1024)

512 is also not bad.

Chunk overlap depends on the chunk size generally. I usually go with 20, but it's less important
how can I improve performance?
RAG is unable to give proper answer to questions I did chunking before storing in chroma
what else can be done?
Did you change the chunk size? How much data are you indexing?
what kinds of questions are you asking? You can debug retrieval somehwat by checking the source nodes

Plain Text
response = quer_engine.query("...")
for node in response.source_nodes:
  print(node.text)
Yes I did change the size. the size of data is 4.78 MB which is 4 documents.
Most of data are in form of tables in word document and excel
I think it is having trouble storing and retrieving of data in table
4.78 MB is a looot of text πŸ˜… I would suggest
a) using a chunk size of 512
b) increasing the top k -- probably 12?
c) using a reranker with top-n = 3 or 4
Add a reply
Sign up and join the conversation on Discord