Does `chunk size limit` not work on

Do you have any insight @Logan M?

I still need to try setting up pinecone, but I can try and help sure 😆

How are you checking the chunk size?

Just checking the console of which chunks was used in the query.

And then just running the chunk through openai tokenizer

How I first picked up on the problem was because I was using similarity_top_k=5 with 512 sized chunks on vector index and I was fitting all of those in one query. But now when I used with the pinecone index I started to get errors back because the query was too big.

Ohhhh interesting 🤔 tbh that sounds like either a bug or there's somewhere else you need to set the chunk size limit

I'll look at the codebase and see if I can tell lol

Just in case that makes a difference.

index = GPTPineconeIndex(documents, pinecone_index=index, chunk_size_limit=512)

That's how I first initialized the index

Now I am using it like this:
index = GPTPineconeIndex('', pinecone_index=index, chunk_size_limit=512)

I mean, from what I can tell that should be working 🤔 maybe it's a bug? @jerryjliu0 thoughts?

Yeah, I tried to index again with smaller dataset just to see that I am not hallucinating with all the AI hallucination going on, but the same result. All the chunks are in different sized and not even close to the limit I have set.

I am on version 0.4.28

hey @Erik @Logan M , thanks for flagging

oooooh hm this may be a bug

will take a look soon tonight!

@jerryjliu0 Any updates on the issue? Or should I use a different vector store for the time being?

hey @Logan M @jerryjliu0. Do you have any updates on the issue? If no, is there a vector store support that for sure supports chunking up data in the size I want?

Sorry, I am not trying to be annoying here. I just would like to know if this is something that can be fixed or should I start indexing on a new vector store.

Oh! I might actually know the reason (after reading this again)

So documents get chunked twice. Once at index construction (to a default length of 4000), and then again during query time.

This is because the chunk size is constrained by the size of the prompt, and ada-002 has a context length of ~8192ish

So, it is chunking again at query time according to max_chunk_size. But when it records the source node, it maps the second level chunking to the original node.

@jerryjliu0 did I describe that properly? Just reading the code to understand lol I think that's how it works.

Maybe try the llama logger (bottom of this notebook https://github.com/jerryjliu/llama_index/blob/main/examples/vector_indices/SimpleIndexDemo.ipynb) to double confirm the size of the chunks sent to the LLM

@Logan M

[{'index_type': <IndexStructType.PINECONE: 'pinecone'>, 'doc_id': 'ba835566-a957-4834-beea-d9a41ae39a4c', 'initial_response': "The timeline to book wedding vendors can vary depending on the vendor and your wedding date. However, there are some general guidelines you can follow. Here is a rough timeline for booking your wedding vendors:\n\n12+ months before your wedding:\n- Wedding planner: If you're hiring a full-service wedding planner, you may want to hire them first thing.\n- Wedding venue: Book your venue as soon as possible, as this will determine your wedding date.\n\n10-12 months before your wedding:\n- Photographer: Good wedding photographers are in high demand, so book them early.\n- Videographer: Like photographers, good videographers are also in high demand.\n\n8-10 months before your wedding:\n- Caterer: If your venue doesn't provide catering, book a caterer early.\n- Florist: Book your florist early to ensure they're available on your wedding date.\n- Music: Book your DJ, band, or ceremony musicians.\n\n6-8 months before your wedding:\n- Officiant: Book your officiant early to ensure they're available on your wedding date.\n- Hair and makeup: Book your hair stylist and makeup artist.\n- Wedding cake: Book your wedding cake baker.\n\n4-6 months before your wedding:\n- Stationery: Order your save-the-dates, invitations, and other stationery.\n- Transportation: Book your transportation, such as limos or party buses.\n- Wedding dress and attire: Start shopping for your wedding dress, suit, or tuxedo.\n\n2-4 months before your wedding:\n- Event rentals: Book any rentals you need, such as tables, chairs, or linens.\n- Favors: Order your wedding favors.\n- ``Lighting:

Book your lighting designer.\n\nOf course, this timeline is just a general guideline. You may need to adjust it based on your wedding date, vendor availability, and personal preferences. It's always a good idea to book your vendors as early as possible to ensure they're available on your wedding date."}]

That's the output of the llama logger. The only thing I see is the answer. How would I see what is sent to LLM on index creation?

Awe, it only logged the response 😅 whoops, sorry for the misdirection there

On index creation? I'm not seeing anything obvious there

Nvm, I understood what you said wrong

Hmm, so sounds like this a bug then? Are there any other vector stores that I could use instead? @Logan M

bbSharpCyclist

Thanks for the tip on llama logger!

I still don't think this is technically a bug (sorry, it was late last night and I had to put my phone down lol)

All other vector stores will have thr same behavior from what I can see. I want to run a little test with pdb in a bit and step through the code to confirm

@Erik did you start with a fresh pinecone store when you added the chunk size limit? Was the text possibly added before the limit was set? I just stepped through the code locally, I can't really see why it wouldn't work tbh. Every vector store uses the same code for chunk_size_limit

I've tried it few times now. To my knowledge I did a fresh start each time, like this:
pinecone.init(api_key=pinecone_api_key, environment="us-east1-gcp")
pinecone.create_index("name", dimension=1536, metric="euclidean", pod_type="p1")
index = pinecone.Index("name")

documents = SimpleDirectoryReader('data').load_data()
index = GPTPineconeIndex(documents, pinecone_index=index, chunk_size_limit=512)

And after that each time I run the program I do it like this:
pinecone.init(api_key=pinecone_api_key, environment="us-east1-gcp")
index = pinecone.Index("name")
index = GPTPineconeIndex("", pinecone_index=index, chunk_size_limit=512)

I have pasted the parts of code together that should be relevant here if you wanna have a look: https://pastecode.io/s/rceobmis

Cool, thanks for the details! And when you do create_index, I'm guessing you give it a different name each time?

If you are able to, can you query the index outside of langchain, save the response, and print(len(response.source_nodes[0].source_text.split(" "))) (or loop over all response nodes and print that?) Just want to sanity check how long the response nodes are (this will approximate the words - should be around 350ish for a chunk size of 512)

Yes I give it different name and even tried with different accounts to be sure I am not messing something up

Ok, I will try it out

I didn't loop it through but the first one I got already with print(len(response.source_nodes[0].source_text.split(" "))) was 591

Attachment

bruh, your index is cursed lol

lol, that's what I thought

I would open a github issue for this... share that pastecode with it. Needs some serious debugging 😆

One more thing you can try: print(INDEX.prompt_helper.chunk_size_limit) just before you would query (it should be 512, the prompt_helper is what helps build the inputs to the LLM, and it should be following the chunk_size_limit)

Yep this one return 512

Alrighty, I will open a github issue for that.

So as I understand all the other vector stores follow the same process. So I would most likely not have different results with them or do you think it would be worth a try? @Logan M

How is this not working?? absolutely mind blowing lol that's a good piece of evidence to narrow down the bug though

I mean, I thought they were all using mostly the same code 😅 Like it works fine for simple vector index. It might be worth trying one more just to see?

I'll let you know if I find any solution though

Any suggestions on which to use? The reason I even tried Pinecone in the first place is that I have used it before.

And yeah vector index worked fine for me before I tried to use vector stores instead.

qdrant seems like another easy one to setup + its open source

Okay, thanks a lot btw, you've been super helpful. I will try the qdrant out and let you know if that works out better

Hihi sorry taking a look at this thread now! @Erik @Logan M is the main issue still that when you specify a chunk_size of 512 tokens in pinecone, the chunk sizes are bigger?

there seem to be a few different points in here

Yea that's the main issue with pinecone 😅 we confirmed it by printing the length of the source node, even though the prompt helper has chunk_size_limit set to 512 🤔

i see.....weird. using pinecone shouldn't affect the chunk size

I agree lol looking at the code it doesn't really make any sense how it's not following the size limit

@Erik i might have missed this, did you have a sample code snippet + data to help me repro?

No sorry I don't have a one but I will create a sample code snippet + data tomorrow, gonna go to bed rn. But yes the problem was that the chunk sizes were bigger than 512 even when I had the chunk_size_limit set.

@Logan M @jerryjliu0 I think I found out what is happening. I am not sure why though. I started to read through the chunks it was using and it uses the text for each chunk twice.

For example this is how one of the chunks looks like: https://pastebin.com/JZLBnFqC

@jerryjliu0 @Logan M Anyways this could be used to repro: https://github.com/eriktlu/llama-pinecone-sample

@Erik thanks for passing along, taking a look soon

@jerryjliu0 Hey, I saw that there was an update on the newer version regarding Pinecone, any chance this could fix the issue here?

So I finally got around to update the packages and catch up with the updates.

If anyone finds this thread then this is fixed now and how I create my pinecone index and set a chunk size limit is like this:

import pinecone
from llama_index import GPTPineconeIndex
from llama_index.indices.service_context import ServiceContext

pinecone.init(api_key=pinecone_api_key, environment=env)
index = pinecone.Index(name)

service_context = ServiceContext.from_defaults(chunk_size_limit=512)
index= GPTPineconeIndex("", pinecone_index=index, service_context=service_context)

Works perfectly. Thanks for the all the help @Logan M @jerryjliu0