LlamaIndex

Log inLog into community

Find answers from the community

Updated 5 months ago

Chunk sizes

Chunk sizes

At a glance

·

Hey ya'll - is there a way to get the query engine to submit multiple nodes at once with queries? Is that what embedding_limit is used for? I'm running into an issue where my Vector Store Index (when using the Curie model on OpenAI) is creating small chunks because it's creating a chunk per page in the documents it's indexing. I like that it's keeping track of pages, but the chunk sizes are so small that the query doesn't have enough context to create a decent response. Does that make sense? Alternatively, a chunk minimum size limit could also work I guess if that's an option available (I don't see anything like that in the docs)?

L

P

54 comments

With a vector index? You can set the top k in the query engine to be larger

index.as_query_engine(similarity_top_k=5)

You could also manually process the document objects to combine them if some are too small. I don't think there's a minimum size setting though 🤔

So top 5 will pick the top 5 chunks but then the query doesn't process them all together, right? At least in the Refine (and now Accumulate) response builders.

This is true. But the compact mode will

Or at least, it will stuff as much text as possible into each LLM call

Will Compact mode make multiple LLM calls? Looking at the code I thought it would only make 1 call with as much text as possible.

Nope it will still do multiple

Makes me think maybe I need to update my Accumulate PR again to include a new Accumulate Compact 😆

Hahaha if you need it, the power is in your hands 😅

is Compact mode the same as Compact & Refine?

Yea the compact mode extends the refine mode. The only difference is how it sets up each text chunk to be as large as possible to reduce LLM calls

sweet yeah I think I might need to make another PR 😆

I'll get the Accumulate PR merged first though. 1 step at a time

So I'm experimenting with making a Compact and Accumulate mode and it's pretty straightforward, but I'm really confused by how compact_text_chunks works. I've created this simple test and it's failing and I don't understand why. Regardless of what I set in my PromptHelper init params, it's keeping all of the chunks separate. Any thoughts?

Plain Text

mock_qa_prompt_tmpl = "{context_str}{query_str}"
mock_qa_prompt = QuestionAnswerPrompt(mock_qa_prompt_tmpl)
prompt_helper = PromptHelper(
    max_input_size=11,
    num_output=0,
    max_chunk_overlap=0,
    tokenizer=mock_tokenizer,
    separator="\n\n",
    chunk_size_limit=20,
)
texts = [
    "This",
    "is",
    "bar",
    "This",
    "is",
    "foo",
]
compacted_chunks = prompt_helper.compact_text_chunks(mock_qa_prompt, texts)
assert compacted_chunks == ["This\n\nis\n\nbar", "This\n\nis\n\nfoo"]
# AssertionError: assert ['This', '', 'is', '', 'bar', '', 'This', '', 'is', '', 'foo'] == ['This\n\nis\n\nbar', 'This\n\nis\n\nfoo']

Hmmm what happens if you increase the max input size?

The code for compact_text_chunks seems so simple, kinda weird that it's not working lol

yeah I increased it to 10000 and it was the same

that was my thought too

I actually got a full concat and accumulate response builder working but then I went to make a unit test for it and I couldn't get it to do anything I expected and eventually I ended up at the compact_text_chunks function

what's interesting is that the one unit test for compact_text_chunks does work

and the only difference I can see is the prompt template

This is the working unit test:

Plain Text

def test_compact_text() -> None:
    """Test compact text."""
    test_prompt_text = "This is the prompt{text}"
    test_prompt = TestPrompt(test_prompt_text)
    prompt_helper = PromptHelper(
        max_input_size=9,
        num_output=1,
        max_chunk_overlap=0,
        tokenizer=mock_tokenizer,
        separator="\n\n",
    )
    text_chunks = ["Hello", "world", "foo", "Hello", "world", "bar"]
    compacted_chunks = prompt_helper.compact_text_chunks(test_prompt, text_chunks)
    assert compacted_chunks == ["Hello\n\nworld\n\nfoo", "Hello\n\nworld\n\nbar"]

I can push my changes to a public branch if you want to take a look

Let me see if I can test something locally first here

Very sus

Haha

hmmm it worked for me somehow haha

Plain Text

>>> from llama_index import PromptHelper
>>> prompt_helper = PromptHelper(max_input_size=4096, num_output=256, max_chunk_overlap=20)
>>> from llama_index.prompts.prompts import QuestionAnswerPrompt
>>> mock_prompt = QuestionAnswerPrompt("{context_str}{query_str}")
>>> prompt_helper.compact_text_chunks(mock_prompt, ['This', 'Should', 'be', 'one', 'msg.'])
['This\n\nShould\n\nbe\n\none\n\nmsg.']
>>>

not quite the same example I guess, but at least its possible lol

🤔 that's so weird

ok let me keep messing around with it

ok yeah.... there must be something else causing an issue here

ok so... What does having mock_service_context as a param (even if it isn't reference at all) do here?

Plain Text

def test_accumulate_compact_response(
    mock_service_context: ServiceContext,
) -> None:
  mock_qa_prompt_tmpl = "{context_str}{query_str}"
    mock_qa_prompt = QuestionAnswerPrompt(mock_qa_prompt_tmpl)
    prompt_helper = PromptHelper(
        max_input_size=100,
        num_output=0,
        max_chunk_overlap=0,
        tokenizer=mock_tokenizer,
        separator="\n\n",
    )
    texts = [
        "This",
        "is",
        "bar",
        "This",
        "is",
        "foo",
    ]
    compacted_chunks = prompt_helper.compact_text_chunks(mock_qa_prompt, texts)
    assert compacted_chunks == ["This\n\nis\n\nbar\n\nThis\n\nis\n\nfoo"]

Because that fails if I run this:

Plain Text

pytest -k test_accumulate_compact_response tests/indices/response/test_response_builder.py -vv

But it passes if I just drop the param even though it isn't referenced in the test at all 🤯

I'd say drop the param then 😅 not 100% sure what it's doing if it's not being used..

Very spooky

oh shit

having that param actually triggers this:

Plain Text

@pytest.fixture()
def mock_service_context(
    patch_token_text_splitter: Any, patch_llm_predictor: Any
) -> ServiceContext:
    return ServiceContext.from_defaults(embed_model=MockEmbedding())

which then you're like "well so what what does that do?"

well that param, patch_token_text_splitter, actually triggers a new fixture:

Plain Text

@pytest.fixture
def patch_token_text_splitter(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(TokenTextSplitter, "split_text", patch_token_splitter_newline)
    monkeypatch.setattr(
        TokenTextSplitter,
        "split_text_with_overlaps",
        patch_token_splitter_newline_with_overlaps,
    )

so the mock_service_context fixture basically sets global overrides of the text splitter. What specifically it does, I don't have time to investigate. But I'm pretty sure that's what's causing this weirdness.

Lol absolutely bonkers. Good catch!

Its basically overriding a key function of the text splitter 👀

Because... reasons lol

omg what an adventure

it turns out I need that patch_llm_predictor side effect because without it the predict call just hangs indefinitely with no error

WOOOO I got the tests passing

new PR incoming!

oh no haha wow between now and yesterday ya'll split out the response builders into their own files!

Oh crap, Simon is moving fast hahaha

Was probably time to do that, it was getting a bit monolithic

yeah for sure

no it was a good idea to do it just surprised me when I went to try to merge 😆

https://github.com/jerryjliu/llama_index/pull/3560/

Add a reply

Sign up and join the conversation on Discord