Find answers from the community

Updated 2 months ago

Chunk sizes

Hey ya'll - is there a way to get the query engine to submit multiple nodes at once with queries? Is that what embedding_limit is used for? I'm running into an issue where my Vector Store Index (when using the Curie model on OpenAI) is creating small chunks because it's creating a chunk per page in the documents it's indexing. I like that it's keeping track of pages, but the chunk sizes are so small that the query doesn't have enough context to create a decent response. Does that make sense? Alternatively, a chunk minimum size limit could also work I guess if that's an option available (I don't see anything like that in the docs)?
L
P
54 comments
With a vector index? You can set the top k in the query engine to be larger

index.as_query_engine(similarity_top_k=5)

You could also manually process the document objects to combine them if some are too small. I don't think there's a minimum size setting though πŸ€”
So top 5 will pick the top 5 chunks but then the query doesn't process them all together, right? At least in the Refine (and now Accumulate) response builders.
This is true. But the compact mode will
Or at least, it will stuff as much text as possible into each LLM call
Will Compact mode make multiple LLM calls? Looking at the code I thought it would only make 1 call with as much text as possible.
Nope it will still do multiple
Makes me think maybe I need to update my Accumulate PR again to include a new Accumulate Compact πŸ˜†
Hahaha if you need it, the power is in your hands πŸ˜…
is Compact mode the same as Compact & Refine?
Yea the compact mode extends the refine mode. The only difference is how it sets up each text chunk to be as large as possible to reduce LLM calls
sweet yeah I think I might need to make another PR πŸ˜†
I'll get the Accumulate PR merged first though. 1 step at a time
So I'm experimenting with making a Compact and Accumulate mode and it's pretty straightforward, but I'm really confused by how compact_text_chunks works. I've created this simple test and it's failing and I don't understand why. Regardless of what I set in my PromptHelper init params, it's keeping all of the chunks separate. Any thoughts?
Plain Text
mock_qa_prompt_tmpl = "{context_str}{query_str}"
mock_qa_prompt = QuestionAnswerPrompt(mock_qa_prompt_tmpl)
prompt_helper = PromptHelper(
    max_input_size=11,
    num_output=0,
    max_chunk_overlap=0,
    tokenizer=mock_tokenizer,
    separator="\n\n",
    chunk_size_limit=20,
)
texts = [
    "This",
    "is",
    "bar",
    "This",
    "is",
    "foo",
]
compacted_chunks = prompt_helper.compact_text_chunks(mock_qa_prompt, texts)
assert compacted_chunks == ["This\n\nis\n\nbar", "This\n\nis\n\nfoo"]
# AssertionError: assert ['This', '', 'is', '', 'bar', '', 'This', '', 'is', '', 'foo'] == ['This\n\nis\n\nbar', 'This\n\nis\n\nfoo']
Hmmm what happens if you increase the max input size?
The code for compact_text_chunks seems so simple, kinda weird that it's not working lol
yeah I increased it to 10000 and it was the same
that was my thought too
I actually got a full concat and accumulate response builder working but then I went to make a unit test for it and I couldn't get it to do anything I expected and eventually I ended up at the compact_text_chunks function
what's interesting is that the one unit test for compact_text_chunks does work
and the only difference I can see is the prompt template
This is the working unit test:
Plain Text
def test_compact_text() -> None:
    """Test compact text."""
    test_prompt_text = "This is the prompt{text}"
    test_prompt = TestPrompt(test_prompt_text)
    prompt_helper = PromptHelper(
        max_input_size=9,
        num_output=1,
        max_chunk_overlap=0,
        tokenizer=mock_tokenizer,
        separator="\n\n",
    )
    text_chunks = ["Hello", "world", "foo", "Hello", "world", "bar"]
    compacted_chunks = prompt_helper.compact_text_chunks(test_prompt, text_chunks)
    assert compacted_chunks == ["Hello\n\nworld\n\nfoo", "Hello\n\nworld\n\nbar"]
I can push my changes to a public branch if you want to take a look
Let me see if I can test something locally first here
hmmm it worked for me somehow haha
Plain Text
>>> from llama_index import PromptHelper
>>> prompt_helper = PromptHelper(max_input_size=4096, num_output=256, max_chunk_overlap=20)
>>> from llama_index.prompts.prompts import QuestionAnswerPrompt
>>> mock_prompt = QuestionAnswerPrompt("{context_str}{query_str}")
>>> prompt_helper.compact_text_chunks(mock_prompt, ['This', 'Should', 'be', 'one', 'msg.'])
['This\n\nShould\n\nbe\n\none\n\nmsg.']
>>> 
not quite the same example I guess, but at least its possible lol
πŸ€” that's so weird
ok let me keep messing around with it
ok yeah.... there must be something else causing an issue here
ok so... What does having mock_service_context as a param (even if it isn't reference at all) do here?
Plain Text
def test_accumulate_compact_response(
    mock_service_context: ServiceContext,
) -> None:
  mock_qa_prompt_tmpl = "{context_str}{query_str}"
    mock_qa_prompt = QuestionAnswerPrompt(mock_qa_prompt_tmpl)
    prompt_helper = PromptHelper(
        max_input_size=100,
        num_output=0,
        max_chunk_overlap=0,
        tokenizer=mock_tokenizer,
        separator="\n\n",
    )
    texts = [
        "This",
        "is",
        "bar",
        "This",
        "is",
        "foo",
    ]
    compacted_chunks = prompt_helper.compact_text_chunks(mock_qa_prompt, texts)
    assert compacted_chunks == ["This\n\nis\n\nbar\n\nThis\n\nis\n\nfoo"]
Because that fails if I run this:
Plain Text
pytest -k test_accumulate_compact_response tests/indices/response/test_response_builder.py -vv
But it passes if I just drop the param even though it isn't referenced in the test at all 🀯
I'd say drop the param then πŸ˜… not 100% sure what it's doing if it's not being used..
having that param actually triggers this:
Plain Text
@pytest.fixture()
def mock_service_context(
    patch_token_text_splitter: Any, patch_llm_predictor: Any
) -> ServiceContext:
    return ServiceContext.from_defaults(embed_model=MockEmbedding())
which then you're like "well so what what does that do?"
well that param, patch_token_text_splitter, actually triggers a new fixture:
Plain Text
@pytest.fixture
def patch_token_text_splitter(monkeypatch: pytest.MonkeyPatch) -> None:
    monkeypatch.setattr(TokenTextSplitter, "split_text", patch_token_splitter_newline)
    monkeypatch.setattr(
        TokenTextSplitter,
        "split_text_with_overlaps",
        patch_token_splitter_newline_with_overlaps,
    )
so the mock_service_context fixture basically sets global overrides of the text splitter. What specifically it does, I don't have time to investigate. But I'm pretty sure that's what's causing this weirdness.
Lol absolutely bonkers. Good catch!
Its basically overriding a key function of the text splitter πŸ‘€
Because... reasons lol
omg what an adventure
it turns out I need that patch_llm_predictor side effect because without it the predict call just hangs indefinitely with no error
WOOOO I got the tests passing
new PR incoming!
oh no haha wow between now and yesterday ya'll split out the response builders into their own files!
Oh crap, Simon is moving fast hahaha
Was probably time to do that, it was getting a bit monolithic
no it was a good idea to do it just surprised me when I went to try to merge πŸ˜†
Add a reply
Sign up and join the conversation on Discord