I think that feature is a tad experimental, but glad to hear its working for you! Would probably be an easy change to not clobber the metadata π
Yeah, I took a look but the solution wasn't too obvious
I'm using the default metadata mode of MetadataMode.ALL, but all I get back at the end of the query for Metadata is
{'05603d6e-02fd-4ef3-bfa2-068961df84cf': {},
'40c874f6-edd5-4747-87b1-eda8ae72f2cb': {},
'4acb82e9-9f62-4795-a14e-d94f7bf2c05a': {},
'632a3505-0bfb-4d73-b792-71e436e5a916': {},
'9d0ee846-bce2-4374-bcf7-6cc2afe7ee6e': {},
'ae9c50eb-548d-499f-bfc9-7070087df97a': {},
'f35a3639-0319-467c-b1da-9fc74ce1728b': {},
'f41aed07-f230-40e2-892f-46c926a7ecb5': {},
'fb943654-6449-40e2-adf3-8c946d990c7f': {}}
on the plus side, the improvement in detail from the responses is almost magical
oh wait, I think I see the issue
we're returning:
return [
NodeWithScore(node=TextNode(text=t)) for t in compressed_prompt_txt_list
]
seems to me that would do it π
hmmm, the question is: how do you correlate the proper metadata back after it's compressed? I guess that's why it wasn't done originally... probably a "handle this another day" sorta issue
hmmm yea, a tricky problem π
Once it compresses, there's no way to associate back to an original source
haha, well, good to know that we came to the same conclusion at least
the real problem is that LLMLingua also reorders the context chunks to improve compressibility (and is part of the secret sauce for information retainment downstream)
there should be the same number of chunks out as in, though... so I guess if you don't care if the metadata matches up chunk to chunk, then that's fine
I wonder, though, if it would be better to make a "metadata batch" so to speak...
"These 3 documents are were batched together and transformed, here's the metadata for those original 3 documents"
or maybe a better way to say it is "This textnode was synthesized from some or all of these 3 documents --- here is the corresponding metadata for those 3 documents"
Whew... so the LLMLingua module is pretty dang amazing... it virtually eliminates lost in the middle problems
right now it's only being used as a postprocessor for document retrieval, but I think it would also be effective for the refine step in the SubQuestionQueryEngine
I'm blown away right now... even a 47x compression yields great info retrieval from the context... :mindexplosion:
wow that's kind of wild lol
I should look into trying that module again... maybe we can make it better π
lol, it totally is... when it compresses down that much, the majority of the compressed output is just gobbledygook, but apparently the LLM is able to make sense of it
My retrieval stack is splitting documents using the sentence splitter stored in elasticsearch with bge-small for embeddings, then I retrieve the top 30 (lol) using hybrid search and then expand the window by 3 sentences, then re rank with bge-reranker-large to pick the top 5, then use Llm lingua to reduce it to a 100 token target... It's, amazingly, able to compress all the relevant info down to 100ish tokens while throwing out irrelevant stuff that remains after reranking...
All three of the models involved in that retrieval and reranking fit comfortably on a RTX 3060 with room to spare
In the Contribution Guide, I noticed that there is the following section
however, that link is dead
that seems like an ideal place to put LLM Lingua, though
oh, wow... you were quick to comment on my PR
I was sitting here trying to figure out how to test it π
tbh I'm a little lost on how to mock the embeddings so that we can get a valid test
my "test" so far has been a directory of 380 medical pdfs that it handled well π
yup! I'm doing a bit of refactoring now
yeah... tried it out with LLM Lingua in mind... I'm honestly blown away
This is the result of a chatbot I'm building with llama index ... the LLM is Mistral 7b Instruct
all of that information is correct, and the references that it plopped down in its response are actual real references... correctly notated in its response
the "Context Sources" are plucked from the metadata, multiple of which were originally 4000+ tokens compressed down to 100 ~ 150 with LLM Lingua
a big problem I was having using small to big was like, if the window was too small it would leave out lots of details, if the window was too big, it would hallucinate because there might be irrelevant data in there
splitting the document semantically tends to group all the related text together... sometimes a single paragraph, sometimes multiple related paragraphs
lol at the lotr quotes in the mock embedding
awesome, got some a couple basic tests written
that being said... make test didn't seem to pick them up
I had to pytest /path/to/test.py
to get it to run
I feel like I'm missing something to register the test...
tests/multi_modal_llms/test_replicate_multi_modal.py . [ 49%]
tests/node_parser/test_html.py ..... [ 50%]
tests/node_parser/test_json.py ..... [ 51%]
tests/node_parser/test_markdown.py .... [ 51%]
tests/node_parser/test_markdown_element.py ... [ 52%]
tests/node_parser/test_unstructured.py s [ 52%]
tests/objects/test_base.py ... [ 52%]
tests/objects/test_node_mapping.py .... [ 53%]
oh, I'm a dummy... I didn't prefix the filename with test_
Yea that will do it! π Awesome we got tests now πͺ