kristian

GitHub - HKUDS/LightRAG: "LightRAG: Simp...

Is anyone aware of a llamaindex compatible version of LightRAG? https://github.com/HKUDS/LightRAG or is someone already working on this?

2 comments

kkristian

Retrieving the Latest Metadata from Documents

Hello! When having a metadata tag like {"year": 2014} is there a way to only get the latest (i.e. max year) from the documents? Could this be achieved with reranking? I imagine this to be a common problem that many similar nodes are in the vector store but ideally only the most recent ones should be used. Or maybe this is best done in regular python after getting the top n nodes? Any ideas or hints appreciated 🙂

4 comments

kkristian

Unexpected behaviour when no match is found using the llamacloudindex

I'm using the LlamaCloudIndex with similarity_top_k=3, I noticed that when no match is found the source_nodes contain one chunk only and that chunk is the whole document. Is that expected behaviour? I'm surprised as I thought there would always be source nodes as we're looking at top similarities, i.e. it would return the most similar nodes even though they might not be similar at all.

5 comments

kkristian

Logging the actual prompt sent to the llm

Hello! Is there a way to log the actual prompt that was sent to the LLM? I.e. including chunks from the vector store etc?

2 comments

kkristian

Preserving Page Numbers When Parsing Pdf To Text

Hello! I'm using llamaparse to parse a big PDF file, I do have the page numbers added to every page. The resulting file is now a txt/md file. If I load this file with the SimpleDirectoryReader I lose the information of which page number the text is coming from (it's obviously not part of the meta data anymore as the file does not have pages like a pdf anymore), if I'm lucky the page number is in the source node but that's not reliable enough.
How should I handle this?

4 comments

kkristian

Hello! I am playing around with llama-

Hello! I am playing around with llama-index and pinecone as a vector db. It seems like there's no control over the id when ingesting data to pinecone:

Plain Text

llama_doc = Document(id_="f1",text="My text")
index = VectorStoreIndex.from_documents([llama_doc], storage_context=storage_context)

Pinecone will have a different internal id. The reason why I'm asking is that there seems to be no way of deleting docs on pinecone with metadata filtering - you have to use the ID. And I don't seem to be able to get the pinecone assigned ID after ingestion either.

18 comments

Find answers from the community

GitHub - HKUDS/LightRAG: "LightRAG: Simp...

Retrieving the Latest Metadata from Documents

Unexpected behaviour when no match is found using the llamacloudindex

Logging the actual prompt sent to the llm

Preserving Page Numbers When Parsing Pdf To Text

Hello! I am playing around with llama-