Sayan

LlamaIndex

Log inLog into community

Find answers from the community

Sayan

S

Sayan

Offline, last seen 6 months ago

Joined September 25, 2024

·

Reramker

Hybrid Search and Re-ranking:

Hello team, I'm planning to implement Qdrant Hybrid Search: https://docs.llamaindex.ai/en/stable/examples/vector_stores/qdrant_hybrid.html

At the moment, I'm using Cohere for re-ranking at the final stage. However, the document mentions, "A fusion algorithm is applied to rank and order the nodes from different vector spaces (relative score fusion in this case)." Does this mean that the search already includes a built-in re-ranker, and therefore, I wouldn't need to use Cohere if I opt for this?

6 comments

L

S

·

I am currently using LlamaIndex 0.10.5.

I am currently using LlamaIndex 0.10.5.

This does not work: llm = Anthropic(model="claude-3-opus-20240229")

Error:

Plain Text

Unexpected err=ValueError('Unknown model: claude-3-opus-20240229. Please provide a valid Anthropic model name.Known models are: claude-instant-1, claude-instant-1.2, claude-2, claude-2.0, claude-2.1'), type(err)=<class 'ValueError'>

However, the documentation here shows usage of that model: https://docs.llamaindex.ai/en/stable/examples/llm/anthropic.html

Is this because I am using an older version of LlamaIndex? Any way to get around this without updating LlamaIndex?

2 comments

S

W

·

I have some code running on a server

I have some code running on a server that's designed to index documents into an existing Qdrant collection. Here's the code:

Plain Text

client = qdrant_client.QdrantClient(host=QDRANT_HOST,
                                    grpc_port=QDRANT_GRPC_PORT,
                                    prefer_grpc=True,
                                    api_key=QDRANT_API_KEY)
vector_store = QdrantVectorStore(client=client,
                                  collection_name=collection_name,
                                  batch_size=20)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store,
                                           service_context=service_context)
for document in documents:
    try:
        log.info(f"SETUP: Source {source_id} Updating document")
        index.update_ref_doc(
            document,
            update_kwargs={
                "delete_kwargs": {
                    "delete_from_docstore": True
                }
            },
        )
    except Exception as err:
        log.info(f"SETUP: Source {source_id} Error: {err}")
        log.info(f"SETUP: Source {source_id} Update failed, trying insert")
        index.insert(document)

This code performs well when processing documents one at a time. However, it encounters issues under multiple concurrent requests. Some requests fail with the error: "UNKNOWN:Error received from peer {grpc_message:"Wrong input: Collection 166850 already exists!", grpc_status:3. Consequently, both index.update_ref_doc in the try block and index.insert(document) in the exception handler block fail.

Can anyone offer some advice on this? Is Qdrant not capable of handling concurrent insertions?

3 comments

L

S

·

Cohere

I have a question regarding Cohere Reranking based on the following - https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/CohereRerank.html

If Cohere is set as a node postprocessor as shown here, and there's a Cohere API outage (or say we have reached a rate limit or something like that), will this code still execute without any re-ranking, or will it crash?

4 comments

L

S

·

Does the order of postprocessors matter?

Does the order of postprocessors matter?

i.e.

Plain Text

node_postprocessors=[
            MetadataReplacementPostProcessor(target_metadata_key="window"),
            cohere_rerank
 ]

vs

Plain Text

node_postprocessors=[
            cohere_rerank,
            MetadataReplacementPostProcessor(target_metadata_key="window")
]

would it make a difference?

4 comments

L

S

W

·

Hi,

Hi,

I'm currently using index.as_chat_engine() alongside Qdrant as a vector store.

I have a scenario where, if a vector lookup fails to return relevant search results, I need to send an error back to the client code interacting with this system.

By default, as_query_engine() / as_chat_engine() would still invoke the LLM without any context, from my understanding. How can I change this behaviour to return an error when no matches are found in the vector store?

4 comments

S

W

·

Thought of giving Gemini a go. Followed

Thought of giving Gemini a go. Followed these instructions - https://docs.llamaindex.ai/en/stable/examples/multi_modal/gemini.html,

but pip install llama-index-llms-gemini is throwing this error:

Plain Text

ERROR: Could not find a version that satisfies the requirement llama-index-llms-gemini (from versions: none)
ERROR: No matching distribution found for llama-index-llms-gemini

Is this only available for certain Python versions?

5 comments

L

S

W

·

I currently use a system that processes

I currently use a system that processes JSON, Markdown, PDF, HTML, and DOCX files, storing them in a Qdrant vector database. The database is then queried in a separate session.

At the moment, I employ the following Node Parser for all file types:

Plain Text

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

However, I've discovered that LlamaIndex offers specialized node parsers for JSON, Markdown, and HTML. Consequently, I plan to switch to MarkdownNodeParser, JSONNodeParser, and HTMLNodeParser for those respective formats, while continuing to use SentenceWindowNodeParser for PDF and DOCX files.

I have two questions:

Do you foresee any issues with this approach?
My query code is as follows:

Plain Text

service_context = ServiceContext.from_defaults(llm=llm,
                                               node_parser=node_parser,
                                               embed_model=embed_model)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store,
                                           service_context=service_context)
chat_engine = index.as_chat_engine(
    similarity_top_k=2,
    # the target key defaults to `window` to match the node_parser's default
    node_postprocessors=[
        MetadataReplacementPostProcessor(target_metadata_key="window")
    ],
    vector_store_kwargs={"qdrant_filters": filters})

This setup is specifically tailored to the SentenceWindowNodeParser due to the node_parser parameter in ServiceContext.from_defaults and the node_postprocessors configuration in index.as_chat_engine():

Plain Text

node_postprocessors=[
    MetadataReplacementPostProcessor(target_metadata_key="window")
],

Is there a way to make the query code Node Parser agnostic?

7 comments

S

L

·

I run a Python server for data ingestion

I run a Python server for data ingestion and handling queries. The server handles multiple concurrent document ingestion requests efficiently when using:

embed_model = OpenAIEmbedding(embed_batch_size=50)

However, when I switch to using:

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

it can only process one request at a time, leaving the others stuck indefinitely.

Does anyone have advice on how to effectively use HuggingFace embeddings with LlamaIndex on a server that receives a high volume of concurrent requests?

11 comments

L

S

d

·

Window

I'm encountering token limit errors with OpenAI when processing very large PDFs.

Here's my code (just the relevant snippets):

Plain Text

MODEL = "gpt-4-1106-preview"
EMBED_MODEL = "text-embedding-3-large"
llm = OpenAI(model=MODEL, temperature=0.1)
node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
   original_text_metadata_key="original_text",
)
embed_model = OpenAIEmbedding()
client = qdrant_client.QdrantClient(QDRANT_URL, api_key=QDRANT_API_KEY)

pdf_reader = SimpleDirectoryReader(input_files=pdf_files)
documents = pdf_reader.load_data()
vector_store = QdrantVectorStore(client=client,            collection_name=collection_name,                  batch_size=20)
service_context = ServiceContext.from_defaults(llm=llm,                      node_parser=node_parser,                          embed_model=embed_model)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store,                         service_context=service_context)
refreshed_docs = index.refresh_ref_docs(documents)

And here's the error I'm getting:

Plain Text

WARNING - Retrying llama_index.embeddings.openai.get_embeddings in 1.6310027891256675 seconds as it raised BadRequestError: Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, however you requested 8212 tokens (8212 in your prompt; 0 for the completion). Please reduce your prompt or completion length.", 'type': 'invalid_request_error', 'param': None, 'code': None}}.

I've already tried using the newer OpenAI embedding model text-embedding-3-large and experimented with different values for embed_batch_size (10, 50, 100), but nothing has worked.

Does anyone have any suggestions?

9 comments

S

L

·

Refreshing the Index Results in a Python

Refreshing the Index Results in a Python Error

I am adhering to the instructions provided here: https://docs.llamaindex.ai/en/latest/module_guides/indexing/document_management.html#refresh

Plain Text

index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
refreshed_docs = index.refresh_ref_docs(
    documents,
    update_kwargs={"delete_kwargs": {
        "delete_from_docstore": True
    }}
)

Error:

ERROR - Unexpected err=TypeError("delete_ref_doc() got multiple values for keyword argument 'delete_from_docstore'"), type(err)=<class 'TypeError'>

I am using the latest version of LlamaIndex. Can anyone please help?

7 comments

L

S

m

·

I'm trying to set up a system that can

I'm trying to set up a system that can store data in a vector store and can subsequently perform both Q&A and summarization on the stored data. Could someone point me to any documentation, please?

I came across this - https://docs.llamaindex.ai/en/stable/examples/query_engine/JointQASummary.html, but it's not clear how to implement this with a vector store.

1 comment

L

·

Payload - Qdrant

I'm currently using LlamaIndex with Qdrant as the vector database.

When adding metadata to nodes via LlamaIndex like this:

Plain Text

document.metadata = {
    "source_id": source_id,
    "document_name": document_name
}

and retrieving it with:

Plain Text

retriever = index.as_retriever(...)
retrieved_nodes = retriever.retrieve(query)

I can access the added metadata through retrieved_nodes[0].metadata.

However, when I add metadata using Qdrant's Python SDK (https://qdrant.tech/documentation/concepts/payload/#:~:text=%7D-,You%20don%E2%80%99t%20need%20to%20know%20the%20ids%20of%20the%20points%20you%20want%20to%20modify.%20The%20alternative%20is%20to%20use%20filters.,-http), the metadata isn't returned by LlamaIndex's retrieval process, even though it's visible on the Qdrant UI.

What does LlamaIndex do differently that allows the metadata to be returned upon retrieval?

14 comments

h

L

S

W

·

I'm interested in using

I'm interested in using AutoMergingRetriever but I need it to function as a chat engine and to stream chat responses. My existing chat engine code is as follows:

Plain Text

chat_engine = index.as_chat_engine(
        similarity_top_k=similarity_top_k,
        node_postprocessors=node_postprocessors,
        vector_store_kwargs={"qdrant_filters": filters})

I'm unsure how to integrate AutoMergingRetriever with the chat functionality. The documentation (https://docs.llamaindex.ai/en/latest/examples/retrievers/auto_merging_retriever.html) suggests using RetrieverQueryEngine, but that would only provide me with a query engine. How can I get a chat engine?

4 comments

L

S

·

Delete

Hi,

I want to evaluate this retriever within the scope of our application - https://docs.llamaindex.ai/en/stable/examples/retrievers/auto_merging_retriever.html.

Currently, we're utilising Qdrant as our vector store, handling inserts, updates, and deletes.

For this particular retriever, we're considering using Redis as a document store while continuing with Qdrant for managing the vector store at the leaf-level nodes. This means we'll need to perform inserts, updates, and deletes on both the vector store and the document store.

Could someone please show the standard way to achieve this in LlamaIndex? The documentation only discusses inserts.

Thanks!

1 comment

L

·

OpenAI has a "System Role" set by

OpenAI has a "System Role" set by default to "You are a helpful assistant." If I directly use the OpenAI API or SDK, I can customize it. When using LlamaIndex with OpenAI as the LLM, is there a way to customize it?

2 comments

S

W

·

**LlamaIndex + Qdrant: Keep getting this

LlamaIndex + Qdrant: Keep getting this error "The write operation timed out"

I'm unsure whether this question pertains to LlamaIndex or Qdrant. If it relates to Qdrant, please inform me so I can reach out to their customer support.

I've been utilizing LlamaIndex alongside Qdrant, hosted on Qdrant's cloud. Initially, I was on their free tier cluster but have since upgraded to a paid, production-grade cluster. However, I'm still encountering the issue described below.

I'm working with a large JSON file (417 kB) and here are the steps I've taken to index it:

Plain Text

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
   original_text_metadata_key="original_text",
)
embed_model = OpenAIEmbedding(embed_batch_size=100)
client = qdrant_client.QdrantClient(QDRANT_URL, api_key=QDRANT_API_KEY)

loader = JsonDataReader()
document = loader.load_data(json_string)
vector_store = QdrantVectorStore(client=client,                      collection_name=collection_name)
service_context = ServiceContext.from_defaults(llm=llm,                                      node_parser=node_parser,
                                             embed_model=embed_model)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(documents,                                    storage_context=storage_context,                                      service_context=service_context)

I keep encountering this error:

Plain Text

ERROR - Unexpected err=ResponseHandlingException(WriteTimeout('The write operation timed out')), type(err)=<class 'qdrant_client.http.exceptions.ResponseHandlingException'>

I've experimented with adjusting this value:

Plain Text

embed_model = OpenAIEmbedding(embed_batch_size=100)

I tried the default, 50, 100, and 200. It successfully worked once with 100, but every other attempt resulted in either a write timeout or a read timeout error.

4 comments

L

S

·

Chat

Chat Engine is stateful: https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/root.html

What is the idiomatic way to implement it in a setup where the code runs on multiple servers, possibly behind a load balancer? In this setup, two consecutive chat questions may not reach the same server, so question 2 won't have the state from question 1 and answer 1.

3 comments

S

L

·

For this one:

For this one:

node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)

Is there a "best practice" for an "ideal" value of "window_size"? As per the doc, the default is 5 and the example uses 3 - https://docs.llamaindex.ai/en/stable/examples/node_postprocessor/MetadataReplacementDemo.html

What factors should one consider to arrive at this number?

1 comment

L

·

Embeddings

I am using the EmbeddedTablesUnstructuredRetrieverPack like this

EmbeddedTablesUnstructuredRetrieverPack = download_llama_pack(
"EmbeddedTablesUnstructuredRetrieverPack",
"./embedded_tables_unstructured_pack",
)

and then

embedded_tables_unstructured_pack = EmbeddedTablesUnstructuredRetrieverPack(
"quarterly-nvidia/quarterly-nvidia.html",
nodes_save_path="quarterly-nvidia.pkl")

However, I get this message: "Embeddings have been explicitly disabled. Using MockEmbedding.".

How to fix this?

8 comments

N

L