Joshhhh

I'm using `reranker =

I'm using reranker = SentenceTransformerRerank(top_n=5, model="BAAI/bge-reranker-base"), as a global variable in my chat engine.

Now when I try to deploy my app via Render, it fails: Ran out of memory (used over 2GB) while running your code.

What are some best practices for storing / deploying models?

11 comments

JJoshhhh

How can I do parallel processing on

How can I do parallel processing on IngestionPipelines?

My conversations have as many as 200 documents with as many as 800 pages, so I need to preprocess data before my customers can start a conversation.

I’ve scoured the docs/code, but haven’t found a way to run multiple pipeline calls at once. I’m currently using asyncio.gather on documents and then pages, call pipeline.arun for each page, but my results still appear to be sequential…

Plain Text

Processed 6 documents in 130.94 seconds
Total number of pages processed: 6
Average time per document: 21.82 seconds
Average time per page: 21.50 seconds
Doc 4 took 16.62 seconds
  Page 1 took 14.89 seconds
Doc 2 took 39.05 seconds
  Page 1 took 38.35 seconds
Doc 6 took 38.55 seconds
  Page 1 took 37.80 seconds
Doc 5 took 93.89 seconds
  Page 1 took 85.75 seconds
Doc 1 took 129.99 seconds
  Page 1 took 128.76 seconds
Doc 3 took 130.94 seconds
  Page 1 took 129.01 seconds

If this test conversation of 6 docs / 6 pages (all small text) took ~20 seconds per page, then the entire job should take ~20 seconds, right? Any recs on how to make this work?

23 comments

JJoshhhh

I’m manually rebuilding an index from my

I’m manually rebuilding an index from my vector_store and it’s breaking a few things, although the chat engine works fine. I think it has to do with my Metadata Handling.

Issues:

My query engine is not honoring the node_text_template. In image_1, the node is properly formatted, and metadata keys are excluded as expected. In image_2, they’re not, even though the node._node_content.text_template is explicitly "text_template": "[Excerpt from document]\n{metadata_str}\nExcerpt:\n-----\n{content}\n-----". This means I'm sending the LLMs junk that could mislead it.

I’m getting chat responses and no errors, but my Citations aren’t showing up. When inspecting sub_question_answer_pair.sources between image_1 and image_2, the only difference is the former seems to be missing _node_content

Pasting relevant code snippets in thread. Appreciate any help here 🙏

26 comments

JJoshhhh

I don't think `RagDatasetGenerator` is

I don't think RagDatasetGenerator is respecting ServiceContext's chunk size/overlap.

Getting this error:

Plain Text

File "/Users/joshuasabol/Library/Caches/pypoetry/virtualenvs/llama-app-backend-CfJQzey9-py3.11/lib/python3.11/site-packages/llama_index/llama_dataset/generator.py", line 105, in from_documents
    nodes = run_transformations(
...
File "/Users/joshuasabol/Library/Caches/pypoetry/virtualenvs/llama-app-backend-CfJQzey9-py3.11/lib/python3.11/site-packages/llama_index/node_parser/text/sentence.py", line 147, in split_text_metadata_aware
    raise ValueError(
ValueError: Metadata length (1493) is longer than chunk size (1024). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.

Despite explicitly setting chunk params:

Plain Text

NODE_PARSER_CHUNK_SIZE = 3000
NODE_PARSER_CHUNK_OVERLAP = 200

# set context for llm provider
gpt_35_context = ServiceContext.from_defaults(
    llm=OpenAI(model="gpt-3.5-turbo-1106",
               temperature=0.1,
               chunk_size=NODE_PARSER_CHUNK_SIZE,
               chunk_overlap=NODE_PARSER_CHUNK_OVERLAP,
    )
)

# instantiate a DatasetGenerator
dataset_generator = RagDatasetGenerator.from_documents(
    parent_nodes,
    service_context=gpt_35_context,
    num_questions_per_chunk=2,  # set the number of questions per nodes
    show_progress=True,
)

2 comments

JJoshhhh

Anyone have success building an Auto-

Anyone have success building an Auto-Retriever like the one in the docs [1]?

I ran into 3 issues:

Error from FunctionTool's description being too long. This is odd since (a) the example vector_store_info token count is 168 and mine is only 244. As a result, I had to remove a bunch of metadata filters.

The LLM's function creation was quite bad. I'm using the OpenAI Function API to infer function parameters, so I'm passing vector_store_info in the function's description, and it keeps picking content_info as the filter key instead of the right MetadataInfo field (in this case, medical_provider).

Plain Text

User: What is the patient's history with Dr. Woods?
**************************************************
=== Calling Function ===
Calling function: fabc5870-ce73-4ee0-9d09-c2c1158b1bdd with args: {
  "query": "chief complaint",
  "filter_key_list": ["content_info"],
  "filter_value_list": ["Dr. Woods"],
  "filter_operator_list": ["=="],
  "filter_condition": "AND"
}

I'm realizing the filter operator I want is a sort of "text contains" or "approximately equal to" where the Auto-Retriever could retrieve "Dr Woods", "Woods", or "John Woods." Perhaps Auto-Retriever isn't a good use for this?

[1] https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent_query_cookbook.html#load-and-index-structured-data

1 comment

JJoshhhh

I'm a bit confused how metadata

I'm a bit confused how metadata extractors work:

Is the metadata just used for retrieval or is it sent to the LLM as well?
If the former, do we explicitly have to tell the index query engine to consider the metadata?

32 comments

JJoshhhh

Reset

How can I reset the values of docstore and index_store?

This approach isn't working:

Plain Text

# Create a new instance of SimpleDocumentStore to clear it --> DOES'T WORK
logger.info(f"Current docstore: {json.dumps(storage_context.docstore, indent=4, sort_keys=True, default=custom_serializer)}")
docstore = SimpleDocumentStore()
logger.info(f"New docstore: {json.dumps(storage_context.docstore, indent=4, sort_keys=True, default=custom_serializer)}")

# Create a new instance of SimpleIndexStore to clear it --> DOES'T WORK
logger.info(f"Current index_store: {json.dumps(storage_context.index_store, indent=4, sort_keys=True, default=custom_serializer)}")
index_store = SimpleIndexStore()
logger.info(f"New index_store: {json.dumps(storage_context.index_store, indent=4, sort_keys=True, default=custom_serializer)}")

# RESULT: The values of each are the same before and after resetting the instance

2 comments

JJoshhhh

Does anyone have any evaluation results

Does anyone have any evaluation results on Generative Language Semantic Retriever?

This is an awesome doc, but I'd love to see how it compares to a Base Retriever: https://docs.llamaindex.ai/en/latest/examples/managed/GoogleDemo.html#

3 comments

JJoshhhh

My original user input query is getting

My original user input query is getting transformed before it hits my SubQuestionQueryEngine and I’m stumped as to why. Does OpenAIAgent.from_tools transform the user’s input query_str?

For example:

User’s input question: List all of the insurance providers covering medical costs for the patient, including entities providing Letters of Protection (LOP)
Is being changed to: insurance providers
Which gets passed to SubQuestionQueryEngine, so it generates an incomplete question: What insurance providers does {patient} have?

Logs and code in thread 🧵. Appreciate any guidance here!

27 comments

JJoshhhh

But I have a bug where `index.as_

But I have a bug where index.as_retriever is retrieving nodes that are not in the index.

For example, I created an index that has 97 nodes, but when I do top_k=100, it returns 100 nodes.

Could use another set of eyes 🙏😅

Plain Text

Query:  What's the X-ray details?
Creating tool: xray
Index xray_f30dc4df-d3f1-4926-9820-31d5867d0aa1 has 97 nodes
Tool xray has the description: A set of XRAY medical record for patient **************, Date of Birth: **************, that were exported from the user's electronic medical record system.
Total Nodes count: 97
Total Nodes: {...}
Filtered Nodes count: 60
BM25 Nodes count: 60
BM25 Node 1 of 60...

base_retriever Index ID: xray_f30dc4df-d3f1-4926-9820-31d5867d0aa1
Base Nodes count: 100 # ---> HERE'S THE ISSUE
Base Node 1 of 100...

44 comments

JJoshhhh

Getting this weird error for `index.

Getting this weird error for index.insert_nodes(child_nodes) where child_node = TextNode(id_=str(uuid4()), text=formatted_text)

Plain Text

UniqueViolation: duplicate key value violates unique constraint "data_pg_vector_store_pkey"
DETAIL:  Key (id)=(93735) already exists.

I'm using pg_vector as my vector_store and have inserted many nodes in a similar fashion as the above. This is the first time I've seen this error

1 comment

JJoshhhh

Running evaluations, but getting

Running evaluations, but getting BadRequestError due to maximum context length is 8192 tokens....

The evals appear to be running against the entire context all at once instead of in chunks.

Specifically:

Plain Text

relevancy_result = judges["relevancy"].evaluate(
    query=example.query,
    response=prediction.response,
    contexts=prediction.contexts,
)

Runs against this example query/response/context combo:

Plain Text

Query tokens: 14
Response tokens: 141
Context tokens: 38395
Remaining tokens: -30358

8 comments

JJoshhhh

Can I use MetadataFilters with

Can I use MetadataFilters with ExactMatchFilter for multiple values for the same key? I want to filter my index on a set of documents

9 comments

Find answers from the community

I'm using `reranker =

How can I do parallel processing on

I’m manually rebuilding an index from my

I don't think `RagDatasetGenerator` is

Anyone have success building an Auto-

I'm a bit confused how metadata

Reset

Does anyone have any evaluation results

My original user input query is getting

But I have a bug where `index.as_

Getting this weird error for `index.

Running evaluations, but getting

Can I use MetadataFilters with