lucawang

Crash

I'm trying to use BM25Retriever w/ Chroma and this is the documentation that I follow: https://docs.llamaindex.ai/en/stable/examples/retrievers/bm25_retriever/#hybrid-retriever-with-bm25-chroma
When I try to execute this line: index = VectorStoreIndex(nodes=nodes, storage_context=storage_context), my jupyter lab kernel always dies and restarts.
I'm only using a very small PDF so there shouldn't be memory overflow issues.
Do you have any ideas of what's going on? Thanks!

6 comments

llucawang_nfls

[Question]: How to print the final promp...

Hi, I'm trying to get the final processed prompt being sent to LLM. I searched online and noticed a related post answered by @Logan M : https://github.com/run-llama/llama_index/issues/13310
The code provided is:
from typing import Dict, List

from llama_index.core.instrumentation.events.llm import (
LLMChatEndEvent,
LLMChatStartEvent,
LLMChatInProgressEvent,
)

class ExampleEventHandler(BaseEventHandler):
events: List[BaseEvent] = []

@classmethod
def classname(cls) -> str: """Class name.""" return "ExampleEventHandler" def handle(self) -> None: """Logic for handling event.""" print("-----------------------") # all events have these attributes print(event.id)
print(event.timestamp)
print(event.span_id)

# event specific attributes
if isinstance(event, LLMChatStartEvent):
# initial
print(event.messages)
print(event.additional_kwargs)
print(event.model_dict)
elif isinstance(event, LLMChatInProgressEvent):
# streaming
print(event.response.delta)
elif isinstance(event, LLMChatEndEvent):
# final response
print(event.response)

self.events.append(event)
print("-----------------------")

import llama_index.core.instrumentation as instrument

dispatcher = instrument.get_dispatcher(name)
dispatcher.add_event_handler(ExampleEventHandler())

However, I'm confused about how to incorporate this code into mine. Basically I have a vector store index built from nodes, and I'm using it as query engine to ask questions.

from llama_index.core import VectorStoreIndex

recursive_index = VectorStoreIndex(nodes=base_nodes + objects)
recursive_query_engine = recursive_index.as_query_engine(
similarity_top_k=5,
verbose=True,
response_mode="compact"
)

Thanks for your help!

11 comments

llucawang_nfls

Python pdf parsing random seed

I'm using Python to parse PDF documents with a parsing instruction. I noticed that the parsing results are different every time I run my code. Is there a way to set random seed for the parsing process?

2 comments

llucawang_nfls

Extracting plots from academic publications using LlameParse

Hi folks, I'm trying to use LlameParse to extract plots from academic publications. While the parser is able to extract obvious pictures, it cannot extract plots (see examples). I also added prompt (see below) to optimize the performance, but currently has no luck. Suggestions & insights are greatly appreciated. Thanks!

Prompt:
ins = """
You are a highly proficient language model designed to convert pages from PDF into structured markdown text. Your goal is to accurately transcribe text, identify and describe images, particularly graphs and other graphical elements.

You have been tasked with creating a markdown copy of each page from the provided PDF image. Each image description must include a full description of the content, a summary of the graphical object.

Maintain the sequence of all the elements.

For the following element, follow the requirement of extraction:
for Text:

Extract all readable text from the page.
Exclude any diagonal text, headers, and footers.

for Text which includes hyperlink:
-Extract hyperlink and present it with the text

for Image Identification and Description:

Identify all images, graphs, and other graphical elements on the page.
If image contains wording that is hard to extract , flag it with <unidentifiable section> instead of parsing.
For each image, include a full description of the content in the alt text, followed by a brief summary of the graphical object.
If the image has a subtitle or caption, include it in the description.
If the image has a organisation chart , convert it into a hierachical understandable format.
for graph , extract the value in table form as markdown representation

OUTPUT INSTRUCTIONS

Exclude any diagonal text, headers, and footers from the output.
For each image and graph, provide a detailed description and summary.

"""

14 comments

Find answers from the community

Crash

[Question]: How to print the final promp...

Python pdf parsing random seed

Extracting plots from academic publications using LlameParse