TextNode
, and at the same time, adds the page number and image path of the image as metadata.MarkdownElementNodeParser
which separates texts and tables into IndexNode
and BaseNode
. Similarly I will like to add page number and image path into these nodes' metadata. But the sequence of the nodes are already jumbled up from line 2 onwards. So how can I still add the page number and image path in them? Thanks[1] node_parser = MarkdownElementNodeParser(llm=llm) [2] nodes = node_parser.get_nodes_from_documents([document]) [3] base_nodes, objects = node_parser.get_nodes_and_objects(nodes)
FunctionCallingAgentWorker
. The first set of tool includes 3 QueryEngineTool
and the second set includes 1 FunctionTool
(pydantic base model, does simple addition on any numbers of input floats).Document
from a list of TextNode
? LlamaParse
, where I break them down into a list of nodes using MarkdownNodeParser
, utilising node.metadata['Header_1]
as a way of filtering those nodes by the md headers from my document, and do text amendment.llama-index-core
, node.metadata
dictionary is missing the Header_1
. What I do now is manually add them back, but I'm stuck with a list of updated TextNode
, not knowing how to convert them into a Document
.node_parser = MarkdownElementNodeParser(num_workers=8, show_progress=False) nodes = node_parser.get_nodes_from_documents([document]) base_nodes, objects = node_parser.get_nodes_and_objects(nodes) index = VectorStoreIndex(nodes=base_nodes+objects) recursive_query_engine = index.as_query_engine(similarity_top_k=3,node_postprocessor[FlagEmbeddingReranker(top_n=2, model=RERANKER_MODEL)], verbose=False)
query_engine
or do we have to calculate ourselves? I am using OpenAI LLMsReActAgent.from_tools
has a output_parser
argument however I had no luck with it. recursive_query_engine = recursive_index.as_query_engine( similarity_top_k=5, node_postprocessors=[reranker], verbose=True ) class DocumentTypeResponse(BaseModel): """Data model for the document type""" document_type: str document_type_identifier = QueryEngineTool( query_engine=recursive_query_engine, metadata=ToolMetadata( name='document_type', description=( "Only use this tool when required. " "Answer to question relating to document type. " "Identify document types as either purchase order or invoice. " ), fn_schema=DocumentTypeResponse ) ) context_document_agent = """\ You are an expert administraive assistant who specialized in answering questions about document. The questions mainly revolves around extracting key information from either an invoice or purchase order. Only use the necessary tool to answer the questions. Only use more tools when needed. """ document_agent = ReActAgent.from_tools( tools=document_type_identifier, verbose=True, context=context_document_agent, )
Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
TypeError: BFloat16 is not supported on MPS and ImportError:
# set up llm using HuggingFaceLLM import torch from llama_index.llms.huggingface import HuggingFaceLLM from transformers import BitsAndBytesConfig quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, ) llm = HuggingFaceLLM( model_name=model_name, model_kwargs={ "token": hf_token, "torch_dtype": torch.bfloat16, "quantization_config": quantization_config }, generate_kwargs={ "do_sample": True, "temperature": 0, "top_p": 0.9, }, tokenizer_name=model_name, tokenizer_kwargs={"token": hf_token}, stopping_ids=stopping_ids, )
# A - text for A # B - text for b
from llama_index.core.node_parser import MarkdownNodeParser parser = MarkdownNodeParser() for page in document: page_nodes = parser.get_nodes_from_documents([page]) for node in page_nodes: if node.metadata['Header_1'] == 'B': node.text = 'New text for B' print(node.text) # text is changed print(page.text) # text is not changed!
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) Cell In[20], line 1 ----> 1 response_header = header_query_engine.query(new_header_query) File <hidden_path>\.venv\lib\site-packages\llama_index\core\instrumentation\dispatcher.py:260, in Dispatcher.span.<locals>.wrapper(func, instance, args, kwargs) 252 self.span_enter( 253 id_=id_, 254 bound_args=bound_args, (...) 257 tags=tags, 258 ) 259 try: --> 260 result = func(*args, **kwargs) 261 except BaseException as e: 262 self.event(SpanDropEvent(span_id=id_, err_str=str(e))) File <hidden_path>\.venv\lib\site-packages\llama_index\core\base\base_query_engine.py:52, in BaseQueryEngine.query(self, str_or_query_bundle) 50 if isinstance(str_or_query_bundle, str): 51 str_or_query_bundle = QueryBundle(str_or_query_bundle) ---> 52 query_result = self._query(str_or_query_bundle) 53 dispatcher.event( 54 QueryEndEvent(query=str_or_query_bundle, response=query_result) 55 ) 56 return query_result ... --> 302 content = content_template.format(**relevant_kwargs) 304 message: ChatMessage = message_template.copy() 305 message.content = content KeyError: "' Item No"
MarkdownNodeParser
to amend some texts in one of the nodes. Now I want to convert it back to a Document, how can I do that?# create a set of tools on top of the query engine query_engine_tools = [ QueryEngineTool( query_engine=recursive_query_engine, metadata=ToolMetadata( name = 'document_type_finder', description = ( ''' This tool finds the type of document and outputs the answer. This tool will look for obvious features in the document such as words like "invoice" or "tax invoice" for invoice and words like "purchase order", "sales order" for purchase order. Finally, this tool only outputs answer as "invoice" or "purchase order" ''' ) ) ), QueryEngineTool( query_engine=recursive_query_engine, metadata=ToolMetadata( name = 'document_origin_finder', description = ( ''' This tool finds the country of origin of the document This finder will first identify the country of origin of the document sender by looking into its company name, address, or any other relevant information to derive the origin of country. Finally, this tool format the answer following to ISO 3166-1 alpha-3 and outputs it. ''' ) ) ) ] # create a RAG ReAct QueryEngineTool agent agent = ReActAgent.from_tools(tools=query_engine_tools, llm=llm, verbose=True) response = agent.chat( ''' What is the document type and origin? Use any of the tools provided to you. ''' )
# output >>>Thought: The user has not provided the document text for analysis. I need to ask for the document text to proceed with the analysis using the appropriate tool. Answer: Could you please provide the text of the document you would like me to analyze for its type and origin?