harshit_alpha

·

Doc source

Hi guys, I am using llamaindex to index my document. And i want to create the document object myself, using the following way:

Plain Text

from llama_index import Document
text_list = [text1, text2, ...]
documents = [Document(t) for t in text_list]

Is there a way, that i can get which node is being used when i am quering this document with question? My main objective is to get the page number of answer as well and when i am creating those text1, text2, ... i will store page number as well.

8 comments

L

h

hharshit_alpha

·

Tables plus text

Hey community members
I need some help from you guys. I am trying to create a bot for financial documents.

def ask(file):
print(" Loading...")
PDFReader = download_loader("PDFReader")
loader = PDFReader()
documents = loader.load_data(file=Path(file))
print("Path: ", Path(file))

# Check if the index file exists
if os.path.exists(INDEX_FILE):
# Load the index from the file
logger.info("found index.json in the directory")
index = GPTSimpleVectorIndex.load_from_disk(INDEX_FILE)
else:
logger.info("didnt find index.json in the directory")
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, chunk_size_limit=1024)
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

# Save the index to the file
index.save_to_disk(INDEX_FILE)

Above is my code snippet for generating index for a pdf. I have used PDFReader from llamahub to extract texts from the pdf. The bot answers well when asked about the text. But it fails when I ask the value from the table present in the pdf.

I tried using different open-ai text models. The best one being text-davinci-003. The bot is not able to answer me about the values present in the tables in the pdf. This is because the pdfReader simply just converts the content of pdf to text (it doesnot take any special steps to convert the table content). I want to know how can i sucessfully index both text and the tables in the pdf using langchain and llamaindex.

11 comments

L

h

J

Find answers from the community

Doc source

Tables plus text