Can someone point me to an example of

Can someone point me to an example of querying a PDF document? Thanks.

17 comments

https://docs.llamaindex.ai/en/stable/getting_started/starter_example.html#load-data-and-build-an-index

In this tutorial data represents the folder name, if you just want to pass a single file:
documents = SimpleDirectoryReader(input_files=["filepath1").load_data()

ddigital_dream64

What do we do if we want to get the page number along with the result?

WWhiteFang_Jr

You get the page number and file name in the metadata of the documents.

Once you get the documents, you can iterate over them and check and also if you want to add more info you can do that too.

Plain Text

for doc in documents:
       print(doc.metadata)
       doc.metadata["new_key"] = "new_value"

once you query your data , you get the source nodes in your response object.

print(response.source_nodes) in that you will get the above created metadata

ddigital_dream64

Thank you

DDavidL

@WhiteFang_Jr Does this work on PDF files? For me, it works for plain text files. But when I use PDF files as input, the query has the following error message:

ConnectError: [WinError 10061] No connection could be made because the target machine actively refused it

WWhiteFang_Jr

🤔 are you using custom setup? Because this error is definitely because some request was not being fulfilled.

Can you share your code if possible?

DDavidL

Yes, please see below:

from llama_index.core import Settings
from llama_index.core import download_loader
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

from dotenv import load_dotenv
load_dotenv()

#Settings.llm = Ollama(model="llama2", request_timeout=3000.0)
Settings.llm = Ollama(model="mistral", request_timeout=3000.0)
#Settings.llm = Ollama(model="starling-lm", request_timeout=3000.0)
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents =SimpleDirectoryReader("./data").load_data()

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

question="What is a PID controller?"
response = query_engine.query(question)
print(response)

WWhiteFang_Jr

Is your ollama model instance running?

DDavidL

Probably not.... How do I run the ollama model instance?

WWhiteFang_Jr

Check the readme for setting up ollama instance: https://docs.llamaindex.ai/en/stable/examples/llm/ollama.html

WWhiteFang_Jr

You need to deploy the model first them you'll be able to use it. That is why you are getting this error as model is not active

DDavidL

You are right. When you said that one has to deploy the model first, what does that entail? Does that mean issue the command like "ollama pull mistral"? @WhiteFang_Jr

WWhiteFang_Jr

So Ollama helps to host opensource model and you can interact with them using their library.

DDavidL

Thank you...

WWhiteFang_Jr

In the first line , it mentions to read the readme for hosting the LLM

DDavidL

Ollama's Readme file explains how to run Ollama by itself. It does not explain how to use it with other tools. In other words, it explains how to use CLI to chat with LLM via Ollama.

WWhiteFang_Jr

No no, see here the first requirement is to deploy the model by following the readme
Then follow the doc: https://docs.llamaindex.ai/en/stable/examples/llm/ollama.html

To interact with the hosted LLM

Attachment

Add a reply

Find answers from the community

Can someone point me to an example of