Doc source

hharshit_alpha

Hi guys, I am using llamaindex to index my document. And i want to create the document object myself, using the following way:

Plain Text

from llama_index import Document
text_list = [text1, text2, ...]
documents = [Document(t) for t in text_list]

Is there a way, that i can get which node is being used when i am quering this document with question? My main objective is to get the page number of answer as well and when i am creating those text1, text2, ... i will store page number as well.

8 comments

LLogan M

You can check the response object for the source nodes

response.source_nodes

Any info you add to the document extra_info dict will be inherited to the nodes as well 👍

hharshit_alpha

Thanks Logan, but is this the right approach?
I have a bunch of text paragraphs, and when i ask question i should get the answer, along with the page number from the document

LLogan M

As long as that info is set in the extra_info dict of the document, you will find it in the source nodes. Although this really only makes sense for vector indexes

Better citation support is planned for the future too 🫡

hharshit_alpha

So, now i was creating the nodes by myself.

Plain Text

# 1. Load in Documents
os.environ["OPENAI_API_KEY"] = <api key>
text1="sammple text 1"
text2="sammple text 2"
text3="sammple text 3"

#doc_id here represents page number
node1 = Node(text=text1, doc_id=1)
node2 = Node(text=text2, doc_id=2)
node3 = Node(text=text3, doc_id=3)
nodes=[node1,node2,node3]
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
index1 = GPTVectorStoreIndex(nodes, storage_context=storage_context)
index = GPTVectorStoreIndex([])
index.insert_nodes(nodes)
max_input_size = 4096
# set number of output tokens
num_outputs = 256
# set maximum chunk overlap
max_chunk_overlap = 20
# set chunk size limit
chunk_size_limit = 600
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_outputs))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

Now i am confused what should i do, like how can i get my final index created ? could you please help
i have used this link for reference https://github.com/jerryjliu/llama_index/blob/a873f7dc467990c25b8676511888441bc4c339d7/docs/guides/primer/usage_pattern.md?plain=1#L81

LLogan M

Hmm you've already created two "final" indexes using those nodes though? 👀

hharshit_alpha

yes, i am little confused

LLogan M

Yea the starter page is just giving examples of a few ways to do things

Normally the simplest pattern is something like this

Plain Text

documents = SimpleDirectoryReader("./data").load_data()

index = GPTVectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query("my query")

print(str(response))

LLogan M

From there, you can customize things like the service context, etc.

Add a reply

Find answers from the community

Doc source