Find answers from the community

Updated 2 months ago

Doc source

Hi guys, I am using llamaindex to index my document. And i want to create the document object myself, using the following way:

Plain Text
from llama_index import Document
text_list = [text1, text2, ...]
documents = [Document(t) for t in text_list]

Is there a way, that i can get which node is being used when i am quering this document with question? My main objective is to get the page number of answer as well and when i am creating those text1, text2, ... i will store page number as well.
L
h
8 comments
You can check the response object for the source nodes

response.source_nodes

Any info you add to the document extra_info dict will be inherited to the nodes as well πŸ‘
Thanks Logan, but is this the right approach?
I have a bunch of text paragraphs, and when i ask question i should get the answer, along with the page number from the document
As long as that info is set in the extra_info dict of the document, you will find it in the source nodes. Although this really only makes sense for vector indexes

Better citation support is planned for the future too 🫑
So, now i was creating the nodes by myself.

Plain Text
# 1. Load in Documents
os.environ["OPENAI_API_KEY"] = <api key>
text1="sammple text 1"
text2="sammple text 2"
text3="sammple text 3"

#doc_id here represents page number
node1 = Node(text=text1, doc_id=1)
node2 = Node(text=text2, doc_id=2)
node3 = Node(text=text3, doc_id=3)
nodes=[node1,node2,node3]
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
index1 = GPTVectorStoreIndex(nodes, storage_context=storage_context)
index = GPTVectorStoreIndex([])
index.insert_nodes(nodes)
max_input_size = 4096
# set number of output tokens
num_outputs = 256
# set maximum chunk overlap
max_chunk_overlap = 20
# set chunk size limit
chunk_size_limit = 600
prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_outputs))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)

Now i am confused what should i do, like how can i get my final index created ? could you please help
i have used this link for reference https://github.com/jerryjliu/llama_index/blob/a873f7dc467990c25b8676511888441bc4c339d7/docs/guides/primer/usage_pattern.md?plain=1#L81
Hmm you've already created two "final" indexes using those nodes though? πŸ‘€
yes, i am little confused
Yea the starter page is just giving examples of a few ways to do things

Normally the simplest pattern is something like this

Plain Text
documents = SimpleDirectoryReader("./data").load_data()

index = GPTVectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query("my query")

print(str(response))
From there, you can customize things like the service context, etc.
Add a reply
Sign up and join the conversation on Discord