Find answers from the community

Updated 2 months ago

KG storage

Hey quick question on loading locally stored knowledge graphs - should I store with storage_context.to_dict() or with persist()? And how do I load the knowledge graph from these files? Recommendations?
L
M
15 comments
Yea you'll want to use the index.storage_context.persist(persist_dir="./storage") and index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage")) combo

to_dict and from_dict are mostly there to help with json blob storage for aws lol
hmm I'm not sure if this properly loads the LLMs I need
but I can just put the service context to load_inxed_from_storage, silly me
What's the difference between num_chunks_per_query and similarity_top_k in the KG query params?
Also I'd like to know what is your favourite method of visualizing the simple knowledge graphs that are not using any graph db? πŸ™‚
So the KG has a few modes. If include_text=True (the default), then then once the triplets are retireved for a query, the text chunk that the triplet was extracted from is included in the prompt to the LLM

num_chunks_per_query sets a limit to how many chunks are included here, the default is 10 (they are sorted by how many times that text chunk is connected to each triplet, i.e 3 triplets might come from 1 text chunk, so we should definitely include that text chunk before limiting)

similarity_top_k is an additional parameter for fetching triplets using embeddings. It's only used if retriever_mode="embedding" or "hybrid". The default is "keyword" though.

Definitely check out the example noteook for some exploration into this https://gpt-index.readthedocs.io/en/latest/examples/index_structs/knowledge_graph/KnowledgeGraphDemo.html

The emebd is broken, but using networkx to visualize the graph is a good option
So similarity_top_k maps my query (as is in retriever_mode="embedding") to triplets, right?
Let's say I've set it to find N triplets.. each triplet is mapping to a text chunk.. now if num_chunks_per_query < 5, it may limit the amount of text context used in the query?
(assuming each embedding hit mapped to a unique triplet-chunk pair)
networkx is giving an error AttributeError: 'NoneType' object has no attribute 'render' for my graph when creating the html viz. graph works for queries but I'm not sure what could be missing.
I think it should also be used in the hybrid mode as well
yea exactly. If it fetches too many chunks, then the response may take ages, hence the limit option
hmmm, I'll give it a shot as well
This worked for me here. It's actually a pretty good visualization, interactive as well
Attachment
image.png
hmm using the exact same code, but not in a notebook, maybe I'll try that
Add a reply
Sign up and join the conversation on Discord