KG storage

At a glance

Hey quick question on loading locally stored knowledge graphs - should I store with storage_context.to_dict() or with persist()? And how do I load the knowledge graph from these files? Recommendations?

15 comments

LLogan M

Yea you'll want to use the index.storage_context.persist(persist_dir="./storage") and index = load_index_from_storage(StorageContext.from_defaults(persist_dir="./storage")) combo

to_dict and from_dict are mostly there to help with json blob storage for aws lol

MMikko

hmm I'm not sure if this properly loads the LLMs I need

MMikko

but I can just put the service context to load_inxed_from_storage, silly me

MMikko

What's the difference between num_chunks_per_query and similarity_top_k in the KG query params?

MMikko

Also I'd like to know what is your favourite method of visualizing the simple knowledge graphs that are not using any graph db? 🙂

LLogan M

So the KG has a few modes. If include_text=True (the default), then then once the triplets are retireved for a query, the text chunk that the triplet was extracted from is included in the prompt to the LLM

num_chunks_per_query sets a limit to how many chunks are included here, the default is 10 (they are sorted by how many times that text chunk is connected to each triplet, i.e 3 triplets might come from 1 text chunk, so we should definitely include that text chunk before limiting)

similarity_top_k is an additional parameter for fetching triplets using embeddings. It's only used if retriever_mode="embedding" or "hybrid". The default is "keyword" though.

Definitely check out the example noteook for some exploration into this https://gpt-index.readthedocs.io/en/latest/examples/index_structs/knowledge_graph/KnowledgeGraphDemo.html

The emebd is broken, but using networkx to visualize the graph is a good option

MMikko

So similarity_top_k maps my query (as is in retriever_mode="embedding") to triplets, right?

MMikko

Let's say I've set it to find N triplets.. each triplet is mapping to a text chunk.. now if num_chunks_per_query < 5, it may limit the amount of text context used in the query?

MMikko

(assuming each embedding hit mapped to a unique triplet-chunk pair)

MMikko

networkx is giving an error AttributeError: 'NoneType' object has no attribute 'render' for my graph when creating the html viz. graph works for queries but I'm not sure what could be missing.

LLogan M

I think it should also be used in the hybrid mode as well

LLogan M

yea exactly. If it fetches too many chunks, then the response may take ages, hence the limit option

LLogan M

hmmm, I'll give it a shot as well

LLogan M

This worked for me here. It's actually a pretty good visualization, interactive as well

Attachment

MMikko

hmm using the exact same code, but not in a notebook, maybe I'll try that

Add a reply

Find answers from the community

KG storage