The community member has a large content base for their SaaS product in the wealthTech space and is planning to build a customer-facing chatbot. They are considering building a knowledge graph instead of just dumping the text into GPT Index. The community members discuss whether this is the right approach, how to create an index from knowledge graphs, and whether the knowledge graph can be used as the underlying content for GPT Index. They also discuss the potential advantages of creating embeddings based on the knowledge graph and storing them in a vector store for indexing.
The comments suggest that the simplest approach is to convert all documents to text, embed them, and dump them into a vector store index. However, the community members may want to define a more precise "knowledge graph" structure over their data, which GPT Index can help with. The comments also indicate that the community members can pass in their own embeddings instead of having GPT Index embed the text for them.
Great job on GPT Index! Really fascinating. @jerryjliu0
Background: We have a lot of content for our SAAS product in the wealthTech space. We are now planning to build a customer facing chatbot that provides them with answers based this content.
We think it might be a good idea to build a knowledge graph first rather than just dumping text in gptindex.
Question: Is this the right approach? If yes, how do we create index out of knowledge graphs? And can we use this as the underlying content to be fed into GPT Index?
Secondly, if we create embeddings based on knowledge graph and store them in a vector store (like pinecone or weaviate) and then use those embeddings for indexing, will that have any advantage?
I'd love to understand your use case a bit better, would be happy to hop on a call if you're available. The absolute simplest approach is to convert all documents to text, embed them, and dump them into a vector store index (e.g. with the quickstart https://gpt-index.readthedocs.io/en/latest/getting_started/starter_example.html). But with your use case you may want to define a more precise "knowledge graph" structure over your data, and I think GPT Index can help with that!
To your second point, you can definitely pass in your own embeddings vs. us embedding the text for you - simply specify embedding when you create a Document object e.g.
Plain Text
doc = Document(text, embedding=embedding)
index = GPTSimpleVectorIndex([doc])
Chiming in (even though I'm no expert). Recall that the goal here is to reduce your massive knowledge base to token sized paragraphs. The mentioned vector stores are a good way to semantically query for subset content. Of course there are many ways to do NLP style of queries across KB or a knowledge graph. You should probably go with the standard setup before trying any customizations. However, if you wanted to arrange content in a graph it's likely best to do within the gptindex by extending the concept of node & "node trees" to an attribute graph that could hold more types of content for assembling token chunks. **no one is supposed to understand this but you can ask chatgpt what it means.