Find answers from the community

Updated 4 months ago

Do we have an implementation of indexing all llama index documents and build a search on t

At a glance
Do we have an implementation of indexing all llama index documents and build a search on top of it? If so, I like to compare it with to see the different
k
L
Q
7 comments
Please stand by @Qingsong Yao as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
Yes, LlamaIndex provides a way to index all documents and build a search on top of it. The process involves creating a list of Document objects and building an Index over these objects. This allows you to start querying them. LlamaIndex offers several different index types, including Vector Store Index and Summary Index.

Here is an example of how to use the Vector Store Index:

Plain Text
from llama_index import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)


You can also build an index over a list of Node objects directly:

Plain Text
from llama_index import VectorStoreIndex

index = VectorStoreIndex(nodes)


Once your text is indexed, it is ready for querying. However, embedding all your text can be time-consuming and, if you are using a hosted LLM, it can also be expensive. To save time and money you will want to store your embeddings first.

LlamaIndex also offers composability of your indices, meaning that you can build indices on top of other indices. This allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.

Here is an example of how to define subindices for each document:

Plain Text
from llama_index import SimpleDirectoryReader, TreeIndex

doc1 = SimpleDirectoryReader("data1").load_data()
doc2 = SimpleDirectoryReader("data2").load_data()
doc3 = SimpleDirectoryReader("data3").load_data()

storage_context = storage_context.from_defaults()

index1 = TreeIndex.from_documents(doc1, storage_context=storage_context)
index2 = TreeIndex.from_documents(doc2, storage_context=storage_context)
index3 = TreeIndex.from_documents(doc3, storage_context=storage_context)
For more details, you can refer to the LlamaIndex documentation () and GitHub examples ().

To compare it with another method, you would need to provide more information about the method you want to compare it with.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@Qingsong Yao You could manually get all of llama_index documents and create a vectorstoreindex, then perform queries on the VectorStoreIndex.

You could download the docs, or the entire repo, by just downloading files from github.

Then you could use beautiful soup, if you want, or some other text parser, to clean anything if you need to.

then you could add metadata to each document based on what the document is documenting.

Then add documents to a VectorStoreIndex and query, I guess?
I think Llama index team should do that, right?
I think they have already created tools for searching the documentation. For example, the kapa.ai bot. Or, on the documentation page, cmd+K to find "Ask Mendable"

If, for some reason, you want to do that yourself, I think the steps I outlined above could be the way to start
kapai.ai is not using llama index. I think if the team can demo how to do that as reference implemenation, that will be helpful.
Add a reply
Sign up and join the conversation on Discord