----------------------
Yes, LlamaIndex provides a way to index all documents and build a search on top of it. The process involves creating a list of Document objects and building an
Index
over these objects. This allows you to start querying them. LlamaIndex offers several different index types, including Vector Store Index and Summary Index.
Here is an example of how to use the Vector Store Index:
from llama_index import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
You can also build an index over a list of Node objects directly:
from llama_index import VectorStoreIndex
index = VectorStoreIndex(nodes)
Once your text is indexed, it is ready for querying. However, embedding all your text can be time-consuming and, if you are using a hosted LLM, it can also be expensive. To save time and money you will want to store your embeddings first.
LlamaIndex also offers composability of your indices, meaning that you can build indices on top of other indices. This allows you to more effectively index your entire document tree in order to feed custom knowledge to GPT.
Here is an example of how to define subindices for each document:
from llama_index import SimpleDirectoryReader, TreeIndex
doc1 = SimpleDirectoryReader("data1").load_data()
doc2 = SimpleDirectoryReader("data2").load_data()
doc3 = SimpleDirectoryReader("data3").load_data()
storage_context = storage_context.from_defaults()
index1 = TreeIndex.from_documents(doc1, storage_context=storage_context)
index2 = TreeIndex.from_documents(doc2, storage_context=storage_context)
index3 = TreeIndex.from_documents(doc3, storage_context=storage_context)