Find answers from the community

Updated 3 months ago

Data sources

Just asking if I can create a vector index of multiple data types
L
L
25 comments
Yes! Since loaders all return document objects, you can build your index with whatever data mixture you want ๐Ÿ‘
Ah and I assume I don't need to do anything more, how can I add to an existing GPTVectorStoreIndex?
like can I add to
Plain Text
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
Yea if the index is already created, you can call insert on each new document

Plain Text
for doc in documents:
    index.insert(doc)
If you haven't created the index yet, you can just create a giant list of documents
Ahhhh, I see and I assume the data type doesn't matter since its all doc objects like u stated above
Absolutely amazing, thank you!
Oh also, one more thing, is it possible to feed it live data. I saw you guys have a database loader, does that take live data or a snapshot
or does it query for data relevant to the query itself
Does it make smart database statements?
Mmm, live data is pretty tricky. The db readers don't really do anything too special I think ๐Ÿค”

Still looking for ways to improve this part of the library ๐Ÿ™

But thankfully embeddings are super cheap, so rebuilding the index isn't usually too expensive (unless it's gigantic)
Oh yeah, I'm not worried about costs!
But just to confirm, does the DB loader take a snapshot of the DB or how exactly does it query the database
does it just reads a snapshot or saves a cache to a file?
It will be just a snapshot. Looks like you have to pass in a query (I'm assuming you are looking at this reader? https://llama-hub-ui.vercel.app/l/database)
Sorry for the late reply but yes I am!
Is it possible to keep the index in memory or in a json file somehow?
Or is it only initialized and built upon program run
Yea you can save/load the index to a folder ๐Ÿ‘

Plain Text
index.storage_context.persist(persist_dir="./storage")

from llama_index import StorageContext, load_index_from_storage 
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
Sorry for the bother but for this

Plain Text
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('./data').load_data()


Does this autoread folders within folders?
so that if we had
Plain Text
data/
  transcript/
    - Transcript1.txt
    - Transcript2.txt
  documents/
    - Document1.txt
    - Document2.txt
 
Just need to set it to recursive

Plain Text
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('./data', recursive=True).load_data()
Awesome, thank you!
Add a reply
Sign up and join the conversation on Discord