Data sources

At a glance

LLawWing

Just asking if I can create a vector index of multiple data types

25 comments

LLogan M

Yes! Since loaders all return document objects, you can build your index with whatever data mixture you want 👍

LLawWing

Ah and I assume I don't need to do anything more, how can I add to an existing GPTVectorStoreIndex?

LLawWing

like can I add to

Plain Text

documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)

LLogan M

Yea if the index is already created, you can call insert on each new document

Plain Text

for doc in documents:
    index.insert(doc)

LLogan M

If you haven't created the index yet, you can just create a giant list of documents

LLawWing

Ahhhh, I see and I assume the data type doesn't matter since its all doc objects like u stated above

LLawWing

Absolutely amazing, thank you!

LLawWing

Oh also, one more thing, is it possible to feed it live data. I saw you guys have a database loader, does that take live data or a snapshot

LLawWing

or does it query for data relevant to the query itself

LLawWing

Does it make smart database statements?

LLogan M

Mmm, live data is pretty tricky. The db readers don't really do anything too special I think 🤔

Still looking for ways to improve this part of the library 🙏

But thankfully embeddings are super cheap, so rebuilding the index isn't usually too expensive (unless it's gigantic)

LLawWing

Oh yeah, I'm not worried about costs!

LLawWing

But just to confirm, does the DB loader take a snapshot of the DB or how exactly does it query the database

LLawWing

does it just reads a snapshot or saves a cache to a file?

LLogan M

It will be just a snapshot. Looks like you have to pass in a query (I'm assuming you are looking at this reader? https://llama-hub-ui.vercel.app/l/database)

LLawWing

Sorry for the late reply but yes I am!

LLawWing

Is it possible to keep the index in memory or in a json file somehow?

LLawWing

Or is it only initialized and built upon program run

LLogan M

Yea you can save/load the index to a folder 👍

Plain Text

index.storage_context.persist(persist_dir="./storage")

from llama_index import StorageContext, load_index_from_storage 
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)

LLawWing

Awesome!

LLawWing

Sorry for the bother but for this

Plain Text

from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('./data').load_data()

Does this autoread folders within folders?

LLawWing

so that if we had

LLawWing

Plain Text

data/
  transcript/
    - Transcript1.txt
    - Transcript2.txt
  documents/
    - Document1.txt
    - Document2.txt

LLogan M

Just need to set it to recursive

Plain Text

from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader('./data', recursive=True).load_data()

LLawWing

Awesome, thank you!

Add a reply

Find answers from the community

Data sources