Andre Tättar

Is there a way to extend the llamaindex

Is there a way to extend the llamaindex document loader or vector builder in a way that it does not add duplicate files, so it filters them on the document loader or vector building step? Are there any example codes for that?
Reason: web scrapers often load 1 page multiple times and content can be duplicated.

65 comments

AAndre Tättar

Any way to find out how many documents I

Any way to find out how many documents I have in a vector index and some basic information - like size, model used, dimensionality etc?

5 comments

AAndre Tättar

I have my prototype setup for a product

I have my prototype setup for a product, I want to start scaling it up in Google Cloud. Are there any tutorials/notebooks/anything to help me? My current setup is simple:

loader = SimpleDirectoryReader(self.LOCAL_TEMP_DIR, recursive=True, exclude_hidden=True)
documents = loader.load_data()
vector_index = VectorStoreIndex.from_documents(documents)

I'd like to make everything here parallel by using Google CloudFunctions or some alternative. I'd also want to start using vector index databases (any recommendations in GCP?).
VectorStoreIndex takes hours to do sometimes and it gets timed out due to cloudfunction problems (since I work on all the documents in one thread). That is the main reason I want to parallelize it (currently my prototype stores everything on disk). Also - I have one vector index per company, and I have lots of companies, so the vector index must be able to handle that use case (query engine cannot use another companies data)

17 comments

AAndre Tättar

I just updated too and I cannot even use

I just updated too and I cannot even use the code from the readme. Pip freeze -> llama-index==0.10.4
The code from readme, which doesn't run is this simple import: "from llama_index.core import StorageContext, load_index_from_storage"

Leads to error: ImportError: cannot import name 'StorageContext' from 'llama_index.core' (unknown location)

5 comments

Find answers from the community

Is there a way to extend the llamaindex

Any way to find out how many documents I

I have my prototype setup for a product

I just updated too and I cannot even use