Find answers from the community

Updated 11 months ago

llama_index/llama_index/indices/base.py ...

hi everyone, I have a question on index.refresh_ref_docs: does this not work for de-duplication when there is a vector store like deeplake or mongo?
https://github.com/run-llama/llama_index/blob/cc739d10069a7f2ac653d6d019fbeb18a891fea2/llama_index/indices/base.py#L310

Edit: => Searching through past messages i found this message. is this still the recommended way to deal with duplicates?

https://discord.com/channels/1059199217496772688/1163880111074971790/1163900056718553169

=====
L
R
53 comments
thats certainly one way. You can also attach a docstore and vector store to an ingestion pipeline, which is a bit easier (just be sure to save/load the docstore to/from disk, or use a remote docstore like mongodb or redis)

https://docs.llamaindex.ai/en/stable/examples/ingestion/document_management_pipeline.html

Plain Text
docstore = SimpleDocumentStore()

pipeline = IngestionPipeline(
    transformations=[
        SentenceSplitter(),
        OpenAIEmbedding(),
    ],
    vector_store=vector_store,
    docstore=docstore,
)
the pipeline doesnt work with the vectordb im trying to use πŸ™‚ Deeplake => needs updating to BasePydanticVectorStore to work with the pipeline
haha ah classic
should be an easy PR
i saw ur astradb commit update not sure if it would be exactly the same for deeplake? Im a Go dev, just dabbling with python for AI FOMO
Its basically the exact same πŸ‘€ I can get to it in a bit, but if you make a PR I can review it pretty quickly
ill be a python code monkey for the day n do it
when i run poetry install, it assumes a python version 3.12 and it fails to install numpy
so i made the change, but i cannot run any lintiing or formatting cuz poetry fails at numpy n wont move on to the other deps
how are you running linting?

make lint should work (assuming you have pre-commit installed)
lint didnt work
as the pre-commit was not installed
install pre-commit? sudo apt-get install pre-commit I think
oh, maybe its a pip package actually
it is part of the dev dependencies in the pypoetry.toml, im using mac
its been a minute
I use a mac as well
well, I switch between haha
brew install pre-commit
should the python version in poetry virtual env be 3.12?
I'm not 100% sure if 3.12 works yet or not, it might not
I use 3.11 or 3.10 most often locally
done πŸ™‚
thanks for ur help
i just saw from ur linkedin that u live in canada. nicee - me too
awesome! Will get that merged!
haha yea ! πŸ‡¨πŸ‡¦ I am probably one of very few working in this space in sask lol
Thats awesome πŸ™‚ i only know one software dev in sask, works at Angi. Are there more canadians in the llama index team?
Theres 3.5 of us! (our cofounder is from toronto, moved to SF for the company. Two others are still in the toronto area)
Hi Logan πŸ™‚

i'm still getting the same error as before:

Plain Text
ValidationError: 1 validation error for IngestionPipeline
vector_store
  value is not a valid dict (type=type_error.dict)
this is the code:
Plain Text
init_vector_store = DeepLakeVectorStore(
    dataset_path=deeplake_path,
    token=deeplake_api_key,
    overwrite=True,
)
# postgres table is created here ##<-
docstore = PostgresDocumentStore.from_uri(
    uri=postgres_url,
    table_name=docstore_table,
    schema_name=docstore_schema,
    perform_setup=True,
    use_jsonb=True,
    namespace=docstore_namespace
)
pipeline = IngestionPipeline.from_service_context(
    service_context=sc,
    docstore=docstore,
    vector_store=init_vector_store
)
You are sure you have the latest version?
pip install -U llama-index
:PSadge: lemme try and test
take ur time πŸ™‚
I get a different error -- seems like a small bug with some missing PrivateAttrs
ok works for me now, making a PR

I never hit the same error as you though, which makes me think you have a slight env issue. If you are working in a notebook, you need to refresh/reboot it for package changes to take effect (and sometimes multiple times, notebooks are weird)
Ill try again and let u knw πŸ™‚ thanks alot
What was the error that u had?
Some classic pydantic issues, ValueError: "DeepLakeVectorStore" object has no field "ingestion_batch_size"
merged the above, you can use pip install -U git+https://github.com/run-llama/llama_index.git to get it
Ye i got that too
One time i got the ingestion
Then i got num_workers
These attributes were an issue too
gotcha -- it was working with my PR at least πŸ™‚
Add a reply
Sign up and join the conversation on Discord