thats certainly one way. You can also attach a docstore and vector store to an ingestion pipeline, which is a bit easier (just be sure to save/load the docstore to/from disk, or use a remote docstore like mongodb or redis)
https://docs.llamaindex.ai/en/stable/examples/ingestion/document_management_pipeline.htmldocstore = SimpleDocumentStore()
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(),
OpenAIEmbedding(),
],
vector_store=vector_store,
docstore=docstore,
)
the pipeline doesnt work with the vectordb im trying to use π Deeplake => needs updating to BasePydanticVectorStore to work with the pipeline
i saw ur astradb commit update not sure if it would be exactly the same for deeplake? Im a Go dev, just dabbling with python for AI FOMO
Its basically the exact same π I can get to it in a bit, but if you make a PR I can review it pretty quickly
ill be a python code monkey for the day n do it
when i run poetry install, it assumes a python version 3.12 and it fails to install numpy
so i made the change, but i cannot run any lintiing or formatting cuz poetry fails at numpy n wont move on to the other deps
how are you running linting?
make lint
should work (assuming you have pre-commit installed)
as the pre-commit was not installed
install pre-commit? sudo apt-get install pre-commit
I think
oh, maybe its a pip package actually
it is part of the dev dependencies in the pypoetry.toml, im using mac
well, I switch between haha
should the python version in poetry virtual env be 3.12?
I'm not 100% sure if 3.12 works yet or not, it might not
I use 3.11 or 3.10 most often locally
i just saw from ur linkedin that u live in canada. nicee - me too
awesome! Will get that merged!
haha yea ! π¨π¦ I am probably one of very few working in this space in sask lol
Thats awesome π i only know one software dev in sask, works at Angi. Are there more canadians in the llama index team?
Theres 3.5 of us! (our cofounder is from toronto, moved to SF for the company. Two others are still in the toronto area)
Hi Logan π
i'm still getting the same error as before:
ValidationError: 1 validation error for IngestionPipeline
vector_store
value is not a valid dict (type=type_error.dict)
init_vector_store = DeepLakeVectorStore(
dataset_path=deeplake_path,
token=deeplake_api_key,
overwrite=True,
)
# postgres table is created here ##<-
docstore = PostgresDocumentStore.from_uri(
uri=postgres_url,
table_name=docstore_table,
schema_name=docstore_schema,
perform_setup=True,
use_jsonb=True,
namespace=docstore_namespace
)
pipeline = IngestionPipeline.from_service_context(
service_context=sc,
docstore=docstore,
vector_store=init_vector_store
)
You are sure you have the latest version?
pip install -U llama-index
:PSadge: lemme try and test
I get a different error -- seems like a small bug with some missing PrivateAttrs
ok works for me now, making a PR
I never hit the same error as you though, which makes me think you have a slight env issue. If you are working in a notebook, you need to refresh/reboot it for package changes to take effect (and sometimes multiple times, notebooks are weird)
Ill try again and let u knw π thanks alot
What was the error that u had?
Some classic pydantic issues, ValueError: "DeepLakeVectorStore" object has no field "ingestion_batch_size"
merged the above, you can use pip install -U git+https://github.com/run-llama/llama_index.git
to get it
One time i got the ingestion
These attributes were an issue too
gotcha -- it was working with my PR at least π