transformations=[ SentenceSplitter(chunk_size=256, chunk_overlap=0), embed_model ],
pipeline = IngestionPipeline( transformations=[ SemanticSplitterNodeParser( buffer_size=1, breakpoint_percentile_threshold=95, ), embed_model ], docstore=SimpleDocumentStore(), vector_store=vector_store, )
Traceback (most recent call last): File "C:\Users\User\Projects\project\TEST_qdrant_ingest_data.py", line 80, in <module> SemanticSplitterNodeParser( File "C:\Users\User\Projects\project\venv\lib\site-packages\pydantic\v1\main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for SemanticSplitterNodeParser embed_model field required (type=value_error.missing)
SentenceSplitter(chunk_size=256, chunk_overlap=0)
π But it didn't work for SemanticSplitterNodeParser(buffer_size=1, breakpoint_percentile_threshold=95)
. Don't you know what the reason could be?SemanticSplitterNodeParser( buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model ),
pipeline = IngestionPipeline( transformations=[ SemanticSplitterNodeParser( buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model, ), embed_model, ], docstore=SimpleDocumentStore(), vector_store=vector_store, )
pipeline = IngestionPipeline( transformations=[ SemanticSplitterNodeParser( buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model, ), embed_model, ], docstore=SimpleDocumentStore(), vector_store=vector_store, cache=IngestionCache(), ) pipeline.persist('./pipeline_storage') pipeline.run( documents=reader.load_data(), )
./pipeline_storage
directory, two files docstore.json
and llama_cache
appear, but both only contain an empty object {}
. Could there be something I'm missing in the setup or the way the persistence is supposed to work?"DEBUG:fsspec.local:open file: C:/Users/User/Projects/project/pipeline_storage/llama_cache DEBUG:fsspec.local:open file: C:/Users/User/Projects/project/pipeline_storage/docstore.json
pipeline = IngestionPipeline( transformations=[ SemanticSplitterNodeParser( buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model, ), embed_model, ], docstore=RedisDocumentStore.from_host_and_port( 'localhost', 6379, namespace='document_store' ), vector_store=vector_store, cache=IngestionCache( cache=RedisCache.from_host_and_port('localhost', 6379), collection='redis_cache', ), )
cache.py
(llama_index.core.ingestion.cache
), there seems to be no reference to RedisCache
. I'm missing something...