Find answers from the community

Updated 2 months ago

Storing json data from file into qdrant database using ingestion pipeline fails with attribute error

Hi, How could i store the JSON data which comes from the JSON File into a Qdrant Database using Ingestion Pipeline ?

i tried storing the documents but i get this error
[str(node.get_content(metadata_mode=MetadataMode.ALL)) for node in nodes]
^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'get_content'
t
L
9 comments
How can I create custom nodes for the JSON and store this in qdrant db.

I also slightly confused b/w nodes and documents.

Are the nodes and documents the same ?
As per the documentation I understand that both are same. Am I correct ?
Basically the same, mostly semantics
Plain Text
from llama_index.core.schema import TextNode

node = TextNode(text="...", metadata={"key": "val", ...})
I have huge IoT sensors data of 5M how can I use multi processing capabilities which are pre-existed in llama index for this generation of my custom text node ?
You can use regular python multiprocessing or threading? Nothing stopping there
Just use a hosted vector store like qdrant, etc so that you can handle concurrent inserts into an index properly
Hey @Logan M
I created custom nodes and tried storing into Qdrant Index but i get this below error

Plain Text
File "/home/tharakn/llama-index-qdrant/app/routers/parser.py", line 129, in json_parsing
    index = VectorStoreIndex.from_documents(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tharakn/llama-index-qdrant/llama_index_qdrant/lib/python3.12/site-packages/llama_index/core/indices/base.py", line 110, in from_documents
    docstore.set_document_hash(doc.get_doc_id(), doc.hash)
                               ^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'get_doc_id'


This is the below code for creating CustomNode
Plain Text
async def _generate_custom_nodes(json_data_list):
  with multiprocessing.Pool(processes=6) as pool:
    results = pool.map(_process_node, json_data_list)
  return results


def _process_node(json_data):
  custom_node = TextNode(
    text="",
    metadata = _get_node_dict(json_data)
  )
  return custom_node


Code for running Ingestion pipeline and storing genereated CustomDocuments into Qdrant
Plain Text
  documents = await _generate_custom_nodes(json_data)
  pipeline_tasks = []
  batch_size = 5
  for i in range(0, len(documents), batch_size): 
    batch_documents = documents[i:i+batch_size]
    task = generate_pipeline_tasks(batch_documents=batch_documents, llm=llm, qdrant_client=qdrant_client)
    pipeline_tasks.append(task)

  results = await asyncio.gather(*pipeline_tasks)

  qdrant_vector_store = QdrantVectorStorage(
    qdrant_client=qdrant_client,
    collection_name="llama_index_searchx"
  )

  storage_index = StorageContext.from_defaults(vector_store=qdrant_vector_store)

  index = VectorStoreIndex.from_documents(
    documents=results,
    storage_context=storage_index
  )


Can you help me in this issue, i tried using TextNode and Document schemas (both) for custom documents generation before passing to Ingestion pipeline, but still facing same issue
You are passing in a list of lists
need to flatten it
Add a reply
Sign up and join the conversation on Discord