Find answers from the community

Updated 4 months ago

Storing json data from file into qdrant database using ingestion pipeline fails with attribute error

At a glance

The community member is trying to store JSON data from a file into a Qdrant database using an ingestion pipeline, but is encountering an error related to the 'get_content' attribute of the data. Other community members suggest creating custom nodes using the 'TextNode' schema from the 'llama_index' library, and using multiprocessing to handle large datasets. However, the community member is still facing issues with storing the custom documents in the Qdrant index, receiving an error related to the 'get_doc_id' attribute. The community members suggest that the issue may be related to passing in a list of lists, and that the list needs to be flattened.

Hi, How could i store the JSON data which comes from the JSON File into a Qdrant Database using Ingestion Pipeline ?

i tried storing the documents but i get this error
[str(node.get_content(metadata_mode=MetadataMode.ALL)) for node in nodes]
^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'get_content'
t
L
9 comments
How can I create custom nodes for the JSON and store this in qdrant db.

I also slightly confused b/w nodes and documents.

Are the nodes and documents the same ?
As per the documentation I understand that both are same. Am I correct ?
Basically the same, mostly semantics
Plain Text
from llama_index.core.schema import TextNode

node = TextNode(text="...", metadata={"key": "val", ...})
I have huge IoT sensors data of 5M how can I use multi processing capabilities which are pre-existed in llama index for this generation of my custom text node ?
You can use regular python multiprocessing or threading? Nothing stopping there
Just use a hosted vector store like qdrant, etc so that you can handle concurrent inserts into an index properly
Hey @Logan M
I created custom nodes and tried storing into Qdrant Index but i get this below error

Plain Text
File "/home/tharakn/llama-index-qdrant/app/routers/parser.py", line 129, in json_parsing
    index = VectorStoreIndex.from_documents(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tharakn/llama-index-qdrant/llama_index_qdrant/lib/python3.12/site-packages/llama_index/core/indices/base.py", line 110, in from_documents
    docstore.set_document_hash(doc.get_doc_id(), doc.hash)
                               ^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'get_doc_id'


This is the below code for creating CustomNode
Plain Text
async def _generate_custom_nodes(json_data_list):
  with multiprocessing.Pool(processes=6) as pool:
    results = pool.map(_process_node, json_data_list)
  return results


def _process_node(json_data):
  custom_node = TextNode(
    text="",
    metadata = _get_node_dict(json_data)
  )
  return custom_node


Code for running Ingestion pipeline and storing genereated CustomDocuments into Qdrant
Plain Text
  documents = await _generate_custom_nodes(json_data)
  pipeline_tasks = []
  batch_size = 5
  for i in range(0, len(documents), batch_size): 
    batch_documents = documents[i:i+batch_size]
    task = generate_pipeline_tasks(batch_documents=batch_documents, llm=llm, qdrant_client=qdrant_client)
    pipeline_tasks.append(task)

  results = await asyncio.gather(*pipeline_tasks)

  qdrant_vector_store = QdrantVectorStorage(
    qdrant_client=qdrant_client,
    collection_name="llama_index_searchx"
  )

  storage_index = StorageContext.from_defaults(vector_store=qdrant_vector_store)

  index = VectorStoreIndex.from_documents(
    documents=results,
    storage_context=storage_index
  )


Can you help me in this issue, i tried using TextNode and Document schemas (both) for custom documents generation before passing to Ingestion pipeline, but still facing same issue
You are passing in a list of lists
need to flatten it
Add a reply
Sign up and join the conversation on Discord