Storing json data from file into qdrant database using ...

At a glance

The community member is trying to store JSON data from a file into a Qdrant database using an ingestion pipeline, but is encountering an error related to the 'get_content' attribute of the data. Other community members suggest creating custom nodes using the 'TextNode' schema from the 'llama_index' library, and using multiprocessing to handle large datasets. However, the community member is still facing issues with storing the custom documents in the Qdrant index, receiving an error related to the 'get_doc_id' attribute. The community members suggest that the issue may be related to passing in a list of lists, and that the list needs to be flattened.

ttharak#3

Hi, How could i store the JSON data which comes from the JSON File into a Qdrant Database using Ingestion Pipeline ?

i tried storing the documents but i get this error
[str(node.get_content(metadata_mode=MetadataMode.ALL)) for node in nodes]
^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'get_content'

9 comments

ttharak#3

How can I create custom nodes for the JSON and store this in qdrant db.

I also slightly confused b/w nodes and documents.

Are the nodes and documents the same ?
As per the documentation I understand that both are same. Am I correct ?

LLogan M

Basically the same, mostly semantics

LLogan M

Plain Text

from llama_index.core.schema import TextNode

node = TextNode(text="...", metadata={"key": "val", ...})

ttharak#3

I have huge IoT sensors data of 5M how can I use multi processing capabilities which are pre-existed in llama index for this generation of my custom text node ?

LLogan M

You can use regular python multiprocessing or threading? Nothing stopping there

LLogan M

Just use a hosted vector store like qdrant, etc so that you can handle concurrent inserts into an index properly

ttharak#3

Hey @Logan M
I created custom nodes and tried storing into Qdrant Index but i get this below error

Plain Text

File "/home/tharakn/llama-index-qdrant/app/routers/parser.py", line 129, in json_parsing
    index = VectorStoreIndex.from_documents(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tharakn/llama-index-qdrant/llama_index_qdrant/lib/python3.12/site-packages/llama_index/core/indices/base.py", line 110, in from_documents
    docstore.set_document_hash(doc.get_doc_id(), doc.hash)
                               ^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'get_doc_id'

This is the below code for creating CustomNode

Plain Text

async def _generate_custom_nodes(json_data_list):
  with multiprocessing.Pool(processes=6) as pool:
    results = pool.map(_process_node, json_data_list)
  return results


def _process_node(json_data):
  custom_node = TextNode(
    text="",
    metadata = _get_node_dict(json_data)
  )
  return custom_node

Code for running Ingestion pipeline and storing genereated CustomDocuments into Qdrant

Plain Text

  documents = await _generate_custom_nodes(json_data)
  pipeline_tasks = []
  batch_size = 5
  for i in range(0, len(documents), batch_size): 
    batch_documents = documents[i:i+batch_size]
    task = generate_pipeline_tasks(batch_documents=batch_documents, llm=llm, qdrant_client=qdrant_client)
    pipeline_tasks.append(task)

  results = await asyncio.gather(*pipeline_tasks)

  qdrant_vector_store = QdrantVectorStorage(
    qdrant_client=qdrant_client,
    collection_name="llama_index_searchx"
  )

  storage_index = StorageContext.from_defaults(vector_store=qdrant_vector_store)

  index = VectorStoreIndex.from_documents(
    documents=results,
    storage_context=storage_index
  )

Can you help me in this issue, i tried using TextNode and Document schemas (both) for custom documents generation before passing to Ingestion pipeline, but still facing same issue

LLogan M

You are passing in a list of lists

LLogan M

need to flatten it

Add a reply

Find answers from the community

Storing json data from file into qdrant database using ingestion pipeline fails with attribute error