Find answers from the community

Updated last month

NLTK

At a glance

The post describes an error encountered while indexing nodes, where the NLTK resource "punkt_tab" was not found. Community members suggested installing the required NLTK resource. However, after solving that issue, the community member encountered a new error related to empty vectors in the semantic double merging splitter. The community members discussed the possibility of the embeddings not working properly and asked for a solution to mitigate the issue. One community member noted that the code has many try/except blocks, which may be hiding the actual traceback for the error.

Useful resources
2025-01-01 09:24:14.745 | ERROR | indexing.indexing_hugging:indexing:140 - Error injecting nodes:
**
Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:

>>> import nltk
nltk.download('punkt_tab')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt_tab/english/

Searched in:
  • '/root/nltk_data'
  • '/usr/nltk_data'
  • '/usr/share/nltk_data'
  • '/usr/lib/nltk_data'
  • '/usr/share/nltk_data'
  • '/usr/local/share/nltk_data'
  • '/usr/lib/nltk_data'
  • '/usr/local/lib/nltk_data'
  • '/usr/local/lib/python3.10/dist-packages/llama_index/core/_static/nltk_cache'
**


lang_config = LanguageConfig(language="english", spacy_model="en_core_web_md")

splitter = SemanticDoubleMergingSplitterNodeParser(
language_config=lang_config,
initial_threshold=0.4,
appending_threshold=0.5,
merging_threshold=0.5,
max_chunk_size=5000,
)
nodes = splitter.get_nodes_from_documents(documents)

Error while using SemanticDoubleMerging
@WhiteFang_Jr @Logan M
W
B
L
7 comments
You need to install the nltk resource as mentioned in the error
solved it, but getting this error now:
20.73s/it]/usr/local/lib/python3.10/dist-packages/llama_index/core/node_parser/text/semantic_double_merging_splitter.py:228: UserWarning: [W008] Evaluating Doc.similarity based on empty vectors. ).similarity(/usr/local/lib/python3.10/dist-packages/llama_index/core/node_parser/text/semantic_double_merging_splitter.py:297: UserWarning: [W008] Evaluating Doc.similarity based on empty vectors. current_nlp.similarity(/usr/local/lib/python3.10/dist-packages/llama_index/core/node_parser/text/semantic_double_merging_splitter.py:310: UserWarning: [W008] Evaluating Doc.similarity based on empty vectors. and current_nlp.similarity(/usr/local/lib/python3.10/dist-packages/llama_index/core/node_parser/text/semantic_double_merging_splitter.py:328: UserWarning: [W008] Evaluating Doc.similarity based on empty vectors. and current_nlp.similarity(/usr/local/lib/python3.10/dist-packages/llama_index/core/node_parser/text/semantic_double_merging_splitter.py:255: UserWarning: [W008] Evaluating Doc.similarity based on empty vectors. ).similarity(2025-01-01 09:35:42.895 | ERROR | indexing.indexing_hugging:indexing:143 - Error injecting nodes: list index out of range
2025-01-01 09:35:42.896 | DEBUG | indexing.utils:move_files:46 - Moved PF Testing Locations_Condos.pdf to /home/raaadmin/data/321/28ab04a7-09bd-4d58-af5e-4b8878e0209a/error-files/PF Testing Locations/attachments/PF Testing Locations_Condos.pdf
Seems like your embeddings aren't working properly?
any solution to mitigate this?
i am using openai embeddings, this is the file that's giving the error
@Logan M @WhiteFang_Jr
Hard to say without an actual traceback. Your code has so many try/excepts its hiding the traceback for the error
Add a reply
Sign up and join the conversation on Discord