2025-01-01 09:24:14.745 | ERROR | indexing.indexing_hugging:indexing:140 - Error injecting nodes:
** Resource [93mpunkt_tab[0m not found.
Please use the NLTK Downloader to obtain the resource:
[31m>>> import nltk
nltk.download('punkt_tab')
[0m
For more information see: https://www.nltk.org/data.html
Attempted to load [93mtokenizers/punkt_tab/english/[0m
Searched in:
- '/root/nltk_data'
- '/usr/nltk_data'
- '/usr/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- '/usr/local/lib/python3.10/dist-packages/llama_index/core/_static/nltk_cache'
**
lang_config = LanguageConfig(language="english", spacy_model="en_core_web_md")
splitter = SemanticDoubleMergingSplitterNodeParser(
language_config=lang_config,
initial_threshold=0.4,
appending_threshold=0.5,
merging_threshold=0.5,
max_chunk_size=5000,
)
nodes = splitter.get_nodes_from_documents(documents)
Error while using SemanticDoubleMerging
@WhiteFang_Jr @Logan M