Find answers from the community

Updated 2 months ago

Hey I want to index a python codebase

Hey I want to index a python codebase and the default Textsplitter is SentenceSplitter. How do I change this to do code aware chunking?
W
S
2 comments
You can choose CodeSplitter like this to change default settings

Plain Text
from llama_index.text_splitter import CodeSplitter
import tiktoken

text_splitter = CodeSplitter(
  separator=" ",
  chunk_size=1024,
  chunk_overlap=20,
  tokenizer=tiktoken.encoding_for_model("gpt-3.5-turbo").encode
)

node_parser = SimpleNodeParser.from_defaults(text_splitter=text_splitter)

Define the parameters as per your requirements. For more You can refer here
https://github.com/jerryjliu/llama_index/blob/main/llama_index/text_splitter/code_splitter.py
Add a reply
Sign up and join the conversation on Discord