So specifically, for langchain llms, it's defaulting to context_window=3900 (allows for some wiggle room) and num_output=256
If any of these are incorrect, then they need to be adjusted in the service context to match the LLM
llm = LangChainLLM(lc_llm)
service_context = ServiceContext.from_defaults(llm=llm, context_window=3900, num_output=1000)
set_global_service_context(service_context)
For other LLMs officially supported, these numbers can be pulled from the LLM class directly. LangChainLLM is a special case.
Then, using these values, llama-index chunks things appropriately. The one snag is the system prompt, which is not accounted for properly when constructing LLM inputs (long story). So if your system prompt is causing issues, try shortening the context window in the service context