I thiiiink this is working fine? At least for me?
Tested gpt-3.5, gpt-3.5 16k, and gpt-4-32k
>>> from llama_index import ServiceContext, SummaryIndex, SimpleDirectoryReader
>>> from llama_index.llms import OpenAI
>>> ctx = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo-16k"))
>>> documents = SimpleDirectoryReader("./docs/examples/data/paul_graham").load_data()
>>> index = SummaryIndex.from_documents(documents, service_context=ctx)
>>> res = index.as_query_engine(response_mode="tree_summarize").query("What did the author do growing up?")
2 text chunks after repacking
1 text chunks after repacking
>>> ctx = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo"))
>>> index = SummaryIndex.from_documents(documents, service_context=ctx)
>>> res = index.as_query_engine(response_mode="tree_summarize").query("What did the author do growing up?")
6 text chunks after repacking
1 text chunks after repacking
>>> ctx = ServiceContext.from_defaults(llm=OpenAI(model="gpt-4-32k"))
>>> index = SummaryIndex.from_documents(documents, service_context=ctx)
>>> res = index.as_query_engine(response_mode="tree_summarize").query("What did the author do growing up?")
1 text chunks after repacking