Find answers from the community

Updated 2 months ago

Doc Summary Index

Hi! , Is it poosible to build Document Summary Index in French ?
I do this
Plain Text
response_synthesizer = get_response_synthesizer(response_mode="tree_summarize", use_async=True)
doc_summary_index = DocumentSummaryIndex.from_documents(
    [data_document],
    service_context=service_context,
    response_synthesizer=response_synthesizer,
    show_progress=True,
)
doc_summary_index.storage_context.persist("index_summary")

But the result of doc_summary_index.get_document_summary(DOC_ID) is in English not in French.
Rem: [data_document] contain text in French.
W
L
9 comments
I checked the code, You need to modify summary_query in DocumentSummaryIndex as this is used as a instruction while creating the summary.

https://github.com/run-llama/llama_index/blob/09a55d53936fe104f7b4b5390bce4344f17a8b88/llama_index/indices/document_summary/base.py#L77
Ah ! OK.
By default summary_query = DEFAULT_SUMMARY_QUERY with
DEFAULT_SUMMARY_QUERY = ( "Describe what the provided text is about. " "Also describe some of the questions that this text can answer. " )
Plain Text
docSummIndex = DocumentSummaryIndex(summary_query= "YOUR_MODIFIED_INSTRUCTION", .... rest of things)

index = docSummIndex.from_documents(
    [data_document],
    service_context=service_context,
   response_synthesizer=response_synthesizer,
    show_progress=True,
)
OK. Thank's
All that remains is to find the right prompt/good instruction to summarize in French. Do you have any ideas?
You could try prompts.chat, They have a good collection of prompts.
This can help you to create a better prompt for your usecase, I think
Plain Text
french_summary_query = (
  "Summarize the content in 5 lines. "
  "The response should be in the language french. "
)
docSummIndex = DocumentSummaryIndex(
    service_context=service_context,
    response_synthesizer=response_synthesizer,
    summary_query=french_summary_query, 
    show_progress=True
)
french_doc_summary_index = docSummIndex.from_documents([data_document])

I got this error
Plain Text
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-14-1c2d4fe7b925> in <cell line: 5>()
      3   "The response should be in the language french. "
      4 )
----> 5 docSummIndex = DocumentSummaryIndex(
      6     service_context=service_context,
      7     response_synthesizer=response_synthesizer,

1 frames

/usr/local/lib/python3.10/dist-packages/llama_index/indices/base.py in __init__(self, nodes, index_struct, storage_context, service_context, show_progress, **kwargs)
     45         """Initialize with parameters."""
     46         if index_struct is None and nodes is None:
---> 47             raise ValueError("One of nodes or index_struct must be provided.")
     48         if index_struct is not None and nodes is not None:
     49             raise ValueError("Only one of nodes or index_struct can be provided.")

ValueError: One of nodes or index_struct must be provided.

What am I doing wrong?
I found the following solution:
Plain Text
french_summary_query = (
  "Summarize the content in 5 lines. "
  "The response should be in the language french. "
)
chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo")
response_synthesizer = get_response_synthesizer(response_mode="tree_summarize", use_async=True)
service_context = ServiceContext.from_defaults(llm=chatgpt, chunk_size=1024)
node_parser = service_context.node_parser

french_doc_summary_index = DocumentSummaryIndex(
    service_context=service_context,
    nodes=node_parser.get_nodes_from_documents([data_document]),
    response_synthesizer=response_synthesizer,
    summary_query=french_summary_query, 
    show_progress=True
)
Add a reply
Sign up and join the conversation on Discord