Find answers from the community

Home
Members
LuroDev
L
LuroDev
Offline, last seen 3 months ago
Joined September 25, 2024
Any chance anyone has used the KeywordTableIndex successfully?

I am currently using it but getting an empty index returned every time.
Using the callback manager I can see that the payload to gpt 3.5 is successfully getting there and coming back with X keywords, but for some reason the index is always empty at the end...

Plain Text
# Templating step is successful (2 events per step)
CBEvent(event_type=<CBEventType.TEMPLATING: 'templating'>, payload={<EventPayload.TEMPLATE: 'template'>: "Some text is provided below. Given the text, extract up to {m......

CBEvent(event_type=<CBEventType.TEMPLATING: 'templating'>, payload=None, time='01/09/2024, 11:25:30.....

# LLM step is successful (Also 2 events per step)
CBEvent(event_type=<CBEventType.LLM: 'llm'>, payload={<EventPayload.MESSAGES: 'messages'>: [ChatMessage(role=<MessageRole.USER: 'user'>, content="Some text is provided below. Given the text, extract up to 3 keywords from

CBEvent(event_type=<CBEventType.LLM: 'llm'>, payload={<EventPayload.MESSAGES: 'messages'>: [ChatMessage(role=<MessageRole.USER: 'user'>, content="Some text is
...
EventPayload.RESPONSE: 'response'>: ChatResponse(message=ChatMessage(role=<MessageRole.ASSISTANT: 'assistant'>, content='KEYWORDS: Ticket ID: 7690, Project....


Any ideas?
Using Service context with an AzureOpenAI deployment
Index is instantiated like this
Plain Text
index = KeywordTableIndex(nodes, max_keywords_per_chunk=3, use_async=False, service_context=service_context, show_progress=True)


index.summary returns 'None'
5 comments
L
L
L
LuroDev
·

Hey everyone,

Hey everyone,
Keyword extractor question:

I am setting up a keyword extractor pipeline like so:
Plain Text
from llama_index.extractors import (
    KeywordExtractor,
)

extractors = [
    KeywordExtractor(keywords=5, llm = llm, )
]

from llama_index.ingestion import IngestionPipeline

pipeline = IngestionPipeline(transformations=extractors)

nodes = pipeline.run(documents=d[0:5])
nodes[1].metadata

Which ends up printing:
Plain Text
{'excerpt_keywords': "I'm sorry, but your request doesn't match the context provided. Could you please provide more information or clarify your question?"}

or
Plain Text
{'excerpt_keywords': "I'm sorry, but I don't have enough information to answer your question."}


I have tried changing the prompt in the KeywordExtractor definition, but for some documents it works and others it doesn't.
I know for sure the documents that give the above answer have content, and have no meaningful difference from the ones that return actual keywords.

Not familiar enough with the code to know where the prompt might be going wrong here.
Any ideas / suggestions?

Will be glad to submit a PR if I can get some direction on where this might be fixed. I also notices a TODO for the KeywordExtractor prompt that I'll look into as well.
14 comments
L
L
Hey everyone,
Hoping to find some help with the following:

Using Azure OpenAI API so maintaining service context with these variables has proved tricky.
Plain Text
def setup_service_context(embed_model = "hgf"):
    api_key = os.environ['openai_api_key']
    azure_endpoint = "xxx"
    api_version = "xxx"

    # Declaring LLM to use - GPT-4-32K by default
    llm = AzureOpenAI(
        engine=settings.CHAT_MODEL,
        api_key=api_key,
        azure_endpoint=azure_endpoint,
        api_version=api_version,
        temperature=0.2,
        system_prompt='''prompt here''',
    )

    # Using HuggingFace Embeddings
    embed_model = HuggingFaceEmbedding(model_name="thenlper/gte-small")
   

    # Setting up node parser for creating documents line by line
    node_parser = TokenTextSplitter.from_defaults(chunk_size=650, separator = " ", backup_separators = ["\n", "\n\n"], chunk_overlap=0)

    service_context = ServiceContext.from_defaults(
        llm=llm,
        embed_model=embed_model,
        node_parser=node_parser,
    )

    settings.SERVICE_CONTEXT = service_context 


That shows how I set up my service context. I then set this to a settings variable that I import in other files.

In a different py file, I do the following:
Plain Text
def setup_retreivers():
    vector_retriever = settings.INDEX.as_retriever(similarity_top_k = st.session_state.sim_top_k, service_context=settings.SERVICE_CONTEXT)

    bm25_retriever = BM25Retriever.from_defaults(
        docstore=settings.INDEX.docstore, similarity_top_k = st.session_state.sim_top_k,
    )
    
    retriever = QueryFusionRetriever(
        [vector_retriever, bm25_retriever],
        similarity_top_k = st.session_state.sim_top_k,
        num_queries=3,  # set this to 1 to disable query generation
        mode="reciprocal_rerank", # can be changed to different rerank modes
        use_async=True,
        verbose=True,
        llm=settings.SERVICE_CONTEXT.llm, # Have to set llm here for generated queries - AzureOpenAI llm needed
        query_gen_prompt=settings.QUERY_GEN_PROMPT,  # we could override the query generation prompt here
    )

    settings.FUSION_RETRIEVER = retriever

def setup_query_engine():
    response_synthesizer = get_response_synthesizer(
        service_context=settings.SERVICE_CONTEXT,
        response_mode=settings.RESPONSE_MODE,
    )

    # TODO: Add a node postprocessor
    settings.QUERY_ENGINE = RetrieverQueryEngine.from_args(retriever=settings.FUSION_RETRIEVER, response_synthesizer=response_synthesizer, service_context=settings.SERVICE_CONTEXT)


I get the following error specifically in the query engine:
Plain Text
openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ********************. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

Which to me seems like it is going to the regular OpenAI API instead of Azure. But I have tried everything I can think of to anchor the Azure API into the query engine. (it looks like it is the lazy embedding that is doing it.)
I have also checked that the service context that is passed to the query engine has the correct HF embedding model.

Would GREATLY appreciate any advice / help. Been staring at this for hours.
1 comment
L