I do

At a glance

I do.

service_context = ServiceContext.from_defaults(
    llm_predictor=llm_predictor
)
gpt_pinecone_index = GPTVectorStoreIndex.from_documents(
        documents, pinecone_index=pinecone_index, service_context=service_context
    )

but I still get:

response_stream = query_engine.query("...")
print(type(response_stream))

<class 'llama_index.response.schema.Response'>

21 comments

LLogan M

Hmm, not sure what's up. Do you have the latest version installed? I just tried myself (minus pinecone) and it works fine. Using pinecone shouldn't matter though

Plain Text

>>> from llama_index import ServiceContext, LLMPredictor
>>> from langchain.chat_models import ChatOpenAI
>>> llm_predictor = LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, streaming=True))
>>> service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
>>> index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)
>>> query_engine = index.as_query_engine(streaming=True)
>>> response = query_engine.query("what did the author do growing up?")
>>> type(response)
<class 'llama_index.response.schema.StreamingResponse'>

Do you spot any major differences between my attempt and yours?

aadamfard

langchain==0.0.167
llama-index==0.6.6

aadamfard

No difference except the Pinecone part:

pinecone.init(api_key="...", environment="...")
index_name = '...'

def construct_pinecone_index(directory_path):
    if index_name not in pinecone.list_indexes():
        pinecone.create_index(
            index_name, 
            dimension=1536, 
            metric="euclidean", 
            pod_type="Starter"
        )

    pinecone_index = pinecone.Index(index_name)

    documents = SimpleDirectoryReader(directory_path).load_data()
 
    gpt_pinecone_index = GPTVectorStoreIndex.from_documents(
        documents, pinecone_index=pinecone_index, service_context=service_context
    )

    absolute_path = os.path.dirname(__file__)
    src_folder = os.path.join(absolute_path, "docs/")
    dest_folder = os.path.join(absolute_path, "indexed_documents/")

    files = os.listdir(src_folder)

    for file in files:
        if file != "do_not_delete.txt":
            src_path = os.path.join(src_folder, file)
            dest_path = os.path.join(dest_folder, file)
            shutil.move(src_path, dest_path)

    return gpt_pinecone_index

index = construct_pinecone_index("docs")

LLogan M

As a sanity check, if you remove the pinecone stuff, does it work?

Maybe if you strip it back to a super simple test script like I did? 🤔 then once it works without pinecone, add pinecone back in

LLogan M

Just trying to narrow down the issue

aadamfard

Yep, now without the Pinecone part just with a simple document it works

LLogan M

And then as soon as you add pinecone to that same test, it stops working? 🫠

aadamfard

Yeah I get:
...

[retrieve] Total embedding token usage: 11 tokens

INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens

[get_response] Total LLM token usage: 0 tokens

INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens

[get_response] Total embedding token usage: 0 tokens

<class 'llama_index.response.schema.Response'>
---
Not sure what's wrong with the Pinecone part:

gpt_pinecone_index = GPTVectorStoreIndex.from_documents(
documents, pinecone_index=pinecone_index, service_context=service_context
)

LLogan M

ok, time to look at the source code and figure out if this is a bug lol

one last thing you could try is query_engine = index.as_query_engine(..., service_context=service_context) as well

LLogan M

Thanks for taking the time to debug this though

aadamfard

index.as_query_engine(..., service_context=service_context)

it didn't help 😅

LLogan M

sheesh haha

aadamfard

I will check further tomorrow!
Thanks for the help 🙏❤️

LLogan M

Yea for sure! I'll let you know if I find anything :dotsCATJAM:

LLogan M

huh, it works for me with pinecone LOL

LLogan M

here's the full test script

LLogan M

Plain Text

from llama_index import (
    GPTVectorStoreIndex,
    GPTSimpleKeywordTableIndex,
    SimpleDirectoryReader,
    LLMPredictor,
    ServiceContext,
    StorageContext
)
from llama_index.vector_stores import PineconeVectorStore
from langchain.llms.openai import OpenAIChat

api_key = "<api-key>"
environment = "asia-southeast1-gcp-free"
index_name = "quickstart"

os.environ['PINECONE_API_KEY'] = api_key

llm_predictor_chatgpt = LLMPredictor(
    llm=OpenAIChat(temperature=0, model_name="gpt-3.5-turbo", streaming=True)
)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt)

vector_store = PineconeVectorStore(
        index_name=index_name,
        environment=environment,
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader("./paul_graham").load_data()

index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)

query_engine = index.as_query_engine(streaming=True)

response = query_engine.query("What did the author do growing up?")

print(type(response))

response.print_response_stream()

LLogan M

I also tried using the pinecone_index object too, and that also worked fine

Plain Text

...
pinecone.init(api_key=api_key, environment=environment)

pinecone_index = pinecone.Index(index_name)

vector_store = PineconeVectorStore(
        pinecone_index=pinecone_index
)
...

aadamfard

The problem was due to the fact that I had built my Pinecone index with an older version of Llama-index. I just built a new index, and streaming is working!

Now I need to test it inside Flask! 😃

aadamfard

@Logan M
What specific setup would you recommend for running the Python backend with React in streaming mode?
I'm afraid this won't work with Heroku and looking for alternatives.

LLogan M

I think streaming should work with either flask or fastAPI + react.

In flask, you just need to use stream_with_context with the generator from llama index (i.e. response.response_gen)

Then in react, you'd need to open a ReadableStream with the response, and update some global variable with the current text from the response
https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams#reading_the_stream

Add a reply

Find answers from the community

I do