Find answers from the community

Updated 5 months ago

I do

At a glance
I do.
service_context = ServiceContext.from_defaults( llm_predictor=llm_predictor ) gpt_pinecone_index = GPTVectorStoreIndex.from_documents( documents, pinecone_index=pinecone_index, service_context=service_context )

but I still get:
response_stream = query_engine.query("...") print(type(response_stream)) <class 'llama_index.response.schema.Response'>
L
a
21 comments
Hmm, not sure what's up. Do you have the latest version installed? I just tried myself (minus pinecone) and it works fine. Using pinecone shouldn't matter though

Plain Text
>>> from llama_index import ServiceContext, LLMPredictor
>>> from langchain.chat_models import ChatOpenAI
>>> llm_predictor = LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0, streaming=True))
>>> service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor)
>>> index = GPTVectorStoreIndex.from_documents(docs, service_context=service_context)
>>> query_engine = index.as_query_engine(streaming=True)
>>> response = query_engine.query("what did the author do growing up?")
>>> type(response)
<class 'llama_index.response.schema.StreamingResponse'>


Do you spot any major differences between my attempt and yours?
langchain==0.0.167
llama-index==0.6.6
No difference except the Pinecone part:


pinecone.init(api_key="...", environment="...") index_name = '...' def construct_pinecone_index(directory_path): if index_name not in pinecone.list_indexes(): pinecone.create_index( index_name, dimension=1536, metric="euclidean", pod_type="Starter" ) pinecone_index = pinecone.Index(index_name) documents = SimpleDirectoryReader(directory_path).load_data() gpt_pinecone_index = GPTVectorStoreIndex.from_documents( documents, pinecone_index=pinecone_index, service_context=service_context ) absolute_path = os.path.dirname(__file__) src_folder = os.path.join(absolute_path, "docs/") dest_folder = os.path.join(absolute_path, "indexed_documents/") files = os.listdir(src_folder) for file in files: if file != "do_not_delete.txt": src_path = os.path.join(src_folder, file) dest_path = os.path.join(dest_folder, file) shutil.move(src_path, dest_path) return gpt_pinecone_index index = construct_pinecone_index("docs")
As a sanity check, if you remove the pinecone stuff, does it work?

Maybe if you strip it back to a super simple test script like I did? πŸ€” then once it works without pinecone, add pinecone back in
Just trying to narrow down the issue
Yep, now without the Pinecone part just with a simple document it works
And then as soon as you add pinecone to that same test, it stops working? 🫠
Yeah I get:
...
[retrieve] Total embedding token usage: 11 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 0 tokens
[get_response] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
[get_response] Total embedding token usage: 0 tokens

<class 'llama_index.response.schema.Response'>
---
Not sure what's wrong with the Pinecone part:

gpt_pinecone_index = GPTVectorStoreIndex.from_documents(
documents, pinecone_index=pinecone_index, service_context=service_context
)
ok, time to look at the source code and figure out if this is a bug lol

one last thing you could try is query_engine = index.as_query_engine(..., service_context=service_context) as well
Thanks for taking the time to debug this though
index.as_query_engine(..., service_context=service_context)

it didn't help πŸ˜…
I will check further tomorrow!
Thanks for the help πŸ™β€οΈ
Yea for sure! I'll let you know if I find anything :dotsCATJAM:
huh, it works for me with pinecone LOL
here's the full test script
Plain Text
from llama_index import (
    GPTVectorStoreIndex,
    GPTSimpleKeywordTableIndex,
    SimpleDirectoryReader,
    LLMPredictor,
    ServiceContext,
    StorageContext
)
from llama_index.vector_stores import PineconeVectorStore
from langchain.llms.openai import OpenAIChat

api_key = "<api-key>"
environment = "asia-southeast1-gcp-free"
index_name = "quickstart"

os.environ['PINECONE_API_KEY'] = api_key

llm_predictor_chatgpt = LLMPredictor(
    llm=OpenAIChat(temperature=0, model_name="gpt-3.5-turbo", streaming=True)
)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt)

vector_store = PineconeVectorStore(
        index_name=index_name,
        environment=environment,
)

storage_context = StorageContext.from_defaults(vector_store=vector_store)

documents = SimpleDirectoryReader("./paul_graham").load_data()

index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context, service_context=service_context)

query_engine = index.as_query_engine(streaming=True)

response = query_engine.query("What did the author do growing up?")

print(type(response))

response.print_response_stream()
I also tried using the pinecone_index object too, and that also worked fine

Plain Text
...
pinecone.init(api_key=api_key, environment=environment)

pinecone_index = pinecone.Index(index_name)

vector_store = PineconeVectorStore(
        pinecone_index=pinecone_index
)
...
The problem was due to the fact that I had built my Pinecone index with an older version of Llama-index. I just built a new index, and streaming is working!

Now I need to test it inside Flask! πŸ˜ƒ
@Logan M
What specific setup would you recommend for running the Python backend with React in streaming mode?
I'm afraid this won't work with Heroku and looking for alternatives.
I think streaming should work with either flask or fastAPI + react.

In flask, you just need to use stream_with_context with the generator from llama index (i.e. response.response_gen)

Then in react, you'd need to open a ReadableStream with the response, and update some global variable with the current text from the response
https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams#reading_the_stream
Add a reply
Sign up and join the conversation on Discord