yea you can!
llm_predictor = LLMPredictor(llm=ChatOpenAI(model_name="gpt-3.5-turbo-0613", temperature=0))
do you know by chance the difference for 0613?
It added the new "function calling api" stuff, but it's optional to use that
might be useful at some point for your JSON stuff though, works very nicely with pydantic
i didn't know they versioned lik ethat
yeah, I haven't gotten around to implementing that
@Logan M I am really struggling. I have a blog summary aspect, here is the prompt
prompt = f"""You are a copywriter. Write a long-form marketing blog post of the podcast. Include HTML tags such as headers, and paragraphs. Include two or three headers, no conclusion. Optimize it for SEO. Make it exciting, and, captivating. {self.extra_prompt} \n """
prompt += """Make sure it is between 600 to 800 words long. No longer than 1300 tokens \n ONLY return a valid JSON object (no other text is necessary), JSON object: {{"blog": ["long-form marketing blog post"]}}\n return only valid JSON:"""
pretty consistently it goes over my params:
max_input_size = 4000
num_output = 1400
max_chunk_overlap = 20
prompt_helper = PromptHelper(
max_input_size, num_output, max_chunk_overlap)
llm_predictor = LLMPredictor(llm=ChatOpenAI(
temperature=0.1, model_name="gpt-3.5-turbo-0613", max_tokens=1400))
service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor, prompt_helper=prompt_helper
)
any idea?
will switch to pydantic soon
what do you mean it goes over your params? Like it gets cut off?
rip. Maybe specifying number of sentences too would help?
Seems like prompt engineering a length is tough
I see there is a "gpt-3.5-turbo-16k-0613" with a 16K context length, which might be helpful π
not sure what access is like for that
hahaha I can increase the max_tokens but t hat gets expensive
@Logan M another question, so sorry for bothering you. How would I create a chat gpt for each individual podcast episode I upload.
Like, I want it to remember the context and be able to keep adding to it.
can you describe a bit more what you mean? Like a chatbot that only knows about a single podcast?
Could create an index per podcast then right? And use that to answer questions in a chat setting
yeah I create an index per podcast right now to generate all the summaries
I guess I'm asking how to make it hold memory for each summary I generate
i.e I generate a blog post, and, want to prime that blog post a bit more but incrementally.
"Write a blog post"
"Add this twitter handle @x"
"Now make it in the voice of a pirate"
I thiiiink that index.as_chat_engine()
should handle this use-case out of the box π€
the chat engines are still a little new, so some extra hacking around the memory buffer size might be needed for extended conversations, but should work!
even just added streaming support too today (its in the source, but not pypi yet)
would it get confused if I use the same index to generate a summary, titles, blogs,
that's pretty awesome though!
I would probably create an additional index on top of the generated stuff, and maybe make it a vector or keyword index so that the user queries are more directed (rather than reading the entire index again)
Might take some playing around, but happy to help you work through any issues as you try it out
so generate a blog and than put that bog in a vector index
right. But that vector index could hold the blogs for each episode in the podcast
see I guess that's what I'm confused about in general, is, why does it need to read the entire index again to generate new summaries. If we put all of the context of every document into the list index and then ask it "Generate a summary" -- "Generate a blog" it should use all of the same context that it took when creating the index haha
like if I do as_chat_engine maybe it'd work that way?
Right, so let me know if I understand this correctly
- your initial pipeline generates summaries for podcast episodes
- on top of those summaries, you want to write a blog post, but interactively/in a chat setting?
Or is this kind of a new pipeline that would start from the raw podcast transcripts?
My initial pipeline creates an index with a transcription of a podcast (sometimes very long, up to 16 chunks)
I then use tree_summarize to generate summaries, titles, blogs, linkedin threads, etc.
this would be a new pipeline
or the same pipeline, if it had all of the context saved in the memory of the transcription
I guess my question is: does every index.query repass in each node of the index or is there a way to share all of that "tree_summarize" it does of the transcription on each prompt I ask (can this be solved by chat_engine)
Yea there's no (easy) way to avoid repassing that input, unless you save an initial set of summaries and use that going forward.
At the end of the day, at some point the LLM has to read all chunks to generate a summary. Just a matter of what you do with that summary afterwards to avoid re-reading the entire podcast π€
To me, I'm imaging a database of pre-generated summaries that a user would then query on top of in a chat setting. That makes the most sense to me π€ this would especially work well in an agent setting (i.e an agent with access to many indexes for different podcasts, we also have some recent agent stuff that this would fit nicely into, or you could use langchain with llama index tools)
Okay, this is some good food for thought. Thank you.