Only for index construction will you see a large embedding cost. Then, queries only need to embed the query text (usually very short)
The easiest way to lower token costs (for davinci and gpt-3.5) is to lower the chunk size in the service context
service_context = ServiceContext.from_defaults(..., chunk_size_limit=2000)
But the smaller the chunk size, the more you might need to increase the top k to find the right answer. Bit of a balancing act
@mathada @Logan M To add to this, use response.source_nodes to scientifically look at your selected nodes. If not enough, increase top k if too many, decrease top k.
Thanks @Logan M and @BioHacker .
Few observations.
- Using "chunk_size_limit" and similarity_top_k" increases response time.
- "gpt-3.5-turbo" model takes really really long time to give response (avg. response time = 30 seconds as compared to 3 seconds for "text-davinci" model)
- "gpt-3.5-turbo" results are much worse than "text-davinci" model.
Please provide details on the document and your query. My experience has been opposite while using a wide variety of queries and docs.
@BioHacker
This was the document used.
https://www.wbhealth.gov.in/uploaded_files/corona/Management_Protocol_for_COVID-19_-_WB_(1).pdfConverted this document to word file and then I used it.
Queries asked -
- what are the symptoms of covid?
- "text-davinci" model gives correct answer while "gpt-3.5-turbo" is not able to (gives a very generic answer. beats around the bush)
- who issued this management protocol guidelines?
- Both give correct answer
- what are the precautions to take in time of pregnancy?Β
- "text-davinci" model gives correct answer while "gpt-3.5-turbo" says out of context question.
The below screenshot (image.png) shows "gpt-3.5-turbo" code along with response.
Whats your experience @BioHacker ? "gpt-3.5-turbo" gives better results and that too in less time? Would like to know. Thank you.
@mathada ohhh you are running on colab? I've heard many people complain about slow openai api usage on colab. I think OpenAI is throttling traffic from their servers tbh π
locally, I think you'll have a better experience in terms of speed
The symptoms of COVID-19 include fever, cough, sore throat, malaise, myalgia, difficulty breathing, respiratory rate greater than 24/min, SpO2 less than 94% in room air, altered sensorium (drowsiness, confusion, stupor), infiltrates on chest X-ray, altered liver or renal function test, and hyperthermia or hypothermia.
Pregnancy:
The precautions to take in time of pregnancy include: avoiding contact with people who are sick, washing hands frequently, avoiding large gatherings, wearing a face mask when in public, and avoiding travel. Additionally, pregnant women should seek medical advice if they experience any symptoms of COVID-19, such as fever, cough, or difficulty breathing. If a pregnant woman tests positive for COVID-19, she should follow the management protocol for all COVID-19 patients, which includes proper hand hygiene, wearing a three-layered medical mask, and proper cleaning of her breast and nipple before each feeding. Additionally, if vaginal delivery is feasible, it should be recommended, and if urgent delivery by Caesarean Section is needed, spinal anaesthesia should be used to minimise the need for general anaesthesia.
@mathada How are these answers?
The token usage for the first answer is 931 in under 5 seconds. The token usage for the second 1200 in under 5 seconds.
@Logan M I will certainly try running it in local.
@BioHacker perfect answers. These are using "gpt-3.5-turbo" model? π±
Can you kindly share the code you useddd ??
What mistake I am doing π€π€
haha i know right you can get crazy outcome by getting the right settings.
I'll share with you in the DMs
If there are magic numbers, Iβm interested too!
well theyre not that magical if you know what you are doing. But if you are using all default values, the results wont be optimized. A lot of trial and error goes into all this so just experiment π
Definitely have- my findings: larger chunks- less accurate. Smaller sometimes lose context. 2048 works well, but splits mid paragraph and sentences sometimes as per other thread π embedding size has to be 4095 (!) if azure.
The bigger the chunk the fewer βseamsβ eg splits in the middle of a paragraph, but the less accurate.