To change the model, you'll want to edit the service context object.
Is there a specific model you wanted to use? I can write the code as an example
Probably 3.5 turbo. I think that's the largest available in terms of input token length. I've a big report from work I'm trying to index.
Think I managed to find it.
Hmm.. ya, I switched the model in the package and immediately was told that the chat model isn't supported through that initialization. .. I'm not really sure what to do. I need to be able to feed it 4000 tokens in a file to index :/ I really hoped I could just use a different model.
It's definitely supported π€
service_context = ServiceContext.from_defaults(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
Where are these lines though? I'm looking at the base.py for llama_index.
I changed this to 3.5Turbo
Are you modifying the source code? π
You'll want to change OpenAI to ChatOpenAI (imported from langchain)
Although the preferred way of doing this would be something closer to the notebook I sent, rather than fiddling with the source files
Look, I know just enough to be dangerous. lol
Will adding this to my python script be more akin to what I want to do then?
Sorry to keep bugging you. It's not recognizing my API key any longer. Is there somewhere new I should be passing it?
No worries!
Is your key set in your env?
Alternatively you can do
import os
os.environ["OPENAI_API_KEY"] = "..."
No, I don't have it set as an environment variable. Where would I just pass it as a named param? (I know, I know, not best practice, but this'll never leave my PC)
ChatOpenAI(...., openai_api_key="mykey")
I definitely tried this and it told me to eff off. Seems to work the second time. It doesn't recognize the keyword argument llm though >.>
Your doing something like this? llm is for the service context object
service_context = ServiceContext.from_defaults(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
(And of course through your key there, I'm just copy pasting stuff lol)
import json
from llama_index import GPTSimpleVectorIndex, download_loader,GPTFaissIndex,ServiceContext
from langchain.chat_models import ChatOpenAI
GoogleSheetsReader = download_loader('GoogleSheetsReader')
loader = GoogleSheetsReader()
documents = loader.load_data(["1EOeVyGdvPg0BlKc13SQEx9RclaMMoXSGuZuZ3qE4n28"])
service_context = ServiceContext.from_defaults(LLM=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", openai_api_key=""))
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)
index_data = index.save_to_disk("/home/equious/testbots/indexes/index_data.json")
tried it, same thing. This was my second attempt
TypeError: from_defaults() got an unexpected keyword argument 'llm'
from llama_index import LLMPredictor
...
service_context = ServiceContext.from_defaults(llm_predictor=LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", openai_api_key="")))
...
yet another layer to the onion lol
also I see you saving to disk. When you load from disk, you'll have to pass the service context back in to maintain the settings
Mm.. ya, I'm not sure what other errors I'm going to run into. I'm saving to disk as a means to pass the data from a child_process (Python) in my telegram bot's main script (JavaScript).
Should be fine... probably lol
stderr: WARNING:llama_index.llm_predictor.base:Unknown max input size for gpt-3.5-turbo, using defaults.
Where do I go about changing these defaults? I'm still getting 3772>1024 π€
The default is 4097 (which is correct)
Usuuuualy those warnings are safe to ignore, in my experience. Are your documents in english btw? or another language?
English, they're spreadsheets importing from Google.
Yea I'm just going to lean on the side of saying it's fine lol.
llama index splits all the input documents into chunks according to max_chunk_size (which is 3900 tokens by default). At query time, if the context +prompt +query are too big, the context might get split again.
For google sheets, I think it treats each row as a document?
I haven't tracked down where exactly that warning comes from (maybe a tokenizer somewhere?), but it hasn't caused anything too scary to happen
Looking at github issues, maybe shrinking the chunk size will remove the warnings?
Fair. It seems to be working, with some preliminary testing now. Really appreciate your help π
Nice! Happy to help! :dotsCATJAM:
How're you such an expert, I ave to ask? I consider myself as cutting edge as a man my age who doesn't do this professionally can be haha, but I can't get help for anything so new π
I just spent some time contributing to the library hahah picked it up over time.
The best way to learn a codebase is (imo)
- Contribute with a PR
- Step through code with a debugger (pdb, pycharm)
Got any advice on how long it takes openai's API to recognize my updated billing information? lol Just ran out of my free credit ffs
Lol for that one, I have no idea π
Is the GoogleSheetsReader working?