I'm not specifying the prompthelper and the llm_predictor is where I define the model_name as gpt-3.5-turbo. Also for the csv I'm leveraging SimpleDirectoryReader without any hangups.
Yeah. Actually when indexing, under the hood the indexing process leverages text-embedding-ada-002-v2. I made the mistake of indexing with Davinci, and it took 4 hours.
Yes, it appears that during indexing, if you define a model within llm_predictor: llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.0, model_name="gpt-3.5-turbo"))
The llama code still forces a more rudimentary model: OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002
As I'm using simple vector store, I don't think a more sophisticated keyword extraction technique is necessary.
When using playground, I noticed that ADA was not utilized for the indexing.