Find answers from the community

Updated 2 years ago

I ve been able to ingest 7k lines of csv

At a glance
I've been able to ingest 7k+ lines of csv without issue. Do you have any malformed content that could possibly trip up the ingest process?
s
h
L
18 comments
and these are some params:
Plain Text
max_input_size = 4096
num_output = 256
max_chunk_overlap = 10
model_name = "gpt-3.5-turbo"
What kind of index?
simpleCSVReader and GPTSimpleVectorIndex
actually looks lke it could be related to including the llm_predictor or prompthelper @Logan M, i am able to ingest more data when i use those
I'm not specifying the prompthelper and the llm_predictor is where I define the model_name as gpt-3.5-turbo. Also for the csv I'm leveraging SimpleDirectoryReader without any hangups.
so csvReader is working fine for up to 8k records but at 10k it just hangs indefinitely lol
are you getting fast responses @hesselgesser? mine seem to take up to 30 secs
Slow in comparison. 10MB of text data, 2247888 tokens used, about 3 minutes with gpt-3.5-turbo
ok, i guess thats just the name of the game
Yeah. Actually when indexing, under the hood the indexing process leverages text-embedding-ada-002-v2. I made the mistake of indexing with Davinci, and it took 4 hours.
woah, even when you define it in llm_predictor?
like you have to define it somewhere else?
Yes, it appears that during indexing, if you define a model within llm_predictor:
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.0, model_name="gpt-3.5-turbo"))

The llama code still forces a more rudimentary model:
OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002

As I'm using simple vector store, I don't think a more sophisticated keyword extraction technique is necessary.

When using playground, I noticed that ADA was not utilized for the indexing.
very interesting, found the documentation on this in case anyone else is curious how to change it https://gpt-index.readthedocs.io/en/latest/how_to/embeddings.html#custom-embeddings
thanks @hesselgesser πŸ™
I was going to link this when I saw your earlier messages, but didn't get a chance πŸ˜† πŸ‘Œ
Add a reply
Sign up and join the conversation on Discord