i have a index file which is 120M i just

At a glance

i have a index file which is 120M. i just tried to convert an 1100 pages income tax act into that index file.

15 comments

Well, that's actually a little big, especially if you are using a vector index.

For a simple vector index, it stores the entire embedding in memory.

With that many documents, you might want to look at integrating a vector store like pinecone, qdrant, etc,

Otherwise, I would try bumping up the RAM of your oracle machine

aautratec

thanks for the feedback. should i change to other index method, like listindex to save the space ? will index to index also help resolve the issue , or makes situation more cmplex ?

LLogan M

A list index will make the queries much more expensive (because then it checks every single node). Also much slower then (but less memory usage I suppose)

wrapping indexes with another index won't help much either (and it will make queries slower, and more complex yes)

aautratec

thanks for the feedback. i just tired to remove some program from memory to release some space. now the program is running, with Key Error: ' index_struct". is it still related to the memory, or i might need to regenerate the index file again ?

LLogan M

You shouldn't have to regenerate (unless the index you are loading was created before v0.5.0?)

What's the full error?

(You can also regenerate if you don't care about spending the embedding tokens)

aautratec

here is the full error: Traceback (most recent call last):
File "consultingGPTv1.py", line 53, in <module>
bot.polling()
File "/usr/local/lib/python3.8/dist-packages/telebot/init.py", line 1043, in polling
self.threaded_polling(non_stop=non_stop, interval=interval, timeout=timeout, long_polling_timeout=long_polling_timeout, File "/usr/local/lib/python3.8/dist-packages/telebot/init.py", line 1118, in threaded_polling
raise e
File "/usr/local/lib/python3.8/dist-packages/telebot/init.py", line 1074, in threaded_polling self.worker_pool.raise_exceptions() File "/usr/local/lib/python3.8/dist-packages/telebot/util.py", line 148, in raise_exceptions raise self.exception_info File "/usr/local/lib/python3.8/dist-packages/telebot/util.py", line 91, in run task(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/telebot/init__.py", line 6428, in _run_middlewares_and_handler
result = handler'function'
File "consultingGPTv1.py", line 46, in echo_message
index = GPTSimpleVectorIndex.load_from_disk('index.json')
File "/usr/local/lib/python3.8/dist-packages/gpt_index/indices/base.py", line 353, in load_from_disk
return cls.load_from_string(file_contents, kwargs) File "/usr/local/lib/python3.8/dist-packages/gpt_index/indices/base.py", line 329, in load_from_string return cls.load_from_dict(result_dict, kwargs)
File "/usr/local/lib/python3.8/dist-packages/gpt_index/indices/vector_store/base.py", line 237, in load_from_dict
return super().load_from_dict(result_dict, config_dict, kwargs)
File "/usr/local/lib/python3.8/dist-packages/gpt_index/indices/base.py", line 303, in load_from_dict
index_struct = load_index_struct_from_dict(result_dict[INDEX_STRUCT_KEY])
KeyError: 'index_struct'

LLogan M

This feels like you have llama index v0.5.0 but created that index json with an older version?

LLogan M

If thats the case, there is a tool to migrate old indexes
https://twitter.com/gpt_index/status/1640792033433620480?cxt=HHwWgMDQ8afqocUtAAAA

For --index_struct_type, use "simple_dict" for a vector index

aautratec

thanks. let me try to build the new index again.

aautratec

hi, i just running the program to rebuild the index with instruction: construct_index("data"). and get this new error: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-8-5af116ce8b4e> in <module>
----> 1 construct_index("data")

2 frames
/usr/local/lib/python3.9/dist-packages/gpt_index/indices/vector_store/base.py in init(self, nodes, index_struct, service_context, text_qa_template, vector_store, use_async, **kwargs)
55 self.text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT
56 self._use_async = use_async
---> 57 super().init(
58 nodes=nodes,
59 index_struct=index_struct,

TypeError: init() got an unexpected keyword argument 'llm_predictor'

aautratec

pls help and looks my working colab notebook is have an issue now.

LLogan M

Yea, v0.5.0 changed some things.

There's a new service context to contain the m predictor, prompt helper, and more

See this example:
https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html#customizing-llm-s

LLogan M

I recommend checking out the docs again, a few things changed in v0.5.0

LLogan M

I need to go to bed lol but I trust the docs to help you out, otherwise I'll check back tomorrow

aautratec

yes. thank for the support. have a good sleep. the issue resolved. i need to upgrade my code.

Add a reply

Find answers from the community

i have a index file which is 120M i just