LlamaIndex

Log inLog into community

Find answers from the community

Updated 4 months ago

I am not python 3 9

I am not python 3 9

At a glance

The community members are discussing an issue with Python 3.9, where certain imports are causing connection timeouts. They have tried creating a virtual environment and using a different Python version, but the issue persists. The community members have identified that the issue is likely related to the tiktoken or NLTK libraries making calls to external URLs that cannot be accessed in the controlled environment. They are exploring ways to either skip or allowlist these calls, such as by downloading the required files and placing them in the appropriate cache locations. One community member has found the NLTK call in the llama_index library and is seeking help in identifying the remaining NLTK call that is causing the issue.

Useful resources

·

I am not, python 3.9

1

W

h

b

21 comments

You could try creating a python env using venv and cheeck if you still get the same issue.

just did, same issue

python version Python 3.9.18

going to try a different python version

no idea why and this shouldn't happen though.

Ok so i was able to find out why it hangs

File ~/neo4nan/myenv/lib/python3.9/site-packages/requests/adapters.py:501, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
486 resp = conn.urlopen(
487 method=request.method,
488 url=url,
(...)
497 chunked=chunked,
498 )
500 except (ProtocolError, OSError) as err:
--> 501 raise ConnectionError(err, request=request)
503 except MaxRetryError as e:
504 if isinstance(e.reason, ConnectTimeoutError):
505 # TODO: Remove this in 3.0.0: see #2811

ConnectionError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))

the import makes connections to pull certain files, I am in an environment where I cannot make them and won't be using them either, I am using my own LLM

Is there a way to not make these calls?

I am running this in a controlled environment, if I know all the endpoints the import is making, I can allowlist them as well, so either I can skip the calls, or if I know exactly which calls it'll be helpful too

What is the url it’s calling?

this is likely either the tiktoken URL call or the NLTK URL call -- I think both of these libraries make calls to fetch small files on first run

def gpt2():
---> 11 mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
12 vocab_bpe_file="https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/vocab.bpe",
13 encoder_json_file="https://openaipublic.blob.core.windows.net/gpt-2/encodings/main/encoder.json",
14 )
15 return {
16 "name": "gpt2",
17 "explicit_n_vocab": 50257,
(...)
20 "special_tokens": {ENDOFTEXT: 50256},
21 }

note that it may call other urls after this, as it hangs on the first call

the other option would be downloading the files and scp them, assuming the urls will not be called if the file is detected

yea, the files just need to be in the cache location for tiktoken (not sure where that is though, would have to dig into tiktoken code)

i am manually doing each one

right now

apparently i have to hex the urls being called as well

So I was able to download and fix the tiktoken, I spent some time figuring out where the NLTK call is but couldn't find it, it seems to be after loading the BAAI BEG embedding. May anyone help?

Nltk is here I think
https://github.com/jerryjliu/llama_index/blob/57b9c427c2e441d878842831502f1eec546ab666/llama_index/text_splitter/utils.py#L38

Add a reply

Sign up and join the conversation on Discord