I have a few warnings as well.
2023-06-09 16:42:29.102941: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-09 16:42:30.085561: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
bin /users/tluong2/.local/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so
CUDA SETUP: CUDA runtime path found: /dcsrsoft/spack/arolle/v1.0/spack/opt/spack/linux-rhel8-zen2/gcc-10.4.0/cuda-11.6.2-hjqfaeelfbajionp4uptpb6grp2uheb6/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /users/tluong2/.local/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda116.so...
The model weights are not tied. Please use the
tie_weights
method before using the
infer_auto_device
function.
Loading checkpoint shards: 100%|ββββββββββ| 242/242 [01:27<00:00, 2.76it/s]
Running on local URL: http://127.0.0.1:7860
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using
tokenizers
before the fork if possible - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)