Hello,
I have noticed a discrepancy between the dtype of the features (both for images and texts) depending on the availability of a GPU.
- if I run the code with CPU only on Colab,
image_features.dtype returns torch.float32.
This happens if do not install the package properly, and then do not pay attention that device is set to cpu.
%pip install git+https://github.com/openai/CLIP.git
- if I run the code with GPU on Colab,
image_features.dtype returns torch.float16.
This happens if I follow the installation process properly and install proper versions of PyTorch (1.7.1+cu101) for Colab:
torch==1.7.1+cu101
torchvision==0.8.2+cu101
Q1: Is there a good reason why both versions do not return the same dtype? Is it due to AMP with GPU?
Moreover, if I wanted to store normalized features, float16 would allow me to cut in half the file size, so I would like to ensure that casting the float32 results (obtained with CPU only) to float16 would not actually lead to a loss of precision.
Q2: Would casting the results to float16 be totally safe? Or would it be safer to cast to float32 instead?
Finally, the discrepancy can be slightly confusing for people who would pre-compute features on a machine with GPU, and then use the pre-computed features along with features computed on the fly in a web app with CPU only. This is how I noticed the discrepancy when running this line:
logits = 100. * image_features @ zeroshot_weights
where image_features were computed on the fly (float32) and zeroshot_weights had been pre-computed (float16).
Hello,
I have noticed a discrepancy between the
dtypeof the features (both for images and texts) depending on the availability of a GPU.image_features.dtypereturnstorch.float32.This happens if do not install the package properly, and then do not pay attention that
deviceis set tocpu.image_features.dtypereturnstorch.float16.This happens if I follow the installation process properly and install proper versions of PyTorch (1.7.1+cu101) for Colab:
Q1: Is there a good reason why both versions do not return the same
dtype? Is it due to AMP with GPU?Moreover, if I wanted to store normalized features,
float16would allow me to cut in half the file size, so I would like to ensure that casting thefloat32results (obtained with CPU only) tofloat16would not actually lead to a loss of precision.Q2: Would casting the results to
float16be totally safe? Or would it be safer to cast tofloat32instead?Finally, the discrepancy can be slightly confusing for people who would pre-compute features on a machine with GPU, and then use the pre-computed features along with features computed on the fly in a web app with CPU only. This is how I noticed the discrepancy when running this line:
where
image_featureswere computed on the fly (float32) andzeroshot_weightshad been pre-computed (float16).