When using .enable_model_cpu_offload() if the model is currently offloaded to cpu, generating a normal image waits for the model to load back onto the gpu before generating. When using hypertile, it does not wait for the model and I get found at least two devices, cuda:0 and cpu! on the first generation. It works fine after that until offloading occurs and then once again the first generation will fail.