1404 Commits

Author SHA1 Message Date
Jedrzej Kosinski
048f4f0b3a Merge branch 'master' into worksplit-multigpu 2025-02-17 19:35:58 -06:00
comfyanonymous
31e54b7052 Improve AMD arch detection. 2025-02-17 04:53:40 -05:00
comfyanonymous
8c0bae50c3 bf16 manual cast works on old AMD. 2025-02-17 04:42:40 -05:00
comfyanonymous
530412cb9d Refactor torch version checks to be more future proof. 2025-02-17 04:36:45 -05:00
comfyanonymous
e2919d38b4 Disable bf16 on AMD GPUs that don't support it. 2025-02-16 05:46:10 -05:00
comfyanonymous
1cd6cd6080 Disable pytorch attention in VAE for AMD. 2025-02-14 05:42:14 -05:00
comfyanonymous
d7b4bf21a2 Auto enable mem efficient attention on gfx1100 on pytorch nightly 2.7
I'm not not sure which arches are supported yet. If you see improvements in
memory usage while using --use-pytorch-cross-attention on your AMD GPU let
me know and I will add it to the list.
2025-02-14 04:18:14 -05:00
comfyanonymous
019c7029ea Add a way to set a different compute dtype for the model at runtime.
Currently only works for diffusion models.
2025-02-13 20:34:03 -05:00
comfyanonymous
8773ccf74d Better memory estimation for ROCm that support mem efficient attention.
There is no way to check if the card actually supports it so it assumes
that it does if you use --use-pytorch-cross-attention with yours.
2025-02-13 08:32:36 -05:00
comfyanonymous
1d5d6586f3 Fix ruff. 2025-02-12 06:49:16 -05:00
zhoufan2956
35740259de
mix_ascend_bf16_infer_err (#6794) 2025-02-12 06:48:11 -05:00
comfyanonymous
ab888e1e0b Add add_weight_wrapper function to model patcher.
Functions can now easily be added to wrap/modify model weights.
2025-02-12 05:55:35 -05:00
Jedrzej Kosinski
d2504fb701 Merge branch 'master' into worksplit-multigpu 2025-02-11 22:34:51 -06:00
comfyanonymous
d9f0fcdb0c Cleanup. 2025-02-11 17:17:03 -05:00
HishamC
b124256817
Fix for running via DirectML (#6542)
* Fix for running via DirectML

Fix DirectML empty image generation issue with Flux1. add CPU fallback for unsupported path. Verified the model works on AMD GPUs

* fix formating

* update casual mask calculation
2025-02-11 17:11:32 -05:00
comfyanonymous
af4b7c91be Make --force-fp16 actually force the diffusion model to be fp16. 2025-02-11 08:33:09 -05:00
comfyanonymous
4027466c80 Make lumina model work with any latent resolution. 2025-02-10 00:24:20 -05:00
comfyanonymous
095d867147 Remove useless function. 2025-02-09 07:02:57 -05:00
Pam
caeb27c3a5
res_multistep: Fix cfgpp and add ancestral samplers (#6731) 2025-02-08 19:39:58 -05:00
comfyanonymous
3d06e1c555 Make error more clear to user. 2025-02-08 18:57:24 -05:00
catboxanon
43a74c0de1
Allow FP16 accumulation with --fast (#6453)
Currently only applies to PyTorch nightly releases. (>=20250208)
2025-02-08 17:00:56 -05:00
Jedrzej Kosinski
b03763bca6 Merge branch 'multigpu_support' into worksplit-multigpu 2025-02-07 13:27:49 -06:00
comfyanonymous
079eccc92a Don't compress http response by default.
Remove argument to disable it.

Add new --enable-compress-response-body argument to enable it.
2025-02-07 03:29:21 -05:00
Jedrzej Kosinski
476aa79b64 Let --cuda-device take in a string to allow multiple devices (or device order) to be chosen, print available devices on startup, potentially support MultiGPU Intel and Ascend setups 2025-02-06 08:44:07 -06:00
Jedrzej Kosinski
441cfd1a7a Merge branch 'master' into multigpu_support 2025-02-06 08:10:48 -06:00
comfyanonymous
14880e6dba Remove some useless code. 2025-02-06 05:00:37 -05:00
comfyanonymous
37cd448529 Set the shift for Lumina back to 6. 2025-02-05 14:49:52 -05:00
comfyanonymous
94f21f9301 Upcasting rope to fp32 seems to make no difference in this model. 2025-02-05 04:32:47 -05:00
comfyanonymous
60653004e5 Use regular numbers for rope in lumina model. 2025-02-05 04:17:25 -05:00
comfyanonymous
a57d635c5f Fix lumina 2 batches. 2025-02-04 21:48:11 -05:00
comfyanonymous
8ac2dddeed Lower the default shift of lumina to reduce artifacts. 2025-02-04 06:50:37 -05:00
comfyanonymous
3e880ac709 Fix on python 3.9 2025-02-04 04:20:56 -05:00
comfyanonymous
e5ea112a90 Support Lumina 2 model. 2025-02-04 04:16:30 -05:00
comfyanonymous
44e19a28d3 Use maximum negative value instead of -inf for masks in text encoders.
This is probably more correct.
2025-02-02 09:46:00 -05:00
Dr.Lt.Data
0a0df5f136
better guide message for sageattention (#6634) 2025-02-02 09:26:47 -05:00
KarryCharon
24d6871e47
add disable-compres-response-body cli args; add compress middleware; (#6672) 2025-02-02 09:24:55 -05:00
Jedrzej Kosinski
99a5c1068a Merge branch 'master' into multigpu_support 2025-02-02 03:19:18 -06:00
comfyanonymous
9e1d301129 Only use stable cascade lora format with cascade model. 2025-02-01 06:35:22 -05:00
comfyanonymous
8d8dc9a262 Allow batch of different sigmas when noise scaling. 2025-01-30 06:49:52 -05:00
Jedrzej Kosinski
02747cde7d Carry over change from _calc_cond_batch into _calc_cond_batch_multigpu 2025-01-29 11:10:23 -06:00
filtered
222f48c0f2
Allow changing folder_paths.base_path via command line argument. (#6600)
* Reimpl. CLI arg directly inside folder_paths.

* Update tests to use CLI arg mocking.

* Revert last-minute refactor.

* Fix test state polution.
2025-01-29 08:06:28 -05:00
comfyanonymous
13fd4d6e45 More friendly error messages for corrupted safetensors files. 2025-01-28 09:41:09 -05:00
Jedrzej Kosinski
0b3233b4e2 Merge remote-tracking branch 'origin/master' into multigpu_support 2025-01-28 06:11:07 -06:00
Jedrzej Kosinski
eda866bf51 Extracted multigpu core code into multigpu.py, added load_balance_devices to get subdivision of work based on available devices and splittable work item count, added MultiGPU Options nodes to set relative_speed of specific devices; does not change behavior yet 2025-01-27 06:25:48 -06:00
comfyanonymous
255edf2246 Lower minimum ratio of loaded weights on Nvidia. 2025-01-27 05:26:51 -05:00
Jedrzej Kosinski
c7feef9060 Cast transformer_options for multigpu 2025-01-26 05:29:27 -06:00
comfyanonymous
67feb05299 Remove redundant code. 2025-01-25 19:04:53 -05:00
Jedrzej Kosinski
51af7fa1b4 Fix multigpu ControlBase get_models and cleanup calls to avoid multiple calls of functions on multigpu_clones versions of controlnets 2025-01-25 06:05:01 -06:00
Jedrzej Kosinski
46969c380a Initial MultiGPU support for controlnets 2025-01-24 05:39:38 -06:00
comfyanonymous
14ca5f5a10 Remove useless code. 2025-01-24 06:15:54 -05:00