--fast now takes a number as argument to indicate how fast you want it.

The idea is that you can indicate how much quality vs speed you want. At the moment: --fast 2 enables fp16 accumulation if your pytorch supports it. --fast 5 enables fp8 matrix mult on fp8 models and the optimization above. --fast without a number enables all optimizations.
2025-09-12 12:37:01 +00:00 · 2025-02-28 02:48:20 -05:00
parent eb4543474b
commit cf0b549d48
3 changed files with 4 additions and 3 deletions
--- a/comfy/model_management.py
+++ b/comfy/model_management.py
@@ -280,9 +280,10 @@ if ENABLE_PYTORCH_ATTENTION:

 PRIORITIZE_FP16 = False  # TODO: remove and replace with something that shows exactly which dtype is faster than the other
 try:
-    if is_nvidia() and args.fast:
+    if is_nvidia() and args.fast >= 2:
        torch.backends.cuda.matmul.allow_fp16_accumulation = True
        PRIORITIZE_FP16 = True  # TODO: limit to cards where it actually boosts performance
+        logging.info("Enabled fp16 accumulation.")
 except:
    pass