--fast now takes a number as argument to indicate how fast you want it.

The idea is that you can indicate how much quality vs speed you want.

At the moment:

--fast 2 enables fp16 accumulation if your pytorch supports it.
--fast 5 enables fp8 matrix mult on fp8 models and the optimization above.

--fast without a number enables all optimizations.
This commit is contained in:
comfyanonymous
2025-02-28 02:48:20 -05:00
parent eb4543474b
commit cf0b549d48
3 changed files with 4 additions and 3 deletions

View File

@@ -280,9 +280,10 @@ if ENABLE_PYTORCH_ATTENTION:
PRIORITIZE_FP16 = False # TODO: remove and replace with something that shows exactly which dtype is faster than the other
try:
if is_nvidia() and args.fast:
if is_nvidia() and args.fast >= 2:
torch.backends.cuda.matmul.allow_fp16_accumulation = True
PRIORITIZE_FP16 = True # TODO: limit to cards where it actually boosts performance
logging.info("Enabled fp16 accumulation.")
except:
pass