大贤者
精华
|
战斗力 鹅
|
回帖 0
注册时间 2022-11-17
|
还能直接跑在SSD上
- I have tested it also 1.73bit (158GB):
- NVIDIA GeForce RTX 3090 + AMD Ryzen 9 5900X + 64GB ram (DDR4 3600 XMP)
- llama_perf_sampler_print: sampling time = 33,60 ms / 512 runs ( 0,07 ms per token, 15236,28 tokens per second)
- llama_perf_context_print: load time = 122508,11 ms
- llama_perf_context_print: prompt eval time = 5295,91 ms / 10 tokens ( 529,59 ms per token, 1,89 tokens per second)
- llama_perf_context_print: eval time = 355534,51 ms / 501 runs ( 709,65 ms per token, 1,41 tokens per second)
- llama_perf_context_print: total time = 360931,55 ms / 511 tokens
- It's amazing !!! running DeepSeek-R1-UD-IQ1_M, a 671B with 24GB VRAM.
- EDIT: 7 layers offloaded.
复制代码
https://old.reddit.com/r/LocalLL ... xxs_200gb_from_ssd/
|
|