婆罗门
精华
|
战斗力 鹅
|
回帖 0
注册时间 2016-8-13
|
本帖最后由 9Suns 于 2025-2-18 10:17 编辑
KTransformer的新更新
https://github.com/kvcache-ai/kt ... ekR1_V3_tutorial.md
- [NEW!!!] Local 671B DeepSeek-Coder-V3/R1: Running its Q4_K_M version using only 14GB VRAM and 382GB DRAM.
- Prefill Speed (tokens/s):
- KTransformers: 54.21 (32 cores) → 74.362 (dual-socket, 2×32 cores) → 255.26 (optimized AMX-based MoE kernel, V0.3 only) → 286.55 (selectively using 6 experts, V0.3 only)
- Compared to 10.31 tokens/s in llama.cpp with 2×32 cores, achieving up to 27.79× speedup.
- Decode Speed (tokens/s):
- KTransformers: 8.73 (32 cores) → 11.26 (dual-socket, 2×32 cores) → 13.69 (selectively using 6 experts, V0.3 only)
- Compared to 4.51 tokens/s in llama.cpp with 2×32 cores, achieving up to 3.03× speedup.
看这个速度算是不错了,不过硬件要求还是有的, 跑的也只是Q4量化
- Model: DeepseekV3-q4km (int4)
- CPU: cpu_model_name: Intel (R) Xeon (R) Gold 6454S, 32 cores per socket, 2 sockets, 2 numa nodes
- GPU: 4090 24G VRAM
|
|