Summary
Hi RTP-LLM team, thank you for the great work on this inference engine!
I noticed that the latest release supports DeepSeek V1/V2/V3/R1, but DeepSeek-V4 is not yet mentioned in the supported models list. Meanwhile, other frameworks like vLLM (v0.20+) and SGLang have already added Day-0/early support for DeepSeek-V4.
Questions
- Is DeepSeek-V4 support currently on the RTP-LLM roadmap?
- If yes, is there an estimated timeline for when it might be available?
- Are there any specific technical challenges with V4 that the team is working on? (e.g., MoE changes, new kernel requirements, etc.)
Context
RTP-LLM already has excellent optimizations for DeepSeek-V3 (DeepEP, DeepGEMM, FlashMLA, EPLB, TP/DP/EP hybrid deployment), achieving impressive throughput numbers (Prefill 42.6K TPS, Decode 14.7K TPS on H800). Given the architectural continuity from V3 to V4, it would be great to know when we can expect V4 support.
Thanks!
Summary
Hi RTP-LLM team, thank you for the great work on this inference engine!
I noticed that the latest release supports DeepSeek V1/V2/V3/R1, but DeepSeek-V4 is not yet mentioned in the supported models list. Meanwhile, other frameworks like vLLM (v0.20+) and SGLang have already added Day-0/early support for DeepSeek-V4.
Questions
Context
RTP-LLM already has excellent optimizations for DeepSeek-V3 (DeepEP, DeepGEMM, FlashMLA, EPLB, TP/DP/EP hybrid deployment), achieving impressive throughput numbers (Prefill 42.6K TPS, Decode 14.7K TPS on H800). Given the architectural continuity from V3 to V4, it would be great to know when we can expect V4 support.
Thanks!