<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AIGC Newsletter: pxiaoer's  AI Notes]]></title><description><![CDATA[Notes on AI, products, and experiments by Pxiaoer]]></description><link>https://aigc.news/s/pxiaoer</link><image><url>https://substackcdn.com/image/fetch/$s_!syhd!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F08e14566-d83d-46bd-b836-3270106b5892_310x310.png</url><title>AIGC Newsletter: pxiaoer&apos;s  AI Notes</title><link>https://aigc.news/s/pxiaoer</link></image><generator>Substack</generator><lastBuildDate>Thu, 30 Apr 2026 04:52:21 GMT</lastBuildDate><atom:link href="https://aigc.news/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[pxiaoer]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aigc@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aigc@substack.com]]></itunes:email><itunes:name><![CDATA[pxiaoer]]></itunes:name></itunes:owner><itunes:author><![CDATA[pxiaoer]]></itunes:author><googleplay:owner><![CDATA[aigc@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aigc@substack.com]]></googleplay:email><googleplay:author><![CDATA[pxiaoer]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[DeepSeek-V4 Launched, Then Cut Prices Twice in a Row]]></title><description><![CDATA[The model is the headline. The pricing is the strategy.]]></description><link>https://aigc.news/p/deepseek-v4-launched-then-cut-prices</link><guid isPermaLink="false">https://aigc.news/p/deepseek-v4-launched-then-cut-prices</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Mon, 27 Apr 2026 15:12:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f4c4d06e-91ec-4cb0-9c95-0f5251e818e9_1376x768.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>DeepSeek-V4 was <a href="https://x.com/deepseek_ai/status/2047516922263285776">released</a>, and then it cut prices twice in a row.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/deepseek_ai/status/2047516922263285776&quot;,&quot;full_text&quot;:&quot;&#128640; DeepSeek-V4 Preview is officially live &amp;amp; open-sourced! Welcome to the era of cost-effective 1M context length.\n\n&#128313; DeepSeek-V4-Pro: 1.6T total / 49B active params. Performance rivaling the world's top closed-source models.\n&#128313; DeepSeek-V4-Flash: 284B total / 13B active params. &quot;,&quot;username&quot;:&quot;deepseek_ai&quot;,&quot;name&quot;:&quot;DeepSeek&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1717417613775757312/Uk1zNOj4_normal.jpg&quot;,&quot;date&quot;:&quot;2026-04-24T03:24:09.000Z&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/HGo9ronbUAAKlRk.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/n1AgwMIymu&quot;}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:1556,&quot;retweet_count&quot;:7617,&quot;like_count&quot;:44462,&quot;impression_count&quot;:9103250,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:false}" data-component-name="Twitter2ToDOM"></div><p></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hje5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hje5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 424w, https://substackcdn.com/image/fetch/$s_!Hje5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 848w, https://substackcdn.com/image/fetch/$s_!Hje5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 1272w, https://substackcdn.com/image/fetch/$s_!Hje5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hje5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png" width="1456" height="346" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71951366-8085-4e4b-88fa-5896b8755634_2531x601.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:434886,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.news/i/195637473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hje5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 424w, https://substackcdn.com/image/fetch/$s_!Hje5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 848w, https://substackcdn.com/image/fetch/$s_!Hje5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 1272w, https://substackcdn.com/image/fetch/$s_!Hje5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71951366-8085-4e4b-88fa-5896b8755634_2531x601.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>After the release, OpenClaw quickly added support for it. From the pricing structure, Flash is actually extremely cheap, while Pro is relatively expensive. The gap between the two is around 12x.</p><p></p><p>V4 also quickly appeared on OpenRouter&#8217;s usage ranking, which shows that people were already trying it in real workflows.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-0uN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-0uN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 424w, https://substackcdn.com/image/fetch/$s_!-0uN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 848w, https://substackcdn.com/image/fetch/$s_!-0uN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 1272w, https://substackcdn.com/image/fetch/$s_!-0uN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-0uN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png" width="1456" height="972" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:972,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:349661,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.news/i/195637473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-0uN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 424w, https://substackcdn.com/image/fetch/$s_!-0uN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 848w, https://substackcdn.com/image/fetch/$s_!-0uN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 1272w, https://substackcdn.com/image/fetch/$s_!-0uN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3209b3fd-f6ca-446c-bb40-fe5628be707a_2120x1416.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Then, on the second day, DeepSeek launched a 75% discount for V4-Pro API.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/deepseek_ai/status/2048062777357750316&quot;,&quot;full_text&quot;:&quot;&#128293;DeepSeek-V4-Pro API is 75% OFF until May 5th, 2026, 15:59 (UTC Time)! Don't miss out on this massive discount.\n\n&#128736;&#65039;Integration Updates:\n&#128313;Claude Code: Set model to deepseek-v4-pro[1m]  to unlock 1M context!\n&#128313;OpenCode: Update to v1.14.24+\n&#128313;OpenClaw: Update to v2026.4.24+\n\nCheck &quot;,&quot;username&quot;:&quot;deepseek_ai&quot;,&quot;name&quot;:&quot;DeepSeek&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1717417613775757312/Uk1zNOj4_normal.jpg&quot;,&quot;date&quot;:&quot;2026-04-25T15:33:11.000Z&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/HGwt-7VasAAPM7i.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/rvf11eSsoJ&quot;}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:318,&quot;retweet_count&quot;:905,&quot;like_count&quot;:9088,&quot;impression_count&quot;:617926,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>The DeepSeek-V4-Pro API is now available at a limited-time 75% discount, ending at 15:59 UTC on May 5, 2026.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TQlX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TQlX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 424w, https://substackcdn.com/image/fetch/$s_!TQlX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 848w, https://substackcdn.com/image/fetch/$s_!TQlX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 1272w, https://substackcdn.com/image/fetch/$s_!TQlX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TQlX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png" width="1456" height="771" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:771,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:912914,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.news/i/195637473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TQlX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 424w, https://substackcdn.com/image/fetch/$s_!TQlX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 848w, https://substackcdn.com/image/fetch/$s_!TQlX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 1272w, https://substackcdn.com/image/fetch/$s_!TQlX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90aef194-b53b-4732-b620-0cac435baed5_1912x1012.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>My guess is that the initial pricing of Pro was indeed a bit too high. For many people doing coding or agentic coding, they would naturally use Pro or Pro Max-like settings. After trying it for a day, it could easily cost dozens of RMB. Compared with a fixed coding plan, that is simply too expensive.</p><p>DeepSeek may have noticed this quickly, so it introduced the limited-time discount.</p><p>This also puts OpenRouter in an awkward position. If OpenRouter cannot adjust prices in time, many users will simply go back to the official API.</p><p></p><p>Then, on the third day, DeepSeek announced another price cut:<br></p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/deepseek_ai/status/2048440764368347611&quot;,&quot;full_text&quot;:&quot;&#128293;DeepSeek Input Cache Price Drop!\n\nEffective immediately, the price for input cache hits across the ENTIRE DeepSeek API series is reduced to just 1/10th of the original price! Build more efficiently for less.\n\n&#128204;Reminder: The DeepSeek-V4-Pro 75% OFF promotion is still active &quot;,&quot;username&quot;:&quot;deepseek_ai&quot;,&quot;name&quot;:&quot;DeepSeek&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1717417613775757312/Uk1zNOj4_normal.jpg&quot;,&quot;date&quot;:&quot;2026-04-26T16:35:11.000Z&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/HG2Fzg4bUAAXN5S.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/glpUSyzqKy&quot;}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:253,&quot;retweet_count&quot;:540,&quot;like_count&quot;:5530,&quot;impression_count&quot;:724715,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>Starting immediately, the input cache hit price for all DeepSeek API products has been reduced to one-tenth of the original price. Build applications more efficiently at a lower cost.</p><p>This cache price cut is the real killer move.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C6ES!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C6ES!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 424w, https://substackcdn.com/image/fetch/$s_!C6ES!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 848w, https://substackcdn.com/image/fetch/$s_!C6ES!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 1272w, https://substackcdn.com/image/fetch/$s_!C6ES!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C6ES!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png" width="1456" height="803" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:803,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1063470,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.news/i/195637473?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C6ES!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 424w, https://substackcdn.com/image/fetch/$s_!C6ES!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 848w, https://substackcdn.com/image/fetch/$s_!C6ES!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 1272w, https://substackcdn.com/image/fetch/$s_!C6ES!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5f0f9c3b-c836-4fc4-9230-baaec70e5d32_1632x900.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once the cache price drops to one-tenth of the original level, few vendors can really follow. DeepSeek is clearly starting to grab market share. My feeling is that this price may actually be close to the normal price level after Huawei&#8217;s inference servers begin shipping at scale.</p><p></p><p>It almost feels intentional.</p><p>First, set the price high.<br>Then cut it twice.<br>Get trending twice.<br>And finally land at a price point that looks extremely competitive.</p><p>The result is that DeepSeek-V4 suddenly looks like one of the highest cost-performance options on the market.</p><p>This is not just a model release.</p><p>It is a pricing campaign, a developer acquisition campaign, and probably the beginning of a new round of inference cost competition.</p>]]></content:encoded></item><item><title><![CDATA[DeepSeek OpenSourceWeek Day 6: In-Depth Analysis of DeepSeek-V3/R1 Inference System Overview]]></title><description><![CDATA[deepseek's One More Thing]]></description><link>https://aigc.news/p/deepseek-opensourceweek-day-6-in</link><guid isPermaLink="false">https://aigc.news/p/deepseek-opensourceweek-day-6-in</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Sat, 01 Mar 2025 14:06:16 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/bd3e736c-5de2-4992-a509-7b602f727105_1400x788.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today at 12 PM, DeepSeek brought a "One More Thing" to its open-source week, introducing some details and cost calculations about the DeepSeek-V3 / R1 inference system&#8212;something you're surely interested in.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eZD2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eZD2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 424w, https://substackcdn.com/image/fetch/$s_!eZD2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 848w, https://substackcdn.com/image/fetch/$s_!eZD2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!eZD2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eZD2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png" width="622" height="539.9666110183639" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1040,&quot;width&quot;:1198,&quot;resizeWidth&quot;:622,&quot;bytes&quot;:518293,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158169889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eZD2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 424w, https://substackcdn.com/image/fetch/$s_!eZD2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 848w, https://substackcdn.com/image/fetch/$s_!eZD2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 1272w, https://substackcdn.com/image/fetch/$s_!eZD2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd487ce70-2d54-4824-848a-b52b1852e531_1198x1040.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://x.com/deepseek_ai/status/1895688300574462431</figcaption></figure></div><p></p><p><strong>Inference System Design Principles</strong></p><p>DeepSeek first introduced the design principles of the inference system, with optimization goals focused on: greater throughput and lower latency.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!prhG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!prhG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 424w, https://substackcdn.com/image/fetch/$s_!prhG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 848w, https://substackcdn.com/image/fetch/$s_!prhG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 1272w, https://substackcdn.com/image/fetch/$s_!prhG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!prhG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png" width="1280" height="572" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:572,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:289791,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158169889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!prhG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 424w, https://substackcdn.com/image/fetch/$s_!prhG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 848w, https://substackcdn.com/image/fetch/$s_!prhG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 1272w, https://substackcdn.com/image/fetch/$s_!prhG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fd2ef98-4b8b-4122-9e32-d6749d04cb5b_1280x572.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Architecture Diagram</em></figcaption></figure></div><p>The solution currently adopted by DeepSeek: Large-Scale Node Expert Parallelism (EP)</p><ul><li><p>EP significantly increases batch size, thereby improving the efficiency of GPU matrix multiplication and boosting throughput.</p></li><li><p>EP distributes experts across different GPUs, with each GPU only needing to compute a small number of experts (thus reducing memory access demands), lowering latency.</p></li></ul><p><strong>Introduced Complexity:</strong></p><ul><li><p>EP introduces cross-node transmission. To optimize throughput, a suitable computation workflow must be designed to allow transmission and computation to occur synchronously.</p></li><li><p>EP involves multiple nodes, naturally requiring Data Parallelism (DP), with load balancing needed between different DP instances.</p></li></ul><p></p><p><strong>Large-Scale Cross-Node Expert Parallelism</strong></p><p>DeepSeek employs a multi-machine, multi-card expert parallelism strategy:</p><ul><li><p><strong>Prefill:</strong> Router Expert EP32, MLA, and Shared Expert DP32. One deployment unit consists of 4 nodes, 32 redundant router experts, with 9 router experts and 1 shared expert per card.</p></li><li><p><strong>Decode:</strong> Router Expert EP144, MLA, and Shared Expert DP144. One deployment unit consists of 18 nodes, 32 redundant router experts, with 2 router experts and 1 shared expert per card.</p></li></ul><p>The multi-machine, multi-card expert parallelism introduces significant communication overhead, so dual-batch overlapping is used to mask communication costs and improve overall throughput.</p><ul><li><p>In the <strong>prefill phase</strong>, computation and communication of two batches are interleaved&#8212;one batch&#8217;s computation can mask the communication overhead of the other.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VulS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VulS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 424w, https://substackcdn.com/image/fetch/$s_!VulS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 848w, https://substackcdn.com/image/fetch/$s_!VulS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 1272w, https://substackcdn.com/image/fetch/$s_!VulS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VulS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png" width="1280" height="281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bb089098-71f6-419b-9166-12f9635badb7_1280x281.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:281,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:185425,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158169889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VulS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 424w, https://substackcdn.com/image/fetch/$s_!VulS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 848w, https://substackcdn.com/image/fetch/$s_!VulS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 1272w, https://substackcdn.com/image/fetch/$s_!VulS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbb089098-71f6-419b-9166-12f9635badb7_1280x281.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><ul><li><p>In the <strong>decode phase</strong>, execution times vary across stages, so the attention part is split into two stages, creating a 5-stage pipeline to overlap computation and communication.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IULZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IULZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 424w, https://substackcdn.com/image/fetch/$s_!IULZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 848w, https://substackcdn.com/image/fetch/$s_!IULZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 1272w, https://substackcdn.com/image/fetch/$s_!IULZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IULZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png" width="1280" height="306" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:306,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:220292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158169889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IULZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 424w, https://substackcdn.com/image/fetch/$s_!IULZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 848w, https://substackcdn.com/image/fetch/$s_!IULZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 1272w, https://substackcdn.com/image/fetch/$s_!IULZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ffbf181-8d5d-498b-b4ff-06c61d45af57_1280x306.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p>Due to the use of large-scale parallelism (including expert parallelism and data parallelism), some GPUs may become overloaded, necessitating computational and communication load balancing.</p><ul><li><p><strong>Prefill Load Balancer</strong></p><ul><li><p>Core Issue: Variations in the number and length of requests across different Data Parallelism (DP) instances lead to differences in core-attention computation and dispatch transmission volumes.</p></li><li><p>Optimization Goal: Ensure roughly equal computation load across GPUs (core-attention load balancing) and similar token input volumes (dispatch transmission load balancing) to avoid prolonged processing times on some GPUs.</p></li></ul></li><li><p><strong>Decode Load Balancer</strong></p><ul><li><p>Core Issue: Variations in the number and length of requests across DP instances result in differences in core-attention computation (related to KVCache usage) and dispatch transmission volumes.</p></li><li><p>Optimization Goal: Ensure roughly equal KVCache usage across GPUs (core-attention load balancing) and similar request volumes (dispatch transmission load balancing).</p></li></ul></li><li><p><strong>Expert-Parallel Load Balancer</strong></p><ul><li><p>Core Issue: For a given MoE model, certain naturally high-load experts exist, leading to uneven expert computation loads across GPUs.</p></li><li><p>Optimization Goal: Balance expert computation across GPUs (i.e., minimize the maximum dispatch reception volume across all GPUs).</p></li></ul></li></ul><p></p><p><strong>Real-World Statistics of the Online Inference System</strong></p><p>All DeepSeek R1/V3 services use H800 GPUs. Matrix computations and dispatch transmissions use FP8 format consistent with training, while core-attention computations and combine transmissions use BF16 consistent with training, maximizing service performance.</p><p>Since service loads are high during the day and low at night, all servers handle inference during peak load times, while some machines are freed up for research and training during low-load periods.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hrWs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hrWs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 424w, https://substackcdn.com/image/fetch/$s_!hrWs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 848w, https://substackcdn.com/image/fetch/$s_!hrWs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 1272w, https://substackcdn.com/image/fetch/$s_!hrWs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hrWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png" width="1194" height="410" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:410,&quot;width&quot;:1194,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72772,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158169889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hrWs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 424w, https://substackcdn.com/image/fetch/$s_!hrWs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 848w, https://substackcdn.com/image/fetch/$s_!hrWs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 1272w, https://substackcdn.com/image/fetch/$s_!hrWs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8efbab03-a91b-45b9-a950-947e08ddbc62_1194x410.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Over the past 24 hours (Beijing time, 2025/02/27 12:00 to 2025/02/28 12:00), the total node usage for DeepSeek V3 and R1 inference services peaked at 278 nodes and averaged 226.75 nodes (each node with 8 H800 GPUs). Assuming a GPU rental cost of $2/hour, the total daily cost is $87,072.</p><p>Within this 24-hour statistical period, DeepSeek V3 and R1 recorded:</p><ul><li><p>Total input tokens: 608 billion, of which 342 billion tokens (56.3%) hit the KVCache disk cache.</p></li><li><p>Total output tokens: 168 billion. The average output rate was 20&#8211;22 tokens per second (tps), with an average KVCache length of 4,989 per output token.</p></li><li><p>Average throughput per H800 GPU:</p><ul><li><p>For prefill tasks: ~73.7k tokens/s input throughput (including cache hits).</p></li><li><p>For decode tasks: ~14.8k tokens/s output throughput.</p></li></ul></li></ul><p>These statistics encompass all loads from web, app, and API usage. If all tokens were priced according to DeepSeek R1&#8217;s rates, the theoretical daily revenue would be $562,027, yielding a cost-profit margin of <strong>545%</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E-Lz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E-Lz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 424w, https://substackcdn.com/image/fetch/$s_!E-Lz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 848w, https://substackcdn.com/image/fetch/$s_!E-Lz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 1272w, https://substackcdn.com/image/fetch/$s_!E-Lz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E-Lz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png" width="1195" height="441" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:441,&quot;width&quot;:1195,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:147690,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158169889?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E-Lz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 424w, https://substackcdn.com/image/fetch/$s_!E-Lz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 848w, https://substackcdn.com/image/fetch/$s_!E-Lz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 1272w, https://substackcdn.com/image/fetch/$s_!E-Lz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce9d85a9-398e-421e-b836-5a676b3fd890_1195x441.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In reality, DeepSeek doesn&#8217;t generate this much revenue because V3&#8217;s pricing is lower, paid services only account for a portion of usage, and discounts are offered at night.</p><p></p><p><strong>What Can We Infer?</strong></p><p>Rumors previously suggested that DeepSeek&#8217;s official deployment consisted of a 320-H800 inference cluster. Now, it appears to be 278 nodes&#8212;2,224 H800s. Officially, DeepSeek acknowledges owning at least 10,000 H800s, meaning the GPUs used for inference are relatively few.</p><ul><li><p><strong>Cost:</strong> Average of 226.75 nodes (1,814 GPUs), at $2/hour per GPU, yields a daily cost of $87,072.</p></li><li><p><strong>Revenue:</strong> Input: 608B tokens; Output: 168B tokens, resulting in a daily revenue of $562,027.</p></li><li><p><strong>Gross Daily Profit:</strong> $474,955 = ~3,457,672.4 RMB/day.</p></li></ul><p>However, the above calculation has flaws. Even if the 6x profit margin is halved to 3x, the profit margin remains very high. Many domestic vendors deploying DeepSeek have shut down API services due to losses, raising questions about where the problem lies.</p><p>In an interview, Liang Wenfeng said: &#8220;We just do things at our own pace, then calculate costs and set prices. Our principle is not to lose money.&#8221;</p><p>This suggests DeepSeek is currently profitable, with earnings likely reinvested into R&amp;D. We look forward to R2&#8217;s release soon.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aigc.news/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aigc.news/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek OpenSourceWeek Day 5: In-Depth Analysis of 3FS and Smallpond]]></title><description><![CDATA[February 28th, the last day of February, also marks the final day of DeepSeek's Open Source Week.]]></description><link>https://aigc.news/p/deepseek-opensourceweek-day-5-in</link><guid isPermaLink="false">https://aigc.news/p/deepseek-opensourceweek-day-5-in</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Fri, 28 Feb 2025 16:01:58 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3b95fc66-8fe9-43f5-8cad-5b55601e786a_1400x788.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>February 28th, the last day of February, also marks the final day of DeepSeek's Open Source Week. On this day, DeepSeek open-sourced two projects: <strong><a href="https://github.com/deepseek-ai/3FS">3FS</a></strong> and <strong><a href="https://github.com/deepseek-ai/smallpond">Smallpond</a></strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pOYJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pOYJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 424w, https://substackcdn.com/image/fetch/$s_!pOYJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 848w, https://substackcdn.com/image/fetch/$s_!pOYJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!pOYJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pOYJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png" width="609" height="546.9816360601002" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1076,&quot;width&quot;:1198,&quot;resizeWidth&quot;:609,&quot;bytes&quot;:633277,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pOYJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 424w, https://substackcdn.com/image/fetch/$s_!pOYJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 848w, https://substackcdn.com/image/fetch/$s_!pOYJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 1272w, https://substackcdn.com/image/fetch/$s_!pOYJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7999f92c-e545-4953-990e-eafc4ce0158e_1198x1076.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>According to the official introduction, the <strong>Fire-Flyer File System (3FS)</strong> is a parallel file system designed to fully utilize the bandwidth of modern SSDs and RDMA networks.</p><ul><li><p>Achieves an aggregate read throughput of <strong>6.6 TiB/s</strong> in a 180-node cluster.</p></li><li><p>Delivers <strong>3.66 TiB/minute</strong> throughput in the GraySort benchmark on a 25-node cluster.</p></li><li><p>Provides peak throughput of <strong>over 40 GiB/s</strong> per client node in KVCache lookups.</p></li><li><p>Features a distributed architecture with strong consistency semantics.</p></li><li><p>Used for training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search, and KVCache lookups in inference for V3/R1.</p></li></ul><p><strong>Smallpond</strong>, on the other hand, is a data processing framework built on top of 3FS.</p><p></p><p><strong>Fire-Flyer File System (3FS)</strong></p><p>3FS is part of the <strong>Fire-Flyer AI-HPC</strong> developed by DeepSeek. It is detailed in the paper <em><a href="https://arxiv.org/abs/2408.14158">Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning</a></em>.</p><p>Fire-Flyer AI-HPC consists of three components: the <strong><a href="https://github.com/HFAiLab/hai-platform">HAI Platform</a></strong> (open-sourced two years ago), <strong>3FS</strong> (open-sourced today), and <strong>HaiScale</strong> (yet to be open-sourced).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GmOe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GmOe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 424w, https://substackcdn.com/image/fetch/$s_!GmOe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 848w, https://substackcdn.com/image/fetch/$s_!GmOe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 1272w, https://substackcdn.com/image/fetch/$s_!GmOe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GmOe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png" width="614" height="497.9025341130604" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:832,&quot;width&quot;:1026,&quot;resizeWidth&quot;:614,&quot;bytes&quot;:552823,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GmOe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 424w, https://substackcdn.com/image/fetch/$s_!GmOe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 848w, https://substackcdn.com/image/fetch/$s_!GmOe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 1272w, https://substackcdn.com/image/fetch/$s_!GmOe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc5cb3ea3-3eed-4bc0-8f29-7fecd55a8e66_1026x832.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In summary, 3FS has several key features:</p><ol><li><p><strong>High-Performance Design</strong>: 3FS is tailored to leverage the high IOPS (input/output operations per second) and throughput of NVMe SSDs, as well as RDMA networks. This design enables it to efficiently handle large-scale data requests, meeting the demands of deep learning and large-scale computing.</p></li><li><p><strong>System Architecture</strong>: The 3FS system comprises four roles: cluster manager, metadata service, storage service, and client. The metadata and storage services periodically send heartbeat signals to the cluster manager to ensure system stability and efficiency. Multiple cluster managers ensure high availability.</p></li><li><p><strong>Request Control Mechanism</strong>: 3FS implements a request transmission control mechanism to alleviate network congestion. Upon receiving a read request, the storage service asks the client for permission to transfer data. This limits the number of concurrent senders, maintaining good performance under high load.</p></li><li><p><strong>Strong Consistency with Chain Replication</strong>: 3FS adopts the Chain Replication and Allocate Query (CRAQ) approach to provide strong consistency. File contents are split into blocks and replicated across a series of storage targets, fully unleashing the throughput and IOPS of all SSDs.</p></li><li><p><strong>High Throughput</strong>: By optimizing batch write and read operations, 3FS achieves write speeds exceeding <strong>10 GiB/s per node</strong>, accelerating checkpoint saving and loading, and reducing latency during training.</p></li><li><p><strong>3FS-KV System</strong>: 3FS also supports <strong>3FS-KV</strong>, a shared-storage distributed data processing system built on 3FS. It supports key-value storage, message queues, and object storage models, further enhancing system flexibility and performance.</p></li></ol><p>3FS provides robust storage support for deep learning and large-scale computing, effectively meeting demands for high throughput and low latency.</p><p></p><p><strong>Description from the Paper:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gPYH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gPYH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 424w, https://substackcdn.com/image/fetch/$s_!gPYH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 848w, https://substackcdn.com/image/fetch/$s_!gPYH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 1272w, https://substackcdn.com/image/fetch/$s_!gPYH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gPYH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png" width="636" height="1142.122105263158" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1706,&quot;width&quot;:950,&quot;resizeWidth&quot;:636,&quot;bytes&quot;:801942,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gPYH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 424w, https://substackcdn.com/image/fetch/$s_!gPYH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 848w, https://substackcdn.com/image/fetch/$s_!gPYH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 1272w, https://substackcdn.com/image/fetch/$s_!gPYH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cb98ce7-efa9-4ba1-a83d-b9ac651d4cf1_950x1706.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6BS5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6BS5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 424w, https://substackcdn.com/image/fetch/$s_!6BS5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 848w, https://substackcdn.com/image/fetch/$s_!6BS5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 1272w, https://substackcdn.com/image/fetch/$s_!6BS5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6BS5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png" width="634" height="444.9122807017544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:912,&quot;resizeWidth&quot;:634,&quot;bytes&quot;:339427,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6BS5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 424w, https://substackcdn.com/image/fetch/$s_!6BS5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 848w, https://substackcdn.com/image/fetch/$s_!6BS5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 1272w, https://substackcdn.com/image/fetch/$s_!6BS5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc86428aa-23de-4c48-8ef1-d6d576552c97_912x640.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!u0FS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!u0FS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 424w, https://substackcdn.com/image/fetch/$s_!u0FS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 848w, https://substackcdn.com/image/fetch/$s_!u0FS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 1272w, https://substackcdn.com/image/fetch/$s_!u0FS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!u0FS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png" width="610" height="509.65367965367966" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:772,&quot;width&quot;:924,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:404125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!u0FS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 424w, https://substackcdn.com/image/fetch/$s_!u0FS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 848w, https://substackcdn.com/image/fetch/$s_!u0FS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 1272w, https://substackcdn.com/image/fetch/$s_!u0FS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fca88bf57-8bd3-4b84-adb3-c067be797a67_924x772.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Performance</strong></p><p><strong>Peak Throughput</strong></p><p>Read throughput test results for a 3FS cluster: The cluster consists of <strong>180 storage nodes</strong>, each equipped with <strong>2&#215;200Gbps InfiniBand NICs</strong> and <strong>16&#215;14TiB NVMe SSDs</strong>. Over <strong>500+ client nodes</strong>, each with a <strong>1&#215;200Gbps InfiniBand NIC</strong>, were used for the read stress test. Under background traffic from training jobs, the aggregate read throughput reached approximately <strong>6.6 TiB/s</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bC3l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bC3l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 424w, https://substackcdn.com/image/fetch/$s_!bC3l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 848w, https://substackcdn.com/image/fetch/$s_!bC3l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 1272w, https://substackcdn.com/image/fetch/$s_!bC3l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bC3l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png" width="609" height="199.09615384615384" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:476,&quot;width&quot;:1456,&quot;resizeWidth&quot;:609,&quot;bytes&quot;:659714,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bC3l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 424w, https://substackcdn.com/image/fetch/$s_!bC3l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 848w, https://substackcdn.com/image/fetch/$s_!bC3l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 1272w, https://substackcdn.com/image/fetch/$s_!bC3l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6186eeaf-4791-4a6a-a799-d24d85a1798f_2048x669.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p><strong>Sorting Performance</strong></p><p>The test cluster consists of <strong>25 storage nodes</strong> (2 NUMA domains per node, 1 storage service per NUMA, 2&#215;400Gbps NICs per node) and <strong>50 compute nodes</strong> (2 NUMA domains, 192 physical cores, 2.2 TiB RAM, and 1&#215;200Gbps NIC per node). Sorting <strong>110.5 TiB of data</strong> across 8192 partitions took <strong>30 minutes and 14 seconds</strong>, achieving an average throughput of <strong>3.66 TiB/minute</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O9Zo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O9Zo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 424w, https://substackcdn.com/image/fetch/$s_!O9Zo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 848w, https://substackcdn.com/image/fetch/$s_!O9Zo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!O9Zo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O9Zo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png" width="573" height="373.86675824175825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:950,&quot;width&quot;:1456,&quot;resizeWidth&quot;:573,&quot;bytes&quot;:1011744,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O9Zo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 424w, https://substackcdn.com/image/fetch/$s_!O9Zo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 848w, https://substackcdn.com/image/fetch/$s_!O9Zo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 1272w, https://substackcdn.com/image/fetch/$s_!O9Zo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7bb42440-857e-4604-98b9-065b905c50b7_1616x1054.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>KVCache</strong></p><p>KVCache is designed to optimize the LLM inference process by caching keys and value vectors from previous tokens in the decoder layers, avoiding redundant computation. The figure above shows the read throughput for all KVCache clients, highlighting peak and average values, with a peak throughput of up to <strong>40 GiB/s</strong>. The figure below shows the IOPS of delete operations during garbage collection (GC) over the same period.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bfpx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bfpx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 424w, https://substackcdn.com/image/fetch/$s_!Bfpx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 848w, https://substackcdn.com/image/fetch/$s_!Bfpx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 1272w, https://substackcdn.com/image/fetch/$s_!Bfpx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bfpx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png" width="605" height="397.654532967033" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:957,&quot;width&quot;:1456,&quot;resizeWidth&quot;:605,&quot;bytes&quot;:1403643,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bfpx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 424w, https://substackcdn.com/image/fetch/$s_!Bfpx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 848w, https://substackcdn.com/image/fetch/$s_!Bfpx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 1272w, https://substackcdn.com/image/fetch/$s_!Bfpx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F467b1f42-3b83-4211-b89c-ed7624f9d626_1640x1078.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Why is a Specialized File System Like 3FS Needed?</strong></p><p>In LLM scenarios, there&#8217;s a need for highly concurrent, high-throughput, and scalable distributed file systems that also demand strong consistency, intelligent routing, and cache management. Systems like 3FS, which offer high-performance solutions tailored for RDMA and SSDs, flexible metadata design, and asynchronous zero-copy I/O, become highly valuable.</p><p></p><p></p><p><strong>Code Analysis</strong></p><p>One notable aspect of 3FS&#8217;s implementation is that DeepSeek used <strong>Rust</strong> to develop the <strong>chunk_engine</strong>.</p><p>The <strong>chunk_engine</strong> is a core module at the bottom layer of the 3FS storage service, responsible for managing, allocating, and reclaiming physical disk blocks. The upper layers can read and write block data through this engine. It primarily uses <strong>cxx</strong> to automatically generate C++ bindings, allowing C++ code to directly call Rust code.</p><p>In recent years, Rust has gained popularity in the MLSys (Machine Learning Systems) field. For example, Hugging Face&#8217;s <strong><a href="https://github.com/huggingface/tokenizers">tokenizers</a></strong> are also implemented in Rust. The DeepSeek team likely chose Rust for the chunk_engine due to its maintainability, memory safety, and excellent performance.</p><p>The DeepSeek team may also have used the Rust framework <strong>Tokio</strong> in backend services, as I found several Rust open-source projects, including Tokio, in Quant AI&#8217;s open-source initiatives. I sincerely hope more teams adopt Rust for developing machine learning systems.</p><p></p><p><strong>Smallpond</strong></p><p><strong>Smallpond</strong> is a lightweight data processing framework built on top of <strong>DuckDB</strong> and <strong>3FS</strong>. It supports lightweight, high-performance data processing and scales to petabyte-scale datasets.</p><p>Installation and usage are straightforward, with a minimal API offering two types: one for dynamically building dataflow graphs and another for static construction.</p><p></p><p><strong>Installation:</strong></p><pre><code><code>pip install smallpond</code></code></pre><p><strong>Usage Example:</strong></p><pre><code><code># Download example data
wget https://duckdb.org/data/prices.parquet

import smallpond
# Initialize session
sp = smallpond.init()
# Load data
df = sp.read_parquet("prices.parquet")
# Process data
df = df.repartition(3, hash_by="ticker")
df = sp.partial_sql("SELECT ticker, min(price), max(price) FROM {0} GROUP BY ticker", df)
# Save results
df.write_parquet("output/")
# Show results
print(df.to_pandas())</code></code></pre><p>For performance, refer to the sorting performance of 3FS.</p><p></p><p><strong>Conclusion of DeepSeek Open Source Week</strong></p><p>DeepSeek Open Source Week concludes today. Thank you, DeepSeek, for sharing valuable resources for everyone to learn and use.</p><p>Many teams have already taken action and achieved tangible performance improvements. For instance, the vLLM team replaced <strong>TRITON_MLA</strong> with <strong>FLASHMLA</strong>, boosting throughput by <strong>2-16%</strong>, delivering real results.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!80wj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!80wj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 424w, https://substackcdn.com/image/fetch/$s_!80wj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 848w, https://substackcdn.com/image/fetch/$s_!80wj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 1272w, https://substackcdn.com/image/fetch/$s_!80wj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!80wj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png" width="497" height="627.9211409395973" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1506,&quot;width&quot;:1192,&quot;resizeWidth&quot;:497,&quot;bytes&quot;:588194,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158109566?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!80wj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 424w, https://substackcdn.com/image/fetch/$s_!80wj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 848w, https://substackcdn.com/image/fetch/$s_!80wj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 1272w, https://substackcdn.com/image/fetch/$s_!80wj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85fefe48-aa59-4158-b085-ec4aeb5d98d0_1192x1506.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://x.com/vllm_project/status/1894994674630435123</figcaption></figure></div><p></p><p>These projects open-sourced by DeepSeek will continue to influence us. Our journey of intense learning continues.</p><p></p><p><strong>more</strong></p><ul><li><p>day0: <a href="https://aigc.openbot.ai/p/deepseek-opensourceweek-is-coming">https://aigc.openbot.ai/p/deepseek-opensourceweek-is-coming</a></p></li><li><p>day1: <a href="https://aigc.openbot.ai/p/deepseek-open-source-week-day-1-in">https://aigc.openbot.ai/p/deepseek-open-source-week-day-1-in</a></p></li><li><p>day2:<a href="https://aigc.openbot.ai/p/day-2-of-deepseek-opensourceweek">https://aigc.openbot.ai/p/day-2-of-deepseek-opensourceweek</a></p></li><li><p>day3:<a href="https://aigc.openbot.ai/p/deepseek-opensourceweek-day-3-deepgemm">https://aigc.openbot.ai/p/deepseek-opensourceweek-day-3-deepgemm</a></p></li><li><p>day4:<a href="https://aigc.openbot.ai/p/deepseek-opensourceweek-day-4-in">https://aigc.openbot.ai/p/deepseek-opensourceweek-day-4-in</a></p></li></ul><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aigc.news/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aigc.news/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek OpenSourceWeek Day 4: In-Depth Analysis of DualPipe & EPLB]]></title><description><![CDATA[Today marks the fourth day of DeepSeek Open Source Week, and DeepSeek has introduced three projects, all centered around optimizing parallel strategies for V3/R1 training and inference.]]></description><link>https://aigc.news/p/deepseek-opensourceweek-day-4-in</link><guid isPermaLink="false">https://aigc.news/p/deepseek-opensourceweek-day-4-in</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Thu, 27 Feb 2025 15:25:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1e6dbf42-d5ce-4021-bcb5-18e006c5fe56_1000x420.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today marks the fourth day of DeepSeek Open Source Week, and DeepSeek has introduced three projects, all centered around optimizing parallel strategies for V3/R1 training and inference.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BqTT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BqTT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 424w, https://substackcdn.com/image/fetch/$s_!BqTT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 848w, https://substackcdn.com/image/fetch/$s_!BqTT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 1272w, https://substackcdn.com/image/fetch/$s_!BqTT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BqTT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png" width="569" height="659.5443886097153" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1384,&quot;width&quot;:1194,&quot;resizeWidth&quot;:569,&quot;bytes&quot;:590801,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158040173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BqTT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 424w, https://substackcdn.com/image/fetch/$s_!BqTT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 848w, https://substackcdn.com/image/fetch/$s_!BqTT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 1272w, https://substackcdn.com/image/fetch/$s_!BqTT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F30e58d2b-3cd4-4adf-8c26-16e63e27aaed_1194x1384.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong><a href="https://github.com/deepseek-ai/DualPipe">DualPipe</a></strong> is a bidirectional pipeline parallelism algorithm designed for computation-communication overlap in V3/R1 training. Meanwhile, <strong><a href="https://github.com/deepseek-ai/eplb">EPLB</a></strong> serves as an expert-parallel load balancer for V3/R1.</p><p>The final project, <strong><a href="https://github.com/deepseek-ai/profile-data">profile-data</a></strong>, primarily releases analytical data from DeepSeek&#8217;s infrastructure for training and inference. So far, it includes data on Prefilling for both training and inference, while the Decoding analysis data for inference has yet to be made public.</p><p>Today, we&#8217;ll dive into an analysis of the DualPipe and EPLB projects, both of which lean toward engineering optimization.</p><p></p><p><strong>DualPipe</strong></p><p>DualPipe is mentioned in the DeepSeek V3 paper as a bidirectional pipeline parallelism communication algorithm, mainly used to optimize data interaction and training efficiency in large-scale models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!deYq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!deYq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 424w, https://substackcdn.com/image/fetch/$s_!deYq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 848w, https://substackcdn.com/image/fetch/$s_!deYq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 1272w, https://substackcdn.com/image/fetch/$s_!deYq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!deYq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png" width="1202" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1202,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:664621,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158040173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!deYq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 424w, https://substackcdn.com/image/fetch/$s_!deYq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 848w, https://substackcdn.com/image/fetch/$s_!deYq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 1272w, https://substackcdn.com/image/fetch/$s_!deYq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0fa3f89a-58c2-4eb0-a368-76d8d1d72f1e_1202x756.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Key Features:</strong></p><ul><li><p><strong>Computation-Communication Overlap</strong><br>DualPipe&#8217;s design aims to maximize cluster computing performance by achieving full overlap of computation and communication during forward and backward passes, reducing idle wait times typical in traditional pipeline parallelism. This is especially critical for expert parallelism (Expert Parallelism) across nodes in MoE models.</p></li><li><p><strong>Bidirectional Scheduling</strong><br>DualPipe employs a bidirectional scheduling strategy, feeding data from both ends of the pipeline simultaneously to reuse hardware resources efficiently. It also incorporates a sophisticated yet highly effective 8-step scheduling strategy.</p></li><li><p><strong>Memory Optimization</strong><br>DualPipe deploys the shallowest layers (including the embedding layer) and the deepest layers (including the output layer) on the same pipeline level (PP Rank), enabling physical sharing of parameters and gradients to further enhance memory efficiency.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!X2pe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!X2pe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 424w, https://substackcdn.com/image/fetch/$s_!X2pe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 848w, https://substackcdn.com/image/fetch/$s_!X2pe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 1272w, https://substackcdn.com/image/fetch/$s_!X2pe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!X2pe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png" width="1456" height="477" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:477,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:562902,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158040173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!X2pe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 424w, https://substackcdn.com/image/fetch/$s_!X2pe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 848w, https://substackcdn.com/image/fetch/$s_!X2pe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 1272w, https://substackcdn.com/image/fetch/$s_!X2pe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3d35c3e1-6bdb-41e6-8bf9-e972fad22239_1674x548.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Pipeline Bubble and Memory Usage Comparison</strong> (Pipeline bubble refers to idle wait time)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TqRF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TqRF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 424w, https://substackcdn.com/image/fetch/$s_!TqRF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 848w, https://substackcdn.com/image/fetch/$s_!TqRF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!TqRF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TqRF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png" width="1456" height="897" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:897,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:734628,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/158040173?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TqRF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 424w, https://substackcdn.com/image/fetch/$s_!TqRF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 848w, https://substackcdn.com/image/fetch/$s_!TqRF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 1272w, https://substackcdn.com/image/fetch/$s_!TqRF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F49840cee-cac4-4fa3-9047-de3d64db9506_1682x1036.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>For those interested, you can check out the code&#8212;it&#8217;s not very long and is great for learning.</p><p></p><p><strong>EPLB</strong></p><p><strong>EPLB (Expert Parallelism Load Balancer)</strong> is primarily designed to optimize the distributed deployment of MoE models. It ensures load balancing among different experts in the MoE portion by replicating shared experts and fine-grained high-load experts across multiple GPUs in the cluster. This allows GPUs to handle more "hot data" (data sent to shared experts) efficiently.</p><p>EPLB isn&#8217;t detailed in DeepSeek&#8217;s paper, and its code is remarkably concise at just 160 lines.</p><p><strong>Key Features:</strong></p><ul><li><p><strong>Load Balancing Optimization</strong><br>It replicates high-load experts (a strategy we can call "redundant expert strategy") and uses heuristic adjustments for expert allocation to ensure balanced workloads across GPUs.</p></li><li><p><strong>Hierarchical Load Balancing</strong><br>EPLB adopts a three-tier structure: node-level &#8594; intra-node expert replication &#8594; GPU allocation. It prioritizes assigning experts from the same group to the same node to minimize cross-node data transfers, then ensures load balancing at each layer. This approach, combined with DeepSeek V3&#8217;s Group-Limited Expert Routing strategy, significantly boosts distributed training efficiency.</p></li><li><p><strong>Dynamic Scheduling Strategy</strong><br>EPLB dynamically selects load balancing strategies based on the situation&#8212;using a hierarchical strategy during the prefilling phase and a global strategy during the decoding phase.</p></li></ul><div><hr></div><p><strong>Let&#8217;s Look at the Code:</strong></p><p><strong>Redundant Expert Strategy</strong></p><pre><code><code>def replicate_experts(weight: torch.Tensor, num_phy: int):
    # Replicate high-load experts
    for i in range(num_log, num_phy):
        redundant_indices = (weight / logcnt).max(dim=-1).indices
        phy2log[:, i] = redundant_indices
        logcnt[arangen, redundant_indices] += 1</code></code></pre><p></p><p><strong>Hierarchical Load Balancing</strong></p><pre><code><code>def rebalance_experts_hierarchical():
    # Step 1: Pack expert groups to nodes
    tokens_per_group = weight.unflatten(-1, (num_groups, group_size)).sum(-1)
    group_pack_index, group_rank_in_pack = balanced_packing(tokens_per_group, num_nodes)

    # Step 2: Build redundant experts within nodes
    tokens_per_mlog = weight.gather(-1, mlog2log).view(-1, num_logical_experts // num_nodes)

    # Step 3: Pack physical experts to GPUs
    tokens_per_phy = (tokens_per_mlog / mlogcnt).gather(-1, phy2mlog)</code></code></pre><p><strong>Dynamic Scheduling Strategy</strong></p><pre><code><code>def rebalance_experts():
    if num_groups % num_nodes == 0:
        # Use hierarchical strategy
        phy2log, phyrank, logcnt = rebalance_experts_hierarchical()
    else:
        # Use global strategy
        phy2log, phyrank, logcnt = replicate_experts()</code></code></pre><p>Interested readers can explore the full code.</p><p></p><p>Tomorrow is the final day of DeepSeek Open Source Week&#8212;will they drop a heavyweight open-source project? Let&#8217;s wait and see!</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aigc.news/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aigc.news/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek OpenSourceWeek Day 3: DeepGEMM In-Depth Analysis]]></title><description><![CDATA[Today marks the third day of DeepSeek's Open Source Week, with the release of DeepGEMM right on schedule at 9 AM.]]></description><link>https://aigc.news/p/deepseek-opensourceweek-day-3-deepgemm</link><guid isPermaLink="false">https://aigc.news/p/deepseek-opensourceweek-day-3-deepgemm</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Wed, 26 Feb 2025 16:06:08 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7577d166-4b4d-49bd-9b40-be89b41dd8d7_800x450.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><p>Today marks the third day of DeepSeek's Open Source Week, with the release of DeepGEMM right on schedule at 9 AM.<br></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LbJr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LbJr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 424w, https://substackcdn.com/image/fetch/$s_!LbJr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 848w, https://substackcdn.com/image/fetch/$s_!LbJr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 1272w, https://substackcdn.com/image/fetch/$s_!LbJr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LbJr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png" width="620" height="441.37353433835847" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:850,&quot;width&quot;:1194,&quot;resizeWidth&quot;:620,&quot;bytes&quot;:452047,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LbJr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 424w, https://substackcdn.com/image/fetch/$s_!LbJr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 848w, https://substackcdn.com/image/fetch/$s_!LbJr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 1272w, https://substackcdn.com/image/fetch/$s_!LbJr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef10be7c-2bcb-4ad0-89bb-5eb4d02a0ab1_1194x850.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>As of now, the project has garnered 3.3k stars since its release.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!M6GC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!M6GC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 424w, https://substackcdn.com/image/fetch/$s_!M6GC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 848w, https://substackcdn.com/image/fetch/$s_!M6GC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 1272w, https://substackcdn.com/image/fetch/$s_!M6GC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!M6GC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png" width="674" height="305.52197802197804" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:660,&quot;width&quot;:1456,&quot;resizeWidth&quot;:674,&quot;bytes&quot;:288381,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!M6GC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 424w, https://substackcdn.com/image/fetch/$s_!M6GC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 848w, https://substackcdn.com/image/fetch/$s_!M6GC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 1272w, https://substackcdn.com/image/fetch/$s_!M6GC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F63fb62ae-5308-4fc6-a423-d85d0084a330_2488x1128.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://github.com/deepseek-ai/DeepGEMM</figcaption></figure></div><p></p><p>The official introduction describes DeepGEMM as an FP8-supporting GEMM library compatible with both dense and MoE (Mixture of Experts) GEMM operations, designed for training and inference of V3/R1 models.</p><p> </p><p> <strong>A Brief Introduction to GEMM</strong></p><p>General Matrix Multiplication (GEMM) is one of the most fundamental and critical operations in deep learning and scientific computing. GEMM refers to the multiplication of two matrices, A and B, to produce a result matrix C, typically expressed as C = A &#215; B.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LVpM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LVpM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 424w, https://substackcdn.com/image/fetch/$s_!LVpM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 848w, https://substackcdn.com/image/fetch/$s_!LVpM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 1272w, https://substackcdn.com/image/fetch/$s_!LVpM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LVpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png" width="1252" height="500" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:500,&quot;width&quot;:1252,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:140311,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LVpM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 424w, https://substackcdn.com/image/fetch/$s_!LVpM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 848w, https://substackcdn.com/image/fetch/$s_!LVpM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 1272w, https://substackcdn.com/image/fetch/$s_!LVpM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F21c9717a-b48d-4236-aabf-82e3f313fb2a_1252x500.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In deep learning, GEMM underpins core components such as fully connected layers, convolutional layers, and attention mechanisms. For instance, in Transformer architectures, both self-attention and feedforward network layers heavily rely on matrix multiplication. As model sizes grow, GEMM operations dominate the computational time in training and inference, making their performance a key factor in the efficiency of deep learning systems.</p><p>Modern GPU architectures, like NVIDIA&#8217;s Tensor Core technology, are specifically designed to accelerate matrix multiplication. With the ever-increasing scale of models, the demand for high-performance GEMM implementations continues to rise, especially in large language models (LLMs) and MoE frameworks, where efficient GEMM is critical for real-time inference and cost-effective training.</p><p>In the paper <em>DeepSeek LLM: Scaling Open-Source Language Models with Longtermism</em>, DeepSeek mentions GEMM, though it ties into their work in another paper, <em><a href="https://arxiv.org/abs/2408.14158">Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning</a></em>, which introduces the HAI-LLM training system. For those interested, I recommend checking out the <em>Fire-Flyer AI-HPC</em> paper.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VtAi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VtAi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 424w, https://substackcdn.com/image/fetch/$s_!VtAi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 848w, https://substackcdn.com/image/fetch/$s_!VtAi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 1272w, https://substackcdn.com/image/fetch/$s_!VtAi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VtAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png" width="1456" height="703" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:703,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:906763,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VtAi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 424w, https://substackcdn.com/image/fetch/$s_!VtAi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 848w, https://substackcdn.com/image/fetch/$s_!VtAi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 1272w, https://substackcdn.com/image/fetch/$s_!VtAi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F35687d2b-05f0-4f94-bd16-12438e61d75b_1876x906.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Today&#8217;s open-sourced DeepGEMM supports FP8 and is tailored for training and inference of DeepSeek&#8217;s V3/R1 models. In the <a href="https://arxiv.org/abs/2412.19437">V3 paper</a>, DeepSeek details several optimizations for FP8 training.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!C6cE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!C6cE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 424w, https://substackcdn.com/image/fetch/$s_!C6cE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 848w, https://substackcdn.com/image/fetch/$s_!C6cE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 1272w, https://substackcdn.com/image/fetch/$s_!C6cE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!C6cE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png" width="1312" height="828" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:828,&quot;width&quot;:1312,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:487818,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!C6cE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 424w, https://substackcdn.com/image/fetch/$s_!C6cE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 848w, https://substackcdn.com/image/fetch/$s_!C6cE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 1272w, https://substackcdn.com/image/fetch/$s_!C6cE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4d0cf8c5-2572-43a5-9f1b-87d67e1f9325_1312x828.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The main challenges of FP8 training lie in precision and error handling. To tackle these, DeepSeek implemented the following optimizations:</p><ul><li><p><strong>Fine-Grained Quantization</strong>: Data is split into smaller groups, each with a specific multiplier to maintain high precision.</p></li><li><p><strong>Online Quantization</strong>: Weights are computed online for each 1x128 activation block or 128x128 weight block, with scaling factors inferred on-the-fly, and activations converted to FP8 in real time.</p></li><li><p><strong>Improved Accumulation Precision</strong>: FP8 accumulation can introduce random errors, so intermediate results are stored in FP32, then converted back after accumulation.</p></li><li><p><strong>Low-Precision/Mixed-Precision Storage and Communication</strong>: For MoE model training, FP8 is mixed with BF16/FP32 to ensure dynamic model stability.</p></li></ul><p>For a detailed look at these optimizations, check out the DeepSeek V3 paper.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!eyaI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!eyaI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 424w, https://substackcdn.com/image/fetch/$s_!eyaI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 848w, https://substackcdn.com/image/fetch/$s_!eyaI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 1272w, https://substackcdn.com/image/fetch/$s_!eyaI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!eyaI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png" width="650" height="512.4012638230648" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:998,&quot;width&quot;:1266,&quot;resizeWidth&quot;:650,&quot;bytes&quot;:582532,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!eyaI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 424w, https://substackcdn.com/image/fetch/$s_!eyaI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 848w, https://substackcdn.com/image/fetch/$s_!eyaI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 1272w, https://substackcdn.com/image/fetch/$s_!eyaI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8491a232-0faf-4f9b-8517-c7a1a58c3788_1266x998.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><br><strong>DeepGEMM Overview</strong></p><p>Here&#8217;s a summary of its key features:</p><ul><li><p><strong>FP8 Support</strong>: DeepGEMM uses CUDA&#8217;s two-stage accumulation to address precision issues.</p></li><li><p><strong>Grouped GEMM Support</strong>: It improves on CUTLASS&#8217;s grouped GEMM, with targeted optimizations for MoE models.</p></li><li><p><strong>Just-In-Time Compilation</strong>: Through JIT technology, code is dynamically generated and optimized at runtime, boosting performance and flexibility.</p></li><li><p><strong>FFMA SASS Interleaving</strong>: DeepSeek analyzed SASS compilation results in depth, tweaking FFMA/FADD instructions to enhance fine-grained FP8 GEMM efficiency.</p></li></ul><p><strong>Performance</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!zaGg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!zaGg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 424w, https://substackcdn.com/image/fetch/$s_!zaGg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 848w, https://substackcdn.com/image/fetch/$s_!zaGg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 1272w, https://substackcdn.com/image/fetch/$s_!zaGg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!zaGg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png" width="618" height="731.7659906396256" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1518,&quot;width&quot;:1282,&quot;resizeWidth&quot;:618,&quot;bytes&quot;:709290,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!zaGg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 424w, https://substackcdn.com/image/fetch/$s_!zaGg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 848w, https://substackcdn.com/image/fetch/$s_!zaGg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 1272w, https://substackcdn.com/image/fetch/$s_!zaGg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F12ff51ba-6e5d-446b-9954-6db1682b54f7_1282x1518.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>All metrics show improvement, with the highest gain reaching a 2.7x speedup. DeepSeek notes that performance isn&#8217;t optimal in some areas and welcomes PRs from those interested in further optimization.</p><p>In the DeepGEMM project&#8217;s README, the DeepSeek team provides a detailed breakdown of the optimizations. For those interested, it&#8217;s worth diving into the code alongside the documentation for a hands-on exploration.</p><p></p><p></p><p><strong>Spotlight: The interleave_ffma.py File</strong></p><p>Today, let&#8217;s focus on a specific file in the project: interleave_ffma.py under the jit directory. It contains some clever tricks worth exploring.</p><p>Here&#8217;s the code:</p><pre><code><code>import argparse
import mmap
import os
import re
import subprocess
from torch.utils.cpp_extension import CUDA_HOME

def run_cuobjdump(file_path):
    command = [f'{CUDA_HOME}/bin/cuobjdump', '-sass', file_path]
    result = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
    assert result.returncode == 0
    return result.stdout

def extract_ffma(sass):
    lines = sass.splitlines()
    collected = []
    current = []
    arch_name, func_name = 'N/A', 'N/A'
    skip_next_line = False
    for line in lines:
        if 'code for' in line:
            arch_name = line.lstrip().lstrip('code for ').rstrip()
        elif 'Function :' in line:
            func_name = line.lstrip().lstrip('Function :').rstrip()
        elif 'FFMA' in line:
            current.append(line)
            skip_next_line = True
        elif skip_next_line:
            current.append(line)
            skip_next_line = False
        else:
            if len(current) &gt;= 16:
                assert len(current) % 2 == 0
                collected.append((f'{arch_name}::{func_name}', current))
            current = []
    if os.getenv('DG_PRINT_REG_REUSE', None):
        print(f"Found {len(collected)} FFMA segments")
    return collected

def extract_hex_from_line(line):
    match = re.search(r'/\*\s*(0x[0-9a-fA-F]+)\s*\*/', line)
    assert match
    return int(match.group(1), 16)

def validate(m, offset, le_bytes, num_lines):
    assert len(le_bytes) == num_lines // 2
    assert m[offset:offset + 16] == le_bytes[0]
    for i in range(1, num_lines // 2):
        if m[offset + i * 16:offset + i * 16 + 16] != le_bytes[i]:
            return False
    return True

def parse_registers(line):
    import re
    line = re.sub(r'/\*.*?\*/', '', line)
    line = line.replace(';', '')
    tokens = line.strip().split(',')
    registers = []
    for token in tokens:
        token = token.strip()
        words = token.split()
        for word in words:
            if word.startswith('R'):
                reg = word.split('.')[0]
                registers.append(reg)
    return registers

def modify_segment(m, name, ffma_lines):
    num_lines = len(ffma_lines)
    assert num_lines % 2 == 0
    le_bytes, new_le_bytes = [], []
    reused_list = []
    dst_reg_set = set()
    last_reused, last_dst_reg = False, ''
    num_changed = 0
    for i in range(num_lines // 2):
        dst_reg = parse_registers(ffma_lines[i * 2])[-2]
        low_line, high_line = ffma_lines[i * 2], ffma_lines[i * 2 + 1]
        low_hex, high_hex = extract_hex_from_line(low_line), extract_hex_from_line(high_line)
        le_bytes.append(low_hex.to_bytes(8, 'little') + high_hex.to_bytes(8, 'little'))
        reused = (high_hex &amp; 0x0800000000000000) != 0
        if reused:
            is_first_occurred = dst_reg not in dst_reg_set
            if is_first_occurred or (last_reused and dst_reg == last_dst_reg):
                assert high_hex &amp; 0x0800200000000000, f"{hex(high_hex)}"
                high_hex ^= 0x0800200000000000
                reused = False
                num_changed += 1
            else:
                reused_list.append(i)
        dst_reg_set.add(dst_reg)
        new_le_bytes.append(low_hex.to_bytes(8, 'little') + high_hex.to_bytes(8, 'little'))
        last_reused, last_dst_reg = reused, dst_reg
    if os.getenv('DG_PRINT_REG_REUSE', None):
        print(f" &gt; segment `{name}` new reused list ({num_changed} changed): {reused_list}")
    offsets = []
    offset = m.find(le_bytes[0])
    while offset != -1:
        offsets.append(offset)
        offset = m.find(le_bytes[0], offset + 1)
    offsets = list(filter(lambda x: validate(m, x, le_bytes, num_lines), offsets))
    for offset in offsets:
        for i in range(num_lines // 2):
            m[offset + i * 16:offset + i * 16 + 16] = new_le_bytes[i]

def process(path):
    if os.getenv('DG_PRINT_REG_REUSE', None):
        print(f'Processing {path}')
    output = run_cuobjdump(path)
    segments = extract_ffma(output)
    with open(path, 'r+b') as f:
        mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_WRITE)
        for segment in segments:
            modify_segment(mm, *segment)
        mm.close()

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Interleave FFMA reg reuse')
    parser.add_argument('--so', help='Path to the SO file')
    args = parser.parse_args()
    process(args.so)</code></code></pre><p>This file is designed to optimize the register reuse patterns of FFMA (Fused Floating-point Multiply-Add) instructions in CUDA-compiled assembly code by modifying the binary file, ultimately improving GPU instruction execution efficiency.</p><p></p><p><strong>Key Function Breakdown:</strong></p><p></p><p><strong>SASS Code Extraction</strong></p><pre><code><code>def run_cuobjdump(file_path):
    command = [f'{CUDA_HOME}/bin/cuobjdump', '-sass', file_path]</code></code></pre><p>Uses NVIDIA&#8217;s cuobjdump tool to extract SASS (assembly) code from the binary.</p><p></p><p><strong>FFMA Instruction Analysis</strong></p><pre><code><code>def extract_ffma(sass):</code></code></pre><p>Extracts segments of SASS code containing FFMA instructions, collecting architecture and function names along with the instruction sequences.</p><p></p><p><strong>Register Usage Analysis</strong></p><pre><code><code>def parse_registers(line):</code></code></pre><p>Parses the registers used in each instruction, identifying those starting with 'R'.</p><p></p><p><strong>Binary Modification</strong></p><pre><code><code>def modify_segment(m, name, ffma_lines):</code></code></pre><p>Modifies the reuse and yield bits of FFMA instructions by tweaking specific bit patterns (e.g., 0x0800200000000000) to optimize register reuse.</p><p></p><p><strong>Workflow:</strong></p><ul><li><p>Reads a compiled CUDA shared library (.so file).</p></li><li><p>Extracts SASS code using cuobjdump.</p></li><li><p>Identifies and collects all FFMA instruction sequences.</p></li><li><p>Analyzes register usage patterns in each FFMA instruction.</p></li><li><p>Modifies the reuse flags based on specific rules.</p></li><li><p>Writes the optimized instructions back to the original file.</p></li></ul><p></p><p><strong>Optimization Strategy:</strong></p><p>The tool targets:</p><ul><li><p>First-time register usage.</p></li><li><p>Consecutive reuse of the same destination register.<br>By tweaking the reuse and yield bits, it optimizes instruction scheduling.</p></li></ul><p><strong>Usage:</strong></p><pre><code><code>python interleave_ffma.py --so path/to/cuda_lib.so</code></code></pre><p></p><p><strong>Analysis:</strong></p><p>This tool acts as a post-processing optimizer, running after CUDA compilation to enhance GPU instruction efficiency by tweaking the binary. Its focus on FFMA instruction register reuse is particularly impactful for compute-intensive applications like deep learning.</p><p>A regex in the file often stumps readers:</p><pre><code><code>def extract_hex_from_line(line):
    match = re.search(r'/\*\s*(0x[0-9a-fA-F]+)\s*\*/', line)
    assert match
    return int(match.group(1), 16)</code></code></pre><p>In CUDA SASS assembly, instructions often appear like this:</p><pre><code><code>FFMA R8, R8, R6, R4;                  /* 0x5c98078000870808 */</code></code></pre><p>This regex extracts 0x5c98078000870808, the hexadecimal machine instruction encoding. The function:</p><ul><li><p>Extracts the hex code from the assembly line.</p></li><li><p>Converts it to an integer for subsequent modification.</p></li><li><p>This step is crucial for locating instructions, modifying specific bits (e.g., reuse and yield flags), and writing them back.</p></li></ul><p></p><p>Honestly, DeepSeek&#8217;s engineers seem to outshine even some NVIDIA folks when it comes to CUDA mastery! Oh, and they&#8217;ve slashed their API prices again.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I1Qn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I1Qn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 424w, https://substackcdn.com/image/fetch/$s_!I1Qn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 848w, https://substackcdn.com/image/fetch/$s_!I1Qn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 1272w, https://substackcdn.com/image/fetch/$s_!I1Qn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I1Qn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png" width="456" height="646.59375" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1815,&quot;width&quot;:1280,&quot;resizeWidth&quot;:456,&quot;bytes&quot;:962887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157971007?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I1Qn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 424w, https://substackcdn.com/image/fetch/$s_!I1Qn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 848w, https://substackcdn.com/image/fetch/$s_!I1Qn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 1272w, https://substackcdn.com/image/fetch/$s_!I1Qn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4977a542-b376-4449-815b-207e5fd9ae4a_1280x1815.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aigc.news/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aigc.news/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[ Day 2 of DeepSeek OpenSourceWeek: In-Depth Analysis of DeepEP]]></title><description><![CDATA[On the second day of OpenSourceWeek, the official DeepSeek X account posted an article at 10:24, introducing the second open-source project of Open Source Week: DeepEP.]]></description><link>https://aigc.news/p/day-2-of-deepseek-opensourceweek</link><guid isPermaLink="false">https://aigc.news/p/day-2-of-deepseek-opensourceweek</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Tue, 25 Feb 2025 16:08:04 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8294b4e2-fba6-4172-aea7-1b95b68ae29a_1000x420.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2ZKT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2ZKT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 424w, https://substackcdn.com/image/fetch/$s_!2ZKT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 848w, https://substackcdn.com/image/fetch/$s_!2ZKT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 1272w, https://substackcdn.com/image/fetch/$s_!2ZKT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2ZKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png" width="500" height="411.51919866444075" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:986,&quot;width&quot;:1198,&quot;resizeWidth&quot;:500,&quot;bytes&quot;:512158,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2ZKT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 424w, https://substackcdn.com/image/fetch/$s_!2ZKT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 848w, https://substackcdn.com/image/fetch/$s_!2ZKT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 1272w, https://substackcdn.com/image/fetch/$s_!2ZKT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f7518a4-f7a8-412d-8dab-d2641c2cc92e_1198x986.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://x.com/deepseek_ai/status/1894211757604049133</figcaption></figure></div><p>On the second day of OpenSourceWeek, the official DeepSeek X account posted an article at 10:24, introducing the second open-source project of Open Source Week: <a href="https://github.com/deepseek-ai/DeepEP">DeepEP</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KB8q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KB8q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 424w, https://substackcdn.com/image/fetch/$s_!KB8q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 848w, https://substackcdn.com/image/fetch/$s_!KB8q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 1272w, https://substackcdn.com/image/fetch/$s_!KB8q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KB8q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png" width="1456" height="713" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:713,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:527089,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KB8q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 424w, https://substackcdn.com/image/fetch/$s_!KB8q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 848w, https://substackcdn.com/image/fetch/$s_!KB8q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 1272w, https://substackcdn.com/image/fetch/$s_!KB8q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F083480d5-0321-42f6-bb57-08e7a8f91643_2048x1003.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://github.com/deepseek-ai/DeepEP</figcaption></figure></div><p><br>Since the code was released, it has already garnered 4.3K stars.</p><p></p><p>Many people are not very familiar with MoE models, so this article will first briefly introduce MoE and some of DeepSeek's work on MoE.</p><p></p><p><strong>MoE Introduction</strong></p><p>The Mixture-of-Experts (MoE) model is a simple extension of the Transformer architecture, rapidly becoming the preferred architecture for medium-to-large-scale language models (2 billion to 600 billion parameters).</p><p><strong>Key Advantages:</strong></p><ul><li><p>Faster pre-training speed compared to dense models</p></li><li><p>Faster inference speed compared to models with the same number of parameters</p></li></ul><p></p><p><strong>Challenges:</strong><br>It requires significant memory since all expert systems need to be loaded into memory. Additionally, as it is typically used for medium-to-large models, it often requires parallel processing across multiple GPUs, and communication must be highly efficient.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gCHh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gCHh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 424w, https://substackcdn.com/image/fetch/$s_!gCHh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 848w, https://substackcdn.com/image/fetch/$s_!gCHh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 1272w, https://substackcdn.com/image/fetch/$s_!gCHh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gCHh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png" width="1274" height="982" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:982,&quot;width&quot;:1274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:411360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gCHh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 424w, https://substackcdn.com/image/fetch/$s_!gCHh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 848w, https://substackcdn.com/image/fetch/$s_!gCHh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 1272w, https://substackcdn.com/image/fetch/$s_!gCHh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa8f01bce-ccb4-424f-9fb0-1949a4c21121_1274x982.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://arxiv.org/abs/2101.03961">Switch Transformers paper</a></figcaption></figure></div><p></p><p>MoE models primarily consist of two key components:</p><ol><li><p><strong>Sparse MoE Layers</strong><br>These layers replace the feed-forward network (FFN) layers in traditional Transformer models. An MoE layer contains several "experts" (e.g., 8), each of which is an independent neural network.</p></li><li><p><strong>Gating Network or Router</strong><br>This component determines which tokens are sent to which experts. Sometimes, a token may even be routed to multiple experts.</p><p></p></li></ol><p><strong>Summary:</strong><br>A notable advantage of Mixture-of-Experts (MoE) models is their ability to perform effective pre-training with far fewer computational resources than dense models. This means that, under the same computational budget, you can significantly scale up the model or dataset size. Especially during pre-training, MoE models typically reach the same quality level faster than dense models.</p><p></p><p></p><p><strong>DeepSeek MoE</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GDzr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GDzr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 424w, https://substackcdn.com/image/fetch/$s_!GDzr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 848w, https://substackcdn.com/image/fetch/$s_!GDzr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 1272w, https://substackcdn.com/image/fetch/$s_!GDzr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GDzr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png" width="1290" height="854" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:854,&quot;width&quot;:1290,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:329088,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GDzr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 424w, https://substackcdn.com/image/fetch/$s_!GDzr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 848w, https://substackcdn.com/image/fetch/$s_!GDzr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 1272w, https://substackcdn.com/image/fetch/$s_!GDzr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8df25801-6501-4d73-87e0-d53cd00d1dab_1290x854.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On January 11, 2024, DeepSeek released the paper <em><a href="https://arxiv.org/pdf/2401.06066">DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models</a></em><a href="https://arxiv.org/pdf/2401.06066">,</a> making it one of the earliest companies to research MoE models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!i3yf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!i3yf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 424w, https://substackcdn.com/image/fetch/$s_!i3yf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 848w, https://substackcdn.com/image/fetch/$s_!i3yf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 1272w, https://substackcdn.com/image/fetch/$s_!i3yf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!i3yf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png" width="1242" height="860" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:860,&quot;width&quot;:1242,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:633748,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!i3yf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 424w, https://substackcdn.com/image/fetch/$s_!i3yf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 848w, https://substackcdn.com/image/fetch/$s_!i3yf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 1272w, https://substackcdn.com/image/fetch/$s_!i3yf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95c4c21a-1957-4a1c-a6d0-2a78097dba84_1242x860.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the DeepSeekMoE paper, the concepts of <strong>Router Expert</strong> and <strong>Shared Expert</strong> were introduced, and experiments were conducted on increasing fine-grained experts.</p><p>In the paper, DeepSeekMoE 16B has 2 shared experts and 64 router experts per layer, with each token activating 2 shared experts and 6 router experts. The 145B version, on the other hand, has 4 shared experts and 128 router experts, with each token activating 4 shared experts and 12 router experts.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fk2z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fk2z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 424w, https://substackcdn.com/image/fetch/$s_!fk2z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 848w, https://substackcdn.com/image/fetch/$s_!fk2z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 1272w, https://substackcdn.com/image/fetch/$s_!fk2z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fk2z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png" width="1264" height="534" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:534,&quot;width&quot;:1264,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:429498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fk2z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 424w, https://substackcdn.com/image/fetch/$s_!fk2z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 848w, https://substackcdn.com/image/fetch/$s_!fk2z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 1272w, https://substackcdn.com/image/fetch/$s_!fk2z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F00ab8abc-cef4-41e1-81d9-5c98ed1ab11c_1264x534.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The DeepSeekMoE paper also mentions today&#8217;s topic: <strong>Expert Parallelism</strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bpkv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bpkv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 424w, https://substackcdn.com/image/fetch/$s_!Bpkv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 848w, https://substackcdn.com/image/fetch/$s_!Bpkv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 1272w, https://substackcdn.com/image/fetch/$s_!Bpkv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bpkv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png" width="1308" height="1610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1610,&quot;width&quot;:1308,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:919292,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Bpkv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 424w, https://substackcdn.com/image/fetch/$s_!Bpkv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 848w, https://substackcdn.com/image/fetch/$s_!Bpkv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 1272w, https://substackcdn.com/image/fetch/$s_!Bpkv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f0c8c64-94b9-4722-b043-164ea484343e_1308x1610.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In the DeepSeek V3 paper, two paragraphs provide more detailed data. DeepSeek R1 was trained based on V3, and although no further details are provided, it is largely consistent with V3.</p><p></p><p></p><p><strong>DeepEP Introduction</strong></p><p>DeepEP is a communication library designed for MoE models and Expert Parallelism (EP). It provides high-throughput and low-latency all-to-all GPU kernels, also known as MoE data <strong>dispatch</strong> and <strong>combine</strong>. Additionally, the library supports low-precision operations, including FP8.</p><p>To align with the group-limited gating algorithm proposed in the DeepSeek-V3 paper, DeepEP offers a set of kernels optimized for asymmetric domain bandwidth forwarding (e.g., from NVLink domain to RDMA domain). These kernels deliver high throughput, making them ideal for training and inference prefetching tasks. They also support control over the number of Streaming Multiprocessors (SM).</p><p>For latency-sensitive inference decoding tasks, DeepEP includes a set of low-latency kernels using pure RDMA to minimize latency. The library also introduces a hook-based method for overlapping communication and computation without occupying any SM resources.</p><p><strong>Note:</strong> The implementation of this library may slightly differ from the DeepSeek-V3 paper.</p><p></p><p></p><p><strong>Performance</strong></p><p><strong>NVLink and RDMA Forwarding Test for General Kernels</strong></p><p>We tested general kernels on H800 (NVLink max bandwidth ~160 GB/s), with each device connected to a CX7 InfiniBand 400 Gb/s RDMA NIC (max bandwidth ~50 GB/s). The test followed the pre-training setup of DeepSeek-V3/R1: 4096 tokens per batch, hidden dimension of 7168, top-4 grouping, top-8 experts, FP8 dispatch, and BF16 combine.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1zk1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1zk1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 424w, https://substackcdn.com/image/fetch/$s_!1zk1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 848w, https://substackcdn.com/image/fetch/$s_!1zk1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 1272w, https://substackcdn.com/image/fetch/$s_!1zk1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1zk1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png" width="1456" height="391" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:391,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98429,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1zk1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 424w, https://substackcdn.com/image/fetch/$s_!1zk1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 848w, https://substackcdn.com/image/fetch/$s_!1zk1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 1272w, https://substackcdn.com/image/fetch/$s_!1zk1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcdb22bc7-7f9f-47dc-9c76-abdc6f8c09a4_1564x420.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><strong>Low-Latency Kernel Test with Pure RDMA</strong></p><p>We tested low-latency kernels on H800, with each device connected to a CX7 InfiniBand 400 Gb/s RDMA NIC (max bandwidth ~50 GB/s). The test followed a typical DeepSeek-V3/R1 production environment setup: 128 tokens per batch, hidden dimension of 7168, top-8 experts, FP8 dispatch, and BF16 combine.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m2v9!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m2v9!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 424w, https://substackcdn.com/image/fetch/$s_!m2v9!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 848w, https://substackcdn.com/image/fetch/$s_!m2v9!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 1272w, https://substackcdn.com/image/fetch/$s_!m2v9!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m2v9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png" width="1456" height="535" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:535,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:107584,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!m2v9!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 424w, https://substackcdn.com/image/fetch/$s_!m2v9!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 848w, https://substackcdn.com/image/fetch/$s_!m2v9!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 1272w, https://substackcdn.com/image/fetch/$s_!m2v9!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59df353f-4f55-4c8b-b7cc-6396c9cddcc4_1530x562.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p><strong>Code Analysis</strong></p><p>The optimization strategy focuses heavily on low-latency communication. Here, we&#8217;ll highlight this aspect, while other features like dynamic scheduling, asynchronous communication, and stream management can be explored in the relevant documentation and code.</p><p><strong>Double Buffering</strong></p><pre><code><code>auto buffer = layout.buffers[low_latency_buffer_idx];
auto next_buffer = layout.buffers[low_latency_buffer_idx ^= 1];</code></code></pre><ul><li><p>Alternates between two buffers: one for the current operation, another for the next.</p></li><li><p>Uses the bitwise operation ^= 1 to efficiently switch buffer indices.</p><p></p></li></ul><p><strong>TMA (Tensor Memory Access) Optimization</strong></p><ul><li><p>Leverages the Hopper architecture&#8217;s TMA instructions to accelerate data transfer.</p></li><li><p>Supports low-precision formats like FP8 to reduce communication bandwidth requirements.</p><p></p></li></ul><p><strong>IBGDA Direct Communication</strong></p><pre><code><code>// Initialize recv queues for low-latency mode AR
ibgda_initialize_recv_queue&lt;&lt;&lt;num_ranks, 128&gt;&gt;&gt;(rank);</code></code></pre><ul><li><p>Uses NVSHMEM&#8217;s IBGDA technology for GPU-direct RDMA communication.</p></li><li><p>Bypasses CPU involvement entirely to reduce latency.</p></li></ul><p><strong>Expert-Level QP Allocation</strong></p><pre><code><code>_buffer = Buffer(group, 0, num_rdma_bytes, low_latency_mode=True,
num_qps_per_rank=num_experts // group.size())</code></code></pre><ul><li><p>Assigns independent Queue Pairs (QPs) to each local expert, eliminating resource contention.</p></li></ul><p></p><p><strong>DeepEP Use Cases</strong></p><ul><li><p>Large-scale MoE model training (e.g., models with hundreds of billions of parameters)</p></li><li><p>High-concurrency, low-latency real-time inference services</p></li><li><p>Heterogeneous computing tasks such as multimodal applications and scientific computing</p></li></ul><p></p><p>Notably, at the end of the repository, DeepSeek mentions using an undocumented NVIDIA instruction for optimization&#8212;a true hacker spirit worth learning from!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AkRZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AkRZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 424w, https://substackcdn.com/image/fetch/$s_!AkRZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 848w, https://substackcdn.com/image/fetch/$s_!AkRZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 1272w, https://substackcdn.com/image/fetch/$s_!AkRZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AkRZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png" width="1456" height="470" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:470,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:428160,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157895277?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AkRZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 424w, https://substackcdn.com/image/fetch/$s_!AkRZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 848w, https://substackcdn.com/image/fetch/$s_!AkRZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 1272w, https://substackcdn.com/image/fetch/$s_!AkRZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F52adb263-bc15-4693-8d31-9c23cb9fbd91_1790x578.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aigc.news/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aigc.news/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek Open Source Week Day 1: In-Depth Analysis of FlashMLA]]></title><description><![CDATA[This morning at 9:34, DeepSeek announced the first project of Open Source Week on X: FlashMLA.]]></description><link>https://aigc.news/p/deepseek-open-source-week-day-1-in</link><guid isPermaLink="false">https://aigc.news/p/deepseek-open-source-week-day-1-in</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Mon, 24 Feb 2025 15:09:54 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rClU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rClU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rClU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rClU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rClU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rClU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rClU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg" width="1364" height="848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1364,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63467,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F298e6a2f-cb95-4cd6-81ba-25805e84cd4d_1600x900.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rClU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rClU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rClU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rClU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F187c8e4c-0940-4692-b31a-d3597d7b586c_1364x848.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FSh6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FSh6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 424w, https://substackcdn.com/image/fetch/$s_!FSh6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 848w, https://substackcdn.com/image/fetch/$s_!FSh6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!FSh6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FSh6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png" width="582" height="714.6709677419354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1142,&quot;width&quot;:930,&quot;resizeWidth&quot;:582,&quot;bytes&quot;:414480,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FSh6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 424w, https://substackcdn.com/image/fetch/$s_!FSh6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 848w, https://substackcdn.com/image/fetch/$s_!FSh6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!FSh6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b5ce142-9ed8-4a46-ae85-4f48b67bb6da_930x1142.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This morning at 9:34, DeepSeek announced the first project of Open Source Week on X: FlashMLA. This article provides an in-depth analysis of FlashMLA.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZM8r!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZM8r!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 424w, https://substackcdn.com/image/fetch/$s_!ZM8r!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 848w, https://substackcdn.com/image/fetch/$s_!ZM8r!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!ZM8r!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZM8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png" width="686" height="365.1442307692308" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:775,&quot;width&quot;:1456,&quot;resizeWidth&quot;:686,&quot;bytes&quot;:581601,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZM8r!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 424w, https://substackcdn.com/image/fetch/$s_!ZM8r!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 848w, https://substackcdn.com/image/fetch/$s_!ZM8r!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 1272w, https://substackcdn.com/image/fetch/$s_!ZM8r!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F073f3d34-f5c1-4dd5-a7d3-c58362cce0ba_2048x1090.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The FlashMLA project has gained significant popularity, with its code already reaching 6.8k stars.</p><p></p><p><strong>Brief Introduction to MLA</strong></p><p>MLA (Multi-Head Latent Attention) is an optimization method for Multi-Head Attention (MHA) proposed by DeepSeek in their paper <em><a href="https://arxiv.org/abs/2405.04434">DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model</a></em>.</p><p>In Transformer models, MHA is one of the most computationally intensive modules. To maintain high efficiency in large-scale scenarios, further optimization is necessary.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ywrG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ywrG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 424w, https://substackcdn.com/image/fetch/$s_!ywrG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 848w, https://substackcdn.com/image/fetch/$s_!ywrG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!ywrG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ywrG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png" width="567" height="538.4335877862595" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1244,&quot;width&quot;:1310,&quot;resizeWidth&quot;:567,&quot;bytes&quot;:605978,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ywrG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 424w, https://substackcdn.com/image/fetch/$s_!ywrG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 848w, https://substackcdn.com/image/fetch/$s_!ywrG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 1272w, https://substackcdn.com/image/fetch/$s_!ywrG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff74c2b91-77a9-4ed5-b6d9-24fc4b39fc06_1310x1244.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>MLA can be considered a variant of MHA. In its implementation, it borrows some concepts from FlashAttention. The DeepSeek-V2 paper primarily compares it with MHA, GQA, and MQA, with optimization results shown in the figure below:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jNDN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jNDN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 424w, https://substackcdn.com/image/fetch/$s_!jNDN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 848w, https://substackcdn.com/image/fetch/$s_!jNDN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 1272w, https://substackcdn.com/image/fetch/$s_!jNDN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jNDN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png" width="635" height="285.30421216848674" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:576,&quot;width&quot;:1282,&quot;resizeWidth&quot;:635,&quot;bytes&quot;:333386,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jNDN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 424w, https://substackcdn.com/image/fetch/$s_!jNDN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 848w, https://substackcdn.com/image/fetch/$s_!jNDN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 1272w, https://substackcdn.com/image/fetch/$s_!jNDN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e558075-e8bd-4c13-959c-b68adf2092e0_1282x576.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-7r1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-7r1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 424w, https://substackcdn.com/image/fetch/$s_!-7r1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 848w, https://substackcdn.com/image/fetch/$s_!-7r1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 1272w, https://substackcdn.com/image/fetch/$s_!-7r1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-7r1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png" width="610" height="285.3689167974882" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cae913c2-b98f-4274-9548-881c1cf819be_1274x596.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:596,&quot;width&quot;:1274,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:381066,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-7r1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 424w, https://substackcdn.com/image/fetch/$s_!-7r1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 848w, https://substackcdn.com/image/fetch/$s_!-7r1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 1272w, https://substackcdn.com/image/fetch/$s_!-7r1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcae913c2-b98f-4274-9548-881c1cf819be_1274x596.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In some inference frameworks, MLA has also been implemented. As shown below, after integrating MLA into SGLang, throughput increased by 2-3 times.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gccI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gccI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 424w, https://substackcdn.com/image/fetch/$s_!gccI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 848w, https://substackcdn.com/image/fetch/$s_!gccI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 1272w, https://substackcdn.com/image/fetch/$s_!gccI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gccI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png" width="634" height="272.97222222222223" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:310,&quot;width&quot;:720,&quot;resizeWidth&quot;:634,&quot;bytes&quot;:73536,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gccI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 424w, https://substackcdn.com/image/fetch/$s_!gccI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 848w, https://substackcdn.com/image/fetch/$s_!gccI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 1272w, https://substackcdn.com/image/fetch/$s_!gccI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F20c69bfd-0486-427e-a9ed-3e6fd222822d_720x310.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://lmsys.org/blog/2024-09-04-sglang-v0-3/</figcaption></figure></div><p></p><p></p><p><strong>Using FlashMLA</strong></p><p><strong>Environment Requirements:</strong></p><ul><li><p>Hopper GPUs</p></li><li><p>Minimum CUDA 12.3</p></li><li><p>Minimum PyTorch 2.0</p></li></ul><p><strong>Installation:</strong></p><pre><code><code>git clone https://github.com/deepseek-ai/FlashMLA  
python setup.py install  </code></code></pre><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o08t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o08t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 424w, https://substackcdn.com/image/fetch/$s_!o08t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 848w, https://substackcdn.com/image/fetch/$s_!o08t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 1272w, https://substackcdn.com/image/fetch/$s_!o08t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o08t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png" width="1456" height="510" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:510,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:239738,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o08t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 424w, https://substackcdn.com/image/fetch/$s_!o08t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 848w, https://substackcdn.com/image/fetch/$s_!o08t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 1272w, https://substackcdn.com/image/fetch/$s_!o08t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1bba6c35-b478-4a8d-9033-ac30922c97d8_1666x584.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><br></p><p><strong>Performance:</strong><br>The repository provides a Benchmark file that can be run directly. Official results show that on an H800 SXM5 with CUDA 12.6, it achieves speeds of up to 3000 GB/s under memory-bound configurations and 580 TFLOPS under compute-bound configurations.</p><p></p><p><strong>Code Analysis</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!78sG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!78sG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 424w, https://substackcdn.com/image/fetch/$s_!78sG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 848w, https://substackcdn.com/image/fetch/$s_!78sG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 1272w, https://substackcdn.com/image/fetch/$s_!78sG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!78sG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png" width="497" height="501.67764705882354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98535b2d-d90a-4d1e-9e00-903020272662_850x858.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:858,&quot;width&quot;:850,&quot;resizeWidth&quot;:497,&quot;bytes&quot;:163327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!78sG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 424w, https://substackcdn.com/image/fetch/$s_!78sG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 848w, https://substackcdn.com/image/fetch/$s_!78sG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 1272w, https://substackcdn.com/image/fetch/$s_!78sG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98535b2d-d90a-4d1e-9e00-903020272662_850x858.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>FlashMLA&#8217;s codebase is relatively small with minimal dependencies.<br>The primary optimization techniques are as follows:</p><p><strong>1. Computation Chunking and Scheduling Optimization</strong></p><pre><code><code>template&lt;int kHeadDim_, int kBlockM_, int kBlockN_, int kNWarps_&gt;  
struct Flash_fwd_kernel_traits_mla {  
    // Fixed block size of 64x64  
    static constexpr int kBlockM = kBlockM_;    
    static constexpr int kBlockN = kBlockN_;    
    
    // Each block uses 8 warps in parallel  
    static constexpr int kNWarps = kNWarps_;    
    static constexpr int kNThreads = kNWarps * 32;  
    
    // Shared memory optimization  
    static constexpr int kBlockKSmem = kHeadDim % 64 == 0 ? 64 : 32;  
};</code></code></pre><p><strong>Key Points:</strong></p><ul><li><p>Improves computational efficiency through chunking (block size of 64), paged KV caching, and multi-warp parallelism.</p></li></ul><p><strong>2. Memory Access Optimization</strong></p><pre><code><code>struct Flash_fwd_mla_params {  
    using index_t = int64_t;  
    int b, seqlen_q, d, d_v;  
    int h, h_h_k_ratio, ngroups;  
    bool is_causal;  
    float scale_softmax, scale_softmax_log2;  
    int *__restrict__ cu_seqlens_k;  
    void *__restrict__ q_ptr;  
    void *__restrict__ k_ptr;  
    void *__restrict__ v_ptr;  
    void *__restrict__ o_ptr;  
    void *__restrict__ softmax_lse_ptr;  
    index_t q_batch_stride;  
    index_t k_batch_stride;  
    index_t v_batch_stride;  
    index_t o_batch_stride;  
    index_t q_row_stride;  
    index_t k_row_stride;  
    index_t v_row_stride;  
    index_t o_row_stride;  
    index_t q_head_stride;  
    index_t k_head_stride;  
    index_t v_head_stride;  
    index_t o_head_stride;  
    int *__restrict__ block_table;  
    index_t block_table_batch_stride;  
    int page_block_size;  
    int *__restrict__ tile_scheduler_metadata_ptr;  
    int num_sm_parts;  
    int *__restrict__ num_splits_ptr;  
    void *__restrict__ softmax_lseaccum_ptr;  
    void *__restrict__ oaccum_ptr;  
};</code></code></pre><p><strong>Key Points:</strong></p><ul><li><p>Uses paged KV caching (block_table, page_block_size).</p></li><li><p>Optimized memory layout and access strides (stride).</p></li><li><p>Scheduling with tile_scheduler_metadata.</p></li></ul><p><strong>3. Softmax Computation Optimization</strong></p><pre><code><code>for (int mi = 0; mi &lt; size&lt;0&gt;(tensor); ++mi) {  
    MaxOp&lt;float&gt; max_op;  
    max(mi) = zero_init ? tensor(mi, 0) : max_op(max(mi), tensor(mi, 0));  
    #pragma unroll  
    for (int ni = 1; ni &lt; size&lt;1&gt;(tensor); ni++) {  
        max(mi) = max_op(max(mi), tensor(mi, ni));  
    }  
    max(mi) = Allreduce&lt;4&gt;::run(max(mi), max_op);  
    const float max_scaled = max(mi) == -INFINITY ? 0.f : max(mi) * scale;  
    sum(mi) = 0;  
    #pragma unroll  
    for (int ni = 0; ni &lt; size&lt;1&gt;(tensor); ++ni)  {  
        tensor(mi, ni) = exp2f(tensor(mi, ni) * scale - max_scaled);  
        sum(mi) += tensor(mi, ni);  
    }  
    SumOp&lt;float&gt; sum_op;  
    sum(mi) = Allreduce&lt;4&gt;::run(sum(mi), sum_op);  
}</code></code></pre><p><strong>Key Points:</strong></p><ul><li><p>Uses log2/exp2 instead of log/exp.</p></li><li><p>Optimizes with FFMA instructions.</p></li><li><p>Warp-level reduction for summation optimization.</p></li></ul><p><strong>4. Double Buffering Optimization</strong></p><pre><code><code>struct SharedStorageMLA {  
    union {  
        struct {  
            // Double buffering for K matrix  
            cute::array_aligned&lt;Element, cosize_v&lt;SmemLayoutQ&gt;&gt; smem_q;  
            cute::array_aligned&lt;Element, cosize_v&lt;SmemLayoutK&gt; * 2&gt; smem_k;  // Double buffer  
            cute::array_aligned&lt;Element, cosize_v&lt;SmemLayoutP&gt;&gt; smem_p;  
            cute::array_aligned&lt;ElementAccum, cosize_v&lt;SmemLayoutRow&gt;&gt; smem_scale;  
        };  
    };  
};</code></code></pre><p><strong>Key Points:</strong></p><ul><li><p>Hides memory latency with double buffering to improve hardware utilization.</p></li></ul><p></p><p><strong>Summary of FlashMLA</strong></p><p>FlashMLA is essentially a customized version of FlashAttention. Its current applicable scenarios include:</p><ul><li><p>Environments requiring CUDA 11+ and SM90+ Hopper architecture.</p></li><li><p>Inference or training of multi-head attention with BF16 (Q=576, V=512).</p></li><li><p>Large-sequence scenarios requiring integration with split-K schemes to boost throughput.</p><p></p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JEzB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JEzB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 424w, https://substackcdn.com/image/fetch/$s_!JEzB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 848w, https://substackcdn.com/image/fetch/$s_!JEzB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 1272w, https://substackcdn.com/image/fetch/$s_!JEzB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JEzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png" width="1456" height="822" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:822,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:484379,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157812222?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JEzB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 424w, https://substackcdn.com/image/fetch/$s_!JEzB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 848w, https://substackcdn.com/image/fetch/$s_!JEzB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 1272w, https://substackcdn.com/image/fetch/$s_!JEzB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc22417-02e9-4a3c-96ec-a4656d17f5de_1654x934.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>As shown above, there are still many optimization methods from the official team. Looking forward to tomorrow&#8217;s project&#8212;could it be infra-related? Perhaps something as challenging as MTP? </p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aigc.news/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aigc.news/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[DeepSeek OpenSourceWeek is Coming: What Mysterious Technologies Might Be Unveiled?]]></title><description><![CDATA[Let&#8217;s guess what kind of projects DeepSeek might open-source next week.]]></description><link>https://aigc.news/p/deepseek-opensourceweek-is-coming</link><guid isPermaLink="false">https://aigc.news/p/deepseek-opensourceweek-is-coming</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Sun, 23 Feb 2025 14:34:45 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GNiV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GNiV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GNiV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GNiV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GNiV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GNiV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GNiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg" width="592" height="394.3764705882353" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:453,&quot;width&quot;:680,&quot;resizeWidth&quot;:592,&quot;bytes&quot;:20551,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157742571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GNiV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GNiV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GNiV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GNiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2488486e-4b53-4e2a-83f6-650972e753c3_680x453.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>On February 21, DeepSeek announced on X the launch of a warm-up for Open Source Week. Starting next week, they will open-source five projects over five consecutive days. For each of these five projects, I will write detailed articles to introduce them on the day they are announced. Feel free to follow me for the latest analyses.</p><p>Today, I&#8217;ll make some predictions about which projects might be open-sourced. If I guess even one correctly, I&#8217;ll consider it a win.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q1NO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q1NO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 424w, https://substackcdn.com/image/fetch/$s_!q1NO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 848w, https://substackcdn.com/image/fetch/$s_!q1NO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 1272w, https://substackcdn.com/image/fetch/$s_!q1NO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q1NO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png" width="524" height="628.8" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1296,&quot;width&quot;:1080,&quot;resizeWidth&quot;:524,&quot;bytes&quot;:563803,&quot;alt&quot;:&quot;https://x.com/deepseek_ai/status/1892786555494019098&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157742571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="https://x.com/deepseek_ai/status/1892786555494019098" title="https://x.com/deepseek_ai/status/1892786555494019098" srcset="https://substackcdn.com/image/fetch/$s_!q1NO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 424w, https://substackcdn.com/image/fetch/$s_!q1NO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 848w, https://substackcdn.com/image/fetch/$s_!q1NO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 1272w, https://substackcdn.com/image/fetch/$s_!q1NO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd03abd3a-54a1-4dd4-b3c3-905ba032dec6_1080x1296.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://x.com/deepseek_ai/status/1892786555494019098</figcaption></figure></div><p></p><p><strong>First, there&#8217;s definitely going to be something related to infra.</strong><br>According to the <a href="https://x.com/deepseek_ai/status/1892786555494019098">X post</a>, they are a small team within DeepSeek, sharing small but genuine progress, specifically modules used in online services. They particularly emphasized &#8220;small,&#8221; which likely points to code related to model inference optimization.</p><p></p><p>The recent release of DeepSeek-R1 has generated significant buzz, but inference optimization still lacks robust support from major frameworks. It feels like they might directly release some official implementations, as this is currently in high demand.</p><p></p><p>These optimizations could include deployment and inference strategies mentioned in the <a href="https://arxiv.org/abs/2412.19437">DeepSeek V3 technical report</a>, such as Prefilling and Decoding.</p><p></p><p><strong>Second, the official repository index hints at a paper, which is likely related.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kSAw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kSAw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 424w, https://substackcdn.com/image/fetch/$s_!kSAw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 848w, https://substackcdn.com/image/fetch/$s_!kSAw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 1272w, https://substackcdn.com/image/fetch/$s_!kSAw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kSAw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png" width="577" height="527.0673076923077" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1330,&quot;width&quot;:1456,&quot;resizeWidth&quot;:577,&quot;bytes&quot;:776053,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157742571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kSAw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 424w, https://substackcdn.com/image/fetch/$s_!kSAw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 848w, https://substackcdn.com/image/fetch/$s_!kSAw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 1272w, https://substackcdn.com/image/fetch/$s_!kSAw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F432dbb17-c11b-4a12-82d2-fcdf07cd3b53_1784x1630.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">https://github.com/deepseek-ai/open-infra-index</figcaption></figure></div><p><br>Today, DeepSeek created an "<a href="https://github.com/deepseek-ai/open-infra-index">open-infra-index</a>" repo on GitHub, which includes a paper: <em><a href="https://arxiv.org/abs/2408.14158">Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning</a></em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pyc2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pyc2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 424w, https://substackcdn.com/image/fetch/$s_!Pyc2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 848w, https://substackcdn.com/image/fetch/$s_!Pyc2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 1272w, https://substackcdn.com/image/fetch/$s_!Pyc2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pyc2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png" width="565" height="692.3239436619718" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85920787-4058-43e1-a396-ad030fef687a_1136x1392.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1392,&quot;width&quot;:1136,&quot;resizeWidth&quot;:565,&quot;bytes&quot;:933441,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157742571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pyc2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 424w, https://substackcdn.com/image/fetch/$s_!Pyc2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 848w, https://substackcdn.com/image/fetch/$s_!Pyc2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 1272w, https://substackcdn.com/image/fetch/$s_!Pyc2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85920787-4058-43e1-a396-ad030fef687a_1136x1392.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This <em>Fire-Flyer AI-HPC</em> paper mainly discusses optimizations for a training cluster with 10,000 A100 GPUs, focusing on reducing system construction costs and energy consumption. Specifically, compared to DGX-A100, it achieves about 80% of the performance while cutting costs by 60% and energy consumption by 50%. I won&#8217;t go into the detailed innovations here&#8212;interested folks can read the paper themselves.</p><p>Why mention this paper? My guess is that they&#8217;ll open-source some of the system code referenced in it. Looking at it, here are a few possibilities:</p><ul><li><p><strong>HFReduce</strong>: A library developed specifically for efficient allreduce operations, designed to optimize GPU communication in PCIe architectures. HFReduce overlaps computation and communication through asynchronous allreduce operations, significantly boosting performance.</p></li><li><p><strong>HaiScale</strong>: A distributed data-parallel training tool that uses HFReduce as its communication backend, enabling asynchronous allreduce operations during backpropagation to improve training efficiency.</p></li><li><p><strong>3FS</strong>: A high-performance distributed file system designed to fully leverage the high IOPS and throughput of NVMe SSDs and RDMA networks. The 3FS system supports efficient read/write operations and achieves traffic isolation under high loads.</p></li><li><p><strong>3FS-KV</strong>: A shared-storage distributed data processing system built on 3FS, supporting various data models (e.g., key-value storage, message queues, and object storage) with read-write separation and on-demand startup features.</p></li><li><p><strong>HAI Platform</strong>: A time-sharing scheduling platform that manages and schedules training tasks to ensure efficient GPU resource utilization.</p></li><li><p><strong>Checkpoint Manager</strong>: A tool for managing checkpoints during large language model training, supporting rapid recovery from hardware failures to ensure training continuity.</p></li></ul><p></p><p>Whether they open-source one or several of these is hard to say, but at the very least, it&#8217;ll involve systems from this paper&#8212;possibly the HAI platform.</p><p></p><p><strong>Third, an RL training framework? It&#8217;s possible.</strong><br>They might open-source some RL methods from DeepSeek R1. RL is genuinely tough to train. However, I think the likelihood of this is low.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Rlwb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Rlwb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 424w, https://substackcdn.com/image/fetch/$s_!Rlwb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 848w, https://substackcdn.com/image/fetch/$s_!Rlwb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 1272w, https://substackcdn.com/image/fetch/$s_!Rlwb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Rlwb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png" width="581" height="429.5096296296296" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:499,&quot;width&quot;:675,&quot;resizeWidth&quot;:581,&quot;bytes&quot;:185491,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157742571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Rlwb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 424w, https://substackcdn.com/image/fetch/$s_!Rlwb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 848w, https://substackcdn.com/image/fetch/$s_!Rlwb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 1272w, https://substackcdn.com/image/fetch/$s_!Rlwb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6b502ee-6bd5-480e-8b7a-0ac2263eae9c_675x499.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p></p><p><strong>Fourth, an inference framework? The most likely.</strong><br>DeepSeek&#8217;s internal inference optimization is top-notch, and their API pricing is very low. However, inference costs for external vendors remain high. If they open-source an inference framework, it could help major frameworks optimize efficiency more quickly.</p><p>For example, the recently announced NAS (<em><a href="https://arxiv.org/abs/2502.11089">Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention</a></em><a href="https://arxiv.org/abs/2502.11089">)</a> might come with code. It&#8217;d be even better if they included something like distributed KV caching.</p><p></p><p><strong>Fifth, a big guess about the open-source license.</strong><br>Currently, DeepSeek uses the MIT license, which is the most permissive one. Many vendors&#8217; open-source licenses are stricter&#8212;LLaMA, for instance, has changed its license multiple times and is no longer considered a true open-source model.</p><p>Here&#8217;s a rundown of licenses for some popular models:</p><ul><li><p><strong>DeepSeek</strong>: MIT, fully open.</p></li><li><p><strong>Qwen</strong>: Apache 2.0 + additional terms for some models.</p></li><li><p><strong>Mistral</strong>: Apache 2.0.</p></li><li><p><strong>LLaMA</strong>: Non-commercial research license, the most restrictive.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BaXU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BaXU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 424w, https://substackcdn.com/image/fetch/$s_!BaXU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 848w, https://substackcdn.com/image/fetch/$s_!BaXU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!BaXU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BaXU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png" width="604" height="377.91483516483515" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1456,&quot;resizeWidth&quot;:604,&quot;bytes&quot;:410470,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aigc.openbot.ai/i/157742571?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BaXU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 424w, https://substackcdn.com/image/fetch/$s_!BaXU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 848w, https://substackcdn.com/image/fetch/$s_!BaXU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 1272w, https://substackcdn.com/image/fetch/$s_!BaXU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f2c606f-b64a-418b-8de2-294abcb308aa_1752x1096.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Recently, Perplexity&#8217;s r1-1776 fiasco was clownish behavior, which might push DeepSeek toward a stricter license. Still, I&#8217;m guessing these five projects will stick with MIT. I&#8217;ll share more details on the licenses once they&#8217;re officially announced.</p><p></p><p>Thank DeepSeek. Next week, we will continue analyzing the five open-sourced projects.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aigc.news/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aigc.news/subscribe?"><span>Subscribe now</span></a></p><p></p>]]></content:encoded></item><item><title><![CDATA[#0 Scale Mindset]]></title><description><![CDATA[Scale Mindset]]></description><link>https://aigc.news/p/0-scale-mindset</link><guid isPermaLink="false">https://aigc.news/p/0-scale-mindset</guid><dc:creator><![CDATA[pxiaoer]]></dc:creator><pubDate>Sun, 02 Jun 2024 05:22:41 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!1a0Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1a0Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1a0Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 424w, https://substackcdn.com/image/fetch/$s_!1a0Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 848w, https://substackcdn.com/image/fetch/$s_!1a0Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 1272w, https://substackcdn.com/image/fetch/$s_!1a0Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1a0Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png" width="1456" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1555646,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1a0Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 424w, https://substackcdn.com/image/fetch/$s_!1a0Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 848w, https://substackcdn.com/image/fetch/$s_!1a0Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 1272w, https://substackcdn.com/image/fetch/$s_!1a0Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F99203108-3b5b-443d-976b-49fa251f6042_2000x1118.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>Scale Mindset</strong></h2><p>Scale refers to the replication, development, and expansion of something to generate accumulative, positive feedback effects that can achieve exponential growth.</p><p>This mindset encourages y&#8230;</p>
      <p>
          <a href="https://aigc.news/p/0-scale-mindset">
              Read more
          </a>
      </p>
   ]]></content:encoded></item></channel></rss>