???Welcome to a new world! Today, d-Matrix introduces Corsair, the first-of-its-kind AI compute platform. 60,000 tokens/sec at 1?ms/token?latency for Llama3 8B in a single server,?30,000 tokens/sec at 2?ms/token latency for Llama3 70B in a single rack.? Corsair shines with ultra-low latency batched throughput! Ideal for tomorrow’s use cases where models will “think” more. Supercharging reasoning, agents and video generation. Celebrated with a toast at SC'24. Onwards and upwards! #dmatrix #corsair
关于我们
To make AI inference commercially viable, d-Matrix has built a new computing platform from the ground up: Corsair?, the world’s most efficient compute solution for AI inference at datacenter scale. We are redefining Performance and Efficiency for AI Inference at scale.
- 网站
-
https://www.d-matrix.ai
d-Matrix的外部链接
- 所属行业
- 半导体制造业
- 规模
- 51-200 人
- 总部
- Santa Clara,California
- 类型
- 私人持股
- 创立
- 2019
- 领域
- AI、semiconductor、Inference、AI chips和Blazing fast inference
地点
-
主要
5201 Great America Pkwy
US,California,Santa Clara,95054
d-Matrix员工
动态
-
#Hiring ?? Apply: https://lnkd.in/gEkh6YuC
-
-
Standing Room Only?@ The University of Texas at Austin Hook 'Em House on energy efficiency and AIwith?d-Matrix's Richard Ogawa at SXSW 2025. AI's requirement for?more and more?energy in the data center as we move into the age of reasoning and inference needs a whole new architecture and technology, and the panel looked at the technologies that are paving the way -- including d-Matrix's innovations. #AI #Inference #EnergyEfficiency ?
-
-
d-Matrix转发了
Ultra-low latency, batched inference + HumanX2025 I often get asked how the d-Matrix solution is different from GPU computing platforms for Gen AI Inference. The GPU is an excellent throughput engine, that needs a lot of batched users to keep the GPU fully utilized. This comes at the expense of latency. The d-Matrix solution with 2.5D and 3D stacked chiplets and memory-compute integration is a throughput engine, that needs fewer users to get fully utilized and accomplishes this with ultra-low latency. Back in the 1930s, a journey from NY to San Francisco required you to take a train, that could carry a 1000 passengers over 5 days. Then came the jet engine that ushered in commercial air travel, and now a journey from NY to San Francisco requires you to take a plane, that can carry 250 passengers (say) over 5 hours. At d-Matrix, we have built a new engine for the inference age. We can carry a collection of users to their destination a lot faster, while ensuring commercial viability for the carrier. Both train travel and plane travel exist today, and serve different purposes. If you care to dig into the math a bit, here is a link https://lnkd.in/gez9f8K5 I will be at HumanX tomorrow speaking on a panel with Ami Badani and Rob Pegoraro on the need for Rethinking infrastructure: Custom solutions for the AI era (Track Stage 1, Level 6), please join us. See you there. #HumanX2025 #GenAI #Inference
-
-
Corsair is the ??????????-????-??????-???????? ???? ?????????????? ????????????????.?It offers unparalleled performance and efficiency for generative AI inference in the datacenter. Each Corsair PCIe card contains two ASIC packages connected back-to-back using PCIe Gen5 where each package has four chiplets connected in an all-to all-topology using our DMX Link. Four packages across a pair of cards are connected via DMX Bridge in an all-to all-topology across 16 chiplets. The unique topology is critical for fast token generation speeds. In fact, four such pairs of Corsair cards (ie. 8 cards) can be connected with PCIe switches and scaled up to build an inference server that integrates easily into AI rack infrastructure. Build for Fast AI Inference. ???? ???????? ???????? ????????. Take the 3 minute tour: https://lnkd.in/g3sG5nVj #Fast #AI #Inference
-
What are the hardware implications of DeepSeek R1? ??-???????????? + ???????????????? = ?????????????? ????????, ?????????????????? ?????????????????? See how d-Matrix and DeepSeek match up:: https://www.d-matrix.ai/ #AI #Inference #deepseek
-
"So it's no surprise that startups like d-Matrix and others are looking to get in on the "fast inference" game as well. The company expects its?Corsair accelerators, due out in Q2, will be able to run models like Llama 70B at latencies as low as 2 ms per token,?which, by our estimate, works out to 500 tokens a second. The company has set its sights on even larger models for its next-gen Raptor series of chips, which we're told will use vertically stacked DRAM to boost memory capacity and bandwidth." https://lnkd.in/g78Am9q9
-
d-Matrix's Sid Sheth talks what's next for real world adoption of AI on stage at HumanX with Arm's Ami Badani. Here's a preview of the talk centering on the catalysts that democratizing AI and making technology commercially viable for enterprises with hardware designed for reasoning. Up next? The application layer and developers. #AI #Inference #reasoning
-
#Hiring ?? Apply: https://lnkd.in/gxJEiZnt
-
-
HumanX is on! d-Matrix is taking the stage with Arm to talk power in the datacenter, the DeepSeek moment through how the Inference Age demands a rethink of semiconductor architecture. Rob Pegoraro sits down with d-Matrix CEO / founder Sid Sheth and Arm's Ami Badani to discuss energy in the datacenter and how enterprises in the real world are scaling their AI Inference deployments around workloads. 11:15 AM on stage. See you there! #HumanX #AI #Inference
-