Teraflop AI

数据基础架构与分析

关注

查看全部 3 位员工

关于我们

Big data

网站: https://www.teraflop.ai
Teraflop AI的外部链接
所属行业: 数据基础架构与分析
规模: 1 人
类型: 私人持股

Teraflop AI员工

查看全部员工

动态

Teraflop AI

38 位关注者
8 个月
举报此动态
Teraflop AI is excited to help support the Caselaw Access Project and Harvard LIL, in the release of over 6.6 million state and federal court decisions published throughout U.S. history. In collaboration with Ravel Law, hlslib digitized over 40 million U.S. court decisions consisting of 6.7 million cases from the last 360 years into a dataset that is widely accessible to use. You can bulk download the data using the CAP API: https://case.law/caselaw/ It is important to democratize fair access to data to the public, legal community, and researchers. You can find a processed and cleaned version of the data available on Huggingface here: https://lnkd.in/ezkqG5bH You can find more information about accessing state and federal written court decisions of common law through the bulk data service documentation: https://case.law/docs/ You can learn more about the Caselaw Access Project and all of the phenomenal work done by Jack Cushman, Greg Leppert, and macargnelutti here: https://case.law/about/ During the digitization of these texts, there were erroneous OCR errors that occurred. We worked to post-process each of the texts for model training to fix encoding, normalization, repetition, redundancy, parsing, and formatting. Teraflop AI’s data engine allows for the massively parallel processing of web-scale datasets into cleaned text form. Our one-click deployment allowed for us to easily split the computation between 1000s of nodes on our managed infrastructure. Thank you to Nomic AI for providing us with Atlas research credits to store and visualize each of the jurisdictions in this dataset. You can access the New York jurisdiction map and all of the other Nomic AI Atlas maps on Huggingface here: https://lnkd.in/e2JGH7Bf Nomic’s Atlas projection algorithm clusters semantically similar data together generating a topic hierarchy. You can find more information here: https://lnkd.in/e9JwrPJw Nomic AI released nomic-embed-text-v1.5, an open-source, 8192 context text embedding model here: https://lnkd.in/e7qx6-Hy You can find the detailed research paper on the methodologies used by Zach Nussbaum, Andriy Mulyar, and Brandon Duderstadt for the nomic-embed-text-v1.5 model here: https://lnkd.in/ejMHsT2W You can find all of the information here detailed in this post: https://lnkd.in/e5hyAvKr Thank you to Shayne Longpre, Robert Mahari, Jon Tow, StabilityAI, Barry Zhang, Sam Ching, Eleuther AI, Daniel Chang, and the many others who have been supportive over these last months. We plan to release trillions of commercially licensed text tokens, images, audio, videos, and other datasets spanning numerous domains and modalities over the next months.
赞评论分享

相似主页

有意向到Teraflop AI工作吗？

Teraflop AI

数据基础架构与分析

关于我们

Teraflop AI员工

Enrico Shippole

ML Engineer

David Andrews

CS @ Georgia Tech | ML/AI Engineer @ TeraflopAI

Daniel Ching

Alum @ TKS | RISE Finalist | ML @ Teraflop

动态

立即加入，查看您错过的职场动态

相似主页

Ancient Tech LLC

DuckAI

IDXExchange

QuestMobile

Foundation

EuroLeaps

Talkyverse

Amariso Eyewear

Dhaka Reader

Framebird