The CAT’s Out of the Bag: Introducing Content-Addressable Transformers
BlockScience
A complex systems engineering firm combining research & engineering to design safe & resilient socio-technical systems.
A New Framework for Data Transformation Provenance in Complex Systems Modeling
This piece is introducing and exploring the concept of?Content-Addressable Transformers?(CATs) from a high-level. CATs are part of the BlockScience?team’s research on data driven systems, inspired by our work in the?Filecoin?ecosystem but applicable far beyond. While this research exists at the design pattern level for the time being, we see it enabling a new frontier of impactful use cases on the content-addressable web. We invite you to join us on the development journey of this exciting new software framework for data provenance in complex systems modeling.
A process diagram that shows how a Content-Addressable Transformer might provide reliable, verifiable data transformation & outputs.
What is a CAT?
Content-Addressable Transformers (shortened to CATs) are a unified software framework that empowers cross-domain collaboration on data-processing pipelines between decentralized multi-disciplinary teams and organizations. They enable data and process verification and provenance as chains of evidence for retrieval and re-execution via?content-addressing?the means of processing (input, process, output, infrastructure-as-code). CATs are implemented using the horizontal scaling capacity of Web2 cloud services and enable data provenance using content-identifiers (CIDs) as a means of data transport between services.?This framework offers data transformation provenance that is critical for data verification in large-scale open source modeling and data science.
Example outputs (pink boxes) of the cryptographic hashing of data (blue boxes).
CATs use content-addressing to find input data, the code for the processing of that data, and instructions for building the system on which the code will run. In this way, CATs define running specific data through specific programs with specific parameters. As a result, a verifiable output is obtained with a receipt of its computational provenance, which is also content-addressed.?This process creates a chain of evidence that can be used to verify data sources and the process used to transform it into its present form, for more transparent, verifiable, and reliable data.
A system diagram of a CAT, with inputs, outputs & process flows.
Using the nomenclature of CATs, we can verify the:
These transformation processes will then be composed into more complex workflows, and more complex compositions can also be content-addressed, and so on. This enables?Kubernetes-style containerized computation that’s also content-addressable on decentralized networks like the?Interplanetary File System (IPFS).
How CATs might integrate into a larger ecosystem of web services.
What Can CATs Do?
The design space is ripe for exploration! For something as general purpose as verifiable data and compute, it’s hard to imagine the depths of what’s possible. That being said, we have a few ideas:
领英推荐
?? ???Collaborative data science
???Shared tooling for data-driven reasoning about the world
???Better bug reporting
???Dynamical art
What’s Next for CATs?
This is the first part of an ongoing research series on CATs. In the future we’ll share a more technical post exploring the mechanisms that make CATs work, and a demo to start to interact with the code. We hope this sparks your imagination and interest in CATs and the content-addressed web!
CATs are part of our blue sky research stream, and are very much a work-in-progress. You can find our technical work on this process in our Github repo: ?https://github.com/BlockScience/cats, as well as our?technical next steps?on the project.
If you are interested in supporting further research in this area, experimenting with CATs for your project’s use case, or want to contribute to the codebase and be an early adopter, please reach out to [email protected].
Content for this article was produced by David Sisson ? Joshua Jodesty ,?Burrrata and Jeff Emmett . Special thanks to? Michael Zargham , Jessica Zartler , and Dr Kelsie N. ?for taking the time to review and provide feedback.
About BlockScience
BlockScience?? is an engineering, R&D, and analytics firm specializing in complex systems. Our focus is to design and build data-driven decision systems for new and legacy businesses leveraging engineering methodologies and academic-grade rigor.
With deep expertise in Blockchain, Token Engineering, AI/Data Science, and Operations Research, we can provide quantitative consulting to technology-enabled businesses. Our work includes pre-launch design and evaluation of economic business and ecosystem models based on simulation and analysis. We also provide post-launch monitoring and maintenance via reporting, analytics, and decision support tools.
Original article published on the Block Science Medium site.