From Data Threads to Entity Fabric
Ken Johnston Entity Graph

From Data Threads to Entity Fabric

Here is a metaphor that has helped me in my work as a Data Scientist

How do you go from individual threads strung across a loom to a piece of fabric and how do you take that fabric and turn it into a product? Turns out there are great big industrial machines that weave all those threads together for us. These machines are pretty amazing when in motion. Racks upon racks of thread are fed straight into the industrial loom at high speed and out the other end is a full piece of cloth where the thread count can range from a loose weave to one-thousand thread count.

Ken Aside: I always thought high thread count for sheets meant they were better. They certainly felt smoother and more luxurious but thread count as the only measure of a good sheet is a myth. As a general rule any sheets over 800 thread count actually degrade the durability. Also the higher the thread count the less room between fibers and so the sheets breath less (Huffington Post Buying Guide). This is far from ideal for those hot summer nights.

I just attended AI Day for a group of us here within Microsoft. One of the common refrains was, “and then after we spent months organizing and labeling our data we built a DNN to …” In data science I feel like I am always tugging at threads and at times I’m out in the fields harvesting the cotton just to bake a new spool. But, when it all works, when it all comes together we produce a new fabric. Sometimes it’s one full or rich browns and oranges for the fall and at other times we make pastels with flower prints for the spring.

In the end though it all comes back to finding the right threads, stitching them together with the right balance of thread count and weight with room to breathe that produces the real value.

Our Fabric Making Process

It usually starts with an idea. “I wonder if.” It could be a simple idea or something crazy like I wonder if batteries in mobile devices last longer in colder climates? We’ve never done this but we all know how heat drains the batteries on our mobile devices faster so maybe colder air around them would help. 

From this idea we look for the raw material, some sort of data source. We spelunk through the data (Big Data Spelunking is another concept I’ll post about later) and try to see if we can find a truth or even a view of a truth. 

As the research evolves across a set of observations we eventually need to organize the data for a higher scale research. The data layer at this stage is typically a set of streams organized into an entity graph model. 

Ken Aside: Interestingly data scientists seem to like to talk about the size of their graphs. When we do we don’t use thread count but instead talk about nodes and edges. My current graph interestingly has billions of nodes but the edges are not as large a ratio as I expected. Dense edges help better measure the relationship between entities. We need to continue to grow the edges of our new graph before publishing a full fabric. 

The graph is really the place where we can pull the various threads together. Threads are those insights that produce business value. They can be seen in personalization on websites, fraud detection algorithms, cross-sell, or up-sell opportunities. The path to production from a graph and the threads we pluck from it takes time and quite often a large set of production jobs. These jobs help weave these insights into a product we can publish, a fabric.

The one misconception I often get about the fabric layer is the assumption that it is analogous to an OLAP cube. The fabric really is a flat table from which dimensions and time series can be drawn. Really the Entity Fabric is the same as a search Index or a fact table. Just a more colorful metaphor. Building a cube is more analogous to manufacturing shirts from a bolt of fabric.

Raw material, graphs, threads, and then a final fabric to drive business value for production. It’s a metaphor that has worked well for me. 

要查看或添加评论,请登录

Ken Johnston的更多文章

社区洞察

其他会员也浏览了