登录查看更多内容

Seeing Images in Single Cell Data (Pareidolia)

Jon Hill

Experienced Data Scientist Focused on Computational Biology

发布日期: 2023年3月22日

This post will describe a bit of an unusual application for generative AI. To be honest, I’m still not sure if it falls into the bin of something genuinely useful, or just a bit of whimsy with data, but once I had the idea, I really, really wanted to see it implemented.?

I was sitting with two collaborators a few weeks back, analyzing single cell data. For those not familiar, a common analysis technique is to assign clusters to the cells in the data set, and then perform a dimensionality-reduction on the data set with UMAP or t-SNE to visualize. You wind up with a scatter plot of different cells, lumped together into blobs of various shapes by similarities in their patterns of gene expression.

Each blob has a cluster id, and eventually, maybe a name with a bit more biological meaning (“these are the activated fibroblasts!”), but the actual navigation with the collaborator went much more like this:

Collaborator: “Ok, now click on the cluster that looks like a snake. Hmmm…interesting.? Ok, now the round blob to the northeast shaped like Australia.”

Me: “This one?”

Collaborator: “No, the other one, which is sort of pear-shaped.”

This experience is familiar to anyone describing animals in clouds when cloud-gazing with friends, and it turns out to have a name: pareidolia .? Seeing familiar things in random patterns.? Ink blot tests are another case.

I was reading a bit about image generation models a bit after this, and got to thinking - why don’t we just make all this description explicit?? Instead of cluster numbers, let’s just agree on what the blobs look like, generate a map of those, and then use it to navigate!

The two innovations that made this scheme possible were first the availability of stable diffusion models that could even be accommodated on my creaky-old personal computer GPU and the fact that these models supported in-painting. With in-painting, you can mask off areas of an image that you don’t want generated.? A normal use is for retouching existing images; I used it to generate images that approximate specific shapes.

So, the workflow in the end is:

Generative AI 4 个月前

NIPRGPT the experimental bridge to leverage Generative…

Air Force Research Laboratory 5 个月前

Human Intelligence versus Machine Intelligence

Jacques Ludik 1 年前

Generate your UMAP visualization, tuning the dot sizes so that you get a “filled” blob where possible rather than just a bunch of dots.
Come up with ideas for objects to use as prompts which might fill the spaces. Don’t overthink this - many animals work well for these since, for example, cats are liquid and can fill many spaces.
The algorithm then generates masks for each cluster and generates images with StableDiffusion.??
I’ve not yet totally optimized the prompts or models here, so there is a bit of guesswork and lots of room for improvements. Currently I inefficiently generate more images than would be necessary, and pick the one which is most space-filling.
Stitch everything back up into a single image.

As someone who has not worked with direct graphics manipulation in a while, the last point was a bit painful to program, and I spent quite a bit of time on this, when I should have been rambling about GPT-4 .?

For the test, I snagged a medium-sized data set from GEO , on cells from a mouse paw, which was uploaded by Morgan G Anderson-Crannage at New York Medical College.? This fit my criteria of "not something I'm directly working on for real projects" and "not an insane number of clusters to test things out."

I'd started by generating a typical UMAP to get the clusters:

No alt text provided for this image — Original UMAP with Numbered Clusters Assigned by Leiden Clustering

After going through the process I'd described above, here’s an example of the final product.

And...here's another less space-filling version. Probably best to pick and choose the most visually-appealing representation for each cluster for the final result.

Now, if you are navigating a dataset with your collaborator, rather than boring old "cluster 1," you can easily refer to the “cluster of two fishes” or the “curled up cat” and find your place in the data set!

The notebook is available on Github. ?It would be great to make it into a more robust tool; I’d be happy to collaborate a bit if you’d enjoy this as a project. No reason this couldn't be applied to other clusters as well, outside of single cell.

Hopefully, someone will find this useful in their work and…if not, maybe I’ll just make a submission to the annual ISMB art exhibition. :-)

要查看或添加评论，请登录

Jon Hill的更多文章

Partners in Science: Evolving from Student to Scientific Leader

2024年2月23日

Partners in Science: Evolving from Student to Scientific Leader

At Boehringer Ingelheim, our commitment to engaging with our local communities is deeply ingrained in our corporate…

1 条评论
Summarization and Prompting

2023年9月25日

Summarization and Prompting

I recently came across a preprint from Griffin Adams et al that covered a new approach called Chain of Density for…

2 条评论
Don’t Confuse Consistency with Quality

2023年9月13日

Don’t Confuse Consistency with Quality

Earlier this summer, I’d decided that it would be a good idea to learn Microsoft Power BI. This is a tool used to…

6 条评论
Leading Change

2023年6月13日

Leading Change

During recent travel, I had the opportunity to read Leading Change, by John Kotter, which is a sort of "business…

1 条评论
What if LLMs are GOOD for security?

2023年6月8日

What if LLMs are GOOD for security?

I had recently shared some thoughts on appropriate security access for LLMs on confidential data, but what if LLMs…

1 条评论
LLMs and Sensitive Data

2023年5月31日

LLMs and Sensitive Data

My colleague Victoria Gamerman, PhD recently shared an article from Tamer Chowdhury about architecture for using…

1 条评论
The Six (Prompting) Hats

2023年3月28日

The Six (Prompting) Hats

I had previously shared some impressions on the Six Thinking Hats method which was recommended by a colleague as a way…

1 条评论
ChatG-PPi-T: Finding Interactions with OpenAI

2023年3月6日

ChatG-PPi-T: Finding Interactions with OpenAI

In an earlier article, I’d posted about some mixed results in using the different LLMs provided by OpenAI to answer…

2 条评论
PowerPoint to Email with OpenAI

2023年3月2日

PowerPoint to Email with OpenAI

I was having a conversation with a colleague during his recent visit to the U.S.

9 条评论
Using Chat-GPT to Generate Structured Biological Knowledge

2023年2月26日

Using Chat-GPT to Generate Structured Biological Knowledge

After my previous post on using Chat-GPT to explain biological findings, I was interested in digging in a bit more…

11 条评论

See all articles

Seeing Images in Single Cell Data (Pareidolia)

Jon Hill

Experienced Data Scientist Focused on Computational Biology

领英推荐

Jon Hill的更多文章

社区洞察

其他会员也浏览了

Vectors are over, hashes are the future of AI

Artificial Intelligence #96

Spatial Intelligence in AI

AI4Future: Top AI News (7-13 October)

Artificial Intelligence #91

The 10 Most Innovative Applications of AI in 2023

Artificial Intelligence News - 22nd July, 2024

#15 AI Research News Updates

#REPLAY - Watch the latest discussions on AI

How to Improve Small Object Detection Accuracy Without Increasing Latency

领英推荐

Jon Hill的更多文章

Partners in Science: Evolving from Student to Scientific Leader

Summarization and Prompting

Don’t Confuse Consistency with Quality

Leading Change

What if LLMs are GOOD for security?

LLMs and Sensitive Data

The Six (Prompting) Hats

ChatG-PPi-T: Finding Interactions with OpenAI

PowerPoint to Email with OpenAI

Using Chat-GPT to Generate Structured Biological Knowledge

社区洞察

其他会员也浏览了

Vectors are over, hashes are the future of AI

Artificial Intelligence #96

Spatial Intelligence in AI

AI4Future: Top AI News (7-13 October)

Artificial Intelligence #91

The 10 Most Innovative Applications of AI in 2023

Artificial Intelligence News - 22nd July, 2024

#15 AI Research News Updates

#REPLAY - Watch the latest discussions on AI

How to Improve Small Object Detection Accuracy Without Increasing Latency