Big Data and supercomputing ...
Somalee Datta
Specialization in petascale computing for health and biotech research applications; Broad experience in healthcare research, genomics, drug design, privacy and everything in between...
From annals of Stanford University biomedical analytics:
Biomedical Big Data, like any other Big Data, is noisy and sparse. And like other Big Data analytics, wrangling and making sense of biomedical data takes great skill and even greater patience.
Scale-up and scale-out are both critical approaches for data analysis and consequently, for a Big Data analytics platform. Stanford community is blessed with a state-of-the-art data center at Stanford Linear Accelerator Center (SLAC) and a fantastic research computing center. To manage the accelerating pace of data growth, Stanford has steadily adopted Cloud (scale-out approach) - first with low risk data and more recently with protected health information. Earlier this year, Department of Genetics at Stanford, became proud recipients of a supercomputer grant from NIH. And over the course of the year, we developed our scale-up approach - the supercomputer was eagerly procured, lovingly deployed, extensively benchmarked and let loose in hands of the biomedical research community.
And our community is doing precisely what we expected them to do - wrangling data in novel ways that would otherwise have been significantly harder and expensive using exisiting scale-out approaches. Stanford supercomputer, a SGI UV300 model (now part of HPE), has 360 intel cores (720 Threads), 10 Terabytes RAM, 20TB NVMe (Flash memory), and 4 NVidia Pascal GPGPU cards. Check out further at: https://med.stanford.edu/gbsc/uv300.html.
If you are a fellow supercomputing enthusiast, have a cool biomedical use case in mind and/or willing to collaborate (ahem, intern!) with us at Stanford, please drop me a note. And look out for success stories from our community at the above mentioned website.
And did we mention, we are hiring!