As part of the OpenPOWER Workgroup on Personalized Medicine, more than 40 interested attendees participated in the Personalized Medicine Workshop organized on Nov 14, 2015 in Austin (TX). The objective of the workshop was to bring together important players and address relevant issues in the field as well as chart future developments, challenges and opportunities. The workshop had 4 different sessions focusing on:
1) Clinical users’ perspective
2) Technology providers’ perspective
3) HPC researchers’ perspective
4) Challenges and trends in the form of a panel session
CLINICAL USERS’ PERSPECTIVE
The workshop started with two talks from the clinical users perspective by John Zhang (MD Anderson) and Hans Hofmann (UT Austin). Dr. Zhang described the state-of-the-art of the computational infrastructure at the MD Anderson Cancer Center used for the analysis of the center’s genomics pipelines, followed by a discussion of the challenges that need to be addressed in the future. These challenges can be classified in 4 different categories related to: genomics data storage, clinical algorithm adaptation, data mining, and data visualization.
Dr. Hofmann presented a global analytical framework needed to link genotype information to phenotype information, by addressing the biochemistry, the cell biology and the physiology aspects of an organism. This framework charts the various computational and analytical issues that need to be studied by the scientific and engineering community to achieve an understanding of phenotype expression. He further noted that in order for personalized medicine approaches to succeed, we need an understanding of the causes and consequences of individual and population variation that goes way beyond current genome-wide association studies and genotype variation.
TECHNOLOGY PROVIDERS’ PERSPECTIVE
Zaid Al-Ars (BlueBee) presented the platform that BlueBee is developing to address the genome analysis challenge by providing an accelerated HPC-based private cloud solution to speed up the processing of mass volumes of genomics data. The platform provides unrestricted scale-up and on-the-fly provisioning of computational and data storage capacity, with an industry-grade security and data integrity features. Such a platform abstracts away the complexity of utilizing highly-specialized HPC technologies (such as hardware acceleration) and offers an easy environment to deploy both BlueBee as well as other OpenPOWER technologies developed by various partners to those active in the genomics field.
Yinhe Cheng (IBM Life Sciences) discussed the efforts of IBM to port various important genomics software tools to the Power platform, combined with various optimizations of these tools to ensure a significant speedup on Power. In addition, IBM brought together contributions from various academic partners (such as the Delft University of Technology) and industrial partners (such as Nallatech), contributions ranging from scalable genomics big data implementations and hardware acceleration, and integrated those on the Power platform.
HPC RESEARCHERS’ PERSPECTIVE
A number of academic partners talked about their HPC research contributions to the genomics field. Ravishankar Iyer (UIUC) presented a couple of research projects focusing on improving the performance of cancer diagnostics pipelines by coding a computational pipeline from scratch that executes significantly faster than current state-of-the-art pipelines. In addition, the group is developing algorithms for health monitoring systems and wearable devices and is integrating them into a unified personalized medicine platform.
Jason Cong (UCLA) showed an approach that enables big data applications to utilize a scale-out architecture together with FPGA-based hardware acceleration based on the Spark framework, capable of speeding up the processing time of computationally intensive programs on a hybrid cluster of CPUs and FPGAs. This approach is being used to enable a rapid increase in performance for genomics computational pipelines, such as those used for whole-genome and whole-exome sequencing experiments.
Wayne Luk (Imperial College London) presented a talk covering reconfigurable acceleration of genomics data processing and compression, where he showed FPGA accelerated components to speed up parts of RNA diagnostics pipelines used to identify cancer. In order to address the large sizes of genomics datasets, the group implemented accelerated compression algorithms to speed up effective storage and management of DNA information. This effort will continue, focusing on optimizing and speeding up the transMART downstream DNA data analysis system on Power platforms.
“CHALLENGES AND TRENDS” PANEL SESSION
Four panelists participated in a panel moderated by Peter Hofstee (IBM) representing various users of genomics information and pipelines: Chris Webb (Dell Medical School), Phil Greer (University of Pittsburgh), John Zhang (MD Anderson) and Hans Hofmann (UT Austin).
Dr. Webb started the discussion by indicating that relevant personalized medicine questions cannot be answered by individual scientists or research groups in isolation. Effectively addressing such research questions requires close collaborations of multidisciplinary teams of doctors, geneticist, computer scientists and mathematicians, to come up with suitable models and efficient computational methods usable in a clinical environment. Mr. Greer pointed out that a number of changes need to take place to enable effective analysis of personalized medicine information. One important issue highlighted is the lack of unified approaches to document and store patient medical records, which complicates linking different sources of information relevant to personalize medical care.
Answering a question from Dr. Hofstee about the upcoming challenges in the growing field of population sequencing, Dr. Zhang identified the need to develop methods that help doctors in making “actionable decisions” based on patient medical information. Dr. Hofmann commented that even common tasks such as data transmission are rapidly becoming a bottleneck due to the staggering sizes of population sequencing information. He further elaborated that standards need to be developed to ensure security and easy integration between the various genomics data types.
The panel concluded that a challenging issue the community needs to address in the coming years is the creation of computational approaches that take into consideration the inherent variations of the human genome and the different ways these variations play a role in different individuals. This would provide doctors with the computational tools needed to identify the levels of confidence associated with a specific therapeutic intervention. Such tools will play an important role to bring about the medical revolution of personalized medicine.
The workshop summary is also covered in the OpenPOWER foundation blog.