How Big Data Drives Scientific Progress

High Performanced Big data 800x800

What underlies disruptive scientific advancement? Does science progress best when a ‘theory-driven’ approach is being followed?

Or is scientific progress rather driven by the development and introduction of new technologies that lead to the ‘sudden’ availability of phenomenal amounts of data?

This question has never become more relevant in an era in which Big Data is being generated en masse. The famous scientific philosophers, Karl Popper, Thomas Kuhn and Max Perutz, may provide us with some interesting clues in answering this question.

Karl Popper postulated that scientific advancement is primarily based on theory-driven research in which hypotheses are formed and subjected to experimental falsification. Science evolves via the rejection of falsified theories, and theories surviving falsification process gain strength over time.

According to Thomas Kuhn, the actual acceptance of revolutionary theories, generated by this theory-driven research, relies heavily on the social and cultural habits of the scientific community. In many scientific fields, however, only a small fraction of scientists dominate. This comes with a huge risk: hypotheses can be rejected based on both personal and/or political motivations of only a handful highly influential scientists. This seriously hampers scientific progress as it negatively affects the speed at which novel hypothesis are generated, proposed, and rejected by the scientific community.

In contrast to Karl Popper’s and Thomas Kuhn’s “theory-driven” views on scientific progress, Max Perutz, postulated that scientific advancement is predominantly driven by observations, made either by accident or design, without having any hypothesis or paradigm in mind.

Max Perutz’ viewpoint was likely shaped by his career in crystallography, a field that is less theory-driven and less subjective compared to other areas in biomedical science. His observation centric perspective of scientific progress implies that disruptive technological advancements are key, as they can effectively provide us with data enabling to reveal remarkable ‘experimental phenomena’ while exploring it.

Data generated by most Big Data approaches (e.g. Next Generation Sequencing (NGS) applications), is conceptually similar to the data generated by crystallography: unbiased and many data points are generated in a single experiment. This can lead to interesting observations and trigger the formulation of new disruptive hypotheses. When therefore considering scientific progress in the Big Data era, a scientific progress mechanism as described by Max Perutz is currently at play: it is disruptive data gathering technology advancement that underlies, and drives, the disruptive progress being made in science.

It is clear that Big Data approaches already reach much further then research. NGS approaches, for example, initially only used in academia, are already being included in numerous clinical, industrial, and forensic applications. Another exciting thought is that this bottom-up disruption is still in its infancy. Big players like IBM, Google, Amazon, and Apple are dedicated to further develop cloud computing and deep-learning based technologies. We are therefore, undoubtedly, still underestimating the scientific and societal impact of the technology driven disruption in the Big Data era.