How do you predict whether a given patient is likely to die from a heart attack? Conventional medical wisdom would base a risk assessment on factors such as the person’s age, whether they were smokers and/or diabetic plus the results of cardiac ultrasound and various blood tests. It may be that a better predictor is a computer program that analyses the patient’s electrocardiogram looking for subtle features within the data provided by the instrument.
A team of researchers at Massachusetts Institute of Technology and the University of Michigan analysed a large data-set of 24-hour electrocardiogram recordings collected at a Boston hospital as part of a clinical trial for a new drug. Employing a number of computational techniques involving algorithms for signal processing, data mining and machine learning, the researchers developed a way to analyse how the shape of the electrical waveform varies, a measure they dubbed morphological variability. At the heart of the approach are mathematical techniques used in speech recognition and genome analysis which allow researchers to compare individual beats. “We compute the differences for every pair of beats,” reported one of the researchers. “If there is lots of variability, that patient is in bad shape.”
The team then applied their algorithm to a second set of electrocardiogram recordings and found that patients with the highest morphological variability were six to eight times more likely to die from a heart attack than those with low variability. They concluded that it consistently predicted as well or better than the indicators commonly used by physicians…
Although these stories are reports about medical research, they are really about computing – in the sense that neither would have been possible without the application of serious computer power to masses of data. In that way they reflect a new – but so far unacknowledged – reality; that in many important fields leading-edge scientific research cannot be done without access to vast computational and data-handling facilities, with sophisticated software for analysing huge data-sets…
The man who did most to alert the world to the urgent need to take “computational science” seriously was Jim Gray, a much-loved visionary who worked for Microsoft Research. Towards the end of his life, Gray argued that we had moved into what he called “the Fourth Paradigm” of scientific research, which he dubbed “data-intensive scientific discovery”. In 2007 he went sailing off the Californian coast – and simply disappeared. Neither he nor his boat was ever found, despite an intensive conventional search butressed by a huge online effort by volunteers who scanned satellite images of the maritime area where the boat was estimated to be.
Last week, in a touching tribute to a lost colleague, Microsoft Research published a handsome book of essays in his memory. It’s entitled The Fourth Paradigm: data-intensive scientific discovery and is available as a free download. In it are 30 thoughtful essays on four areas which were central to Jim Gray’s vision – environment, health, scientific infrastructure and scholarly communication. This book should be required reading for every policymaker responsible for science and technology to remind them that we now have to provide the resources to fund the IT infrastructure. If we don’t give them these tools, then we cannot expect them to finish the job.
Regular readers know how often I characterize one or another field of physical research as something I’d do if I were young enough to start all over again. And that the thread binding together all these endeavors is computational analysis.