Neo4j’s Emil Eifrem reports on how Germany is using graph
technology in combination with AI to make connections in research that no-one
else is doing. Is this something the UK’s health sector leaders should also be
doing?
Diabetes is
one of the most widespread diseases worldwide, and increases not only of type 2
diabetes in our ageing population but also of type 1 diabetes will present
major challenges to the NHS in the coming years – type 2 diabetes in children
has risen 40% in three years amid Britain’s obesity epidemic, for instance.
Health
policymakers are using all the tools they can summon to try and help. In
Germany, for example, the country’s national Centre for Diabetes Research (DZD)
is looking to investigate the causes of the disease and, through new scientific
findings, develop effective prevention and treatment measures to halt the
emergence or progression of diabetes. DZD is an instructive example of what can
happen with diabetes research when a new way of tackling the problem is
explored.
A ‘master database’ to consolidate diabetes information
Based in
Munich, DZD brings together experts from across the Federal Republic to develop
effective prevention and treatment measures for diabetes across multiple
disciplines, and to see what treatment the latest biomedical technologies may
offer citizens dealing with the condition. In order to better understand
diabetes’ causes, its scientists examine the disease from as many different
angles as they can.
DZD’s
researchers, then, combine basic research data sources – genetics, epigenetics,
metabolic pathways – with data from clinical studies. Connecting this highly
heterogeneous data is a challenge, but necessary in order to answer biomedical
questions across disciplines.
Recently,
its IT leadership decided it needed a better way of connecting this research
data from various disciplines, locations and species. Besides connecting data
sources, it wanted an easy-to-understand visualisation of data and easy
querying so that scientists benefit from it. The result is a ‘master database’
to consolidate this information, and provide its 400-strong team of scientist
peers with a holistic view of available information, enabling them to gain
valuable insights into the causes and progression of diabetes.
In search of
a suitable data tool to build such a system on, Dr Alexander Jarasch, the
Centre’s Head of Bioinformatics and Data Management, drew on experience gleaned
from previous work on a project at Munich’s Helmholtz Zentrum. That had used a
graph database – a positive experience that prompted him to test graph
technology at DZD, specifically Neo4j’s graph software. Dr. Jarasch has thus
offered his colleagues a new internal tool, DZDconnect, built in graph software
that sits as a layer over the various relational databases linking different
DZD systems and data silos. DZDconnect is not fully implemented yet, but DZD
staffers can already access metadata from clinical studies in the prototype –
and are particularly impressed by the visualisation and the easy querying it’s
made possible.
‘The more detailed the information, the easier it is to
identify relationships and patterns’
Many researchers wonder if graph databases
(technology that powered the Paradise Papers
investigation, among other intriguing examples of cracking big data problems)
could help in the prevention, early diagnosis and treatment of major illnesses
and so save lives. Why? Because not only is graph technology ideally suited to
depicting hidden relationships and discovering unknowns at big data scale, it
is also able to handle dynamic and constantly evolving data – something that
medical thinkers say is vital with scientific or bioinformatics analysis
research.
“With graph technology we were able to combine
and query data across various locations,” Dr. Jarasch enthuses, adding that, “Even though only
part of the data has been integrated, queries have already shown interesting
connections, which will now be further researched by our scientists.”
In the long term, as much DZD data as possible
should be integrated into graph database, Jarasch believes, noting that the
next step is to see how human data from clinical research will be complemented
with highly standardised data from animal models, such as mice, to find
communalities or other insights.
It’s not just graph software that is being
employed. AI techniques like Machine Learning in combination with graph
software will play a key role going forward, says DZD, with a particular area
of interest being building a system able to ‘read’ scientific texts and
integrate them into the database ready for analysis.
The promise
is that the more detailed the information, the easier it is to identify
relationships and patterns – which could really help in cracking the diabetes
problem. The kind of innovative data management and analysis approach DZD is
pioneering could well be the way forward in precision medicine, prevention and
treatment of diabetes – and, perhaps, other diseases.
Given that the NHS needs as much help combating
diseases as possible, could graph technology’s innate ability to discover
relationships between data points have an important role to play in fighting
not just diabetes, but many other problems? The DZD example does seem to suggest
this is a pathway worth exploring.
The author is co-founder and CEO of Neo4j, the world’s
leading graph database (http://neo4j.com/)