On July 1 2014, as part of the 1st UCL Festival for Digital Health, a roundtable discussion was held on the theme of open data in healthcare (see further information here), co-badged with i-sense, the EPSRC IRC in Early Warning Sensing Systems for Infectious Diseases. This meeting involved a closed panel, comprised of 13 panelists and Sir John Tooke as chair. Over the two-hour meeting, a broad range of themes were addressed. Due to the closed nature of the event, organisers decided that providing the public with a summary of the discussions would be desirable.

[EDIT 08/07/14]: A few amendments were made to the blog after panelists had some time to review the content. The text below has seen minor changes since the original version.

The initial proposal was to address the following five questions (see below), but due to time constraints, only the first four were discussed.

Q1. What are the benefits of opening up clinical data for health research?
Q2. How can data from different sources (public and private, including non-traditional sources) be merged to deliver healthcare benefits?
Q3. How do we balance access to data with patient privacy?
Q4. What policy changes are needed to responsibly develop big data for health?
Q5. What are the lessons learned from care.data?

Below are the main themes discussed under each question.


Q1. What are the benefits of opening up clinical data for health research?

We have entered the era of the “Internet of Everything” – data, processes, people and things (soon 50 billions of smart objects) will be connected to the Internet. The level of personal data sharing from social media, personal fitness tracking to shopping habits is unprecedented. This is increasing the potential for mining and analyzing these large data sources but we need a secure data federated architecture..

It is important to look at this question from patients’ perspective where the majority of benefits come in better understanding and treatment for specific diseases and benefits of personal care enabled by easier access to information. From the perspective of practitioners, in addition to far reaching research potential, the practical benefits seem to come primarily from more efficient ways of working – for example, GPs due to modern interactive interfaces might require less time for providing safer prescriptions. We can also imagine that in the future, there would be a larger database which can be easily accessible to provide quick comparisons between symptoms, diagnoses, and prescriptions. By pulling out hundreds of similar cases and comparing what other GPs have done in those cases and how each particular case relates to the population, we can imagine better diagnoses as a result.

Opening up clinical data can also provide us with information which highlights social inequities in population health. It is well known these exist between the developing/developed countries but also within the same national, or even local territory (class inequity, for example). The sharing of large amounts of data can provide us with further evidence of these inequities, highlighting problematic areas which need to be tackled.

There are, however, problems that come large amounts of clinical data, not the least of which is data overload. A balance needs to be found in quantity and quality, particularly when it comes to analysing data. An argument was made that the problem we have today is not as much in seeing and sharing data, but rather in our analytic capacity to deal with the sets. There is a whole ecosystem of people giving new signals and searching for different queries, the noise risks are becoming overwhelming. We need a proactive way to think about ways in filtering the information and in designing secure computer systems ecosystems.


Q2. How can data from different sources (public and private, including non-traditional sources) be merged to deliver healthcare benefits?

It is not essential to move data to analyze them – data “merging” in the traditional sense is thus obsolete. You can analyse data without removing it from its source, providing there is an appropriate secure data infrastructure to do so. We can work with massive amounts of data and run queries remotely and securely with disruptive technologies based on data federation and data virtualization.  This brings up questions concerning how much of the data corresponds with each other and how personal data and/or initial source could be identified. Further, are there technologies which would harmonise the data? The analyses that derive from two or more data sets must come together around unique identifiers.

There is a public perception that the use of data from social media platforms or/and private companies in general is done and dusted. There is a challenge to develop and harness the potential for these unconventional data sources. Never has so much data about so many people been held by so few, and in cases of some IT companies, this data is even deleted after a certain time period. These data sources need to be made available for research, or its potential will never be realised. It was argued that there is a need to merge different access to these different sources, regardless where the data comes from and by whom it has been collected.

A point was made that when talking about merging data from different sources, we are addressing IT skills already in existence but in siloes. What needs to be developed is a new coherent  information architecture (intercloud) which brings together not only patient data but also evidence from guidelines, pharmaceutical trials, non medical data, etc. There needs to be a structural reform in terms of what is seen as necessary information both for research as well as for providing better informed diagnosis and prescriptions.


Q3. How do we balance access to data with patient privacy?

There has been a large change in the world of healthcare socially and culturally, and the argument was made that the public (turned from “patients” to “customers”) no longer perceives their GPs as they did two decades ago. GPs are struggling to keep up with workloads affecting traditionally valued doctor-patient relationships and a notion of family physicians changed due to new approaches for delivering treatment and continuity of care. There is now a need to advertise that GPs do care about their patients while the consultation seems to be increasingly shorter and superficial interaction where the practitioner is occupied by entering information into the system. Patients are seen to be consumers and what was once a social contract between healthcare services and patients is now a relationship which takes profit into consideration.

What do we really mean by patient privacy, in that case? Traditionally, an individual was a part of a healthcare system, through an accepted social contract, implying rights as well as responsibilities as both sides were a negotiating party. In the context of research chances to the “social contract”, our thought needs to be put into different kinds of arrangement. Whereas before if was dependent on a patriarchal approach, we cannot now solely think about consent, protection and privacy but have to focus on potential positive benefits to renegotiate new approaches, such as rights to use of data, right for case and trust. It might potentially not benefit the person as an individual to share their data, but the debate should focus on stressing the benefits for population health. A similar argument can be made about vaccines and herd immunity.

Not sharing information can stem from fear of the data being used against the patient, to take away their benefits, impact on their insurance coverage and invade privacy. On the debate of privacy vs. access, the notion of anonymisation was discussed and the feasibility of the concept. To what level are we able to anonymise data and will this process still keep them usable? Similarly, there is also the common fear patients feel when being recorded of being recognised and identifiable outside the practice. There needs to be transparency and clarity of terms and conditions in sharing data for any purpose.


Q4. What policy changes are needed to responsibly develop big data for health? 

Similar points have been grouped together in the following themes:

  • Development of ‘safe-haven’ data registries: Big library of data which researchers can access and share data with management by an independent, not-for-profit organisation. This could potentially be paired with government policy to make data for research purposes available in machine-readable format, and even with funders and journals demanding for data-sets to be published alongside articles.
  • Public and citizen engagement: A change in the current media landscape by providing the public with success stories of communities (on a national, international level, or even for future generations) benefiting from data sharing. These stories must, however, be based on scientific, empirical methods as opposed to anecdotal and tabloid-esque stories. In addition, there should be proactive public engagement instead of damage control and a defensive approach. This will allow for a feedback loop from the public towards policy changes which have more public benefits.
    • On this note, we need to take into consideration how dynamic the world of policy is, and every key consideration will soon be outdated and new amendments required. Therefore, there needs to be constant dialogues between the public and private sectors and with the public through citizen engagement. We need to do much better to promote trust between the public and the government.
  • Clarity and transparency: There is a need for patients to understand their rights in terms of agreeing to data being used in various contexts. Today, it is an overly-complicated process. We require real clarity and transparency about what data is going to be used for, and how; and on the other end, those who are using the data need to understand what the data that is available means, and their level of accuracy (not always directly translatable in different contexts).
    • This level of transparency and clarity also needs to be extended to public understanding of benefits and risks of data-sharing.
  • New data structures: We need to work on a new information architecture (as discussed above) and to deliver finished products that benefit patients, practitioners, clinicians, on every level. This requires funding and further engagements with IT developers the latter of which need to engage more with the various areas of research and to share responsibility as to how these new technologies may be used.
  • Training and education: Workforce policies also need to be considered. These are hot themes and developing areas, and we need to train the future workforce accordingly. They will be working in a different world in terms of information technology, and need to know how to use it both ethically and efficiently.