The Quest – Mapping Health Information

Published by Lodestone Logic on

The following post describes the research and findings associated with the Indiana University Center for Law, Ethics, and Applied Research in Health Information Health Information Map.

Two years ago, my friend, Stan Crosley (@croshoops), approached me with a project for the Indiana University Center for Law, Ethics, and Applied Research in Health Information. His request was for Lodestone Logic to help them map where health information is collected, shared, used, and goes to die. Initially, I laughed at him because I believed that someone else had created such a map. I agreed to sign a contract. But I also told him that once I find that the map that already existed, that I’d move on.

That was two years ago. I can officially say that I’ve been schooled.

The initial phase of the project included reaching out to different parts of my network to find the existing map. Even though no one was able to produce a map for my reference, I persisted on my hunt. Finally, after approximately 3 months, I gave in and proclaimed that a map didn’t exist.

The next phase was to collect and filter the information that I had received during the previous months and determine an approach for constructing a map that could be used for policy level discussions. What I found interesting about this phase of the research was that many people had only defined ‘health information’ as the data that is collected and produced during care within a healthcare system or setting. What was missing was the recognition of all of the health data and information that is being produced by individuals, patients, and/or consumers as part of their daily lives – either through posts to FaceBook, tweets on Twitter, or text messages on their mobile phones. This was a significant disconnect then and continues to be not realized even today.

We then began to collect information from online entities. We created accounts and looked at the fields of data that they asked individuals to complete – among these many fields were name, address, zip code, diagnosis, and medical history. We noted whether or not these fields were mandatory or were optional at the point of data collection.  We also reviewed privacy policies. These policies gave us information about data that was being collected without the user needing to do anything – e.g. IP address, operating system, etc. The privacy policies also gave us insight to how the data was used, stored, and shared by that particular entity. Between the data fields required at sign up and the data collected by the entities, we quickly amassed 144 data attributes. We used these attributes in one-on-one interviews with stakeholders and representatives of other entities where this type of data was not freely available – e.g. MD offices and hospitals.

This part of the research was comparatively easy. Our interviewees were very cooperative and provided us information about the data that they collect in a typical interaction with patients in a care setting.

We quickly realized that 144 attributes was a challenging number to manage and represent in a data model. So, we invested time and energy in grouping and collecting specific attributes. We finally were able to create a list of 44 attributes. We worked with Jeb Banner (@jebbanner) and Smallbox to mock up an initial visualization that would aid users to select the entities and see where the attributes overlap. The final version of this table is here.

The next phase of the project was to determine where the data moved/flowed following initial data collection. We reached back out to our original entities and asked them for another interview. The responses that we received were unexpected. One-third of the time we were able to schedule an interview, one-third of the time there was no response, and the last one-third of the time was an  apology that they will not be able to provide that type of information to use.  I believe that these responses were based on a couple of factors –

  1. The entity felt that the flow of the data was proprietary and speaking to a researcher doing research that will be public was not in the company’s best interest
  2. The representative of the entity really didn’t know where the data flowed

With the data that we were able to collect, we embarked on the development of a visualization. This visualization would allow for the utilization of the initial data set AND expose the flow aspects. My original concept was called “Fireworks” and it included time elements.  My belief was that people need to know that the data that is collected lives on beyond the original point of capture. After shopping this idea around with several visualization folks, I realized that with the budget that we had, that it wasn’t possible.

HIMAP fireworks

However, after attending the Health 2.0 conference in September 2010, I found my visualization person in Damien Leri  (@damienleri) from Big Yellow Star.

Damien quickly educated us on many different off-the-shelf visualization tools and helped us structure our existing data so that he could quickly mock up examples. It was a huge leap forward for the project. We finally committed to the “sideways circus tent” design. With this design, we were able to show that all of the data originates from the patient, but what each entity does with it was uniquely different. If you click on the “color” option, viewers of the visualization are also able to see the individual data attributes that move; it’s bits and pieces and not comprehensive flows.

We continued to try to track down data flows, but realized that our progress was waning.

We were also coming to the realization that even though we had made significant progress in identifying where data is collected and where some of it moves to, we still hadn’t gotten to the ‘so what?’ So, we embarked on trying to capture the story of how the adoption of technology and progress of society has influenced health information to become more ubiquitous Health information moves not just in the health systems but publicly through non-HIPAA regulated entities.

To create the compare and contrast, we initiated a new method of research for the project. We established a standard set of questions based on the original 44 data attributes. We engaged in discussions and interviews with entities about their historical practices that took place versus how they practice currently in 2012. Oddly enough, our requests for interviews about the past with a specific patient case scenario as the frame for the interview re-opened doors and people were more willing to speak with us. These responses were collected and analyzed.

We also realized that the major shift in the ubiquity of health data is going to take place in the next decade.  That meant that we needed to create a heuristic model of the potential for health data collection and use it in the future. We leveraged our network of thought-leaders for insights and predictions. We tracked with futurists and their predictions. We collected our own insights based on current trends within consumer models that will bleed over into healthcare as the quantified self- movement grows and the consumerization of healthcare happens.

With this new data and information, it took us a few months to figure out how to structure the data for visualization to allow viewers to appreciate the storyline and differences. As we were working with Damien, we realized that it couldn’t be encapsulated in a single visualization. Instead, it needed to be multiple visualizations that allowed the viewer/user to choose each year 2002/2012/2022 and view them individually to see the difference. This allowed us to determine the best visualizations for the different aspects of the data that we wanted to highlight.

During the project, the environment for health data continued to rapidly evolve. Even though many aspects of our data were becoming outdated, we knew that we just had to finish and produce what we had learned. The entire project came to a natural close/end in the fall of 2012.

Our Learnings –

1. The definition of health data is messy. Many entities are collecting identifiable health data elements, but do not recognize or tag the data as such.

2. As the US moves to the Affordable Care Act and people are incented to manage care based off of outcomes not activities. There is a HUGE opportunity to combine health data streams and provide personalized care to an individual based on their whole self, not just what is shared with a Healthcare Practitioner in a 20-minute appointment.

3. Based on current day technologies, most people assume that their healthcare data is moving and available when they present themselves in a care setting. Yet, we found that there were health data silos that created islands of health data that are not connected to anything else. These islands exist in healthcare settings that are part of the same ‘system’ and even under the same roof, but the data isn’t connected. For example, we found a situation where the radiology lab at a hospital is outsourced to a contract vendor. This vendor is unable to connect to the hospital’s electronic health record to access the patient’s record. This created administrative work and increased the potential for error when re-connecting the radiology results to the patient record.

4. In general, individuals lack the knowledge about their health digital footprint and the information that is out in public. They are being used by entities to segment and target customers.  Last year, one of my employees emailed me to let me know she would not be in because her daughter had a suspected case of head lice.  When I opened the email in my Gmail account, I immediately saw an advertisement for a head lice prevention treatment.  Clearly, Google was using the text of her email to target an advertisement to me.  I found this amazing and I was really excited (and somewhat frightened) to see our research in action.


5. With the continued increase in the everyday utilization of mobile and sensor technologies, there will be volumes of data associated to individuals. It will be Orwellian, but if this information is integrated in an effective manner with genetic and other data sources. It will lead to a better understanding of diseases. This will generate relevant, innovative, and personalized solutions. Medication labels won’t need to be generalized warnings based on population statistics; they will identify specific things that relate to the individual that needs the medicine. N=1 is definitely on the horizon.

Finally, we are so honored that Stan and IU CLEAR chose us to help them with this research. We are mega fans for the work that they are doing and look forward to continuing to partner with them in the future.

Visit the HIMAP to see how all of these experiences led to the development of multiple visualizations that represent a high level model of where health data is captured, stored, used and shared. If you have any thoughts about this topic, please comment to this post or send me an email at