Mark Fortner, Founder and CEO of Aspen Biosciences joined to explain the concept of FAIR data. The idea around FAIR data is to ensure that data is available and labeled consistently to allow people AND machines to correctly interpret, for example, what the label on a column of data means. The benefit for all is to enable more effective data sharing in collaborations or over time from one study to another.
Mark provided the following summary for me.
We can take the FAIR acronym as a laundry list of specific problems:
Findable
Data has to have a unique, persistent address. Often data in an academic environment has a brief shelf life before disappearing into a bitbucket, never to be found again.
It has to be well described.
It has to be registered and indexed so that people (or machines) can search through it.
Accessible
Data & Metadata has to be retrievable by that unique address using open standards and protocols.
Metadata has to be accessible even when the data are not. Even if the data has disappeared you should be able to find the people, institutions or publications where the data were presented.
Interoperable
Data & metadata need to be represented in a standard broadly acceptable language.
It has to use a common vocabulary.
It may include references to other data sets, and those references need to be as specific as possible. Instead of gene X is associated with disease Y, the reference should indicate that a specific mutation in gene X is responsible for disease Y.
Reusable
In order for data & metadata to be reusable, we have to understand the provenance of it. Who generated it? how was it done? What instruments and materials were used? What was the protocol that was followed to generate the data?
You want to give the people who follow the trail you blazed enough information to allow them to reproduce those results.
You want them to know whether they can combine the data they've generated with the data you've generated. Are those data sets comparable? Or were there too many differences in the protocol to make that possible?
Example
Imagine a large pharma company investing in a project at a small biotech that is developing a novel therapy. Being novel, there is no expertise in this area at PharmaCo.
Biotech is going to hand over a package of assay data for evaluation. If it’s not clear how the data was created and what it means, it will take time and effort (money) to sort out. Having FAIR data helps smooth out the process.
The alternative is that PharmaCo risks a lot of money or Biotech spends a lot of time educating their pharma partners on methods etc. Both companies benefit from this process but the burden is on the smaller company producing the data.
Spreading and embedding
Ideally, where possible, tools like mass spec or other analytical equipment would output FAIR data by default.
What is surprising to me is that academic research is where the idea of FAIR data is taking hold because of its collaborative nature. Mark mentioned to me after the call that it’s not just for corporations and goes beyond biology and chemistry. Archeologists are adopting this and some European funding agencies are requiring it. This makes total sense to me. The agencies are paying people to produce data of all kinds. They want it to be persistent and accessible for studies later on.
Chat with Chris about content for your life science brand
Share this post