Webinar: Linking up all those data – why should we make data FAIR?

Webinar: Linking up all those data – why should we make data FAIR?



so welcome everyone to the third of the reconnect we've been asked for the European reference network networks so today we are going to focus on fair data what it means why it's important that they are shareable and interoperable in databases this in red is E's research so just some organized organization stuff so on the presentation is going to take roughly there to eat right Marco but yeah so they around 30 minutes and then we'll have time for questions so if you have any questions please type them in the chat box so we should see a bar on the bottom of the screen and one of them is of the buttons is they labeled chat so please type in your questions in the so that everyone stays muted during the presentation so we can do it without interruptions and and before Marco starts I'm going to make a very brief introduction of Arda kinetics for those who maybe would like to learn more about general about what we do so give me a second ominous Queen oh that's not what I was about to share that's interesting can you see my screen I can okay that's so today we're going to talk about third data but first I'm going to explain a bit more about how to connect so we are dealing with rare diseases what's difficult about rare diseases is that the patient's are so few and they are scattered across countries across the world so it's very difficult to to do any research or even provide it any healthcare that's why it's very important that in case of rare diseases that we do international collaboration and that we share an exchange and pull data which would allow us to do better research diagnosis from the clinical trials and provide health to the patients so currently there is a lot of research going on and on rare diseases but a lot of data or dinner is generated for those different types of data are stored in different databases and usually they are silat without much connection between those differences and that's why are the connect aims on connecting there was different types of data and making them available to researchers and clinicians to improve research and and healthcare and for this reason the European Commission has granted has founded Rd connect to enhance and promote and enable data sharing in the rare disease research and when I speak about data sharing I mean sharing of genomic data phenotypic data as well as information about patient registries and bio samples in bio box so for this purpose Rd connect has created an integrated platform which consists of three systems the genome phenome analysis platform which lets research nations analyze genomics data diagnose patients and discover new breath disease genes sample catalog which in which you can find information about individual bio samples start in red is experiments and the registry and biobank find them which contains information about different rare disease registries special registries and the type of information they have in them and how does this work article 8 receives data from researchers and clinicians for example those in European reference networks as well as by opens about the omics data phenotypic data bio samples data all the data goes to the integrated Rd connect platform why it's made available for researchers and clinicians gene discovery diagnosis therapy development so of course this is very beneficial for the clinicians and researchers but most importantly it's it's beneficial for the rare disease patients are the clinic has the novelty connect project is currently coming to an end next month but Rd clinic as such is going to continue work in a different form so and recently we have launched the early connect community which is an association of of individuals or organizations or research groups who have the common aim of supporting and promoting data sharing in red is ease research so this community the membership is open free of charge for anyone who is involved in rare disease research so whether you're clinician or a patient or a policymaker or someone else you're all welcome to join so if you're interested in joining them you can see you can find more details on our website and register so that's the end of my very short presentation if you have any questions in the future you can become visit our cite or write us at info at our D connect dot u that's it from my sight I'm going to pass it okay Thank You Dorota hopefully I'm sharing the right screen so do you see my my start up screen with a little guy okay so welcome welcome everybody and and and also Thank You Dorothy for the introduction to the new article next in the RT connect community to which we overload also this work that I'm presenting wouldn't have been possible in the proto rare diseases if it wasn't for for RT connect so yeah well to continue with the community so the what but I'm was asked to do and and we'll do here is it to say a few words about why fair for rare diseases to be honest it'll actually not I have loads of slides and different ways to explain this so it wasn't it was actually that trivial to to put a slide deck together so I also hope that we have a nice discussion at the end of the of the presentation where we can explain more and I also see some of my colleagues on the on the in the group so there's others who can help me with answers as well so what I wanted to start with is I'm not going anywhere yeah what I wanted to start with is a student who did an internship with Claire shuffling just one of the the year n1 of people involved in year ends and she also spent a little bit of time here in our group very bright students and she was doing research on hht hereditary he made I'm not going to I won't be able to express this best to pronounce that but she was doing research on that and I'm not going to go into all of her research it's in one of the vascular anomalies and you see some of the symptoms and that it has relations with nosebleeds and and genetic genetic background some possible mechanisms that that that researchers here are looking into and Lisa also worked on a on a hypothesis that you see here as I said I won't go deeply into the science but I want to make some observations that we made when she was doing this research so what Lisa did and could not do so one of the things she did was that she had to manually extract data from medical records and medical data she I think she was lucky enough that she didn't have to take it from handwritten notes but it was still a major task to to get data into a shape that you could do some statistical analyses so all in all there's a big effort for getting a relatively small data set to work with and that had a major influence that's one of the observations we could make a major influence on the statistics that you could do so basically it was largely assumption driven if this could happen and maybe if we take if we add this little bit of extra data then we can just about answer that question but it did affect the the strength of her statistics and it also meant that she couldn't do data-driven analysis she couldn't just take all the data and see what that had what answers were in the data the other thing that we observed it as it was yeah we didn't even think about comparing her results with data from other patients from other diseases that potentially have similar symptoms this is all about iron deficiency and it would have been nice if she could just ask a question for other diseases with the same kind of deficiencies how treatments there would work for instance so these were some observations that hampered students in doing some useful research so in conclusion the local data and other people's data were not readily available for analysis and one thing that that I think is for all of us to think about is that you should multiply this with all the researchers doing similar kind of investigations and also in our group where bioinformatics group bias conditions typically spend months on making the data usable so coming back to door toss starting slide with the different silos and how to deal with it I think usually in this situation people define as a goal to share data and put data together which seems I guess nobody will we'll look at this very strangely but if you if you think a little bit deeper then maybe this is this is not actually the end goal maybe this is actually more mean the sharing data I am putting data together because the actual goal is that we should be able to analyze data and answer questions like we would hope that Lisa will do very efficiently and that will be for questions like this I just put a list of what we call driving questions for our verification cases that we do what are differences in age loss of ambulation as a result of of steroid use in DMD patients how do patients with HTT that relates to the disease that Lisa was investigating how do they respond to radiotherapy in other hospitals and what associations with iron infusions for other patients do we see what's the current information about course and treatment of vascular malformations in across several registries what number of fractures is the most common in patients carrying and osteogenesis imperfecta mutation of course Europe what associations on a molecular level for genetic variant disease severity and phenotypes in red syndrome and where can I obtain tissue samples for transcriptome analysis so sample to start doing some analysis on the common the common denominator here is that all of these questions require multiple data sources to answer them and then the question we could ask is why is this not easy why is it that it's it's such a such a problem and a possible bottleneck that we can think about that maybe the biggest bottleneck for data sharing is the actual need to share data because across countries and Institute's this is a problem so if that is so you need to share data but maybe it's actually the perceived need to physically share data and some of the science of that are that people spend a lot of time on data sharing guidelines embargoes publication embargoes data access forms debates about common data elements are a sign of this aligning policies between institutes and across borders and data warehousing attempts are all signs of trying to overcome this we do however have an excellent need to enable this analysis of resources especially for rare diseases because we do not have these data silos in one location we have them all across the world and because rare disease data is sparse we would need to do analysis across them so dated sparse highly distributed it's heterogeneous it's always often different formats poorly interoperable often sensitive so there is there's quite a big need here so if that sharing is such a bottleneck then and maybe we can think about a different approach and that's what the fair data to large extent is about and also relates to the personal health training paradigm that I will talk about so if sharing is hard then maybe we should just don't share instead maybe we should make our our own data findable accessible under well defined conditions interoperable reusable for humans and computers so that is the fair the fair principles so don't share but be fair at the source and one of the examples of where this was applied this principle one of the early examples is provided by Andre Decker colleague from us in my master clinic in the Netherlands who together with his team did machine learning or to predict whether patients should have chemotherapy or not and he was able to show that the the his predictions were actually doing better than what doctors could do and with quite drastic results so he could show that for instance in Australia where people where doctors assess whether people should get the chemotherapy or not that often patients that doctors were not giving the therapy he could show we could predict that they often actually would have benefited from this chemotherapy in the Netherlands everybody gets that in the chemotherapy by definition but that's equally bad because he could also predict that a lot of patients in the Netherlands that get the chemotherapy he could predict that they would actually not benefit from the chemotherapy and it's not a nice treatment so the people in the last period of their lives get a very harsh chemotherapy where that could have been prevented so in for this story it's actually I'm not going into the the the analysis here but more about how he did this because he needed to do the machine learning he needed data from all over the world so that's why I applied this machine learning algorithm but the data were never shared and were never put together so apparently that's not necessary and the way he did that was by making the day at each of the sources so we had people from his team go there and make their data what we would now call fair and especially in droppable preparing the data for analysis in the same way applying the same standards at each location and that allowed him to do the machine learning algorithms on-site at source and then pull the results together and the results are non cent not sensitive at all it's only that the source data is sensitive and that is the basis for the what we call the personnel train I'll come back to that so first also our little demonstrator so this this is a very advanced machine learning but you can also do more more of the typical questions that I showed before so here's one example where can I obtain bio samples of donors with an abnormality in head or neck and which by bio banks can I find these samples and we put this into a demonstrator using the same approach that Andre Oakes so here you see this demonstrator and I select one of the questions get a number of bio samples from donors with the specific phenotype and from a specific region we can type that in into the type in the variables so this toy has the region the abnormality of head or neck and then process this information and then you get a result and here you see a table with the requested results so that's the first thing it worked but you also see that if you look at the bio Bank and registry columns that you see the day the information comes from different bio banks and different registries and you also see that all the information here is blue that means everything is clickable everything links to something else so for instance you see in the registry column ring 14 clinical database if you click on that we actually go to the bio bank and registry Finder that thought I mentioned that in her introduction and you find information about this database so that also answers the question where can I find it and similarly you can do that for the other information the diseases and phenotypes and also note that you see different phenotypes here and in different diseases so here you can get more information about ring-ring 14 disease and here you have the four phenotypes and what perhaps some of you may also observe is that we asked for abnormalities of hakkon of neck head or neck and you don't see that in this phenotype column and here we're actually making use of the underlying hierarchy of terms because these are all more specific versions of head or neck abnormalities and then we can also click on one and you see more information about that particular phenotype so that the underlying technology is what we often refer to as a personal health train and there's a nice video that explains that so bear with me I tried to fit that I and these stakeholders collect and manage their data in different ways making the data hard to find and use furthermore personal medical information is very privacy sensitive as a consequence personal health data cannot be used easily by citizens themselves by physicians and by researchers the personal health train goes to the root of this problem by building fair data stations there's protocols ensure that data are findable accessible interoperable and reusable these data stations are connected by tracks which are strictly secured and protected their data trains are constantly monitored and only trains with the appropriate validation may enter a particular station to secure privacy every data station has ruled each data owner has access to his or her secured station and can easily request and control one's personal information in addition the owner can Federal's for who can access the data and how it may be used communities with common interests like patient society can choose to connect and organize their information through umbrella stations with this access a researcher can learn for example which cancer patients benefit the most from novel therapies such as proton therapy or a citizen with a specific variant in his genome can be automatically alerted when new research results in fun or possible risks become available in this approach the information is accessible for research prevention and education without the data ever leaving the station the personal health trained health data becomes available for individuals and institutions maintain control so I hope that was audible what what it yes I'm going to paste the link to this video in the chat right now so in case you we watch it later you have the yeah thank you so what we saw in the video on to focus on one thing that we saw in the in the animation is the the fair data station so what what is what is this fair data station so it contains the data at the source and what I think is interesting for for our dealing with data sharing is that it's not limited to a common or minimal data set it's just the data that you have at the local station it exposes the description it expose the description of the data to be findable so this description is not necessarily very privacy sensitive it's just a description of the whole thing and it makes it findable it controls access the content the inside of the fare data's ation is prepared for analysis across fare data stations so this is the interoperability part and finally the content is also richly described for efficient and correct reuse so that's sort of an extension of the interoperability here so that is the the basis of a fair data station and that is what we have in mind for for the fabrication in the rare disease domain to increasingly produce more fair data stations rare disease fair daesik stations so independently fair rare disease data stations for doing some pilots that's the first ones I'm showing here of where we working on that there's also some of bigger resources that have gone certain steps in towards towards fair and machine readable and interoperability I show a few years for Evi as a platform that the European bioinformatics Institute orphanet has an interface that is producing interoperable data a big protein database door so we have a drug resource that made several steps in this direction and there are a few more so in the end you will have an ecosystem of fair data stations and that would allow someone like Lisa to do her analysis much more efficiently and actually do good statistics on her data without spending a huge amount of time on making the data usable for her analysis so instead of making a massive central database which will come with massive sharing issues and several other issues that I won't go into right now and it's usually for common data elements only elements that are not sensitive elements that can be shared so we're not doing that instead what we're doing is we're pushing out standards protocols supports or services to these different sources to enable this fair data station ecosystem so how do we get there then for your data and I'm not going into all details and just giving you a glimpse and one thing to realize is that fair is requiring that the the the four letters that they are human that are for humans so fair for humans but also fair for machines findable accessible in trouble or usable for machines and we need that because then we can help computers help us and that's typically a quite a hard process although dual and one thing to think about that for people who are not so computer minded on on this go here I know some are is one thing to look at it is to do in analogy to think of mr. Spock I don't know if all of you know the character the Star Trek character what mr. Spock is born on Vulcan and everybody there things very logically it has to be logical and thinking of what a computer needs you should actually think of baby Spock the computers are like little baby Spock's so they know nothing they can learn a lot but everything has to be 100% explicit logical and unambiguous so that's what real we will have to work on for everybody's data and just to give you a glimpse so this is some example data or scrambles and such but example data from one of the workshops that we organized and you see on the top you see from four different sources for what we as humans easily see is mutation information but you also see that they have different column matters and in some cases we're not entirely sure but in some cases it seems that they have a different way of describing a mutation so for baby Spock this would be horrible this is not logical at all and you also see in one case the fourth column that there's obviously a mistake and as human beings we it would be easy for us to fix that mistake but for babies Spock and for computers it would be much harder you should also think about scaling so if we do have to do this for for datasets that may be doable but if you have to do that for tens or hundreds that becomes on Google and at the bottom you see data from one from one table and here I want to highlight that the relationships between the values in the columns it's not it's not necessarily clear – baby spoke because it's not specified how this W one relates to this WS s one how patient relates to gene a one as human beings we can't come up with with an interpretation there's a risk by the way that we have the wrong interpretation but for baby spoke it wouldn't be so happy it's even worse for real computers because what we could see here this is the human readable version for computer it would be look more or less like this some text some different type of text some other text it's not very readable so we have to make this readable for computers and the way we do that and that is how we set up the rare disease data linkage plan a couple of years ago with the help of article next to do this in interdisciplinary collaboration because for this you need people who know about the data who know what everything means who can interpret what I just shown in this previous slide but you also need people who know about how to make data machine read readable and the people who know about machine readability they don't know what the data but what it necessarily what biologically means and the people know what the biologically means don't know how machine readability works at the start what is very important for our data linkage plan is that knowledge exchange takes place so we have our fair data stewards and in every verification process we want the domain experts would also be database maintained for instance but the day and they also learn about the same process and learn about the ontology that we use etc and the process that we define this is the pic depicted here in seven steps I won't go into the the individual steps right now there's there's other other place to do that but if you want I can I can elaborate later and we also put this as an elixir deliverable for instance I just wanted to show you very shortly what happens when we do this so here are two registries and oh so Wow scratched on me I have to start it up again sorry about that where was I can still see my screen okay so you do we share my screen can you see it now again yes okay so here you see two registries and I think it's quite so actually there's there's interesting information to be gained from the combination of these two registries I know this because I put these slides together but I think everybody agrees that you cannot really quickly see that already now so integration at this level is quite impossible this would be the situation that Lisa is confronted to it so the next step is then a fine yeah is that we have to make this and now think of baby Spock you see on the human side that we already made the information from these two registries very explicit and very logical so to say subject has disease ring 14 disease and has phenotypes and and seizures and we also do that for the for the other registry we still can't see at this level we can't see the comment the linking point yet from that we also do the machine readable part that you see with all these with all these codes and that's the machine readable version so as humans we don't have to be able to read this but but the machine can handle this and all these these coach you see here they actually come from ontology and what is so interesting about ontology is that they they are for humans and computers they have these both these two faces already and I will come back to that I'll show an example the I'll show that in the neck in the next slide yeah and I explained that and I have an example what is in in the other thing that is important is that this the way this is described here is in a computer language that is globally understood and the knowledge and analogy you can draw here is that is it the English so at the way we use English to communicate between scientists there's also computer language that is globally understood understood to communicate between community computers and that's what we used here and so this information is explicit unambiguous logical and linkable exactly as baby spoke would like it and now we're going to what an ontology is in this case so you see one of these terms oboe HP etc you can see by analogy that that is one of the phenotypes and this HP refers to the human phenotype ontology and here you see a snapshot of that human phenotype ontology and what you can see and that's important here at the bottom you see this hierarchy of terms and that's typically what an ontology does it gives this hierarchy of terms what it also does and that's not shown here it also defines relations between terms so for instance if you that could be the phenotype ontology phenotypes are associated with diseases so there will be a relation that can also be defined in an ontology that epilepsy is associated with under comes disease for instance what is depicted here by the the but I enlarged is that the same information in an ontology this term the same information is in available in a machine-readable term this code thing and in a human readable term this definition so we both can use this human and computers does that make it a little bit more clear what an ontology is yeah okay so what we then have is that if you look of one of these registries we have this human readable version on the right we have the machine readable version what is kind of nice is that the language that we use for this machine readable version we can represent that a tally as a little graph so we often call that a little knowledge graph so here you have the you can see that there's an anonymous patient this the two brackets has a phenotype epilepsy so everything here has codes but I've put the labels for human readable versions there so that makes it readable for us humans that it has a phenotype epilepsy and it has a disease ring 14 disease and this is done at the source so this is affair this would be the information that is in a fair data station this is the other registry we've done the same thing so here you see the knowledge graph that we can that represents the information in this other registry again a person as a disease as a phenotype again also epilepsy but in this case there is a treatment for this patient so when we put that together I here we can actually see because we used exactly the same code and why wouldn't we exactly the same code for this human phenotype they actually become linked and that's what we can exploit by by doing this step towards machine readable data so now we can say and it's a very simple example of course that perhaps for this this patient who didn't have any treatment perhaps is lamotrigine would be a treatment to consider for this patient because we found that in another registry now I've done it between two registries obviously as computational scientists we do not want any has any limitation to the number of registries that we can ask this question to and I'm not going into detail how that works but we don't need so here in this graph I really put it together but we don't need to physically put this together to still do this kind of analysis over these registries so in our rare disease data linkage plan I wanted to show this picture because it's also very human exercise so critical to our data linkage plan and the fara fication process are the people here in the middle the data stewards the fair data stewards and they work with local data stewards so the disease experts or the designated people who manage the data for interest for DMD or for osteogenesis imperfecta their specific experts I've also indicated here registry software providers because it would we are working on that registry software becomes fair data generating that would make everything much more much easier for if you want so we're working with that as well and then on the right you see the fair software engineers who are working on tooling that for instance the registry software providers can put into their tools so critical are the people here in the middle the data stewards and you see all these little else here so that's the the aspect of the knowledge transfer and when we do when we collaborate we the people who have the ELLs here will also become a little bit of an expert on how to make fair data so this is an example of how it of the rare disease data linkage plan how we set that up we've also done this in our own Hospital for one of the cases that we worked on and I thought is also nice to show it there because this could be a paradigm of how to scale this up if we have again we have fair data stewards in the middle and in our case here at the LMC we had librarians and data managers from a data management group who were participating in a fara fication process and we also had people from our ICT department who are participating in this process so we could do knowledge transfer to these people in the endo scott'll who hopefully in the future will also be providing further help with verification so we can move on to other verification processes where we didn't transfer knowledge to yet so in conclusion Fair is about efficient and correct computer analysis Crossrail receive data resources it's about being clear and explicit about what data elements mean and then code data with a globally understood computer language and ultimately fares about making babies book happy and actually in the end the leases of this of this world and Isis buying from editions so what support is there for fair I have a few slides on that so first I wanted to show you this so this is a flowchart or decision tree so to say and you see here so the first question to ask is do I wish my data to be more reusable so if the answer is no then be our guest and make sure and silo your data if the answer is yes then at this point in time we're tooling is is not yet perfect for fair we advised to consult rare disease data linking experts and this is one of the benefits of setting up the rare disease community because the rare it is the article net community because the article net community has these contains these experts so I put a contact information here but you can also contact the article net community so after you've done that you can ask can we organize a verification project and can we do that now because it's it's costly you should you should know this but if you can then you can plan a verification project and you can start verifying your data and it takes roughly between four to eight months it depends a little bit of how you how large and complex your your data set it and typically takes as I mentioned it's very interdisciplinary process so it takes about three or four people to be involved not necessarily all the time but you need a little team so if the answer is no you cannot do that right now then you can ask maybe you want to learn more about it first and there are a couple of training workshops I will mention one especially the Rome summer school which is always very very nice to do that's two days you can also if you can also recode some of your data that's already a step towards fair for instance the orphanage where disease codes if you start using that these human phenotype ontology codes if you start using that then you're already making a step towards fair or in generally general structuring your data helps for a later step towards fair and so here's this just a snapshot of such a fair project blueprint of this multi month project towards verification and this is something we in this consultancy that we discuss about what the costs are and what what need what what you would need way you can get it from there's also supporting infrastructure and I wanted to start with what I find the most important one and that is that is the community itself the the the domain experts so there can be patient patient representatives it can also be the experts now now combined in Rd Connect because they have the knowledge of what the data means and what should be done with it and that's vitally important in to me a vitally important part of the infrastructure then there's the more technical infrastructure that are collaborating and sometimes we from the rare disease community make them collaborate which i think is a nice effect we have the GP actually there's an older slide I didn't put a European drawing program on it yet that will start next year that will be added to this then and one a very important one that I wanted to mention especially if he is elixir because that is a an infrastructure that is really for the long term and it's based on national nodes have set this up so that the countries of Europe it's not necessarily in European Union thing it's country to countries of Europe who have joined forces to set up services across Europe and they've also set up these platforms what you see on the Left these five platforms you could see that as a kind of consultancy for harmonizing how we do things in Europe when it's about data elixir is a life science data infrastructure so as one of things we do to help implement fair is developing tools and support and I've here in this picture has put it in three layers so the top one are really tools that we develop some are tools like the data stewardship wizard is a tool with a web interface that you can really do as a user some like the fair fire would require collaboration with a fair data expert there's also tools like the fair data point that is really software that goes under the hood of another tool and so that is that is not for end-users but they're all software tools there's also more human exchange so that is doing courses we have something called bring your own data workshops one one is associated with the Rome summer school we sometimes organize hackathons where we put programmers together and also do knowledge exchange there and then finally there's the level for the data stewards who run fair data projects and we do this all in the for rare diseases under the umbrella of the rare disease data linkage plan and under the years of the European joint program we want to develop this into surface for the community that it's also partly sustained by the community itself for instance by the year end and finally we've recently are in the process of setting up a rare disease goal fair network and gopher stands for globally open fair and this is really a networking exercise so what we put together is at the moment it's a small group of volunteers we want to help foster the adoption of fair principles in the rare disease community and providing our for stakeholders guiding to transition so it's a small group of volunteers to help the larger community and center we have what we call handlers that's a small group of volunteers and they each communicate with one or more liaisons who are really working in the field and in some cases they the liaisons come from networks themselves like the ear end would be a network but also data stewards we hope will form their own network to start helping each other to to manage data more properly including fare principles so it would be interesting for us to know if you would like to be one of those linear zones of or be otherwise involved in this go friend network so for their future we want to address fair the fair principles more fully yeah for those who are not so technical the metadata term is always difficult it's it's data about data but you can structure that as well and we want to do more work more work on that there's fair metrics so there's there's people working on how to measure how fair a resource is and that can also be an aid in becoming more fair especially interesting for health data is automating Elsi procedures procedures for ethical legal and societal impacts of consent codes etc consent checking and I've also be in collaboration with infrastructures like elixir and filling in gaps for analytics so fare will help a lot but probably not everything so we need to see where the gaps are then the other activity is for the future is scaling up so as I mentioned we hope to set up a fair a data steward Network that people will start helping each other annika are calling someone in our group is working with our other colleagues on more fair guidelines it's it's quite hard to be completely fair especially on the readability side so we would like to put our guidelines on how to make steps towards fair that is already very helpful we won't like want to publish on that and then there's the activity to put fair data generation into tools that are generally used like registry software and this this is we already have started at in collaboration with some of these registry software providers the approach will always be to do this through real-life cases in interdisciplinary collaboration that's the essence of the rare disease data linkage plan that it's always on relocation so this is what I wanted to mention lastly the the Summer School in Rome that's always very nice it has limited space but keep an eye out for this because this brings medical doctors patient registry a patient representatives and fair experts together and the whole summer school you people learn a lot about how to set up a good registry and that was it thank you very much and I'm open to questions so maybe I can start you've mentioned that the making your data fair is quite time-consuming and costly so I've been wondering we mentioned for example you need to hire three to four people to work on that is that something you once you do that they have to do you need them to keep working fair forever or is that just inverse at that face oh so that that I guess that depends and also depends on the on the source so it first of all it doesn't necessarily mean that just to hire people it can also be that you have people working on this or doing data management in some form already that they just can learn this extra piece for from our fair data experts so then it will be sort of an in-kind contribution in a way to a fair project I think to be honest the every data base will require some maintenance so the first step is the biggest step for sure but then when new data comes in you will have to go to this procedure again but then with will be less work we have a question yeah so partly were in the process of doing that so we recently put out a deliverable for for elixir and we're going to work this up to a publication that's a guideline there is there is a website so that the golfer offers for instance has a website and elixir probably elixir also provides guidelines I think in general in in my general advice would be to contact people that's I think at this stage it's still the best way forward is to discuss this with with with experts right so for this you will need to contact the email can you remind the email yeah it's wasn't one of the slides yeah there it is at the very bottom or the info okay another question how are the principles in the rest of the world it seems more a European plum oh there's it's actually extremely international so the NIH in the US you would find it under the Commons name they they call it the NIH Commons and that actually first refer principles as well what you do see is nuanced differences in the implementation or actually it's not so much different but it's the the focus is sometimes a little bit different so here in Europe we we spend a lot of time on linking making data linkable that's definitely something we put forward here in the US I think there's a slightly more focus on ontology and that's actually nice combination that we can combine that but for instance that one of the task force is that we set up through partly through Rd Connect is an international task force so there's one of the resources that supplying similar strategy is monarch for instance where you have the mouse phenotype database apply similar principles and do you know if if funding angel agencies supports data verification for example by giving France two different types of databases to verify the it's so what what is I wouldn't say that there well actually they do so there are several thunders that put fair in into their calls I think more and more actually I think at some point you will hardly not find it any more so I am I for instance then of this medicine initiative and specific calls for verification of its own resources what is what is more and I'm not unhappy about that is that the what they now almost do is ask for data stewardship and data management plans and it's I think it's a good thing to put fair principles in there also if the funder doesn't ask for it it's good to put it in there and an easy very easy way to get some more to be more fair is to to actually put it when you princess have to buy a software platform is to put it on the requirements list for that software that they need to become more fair we've learned at lunc so our our data management for clinical data the data management software that was required they are have now the obligation to become a more fair tool and that's very easy because they just you put that in requirements of your of your tender for it but also reaching out to other experts to do some of the data management for you that's we should also see Emma Netherlands what I think internationally as well that it's more more require that you allocate some of your budget for data management and then this advice would be together with the international consortium for Disease Research basically States in this way that very guiding principles the way to go and wants to draw attention of thank you for mentioning that Dorothea and that that actually is a reference to a guidelines document but that's also where where I make the statement to contact experts because this is a document that data stewards and data scientists can read and understand but I think for the typical registry manager for instance this is still one step too far to understand oh that's actually very nice very nice questions we're working on that you can imagine that for the personal health train this paradigm of the animation it's essential that we can check this consent otherwise we'd be breaking the law so the answer would would basically be yes and one of the things that is ongoing work at the moment is their their consent on geologies and there's there's consent codes from the Global Alliance and yeah we need projects where we actually put this to the test and whether we can put this into the technology of fair data stations the principle of fair data station should do that and patients the ideal is that patients just provide consent as they would normally do but that we have a machine-readable equivalent of that information so we that we can do automated checking and possibly also involve the patients in the process so for is that they can can fire their phone get a message some scientists want to use your data I said okay that's we're not there yet but this is this is one of the paradigms were working on like the personal locker that everybody would have a locker that is basically a fair data station then one really personal for your own data I think now we are at the stage where we need patients to help us to advocate and push that they have become a standard so so that it's it's supported also in the highest levels by policymakers and funders yeah yeah exactly and typically patients are our friends in this because they see the need to to make their data usable for scientists in the most controlled way and that's what we're after you I don't see any more questions so if if there are no more questions I think and we can finish the session and if you have if you have other questions in the future please contact us at those two addresses so info at our D connect you or fair – Rd – India – Europe all right thank you very much for attending and Marco thanks for an excellent excellent webinar okay thank you all very much and looking forward to responses thank you