Find below a non-exhaustive but detailed list of scientific publications on genome privacy and security. Please contact us for any missing publications.
Search the publications via keywords.
Richard Mott, Christian Fischer, Pjotr Prins; Robert William Davies: Private Genomes and Public SNPs: Homomorphic encryption of genotypes and phenotypes for shared quantitative genetics. Genetics, 215 (2), pp. 359–372, 2020. (Type: Journal Article | Abstract | BibTeX) @article{mott2020private, title = {Private Genomes and Public SNPs: Homomorphic encryption of genotypes and phenotypes for shared quantitative genetics}, author = {Richard Mott, Christian Fischer, Pjotr Prins and Robert William Davies}, editor = {Genetics Soc America}, year = {2020}, date = {2020-06-18}, journal = {Genetics}, volume = {215}, number = {2}, pages = {359--372}, abstract = {Sharing human genotype and phenotype data is essential to discover otherwise inaccessible genetic associations, but is a challenge because of privacy concerns. Here, we present a method of homomorphic encryption that obscures individuals’ genotypes and phenotypes, and is suited to quantitative genetic association analysis. Encrypted ciphertext and unencrypted plaintext are analytically interchangeable. The encryption uses a high-dimensional random linear orthogonal transformation key that leaves the likelihood of quantitative trait data unchanged under a linear model with normally distributed errors. It also preserves linkage disequilibrium between genetic variants and associations between variants and phenotypes. It scrambles relationships between individuals: encrypted genotype dosages closely resemble Gaussian deviates, and can be replaced by quantiles from a Gaussian with negligible effects on accuracy. Likelihood-based inferences are unaffected by orthogonal encryption. These include linear mixed models to control for unequal relatedness between individuals, heritability estimation, and including covariates when testing association. Orthogonal transformations can be applied in a modular fashion for multiparty federated mega-analyses where the parties first agree to share a common set of genotype sites and covariates prior to encryption. Each then privately encrypts and shares their own ciphertext, and analyses all parties’ ciphertexts. In the absence of private variants, or knowledge of the key, we show that it is infeasible to decrypt ciphertext using existing brute-force or noise-reduction attacks. We present the method as a challenge to the community to determine its security.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Sharing human genotype and phenotype data is essential to discover otherwise inaccessible genetic associations, but is a challenge because of privacy concerns. Here, we present a method of homomorphic encryption that obscures individuals’ genotypes and phenotypes, and is suited to quantitative genetic association analysis. Encrypted ciphertext and unencrypted plaintext are analytically interchangeable. The encryption uses a high-dimensional random linear orthogonal transformation key that leaves the likelihood of quantitative trait data unchanged under a linear model with normally distributed errors. It also preserves linkage disequilibrium between genetic variants and associations between variants and phenotypes. It scrambles relationships between individuals: encrypted genotype dosages closely resemble Gaussian deviates, and can be replaced by quantiles from a Gaussian with negligible effects on accuracy. Likelihood-based inferences are unaffected by orthogonal encryption. These include linear mixed models to control for unequal relatedness between individuals, heritability estimation, and including covariates when testing association. Orthogonal transformations can be applied in a modular fashion for multiparty federated mega-analyses where the parties first agree to share a common set of genotype sites and covariates prior to encryption. Each then privately encrypts and shares their own ciphertext, and analyses all parties’ ciphertexts. In the absence of private variants, or knowledge of the key, we show that it is infeasible to decrypt ciphertext using existing brute-force or noise-reduction attacks. We present the method as a challenge to the community to determine its security. |
Sero, Dzemila; Zaidi, Arslan; Li, Jiarui; White, Julie D; Zarzar, Tom'as B Gonz'alez; Marazita, Mary L; Weinberg, Seth M; Suetens, Paul; Vandermeulen, Dirk; Wagner, Jennifer K; others: Facial recognition from DNA using face-to-DNA classifiers. Nature communications, 10 (1), pp. 2557, 2019. (Type: Journal Article | Abstract | BibTeX) @article{sero2019facial, title = {Facial recognition from DNA using face-to-DNA classifiers}, author = {Sero, Dzemila and Zaidi, Arslan and Li, Jiarui and White, Julie D and Zarzar, Tom{'a}s B Gonz{'a}lez and Marazita, Mary L and Weinberg, Seth M and Suetens, Paul and Vandermeulen, Dirk and Wagner, Jennifer K and others}, editor = {Nature Publishing Group}, year = {2019}, date = {2019-07-09}, journal = {Nature communications}, volume = {10}, number = {1}, pages = {2557}, abstract = {Facial recognition from DNA refers to the identification or verification of unidentified biological material against facial images with known identity. One approach to establish the identity of unidentified biological material is to predict the face from DNA, and subsequently to match against facial images. However, DNA phenotyping of the human face remains challenging. Here, another proof of concept to biometric authentication is established by using multiple face-to-DNA classifiers, each classifying given faces by a DNA-encoded aspect (sex, genomic background, individual genetic loci), or by a DNA-inferred aspect (BMI, age). Face-to-DNA classifiers on distinct DNA aspects are fused into one matching score for any given face against DNA. In a globally diverse, and subsequently in a homogeneous cohort, we demonstrate preliminary, but substantial true (83%, 80%) over false (17%, 20%) matching in verification mode. Consequences of future efforts include forensic applications, necessitating careful consideration of ethical and legal implications for privacy in genomic databases.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Facial recognition from DNA refers to the identification or verification of unidentified biological material against facial images with known identity. One approach to establish the identity of unidentified biological material is to predict the face from DNA, and subsequently to match against facial images. However, DNA phenotyping of the human face remains challenging. Here, another proof of concept to biometric authentication is established by using multiple face-to-DNA classifiers, each classifying given faces by a DNA-encoded aspect (sex, genomic background, individual genetic loci), or by a DNA-inferred aspect (BMI, age). Face-to-DNA classifiers on distinct DNA aspects are fused into one matching score for any given face against DNA. In a globally diverse, and subsequently in a homogeneous cohort, we demonstrate preliminary, but substantial true (83%, 80%) over false (17%, 20%) matching in verification mode. Consequences of future efforts include forensic applications, necessitating careful consideration of ethical and legal implications for privacy in genomic databases. |
Jean Louis Raisaro, Juan Ramon Troncoso-Pastoriza, Mickael Misbach, Joao Sa Sousa, Sylvain Pradervand, Edoardo Missiaglia, Olivier Michielin, Bryan Ford; Jean-Pierre Hubaux: MedCo: Enabling Secure and Privacy-Preserving Exploration of Distributed Clinical and Genomic Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16 (4), pp. 1328-1341, 2019. (Type: Journal Article | Abstract | BibTeX) @article{raisaro2019medco, title = {MedCo: Enabling Secure and Privacy-Preserving Exploration of Distributed Clinical and Genomic Data}, author = {Jean Louis Raisaro, Juan Ramon Troncoso-Pastoriza, Mickael Misbach, Joao Sa Sousa, Sylvain Pradervand, Edoardo Missiaglia, Olivier Michielin, Bryan Ford and Jean-Pierre Hubaux}, year = {2019}, date = {2019-07-01}, journal = {IEEE/ACM Transactions on Computational Biology and Bioinformatics}, volume = {16}, number = {4}, pages = {1328-1341}, abstract = {The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo’s practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The increasing number of health-data breaches is creating a complicated environment for medical-data sharing and, consequently, for medical progress. Therefore, the development of new solutions that can reassure clinical sites by enabling privacy-preserving sharing of sensitive medical data in compliance with stringent regulations (e.g., HIPAA, GDPR) is now more urgent than ever. In this work, we introduce MedCo, the first operational system that enables a group of clinical sites to federate and collectively protect their data in order to share them with external investigators without worrying about security and privacy concerns. MedCo uses (a) collective homomorphic encryption to provide trust decentralization and end-to-end confidentiality protection, and (b) obfuscation techniques to achieve formal notions of privacy, such as differential privacy. A critical feature of MedCo is that it is fully integrated within the i2b2 (Informatics for Integrating Biology and the Bedside) framework, currently used in more than 300 hospitals worldwide. Therefore, it is easily adoptable by clinical sites. We demonstrate MedCo’s practicality by testing it on data from The Cancer Genome Atlas in a simulated network of three institutions. Its performance is comparable to the ones of SHRINE (networked i2b2), which, in contrast, does not provide any data protection guarantee. |
Marc Fiume, Miroslav Cupak, Stephen Keenan, Jordi Rambla, Sabela de la Torre, Stephanie O. M. Dyke, Anthony J. Brookes, Knox Carey, David Lloyd, Peter Goodhand, Maximilian Haeussler, Michael Baudis, Heinz Stockinger, Lena Dolman, Ilkka Lappalainen, Juha Törnroos, Mikael Linden, J. Dylan Spalding, Saif Ur-Rehman, Angela Page, Paul Flicek, Stephen Sherry, David Haussler, Susheel Varma, Gary Saunders & Serena Scollen: Federated discovery and sharing of genomic data using Beacons. Nature biotechnology, 37 (3), 2019. (Type: Journal Article | Abstract | BibTeX) @article{fiume2019federated, title = {Federated discovery and sharing of genomic data using Beacons}, author = {Marc Fiume, Miroslav Cupak, Stephen Keenan, Jordi Rambla, Sabela de la Torre, Stephanie O. M. Dyke, Anthony J. Brookes, Knox Carey, David Lloyd, Peter Goodhand, Maximilian Haeussler, Michael Baudis, Heinz Stockinger, Lena Dolman, Ilkka Lappalainen, Juha Törnroos, Mikael Linden, J. Dylan Spalding, Saif Ur-Rehman, Angela Page, Paul Flicek, Stephen Sherry, David Haussler, Susheel Varma, Gary Saunders & Serena Scollen}, year = {2019}, date = {2019-03-04}, journal = {Nature biotechnology}, volume = {37}, number = {3}, abstract = {To the Editor — The Beacon Project (https://github.com/ga4gh-beacon/) is a Global Alliance for Genomics & Health (GA4GH)1 initiative that enables genomic and clinical data sharing across federated networks. The project is working toward developing regulatory, ethics and security guidance to ensure proportionate safeguards for distribution of data according to the GA4GH-developed “Framework for Responsible Sharing of Genomic and Health-Related Data”2. Here we describe the Beacon protocol and how it can be used as a model for the federated discovery and sharing of genomic data.}, keywords = {}, pubstate = {published}, tppubtype = {article} } To the Editor — The Beacon Project (https://github.com/ga4gh-beacon/) is a Global Alliance for Genomics & Health (GA4GH)1 initiative that enables genomic and clinical data sharing across federated networks. The project is working toward developing regulatory, ethics and security guidance to ensure proportionate safeguards for distribution of data according to the GA4GH-developed “Framework for Responsible Sharing of Genomic and Health-Related Data”2. Here we describe the Beacon protocol and how it can be used as a model for the federated discovery and sharing of genomic data. |
Hagestedt, Inken; Zhang, Yang; Humbert, Mathias; Berrang, Pascal; Haixu, Tang; XiaoFeng, Wang; Backes, Michael: MBeacon: Privacy-Preserving Beacons for DNA Methylation Data. NDSS 2019, 2019. (Type: Journal Article | Abstract | BibTeX) @article{hagestedt2018mbeacon, title = {MBeacon: Privacy-Preserving Beacons for DNA Methylation Data}, author = {Hagestedt, Inken and Zhang, Yang and Humbert, Mathias and Berrang, Pascal and Haixu, Tang and XiaoFeng, Wang and Backes, Michael}, year = {2019}, date = {2019-02-28}, journal = {NDSS 2019}, abstract = {The advancement of molecular profiling techniques fuels biomedical research with a deluge of data. To facilitate data sharing, the Global Alliance for Genomics and Health established the Beacon system, a search engine designed to help researchers find datasets of interest. While the current Beacon system only supports genomic data, other types of biomedical data, such as DNA methylation, are also essential for advancing our understanding in the field. In this paper, we propose the first Beacon system for DNA methylation data sharing: MBeacon. As the current genomic Beacon is vulnerable to privacy attacks, such as membership inference, and DNA methylation data is highly sensitive, we take a privacy-by-design approach to construct MBeacon. First, we demonstrate the privacy threat, by proposing a membership inference attack tailored specifically to unprotected methylation Beacons. Our experimental results show that 100 queries are sufficient to achieve a successful attack with AUC (area under the ROC curve) above 0.9. To remedy this situation, we propose a novel differential privacy mechanism, namely SVT^2, which is the core component of MBeacon. Extensive experiments over multiple datasets show that SVT^2 can successfully mitigate membership privacy risks without significantly harming utility. We further implement a fully functional prototype of MBeacon which we make available to the research community.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The advancement of molecular profiling techniques fuels biomedical research with a deluge of data. To facilitate data sharing, the Global Alliance for Genomics and Health established the Beacon system, a search engine designed to help researchers find datasets of interest. While the current Beacon system only supports genomic data, other types of biomedical data, such as DNA methylation, are also essential for advancing our understanding in the field. In this paper, we propose the first Beacon system for DNA methylation data sharing: MBeacon. As the current genomic Beacon is vulnerable to privacy attacks, such as membership inference, and DNA methylation data is highly sensitive, we take a privacy-by-design approach to construct MBeacon. First, we demonstrate the privacy threat, by proposing a membership inference attack tailored specifically to unprotected methylation Beacons. Our experimental results show that 100 queries are sufficient to achieve a successful attack with AUC (area under the ROC curve) above 0.9. To remedy this situation, we propose a novel differential privacy mechanism, namely SVT^2, which is the core component of MBeacon. Extensive experiments over multiple datasets show that SVT^2 can successfully mitigate membership privacy risks without significantly harming utility. We further implement a fully functional prototype of MBeacon which we make available to the research community. |
Mittos, Alexandros; Malin, Bradley; De Cristofaro, Emiliano: Systematizing genome privacy research: a privacy-enhancing technologies perspective. Proceedings on Privacy Enhancing Technologies, 2019 (1), pp. 87-107, 2018. (Type: Journal Article | Abstract | BibTeX) @article{mittos2019systematizing, title = {Systematizing genome privacy research: a privacy-enhancing technologies perspective}, author = {Mittos, Alexandros and Malin, Bradley and De Cristofaro, Emiliano}, year = {2018}, date = {2018-12-24}, journal = {Proceedings on Privacy Enhancing Technologies}, volume = {2019}, number = {1}, pages = {87-107}, abstract = {Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and pro- vide a critical analysis of the current knowledge on privacy- enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work pub- lished in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, fo- cusing in particular on those that are inherently tied to the na- ture of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy experts.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Rapid advances in human genomics are enabling researchers to gain a better understanding of the role of the genome in our health and well-being, stimulating hope for more effective and cost efficient healthcare. However, this also prompts a number of security and privacy concerns stemming from the distinctive characteristics of genomic data. To address them, a new research community has emerged and produced a large number of publications and initiatives. In this paper, we rely on a structured methodology to contextualize and pro- vide a critical analysis of the current knowledge on privacy- enhancing technologies used for testing, storing, and sharing genomic data, using a representative sample of the work pub- lished in the past decade. We identify and discuss limitations, technical challenges, and issues faced by the community, fo- cusing in particular on those that are inherently tied to the na- ture of the problem and are harder for the community alone to address. Finally, we report on the importance and difficulty of the identified challenges based on an online survey of genome data privacy experts. |
Erlich, Yaniv; Shor Tal; Pe’er, Itsik; Carmi, Shai;: Identity inference of genomic data using long-range familial searches. Science, 2018. (Type: Journal Article | Abstract | Links | BibTeX) @article{10.1126/science.aau4832, title = {Identity inference of genomic data using long-range familial searches}, author = {Erlich, Yaniv; Shor Tal; Pe’er, Itsik; Carmi, Shai;}, url = {http://science.sciencemag.org/content/early/2018/10/10/science.aau4832}, year = {2018}, date = {2018-10-11}, journal = {Science}, abstract = {Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US-individual of European-descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. Based on these results, we propose a potential mitigation strategy and policy implications to human subject research.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US-individual of European-descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. Based on these results, we propose a potential mitigation strategy and policy implications to human subject research. |
Asharov, G., Halevi, S., Lindell, Y., & Rabin, T. : Privacy-Preserving Search of Similar Patients in Genomic Data. Proceedings on Privacy Enhancing Technologies, 2018(4), pp. 104-124., 2018. (Type: Journal Article | Abstract | BibTeX) @article{asharov2018privacy, title = {Privacy-Preserving Search of Similar Patients in Genomic Data}, author = {Asharov, G., Halevi, S., Lindell, Y., & Rabin, T. }, year = {2018}, date = {2018-07-13}, journal = {Proceedings on Privacy Enhancing Technologies, 2018(4)}, pages = {104-124.}, abstract = {The growing availability of genomic data holds great promise for advancing medicine and re- search, but unlocking its full potential requires adequate methods for protecting the privacy of individuals whose genome data we use. One example of this tension is run- ning Similar Patient Query on remote genomic data: In this setting a doctor that holds the genome of his/her patient may try to find other individuals with “close" genomic data, and use the data of these individuals to help diagnose and find effective treatment for that pa- tient’s conditions. This is clearly a desirable mode of operation. However, the privacy exposure implications are considerable, and so we would like to carry out the above “closeness” computation in a privacy preserving manner. In this work we put forward a new approach for highly efficient secure computation for computing an approx- imation of the Similar Patient Query problem. We present contributions on two fronts. First, an approxi- mation method that is designed with the goal of achiev- ing efficient private computation. Second, further opti- mizations of the two-party protocol. Our tests indicate that the approximation method works well, it returns the exact closest records in 98% of the queries and very good approximation otherwise. As for speed, our pro- tocol implementation takes just a few seconds to run on databases with thousands of records, each of length thousands of alleles, and it scales almost linearly with both the database size and the length of the sequences in it. As an example, in the datasets of the recent iDASH competition, after a one-time preprocessing of around 12 seconds, it takes around a second to find the nearest five records to a query, in a size-500 dataset of length- 3500 sequences. This is 2-3 orders of magnitude faster than using state-of-the-art secure protocols with exist- ing edit distance algorithms.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The growing availability of genomic data holds great promise for advancing medicine and re- search, but unlocking its full potential requires adequate methods for protecting the privacy of individuals whose genome data we use. One example of this tension is run- ning Similar Patient Query on remote genomic data: In this setting a doctor that holds the genome of his/her patient may try to find other individuals with “close" genomic data, and use the data of these individuals to help diagnose and find effective treatment for that pa- tient’s conditions. This is clearly a desirable mode of operation. However, the privacy exposure implications are considerable, and so we would like to carry out the above “closeness” computation in a privacy preserving manner. In this work we put forward a new approach for highly efficient secure computation for computing an approx- imation of the Similar Patient Query problem. We present contributions on two fronts. First, an approxi- mation method that is designed with the goal of achiev- ing efficient private computation. Second, further opti- mizations of the two-party protocol. Our tests indicate that the approximation method works well, it returns the exact closest records in 98% of the queries and very good approximation otherwise. As for speed, our pro- tocol implementation takes just a few seconds to run on databases with thousands of records, each of length thousands of alleles, and it scales almost linearly with both the database size and the length of the sequences in it. As an example, in the datasets of the recent iDASH competition, after a one-time preprocessing of around 12 seconds, it takes around a second to find the nearest five records to a query, in a size-500 dataset of length- 3500 sequences. This is 2-3 orders of magnitude faster than using state-of-the-art secure protocols with exist- ing edit distance algorithms. |
Cho, Hyunghoon; Wu, David J; Berger, Bonnie: Secure genome-wide association analysis using multiparty computation. Nature biotechnology, 36 (6), pp. 547, 2018. (Type: Journal Article | Abstract | BibTeX) @article{cho2018secure, title = {Secure genome-wide association analysis using multiparty computation}, author = {Cho, Hyunghoon and Wu, David J and Berger, Bonnie}, editor = {Nature Publishing Group}, year = {2018}, date = {2018-05-07}, journal = {Nature biotechnology}, volume = {36}, number = {6}, pages = {547}, abstract = {Most sequenced genomes are currently stored in strict access-controlled repositories1,2,3. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and aid the discovery of new drug targets4,5. However, concerns over genetic data privacy6,7,8,9 may deter individuals from contributing their genomes to scientific studies10 and could prevent researchers from sharing data with the scientific community11. Although cryptographic techniques for secure data analysis exist12,13,14, none scales to computationally intensive analyses, such as GWAS. Here we describe a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable secure genome crowdsourcing, allowing individuals to contribute their genomes to a study without compromising their privacy.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Most sequenced genomes are currently stored in strict access-controlled repositories1,2,3. Free access to these data could improve the power of genome-wide association studies (GWAS) to identify disease-causing genetic variants and aid the discovery of new drug targets4,5. However, concerns over genetic data privacy6,7,8,9 may deter individuals from contributing their genomes to scientific studies10 and could prevent researchers from sharing data with the scientific community11. Although cryptographic techniques for secure data analysis exist12,13,14, none scales to computationally intensive analyses, such as GWAS. Here we describe a protocol for large-scale genome-wide analysis that facilitates quality control and population stratification correction in 9K, 13K, and 23K individuals while maintaining the confidentiality of underlying genotypes and phenotypes. We show the protocol could feasibly scale to a million individuals. This approach may help to make currently restricted data available to the scientific community and could potentially enable secure genome crowdsourcing, allowing individuals to contribute their genomes to a study without compromising their privacy. |
Jean Louis Raisaro (EPFL), Juan Ramon Troncoso-Pastoriza (EPFL), Mickaël Misbach (EPFL; Centre Hospitalier Universitaire Vaudois), João Sá Sousa (EPFL), Sylvain Pradervand (Centre Hospitalier Universitaire Vaudois; University of Lausanne), Edoardo Missiaglia (Centre Hospitalier Universitaire Vaudois), Olivier Michielin (Centre Hospitalier Universitaire Vaudois), Bryan Ford (EPFL); Jean-Pierre Hubaux (EPFL): MedCo: Enabling Privacy-Conscious Exploration of Distributed Clinical and Genomic Data.. Genopri 2017, 2017. (Type: Journal Article | Abstract | BibTeX) @article{Raisaro2017b, title = {MedCo: Enabling Privacy-Conscious Exploration of Distributed Clinical and Genomic Data.}, author = {Jean Louis Raisaro (EPFL), Juan Ramon Troncoso-Pastoriza (EPFL), Mickaël Misbach (EPFL and Centre Hospitalier Universitaire Vaudois), João Sá Sousa (EPFL), Sylvain Pradervand (Centre Hospitalier Universitaire Vaudois and University of Lausanne), Edoardo Missiaglia (Centre Hospitalier Universitaire Vaudois), Olivier Michielin (Centre Hospitalier Universitaire Vaudois), Bryan Ford (EPFL) and Jean-Pierre Hubaux (EPFL)}, year = {2017}, date = {2017-10-15}, journal = {Genopri 2017}, abstract = {Being able to share large amounts of sensitive clinical and genomic data across several institutions is crucial for precision medicine to scale up. Unfor- tunately, existing solutions only partially address this challenge and are still unable to provide the strong privacy and security guarantees required by regulations (e.g., HIPAA, GDPR). As a result, currently only very limited datasets of non-sensitive and moderately useful information can be shared. In this paper, we introduce MedCo, the first operational system that enables an investigator to explore sensi- tive medical information distributed at several sites and protected with collective homomorphic encryption. MedCo builds on top of established and widespread technology from the biomedical informatics community, such as i2b2 and SHRINE, and relies on state-of-the-art secure protocols for processing encrypted distributed data and complying with regulations. As such, MedCo can be easily adopted by clinical sites thus paving the way to new unexplored data-sharing use cases. We tested MedCo in a real network of three institutions (EPFL, UNIL and CHUV) by focusing on an oncology use-case with real somatic mutations and clinical tumor data. The relatively low overhead introduced by MedCo shows that it represents a concrete and scalable solution for sharing privacy-conscious medical data.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Being able to share large amounts of sensitive clinical and genomic data across several institutions is crucial for precision medicine to scale up. Unfor- tunately, existing solutions only partially address this challenge and are still unable to provide the strong privacy and security guarantees required by regulations (e.g., HIPAA, GDPR). As a result, currently only very limited datasets of non-sensitive and moderately useful information can be shared. In this paper, we introduce MedCo, the first operational system that enables an investigator to explore sensi- tive medical information distributed at several sites and protected with collective homomorphic encryption. MedCo builds on top of established and widespread technology from the biomedical informatics community, such as i2b2 and SHRINE, and relies on state-of-the-art secure protocols for processing encrypted distributed data and complying with regulations. As such, MedCo can be easily adopted by clinical sites thus paving the way to new unexplored data-sharing use cases. We tested MedCo in a real network of three institutions (EPFL, UNIL and CHUV) by focusing on an oncology use-case with real somatic mutations and clinical tumor data. The relatively low overhead introduced by MedCo shows that it represents a concrete and scalable solution for sharing privacy-conscious medical data. |
Koki Hamada (NTT Secure Platform Laboratories), Satoshi Hasegawa (NTT Secure Platform Laboratories), Kazuharu Misawa (Tohoku Medical Megabank Organization), Koji Chida (NTT Secure Platform Laboratories), Soichi Ogishima (Tohoku Medical Megabank Organization); Masao Nagasaki (Tohoku Medical Megabank Organization): Privacy-Preserving Fisher's Exact Test for Genome-Wide Association Study. Genopri 2017, 2017. (Type: Journal Article | Abstract | BibTeX) @article{Hamada2017, title = {Privacy-Preserving Fisher's Exact Test for Genome-Wide Association Study}, author = {Koki Hamada (NTT Secure Platform Laboratories), Satoshi Hasegawa (NTT Secure Platform Laboratories), Kazuharu Misawa (Tohoku Medical Megabank Organization), Koji Chida (NTT Secure Platform Laboratories), Soichi Ogishima (Tohoku Medical Megabank Organization) and Masao Nagasaki (Tohoku Medical Megabank Organization)}, year = {2017}, date = {2017-10-15}, journal = {Genopri 2017}, abstract = {For protecting data privacy in genomic analysis, privacy-preserving data mining for genomic analysis has been actively studied in recent years. Fisher’s exact test is known as an important method for statistical hypothesis testing in genome-wide association studies (GWAS). However, designing a practical privacy- preserving variant of the test for GWAS faces two main problems: (i) It is a dif- ficult task to efficiently execute even a single Fisher’s exact test while preserving privacy. (ii) The entire privacy-preserving GWAS is infeasible because GWAS often involves hypothesis tests for over 1 million inputs. In this paper, we construct efficient privacy-preserving algorithms for Fisher’s ex- act test for GWAS. To overcome problem (i), we propose a method involving a decision tree that can efficiently conduct a single Fisher’s exact test. We empiri- cally found that the size of the decision tree is approximately proportional to N1.7, where N is the sample size in an input, whereas a naive approach requires a linear scan on a list of Ω(N3) items. Problem (ii) is solved by filtering some inputs whose outputs are easily deter- mined by a simplified equation that computes a lower bound instead of the orig- inal equation. This filtering drastically reduces the number of inputs that should be evaluated by the original equation. In addition, we used a parallel array ac- cess algorithm to reduce the communication cost of the filtering operation to O((M + N)log(M + N)) whereas a naive approach requires Ω(MN), where M is the number of inputs. Empirical results show the proposed method takes only 8 minutes for N = 1, 000 and M = 1, 000, 000 cases whereas the naive method is estimated to take over 20 years.}, keywords = {}, pubstate = {published}, tppubtype = {article} } For protecting data privacy in genomic analysis, privacy-preserving data mining for genomic analysis has been actively studied in recent years. Fisher’s exact test is known as an important method for statistical hypothesis testing in genome-wide association studies (GWAS). However, designing a practical privacy- preserving variant of the test for GWAS faces two main problems: (i) It is a dif- ficult task to efficiently execute even a single Fisher’s exact test while preserving privacy. (ii) The entire privacy-preserving GWAS is infeasible because GWAS often involves hypothesis tests for over 1 million inputs. In this paper, we construct efficient privacy-preserving algorithms for Fisher’s ex- act test for GWAS. To overcome problem (i), we propose a method involving a decision tree that can efficiently conduct a single Fisher’s exact test. We empiri- cally found that the size of the decision tree is approximately proportional to N1.7, where N is the sample size in an input, whereas a naive approach requires a linear scan on a list of Ω(N3) items. Problem (ii) is solved by filtering some inputs whose outputs are easily deter- mined by a simplified equation that computes a lower bound instead of the orig- inal equation. This filtering drastically reduces the number of inputs that should be evaluated by the original equation. In addition, we used a parallel array ac- cess algorithm to reduce the communication cost of the filtering operation to O((M + N)log(M + N)) whereas a naive approach requires Ω(MN), where M is the number of inputs. Empirical results show the proposed method takes only 8 minutes for N = 1, 000 and M = 1, 000, 000 cases whereas the naive method is estimated to take over 20 years. |
Dixie Baker (Martin, Blanck; Associates), Bartha Knoppers ( McGill University), Mark Phillips (McGill University), David van Enckevort (University Medical Center Groningen) , Petra Kaufmann (National Institutes of Health), Hanns Lochmuller (Newcastle University); Domenica Taruscio (Istituto Superiore di Sanità): Privacy-Preserving Linkage of Genomic and Clinical Data Sets.. Genopri 2017, 2017. (Type: Journal Article | Abstract | BibTeX) @article{Baker2017, title = {Privacy-Preserving Linkage of Genomic and Clinical Data Sets.}, author = {Dixie Baker (Martin, Blanck and Associates), Bartha Knoppers ( McGill University), Mark Phillips (McGill University), David van Enckevort (University Medical Center Groningen) , Petra Kaufmann (National Institutes of Health), Hanns Lochmuller (Newcastle University) and Domenica Taruscio (Istituto Superiore di Sanità)}, year = {2017}, date = {2017-10-15}, journal = {Genopri 2017}, abstract = {A key challenge for the Global Alliance for Genomics and Health (GA4GH) is the capability to link data sets associated with the same individual. This challenge is exacerbated by the facts that the data sets may include both genomic and clinical data and may span multiple ethico-legal jurisdictions, and by the need to enable re- identification when the use of the data lead to medical conclusions that the law permits or requires be communicated back to the individual. Privacy-Preserving Record Link- age (PPRL) methods address these challenges that lie at the intersection of biomedical research and clinical practice. In 2016, the Global Alliance for Genomics and Health (GA4GH) launched a task team to explore ethical questions, regulatory requirements, and technological methods and approaches related to PPRL. The task team is a collab- oration in which the GA4GH (Regulatory and Ethics Work Stream and the Data Se- curity Work Stream) is preparing policy and technology standards, together with the Interdisciplinary Committee of the International Rare Diseases Research Consortium (IRDiRC) to enable highly reliable linking of coded data records associated with the same individual without disclosing the identity of that individual except under condi- tions in which the use of the data has led to information of importance to the individ- ual’s safety or health, and applicable law allows or requires the return of results. The PPRL Task Force has examined the ethico-legal requirements, constraints, and impli- cations of PPRL, and has applied this knowledge to the exploration of technology methods and approaches to PPRL. This paper reports and justifies the findings and recommendations thus far}, keywords = {}, pubstate = {published}, tppubtype = {article} } A key challenge for the Global Alliance for Genomics and Health (GA4GH) is the capability to link data sets associated with the same individual. This challenge is exacerbated by the facts that the data sets may include both genomic and clinical data and may span multiple ethico-legal jurisdictions, and by the need to enable re- identification when the use of the data lead to medical conclusions that the law permits or requires be communicated back to the individual. Privacy-Preserving Record Link- age (PPRL) methods address these challenges that lie at the intersection of biomedical research and clinical practice. In 2016, the Global Alliance for Genomics and Health (GA4GH) launched a task team to explore ethical questions, regulatory requirements, and technological methods and approaches related to PPRL. The task team is a collab- oration in which the GA4GH (Regulatory and Ethics Work Stream and the Data Se- curity Work Stream) is preparing policy and technology standards, together with the Interdisciplinary Committee of the International Rare Diseases Research Consortium (IRDiRC) to enable highly reliable linking of coded data records associated with the same individual without disclosing the identity of that individual except under condi- tions in which the use of the data has led to information of importance to the individ- ual’s safety or health, and applicable law allows or requires the return of results. The PPRL Task Force has examined the ethico-legal requirements, constraints, and impli- cations of PPRL, and has applied this knowledge to the exploration of technology methods and approaches to PPRL. This paper reports and justifies the findings and recommendations thus far |
Alexander Senf (European Molecular Biology Laboratory, European Bioinformatics Institute): End-to-end Security for Local and Remote Human Genetic Data Applications at the EGA.. GenoPri 2017, 2017. (Type: Journal Article | Abstract | BibTeX) @article{Senf2017, title = {End-to-end Security for Local and Remote Human Genetic Data Applications at the EGA.}, author = {Alexander Senf (European Molecular Biology Laboratory, European Bioinformatics Institute)}, year = {2017}, date = {2017-10-15}, journal = {GenoPri 2017}, abstract = {Sensitive genomic data should remain secure – whether on disk for storage or analysis, or in transport. However, secure storage, delivery and usage of genomic data is complicated by the size of files, and diversity of workflows. This paper presents solutions developed by GA4GH and EGA to use customized encryption, encrypted file formats, toolchain integration, and intelligent APIs to help solve this problem}, keywords = {}, pubstate = {published}, tppubtype = {article} } Sensitive genomic data should remain secure – whether on disk for storage or analysis, or in transport. However, secure storage, delivery and usage of genomic data is complicated by the size of files, and diversity of workflows. This paper presents solutions developed by GA4GH and EGA to use customized encryption, encrypted file formats, toolchain integration, and intelligent APIs to help solve this problem |
Scott Thiebes (University of Kassel), Gregor Kleiber (University of Cologne); Ali Sunyaev (University of Kassel): Cancer Genomics Research in the Cloud: A Taxonomy of Genome Data Sets.. GenoPri 2017, 2017. (Type: Journal Article | Abstract | BibTeX) @article{Thiebes2017, title = {Cancer Genomics Research in the Cloud: A Taxonomy of Genome Data Sets.}, author = {Scott Thiebes (University of Kassel), Gregor Kleiber (University of Cologne) and Ali Sunyaev (University of Kassel)}, year = {2017}, date = {2017-10-15}, journal = {GenoPri 2017}, abstract = {The adoption of cloud services in genomics is often accompanied by information privacy and information security concerns. While specific infor- mation privacy and information security requirements are recognized to vary de- pending on the underlying genome data sets’ sensitivity, extant research has mostly taken a maximum effort approach to the protection of genome data in cloud computing environments. In this paper, we employ the method of Nicker- son et al. to develop a taxonomy of genome data sets that can aid interested re- searchers in deciding whether to store and process their genome data in the cloud. Our taxonomy consists of the ten dimensions (1) Organism, (2) Access, (3) Iden- tifiable, (4) File size, (5) Processing requirements, (6) Transfer requirements, (7) Mutable, (8) API access, (9) Software availability, and (10) Use restriction. Anal- ysis of our taxonomy and data set classifications from a cloud computing per- spective highlights the existence of diverse factors and contextual influences be- yond just privacy and security concerns that can motivate or discourage cancer genomics researchers to move their genome data to the cloud.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The adoption of cloud services in genomics is often accompanied by information privacy and information security concerns. While specific infor- mation privacy and information security requirements are recognized to vary de- pending on the underlying genome data sets’ sensitivity, extant research has mostly taken a maximum effort approach to the protection of genome data in cloud computing environments. In this paper, we employ the method of Nicker- son et al. to develop a taxonomy of genome data sets that can aid interested re- searchers in deciding whether to store and process their genome data in the cloud. Our taxonomy consists of the ten dimensions (1) Organism, (2) Access, (3) Iden- tifiable, (4) File size, (5) Processing requirements, (6) Transfer requirements, (7) Mutable, (8) API access, (9) Software availability, and (10) Use restriction. Anal- ysis of our taxonomy and data set classifications from a cloud computing per- spective highlights the existence of diverse factors and contextual influences be- yond just privacy and security concerns that can motivate or discourage cancer genomics researchers to move their genome data to the cloud. |
Jake Weidman (The Pennsylvania State University; Technical University of Munich), William Aurite (The Pennsylvania State University); Jens Grossklags (Technical University of Munich): A Vignette Study on Personal and Interdependent Privacy Concerns, and Sharing Intentions for Genetic Data. GenoPri 2017, 2017. (Type: Journal Article | Abstract | BibTeX) @article{Weidman2017b, title = {A Vignette Study on Personal and Interdependent Privacy Concerns, and Sharing Intentions for Genetic Data}, author = {Jake Weidman (The Pennsylvania State University and Technical University of Munich), William Aurite (The Pennsylvania State University) and Jens Grossklags (Technical University of Munich)}, year = {2017}, date = {2017-10-15}, journal = {GenoPri 2017}, abstract = {Genetics and genetic data have been the subject of recent scholarly work, with significant attention paid towards understanding genetic data consent practices and genetic data security. Attitudes and perceptions concerning the trustworthiness of governmental or national institutions receiving test-taker data have been explored, with varied findings, but no robust models or deterministic relationships have been established that account for these differences. These results also do not explore in detail the perceptions regarding other types of organizations (e.g., private corporations). Further, considerations of privacy interdepen- dence arising from blood relative relationships have been absent from the conversation regarding the sharing of genetic data. This paper reports the results from a factorial vignette survey study in which we investigate how variables of ethnicity, age, genetic markers, and association of data with the individual’s name affect the likelihood of sharing data with different types of organizations. We also investigate elements of personal and interdependent privacy concerns. We document the significant role these factors have in the decision to share or not share genetic data with a third party. We support our findings with a series of regression analyses.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Genetics and genetic data have been the subject of recent scholarly work, with significant attention paid towards understanding genetic data consent practices and genetic data security. Attitudes and perceptions concerning the trustworthiness of governmental or national institutions receiving test-taker data have been explored, with varied findings, but no robust models or deterministic relationships have been established that account for these differences. These results also do not explore in detail the perceptions regarding other types of organizations (e.g., private corporations). Further, considerations of privacy interdepen- dence arising from blood relative relationships have been absent from the conversation regarding the sharing of genetic data. This paper reports the results from a factorial vignette survey study in which we investigate how variables of ethnicity, age, genetic markers, and association of data with the individual’s name affect the likelihood of sharing data with different types of organizations. We also investigate elements of personal and interdependent privacy concerns. We document the significant role these factors have in the decision to share or not share genetic data with a third party. We support our findings with a series of regression analyses. |