1. Why is genomic data so special from a privacy perspective?
    Because of its nature, genomic data contains extremely sensitive information about an individual. For example, from a genome one can learn information about a person’s (i) diseases predisposition, (ii) responses to drugs, (iii) ethnicity, and (iv) relatives. In addition, the genome itself is a unique identifier for an individual. Finally, internal correlations within the genome make its protection difficult as information on one part of the genome can reveal information on other parts.
  2. Why focus so much on genomic data when other medical data (like hospital records) seems to contain more sensitive and straightforward information?
    Medical records being “more sensitive” is indeed today’s situation. The privacy-protection level of such data is still insufficient, but it can be improved in the future. Given the rapid pace of new discoveries, the huge amount of sensitive information contained in the genome of an individual is becoming interpretable. Protecting genomic data should be the future path to go for developing more sophisticated privacy-enhancing technologies. These new methodologies can then be adapted to the protection of non-genomic data as well.
  3. Why is legislation (e.g., GINA, HIPAA) not enough to ensure genome privacy?
    The fact that there exists law protection does not protect genome privacy from being compromised. Very often law protection is insufficient. For example in the US, the Genetic Information Non-Discrimination Act (GINA) prohibits health insurances and employers from generally asking about and acting on genetic information. However, it does not prevent life insurances, disability insurances, or long-term care insurances from doing it. Illegal access to genome data can also happen.
  4. Are standard anonymization techniques enough to protect genome privacy?
    No. Standard anonymization techniques, for example removing personal identifiers, are not enough because the genome itself is a unique identifier.
  5. What would be the main potential threats and abuses due to the leakage of genome information?
    Genetic discrimination could represent nowadays the main abuse due to the leakage of genome information. For example, health insurances could deny a policy or increase its cost based on the information contained in the genome, or employers could adjust their recruitment strategy based on the genome of applicants. Yet, because of the complexity and the immature knowledge of genetic information, more abuses and threats are likely to emerge in the future.
  6. Why do we need to design specialized privacy-protection methods? Couldn’t we for example simply apply AES encryption on it, like encrypting a file or a disk?
    The answer to this question depends on the application scenarios and security and privacy requirements, i.e., the adversary model. For instance, if an individual’s genomic data is used by a third party (e.g., a doctor, or a DTC service such as 23andme), but the owner does not want to reveal everything (e.g., some sensitive genetic variants), then a simple AES encryption will not work, because the data has to be first decrypted and then seen by the third party. To this regard, researchers have proposed more advanced solutions to do computation over encrypted genomic data (without decrypting it).
  7. Isn’t the obsession of privacy detrimental to genomic research?
    Even if the protection of genome privacy can reduce data utility for genomic research, it is crucial for establishing a trust relationship between the researchers and the donors. Medical research, in general, is based on trust and transparency; privacy concern is one of the main obstacles.
  8. What are the main privacy-enhancing techniques used to protect genome privacy?
    Most of the proposed solutions to protect genome privacy and preserve data utility focus on conventional cryptographic techniques, secure multi party computation techniques (such as homomorphic encryption, secret sharing, and garbled circuits), or differential privacy.
  9. Is it possible to infer the genotype from the phenotype and vice-versa?
    Potentially yes. Recent works have shown that it is possible to re-identify a person by inferring some phenotypic traits (facial traits) from his genetic information and some contextual data. Similarly some genetic information can also be inferred from particular phenotypes.
  10. How much genetic information is necessary to uniquely identify an individual?
    Around 80 single nucleotide polymorphisms are enough to uniquely identify one individual [1]. [1] Z. Lin, A. B. Owen, and R. B. Altman. "Genomic research and human subject privacy." Science, vol. 305, no. 5681, p. 183, 9 July 2004.
  11. Is it worthy to devote so much effort to protect genomic data while it is so easy to obtain the DNA sample of a person due to the dissemination of hair, saliva, etc?
    Actually, the main contribution in this research topic is general privacy-preserving solutions that will be applied to protect genomic data of a large population. A targeted attack against a specific person is indeed hard to prevent when the DNA sample can be obtained and sequenced, but this attack is unlikely to be replicated for everyone in a massive dataset. Therefore privacy-preserving solutions are crucial for protecting the large genomic database.