ChatGPT Genetic Raw Data Upload: Weighing Pros and Cons
Article at a Glance
- ChatGPT and other AI platforms are not professional geneticists or genetics counsellors. Limitations in the models’ training can lead to oversimplification or even hallucinations. And the results could seem so authoritative that you make major health and lifestyle decisions based on incorrect information.
- It’s also wise to consider any AI analyses of genetic data as a starting point, not the final, diagnostic report.
Contents
With so many services available to analyze your genetic data, chances are you’ve considered just asking AI to run the data for you. Is it safe to upload raw genetic data to ChatGPT and other AI systems?
Before you do so, it’s smart to consider the dangers of misinterpretation and the potential impact on the privacy of any sensitive genetic information.
The risks of sharing raw genetic data
23andme and some other genetics services offer the option to download your raw genetic data. Once it’s in your possession, you can share it however you like. This offers great freedom, but it also puts the onus on you to share data responsibly, understand the risks of potential misinterpretation or manipulation, and to safeguard that data as necessary.
Unlock Your Personalized Nutrition & Supplement Report
Gene Food uses a proprietary algorithm to divide people into one of twenty diet types based on genetics. We score for cholesterol and sterol hyperabsorption, MTHFR status, histamine clearance, carbohydrate tolerance, and more. Where do you fit?
Misinterpretation by AI
ChatGPT and other AI platforms are not professional geneticists or genetics counsellors. Limitations in the models’ training can lead to oversimplification or even hallucinations. And the results could seem so authoritative that you make major health and lifestyle decisions based on incorrect information.
ChatGPT errors in analysis could even lead you to make dangerous changes to medication regimens or to forego medical testing. Or, conversely, you may be sent down a rabbit hole of medical tests that you don’t actually need. This can be costly, invasive, and traumatic.
AI models typically summarize single nucleotide polymorphisms (SNPs) or variants, while overlooking nuances such as environmental factors, or how variants interact with each other (polygenic interactions, i.e., interactions between more than one genetic variant). The dangers may be compounded for people of color, where limited genetic datasets hamper meaningful analysis. Or, if you have a rare allele, ChatGPT and its counterparts may have very little, or nothing, to go on to carry out its analysis.
And, of course, AI models tend to come across as very authoritative, meaning you don’t get the same nuance you would with a qualified and experienced genetics counselling service. Real human geneticists and counsellor acknowledge the limits of current scientific knowledge and will work collaboratively with you to create a plan. AI models don’t do that.
Perhaps the most worrying risk of uploading genetic data to ChatGPT is that the AI incorrectly decides your risk of a disease, based on a single gene variant, without factoring in the actual real-life impact of that variant. The upshot is either unnecessary worry or false reassurance.
Specific concerns with ChatGPT and other AI models
ChatGPT is impressive when it comes to pattern recognition. But it does not have domain-specific training, especially in medicine or genetics.
As such, the chances of ChatGPT misinterpreting your raw genetic data from a VCF file, or from a Promethease report, is quite high. There are user-reported instances of ChatGPT misclassifying benign variants as causing disease, or pathogenic variants as benign.
Specific genetics AI tools from Nucleus Genomics and others tend to do much better, with integrated genetics databases to call on for help with analysis. However, these still risk bias due to overrepresentation of some populations and underrepresentation of others.
Unlock Your Personalized Nutrition & Supplement Report
Gene Food uses a proprietary algorithm to divide people into one of twenty diet types based on genetics. We score for cholesterol and sterol hyperabsorption, MTHFR status, histamine clearance, carbohydrate tolerance, and more. Where do you fit?
A safer way to use AI for genetic analysis
AI tools have their place, and if you are planning to use them to analyze your genetic data, there are safer ways to go about it.
For instance, it’s a good idea to use an AI in collaboration with a certified genetics counsellor. Or, use a tool like ClinVar that is designed specifically for this kind of work.
It’s also wise to consider any AI analyses of genetic data as a starting point, not the final, diagnostic report. Just as you might use AI to understand a legal situation, you would still want to run any AI outputs by an actual lawyer, so they can spot errors and help you navigate the legal system as it currently stands, not as AI imagines it to be (complete with hallucinated legal cases!).
The American College of Medical Genetics and Genomics (ACMG) recommends that any AI-assisted genetic analysis be overseen by a qualified human interpreter. That way, the risk of clinical errors is much lower.
Privacy and security
The other major issue with uploading your genetic data to ChatGPT is the risk of loss of privacy.
Your genetic data is inherently identifiable and deeply personal. And, given the nature of genetics, this kind of data can also reveal information about your relatives, so any privacy concerns are not just about you.
Uploading your genetic data to ChatGPT or other AI, where there’s a risk of your account or search history being accessed, or of the platform itself sharing your data, creates concerns about:
- Discrimination
- Stigmatization
- Loss of privacy
- Misinterpretation by the AI.
It’s unlikely anyone will be able to access your genetic data through the AI model itself, but if you lose your laptop, or someone gains access to your ChatGPT or other AI account, they would be able to see your genetic analysis.
Uploading raw genetic data to ChatGPT and similar AI systems creates risks that aren’t present with platforms designed to handle sensitive genetic information. For instance, if the data isn’t properly protected, and depending on your service agreement and settings within the AI system, that data could be:
- Stored by the AI company
- Used to train AI models
- Accessed by unauthorized parties.
ChatGPT security protocols
The good news is that ChatGPT does have various security measures in place to protect user data.
ChatGPT encrypts data in transit and at rest, using Transport Layer Security (TLS) to prevent unauthorized users from intercepting the data.
Multi-factor authentication (MFA) and zero-trust security models work to keep data out of the hands of malicious actors. ChatGPT also monitors its systems and behaviors continuously to identify anything suspicious that might indicate a data breach.
OpenAI – the parent company of ChatGPT – also has strict data retention policies that limit how long the AI stores sensitive data. The shorter this time period, the smaller the risk the data leaks.
You can also toggle permissions on your ChatGPT account, so that your queries and data are not used to train the model.
Unlock Your Personalized Nutrition & Supplement Report
Gene Food uses a proprietary algorithm to divide people into one of twenty diet types based on genetics. We score for cholesterol and sterol hyperabsorption, MTHFR status, histamine clearance, carbohydrate tolerance, and more. Where do you fit?
What about other AI systems?
ChatGPT isn’t the only AI in town, and other AI systems have similar or even better protocols for handling sensitive data. These protocols include:
- Encryption
- Zero-trust security
- Strict access controls
- Limits on how long data is stored (data retention)
- Automatic detection of anomalies and suspicious behavior.
Acknowledging the unique risks surrounding genetic data, some AI systems have specific tactics to manage sensitive data. These include:
- Adding ‘noise’ to data to allow analysis of aggregated data while masking individual identities (this is known as Differential Privacy)
- Homomorphic encryption, where the AI processes encrypted data directly without having to decrypt it first, so privacy is never an issue during the actual computation part of the process
- Training the AI model on locally stored data only, so no raw data is swirling about will-nilly (this is known as federated learning or localized learning)
- Automatic detection when a user uploads sensitive raw data, which then triggers stronger data security protocols.
Part of the reasoning behind these tactics is to ensure the AI systems comply with privacy regulations like HIPAA and GDPR.
So, is it safe?
Even with all these protections in place, there’s no one hundred percent guarantee of safety and security when uploading raw genetic information to an AI system.
Best practice remains to avoid uploading any sensitive information to publicly accessible AI systems.
It’s also a good idea to reduce how much personally identifiable information (PII) you upload and to avoid uploading any sensitive information you would never want exposed, even to supposedly secure systems.
Another way to manage the risk is to separate different types of PII, such as your name, address, date of birth, etc., from any data files. You might also want to extract snippets of the genetic data for analysis rather than uploading your whole genome.
Other best practices if you’re considering uploading your genetic data include:
- Check the AI system has explicit security certifications and offers strong access controls, encryption, and data governance
- Look for platforms that enhance privacy by using homomorphic encryption or federated/localized learning for sensitive data
- Look into the AI’s data retention policy and toggle any settings to limit storage duration for genetic or other sensitive information
- Only use platforms that are signatories to HIPAA Business Associate Agreements (BAAs) or other regulations applicable where you live.
In general, if you have any security concerns about your genetic data being exposed, don’t upload it to ChatGPT or other AI systems.
The best AI for uploading genetic data
If you do decide to use AI to help parse your genetic data, consider the following platforms that have more robust security protocols:
- Sophia Genetics – this is a specialized genomic AI platform that boasts secure cloud infrastructure and uses continuous monitoring, homomorphic encryption, and strict regulatory compliance to keep your genomic data secure
- Lifebit AI – another genetics focused AI system that uses localized/federated learning and privacy-preserving computation to enable genetic analysis without moving raw data unnecessarily
- DeepSomatic by Google Research – Google developed this AI tool specifically to help identify genetic variants linked to cancer. It is built on Google’s secure infrastructure and has high standards for genetic data protection
- ChatGPT by OpenAI – one of the best known AI platforms, ChatGPT does a surprisingly good job with data security. It employs a high standard for encryption, zero-trust access, multi-factor authentication, and privacy-preserving measures for genetic data secure
- Google’s AI platforms (like Google Cloud AI and Bard) – these more publicly known AI platforms also have strict data security and privacy standards, including HIPAA compliance for healthcare data, with extensive encryption and identity management controls
- Perplexity AI – beloved of coders, this AI is privacy conscious and has encrypted communications and limited data retention. Unfortunately, though, Perplexity has no public info on its specific data governance protocols for genetic info.
The chances of a data breach are very low, but not non-existent. Choosing a purpose-designed genetics analysis service remains the best option for interpreting your genetic data. And if you do use ChatGPT to analyze your genotype, be very cautious about any conclusions the AI draws and be sure to run its interpretation by an experienced genetics counsellor or service before making any major health decisions.