Description
Hello,
I am working with HCC1395 data, analyzing tumor samples at 75x coverage and normal samples at 45x coverage. I utilized Clair3 to process the normal.bam file, generating a normal.vcf. This file was then employed for phasing and haplotagging the tumor.bam, followed by using a somatic mutation caller. The results showed a notable decrease in false positives.
phase and haplotag | Precision | Recall | F1-score | TP | FP | FN |
---|---|---|---|---|---|---|
ClairS germline.vcf | 67.12% | 77.64% | 72.00% | 30626 | 15001 | 8821 |
Clair3 normal.vcf | 72.50% | 77.46% | 74.90% | 30556 | 11593 | 8891 |
In an instance where false positives were converted to true negatives, it was observed that the mutations in the normal sample were heterozygous, whereas in the tumor sample, they were homozygous. This suggests a loss of heterozygosity (LOH) event, making the strategy of phasing and tagging most reads into the same haplotype seem correct. Have you considered this method?
Moreover, I noted in literature that the primary reason for choosing Longphase for phasing is its speed. We still have a speed advantage in haplotagging. ClairS employs parallel acceleration at the chromosome level and we can introduce a feature to specify a range. Could this reduce the training costs for you? I also conducted a haplotag test, and the results do not seem to show any significant differences.
haplotag | Precision | Recall | F1-score | TP | FP | FN |
---|---|---|---|---|---|---|
whatshap v1.7 | 67.12% | 77.64% | 72.00% | 30626 | 15001 | 8821 |
longphase v1.3 | 67.27% | 77.62% | 72.07% | 30617 | 14897 | 8830 |
Activity