Skip to content

Enhancing somatic variant calling and execution speed #22

Closed
@sloth-eat-pudding

Description

Hello,

I am working with HCC1395 data, analyzing tumor samples at 75x coverage and normal samples at 45x coverage. I utilized Clair3 to process the normal.bam file, generating a normal.vcf. This file was then employed for phasing and haplotagging the tumor.bam, followed by using a somatic mutation caller. The results showed a notable decrease in false positives.

phase and haplotag Precision Recall F1-score TP FP FN
ClairS germline.vcf 67.12% 77.64% 72.00% 30626 15001 8821
Clair3 normal.vcf 72.50% 77.46% 74.90% 30556 11593 8891

In an instance where false positives were converted to true negatives, it was observed that the mutations in the normal sample were heterozygous, whereas in the tumor sample, they were homozygous. This suggests a loss of heterozygosity (LOH) event, making the strategy of phasing and tagging most reads into the same haplotype seem correct. Have you considered this method?

image

Moreover, I noted in literature that the primary reason for choosing Longphase for phasing is its speed. We still have a speed advantage in haplotagging. ClairS employs parallel acceleration at the chromosome level and we can introduce a feature to specify a range. Could this reduce the training costs for you? I also conducted a haplotag test, and the results do not seem to show any significant differences.

haplotag Precision Recall F1-score TP FP FN
whatshap v1.7 67.12% 77.64% 72.00% 30626 15001 8821
longphase v1.3 67.27% 77.62% 72.07% 30617 14897 8830

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions