Accuracy of programs for the determination of HLA alleles from NGS data

The human leukocyte antigen (HLA) genes code for proteins that play a central role in the function of the immune system by presenting peptide antigens to T cells. As HLA genes show extremely high genetic polymorphism, HLA typing on the allele level is demanding and is based on DNA sequencing. Determination of HLA alleles is warranted as many HLA alleles are major genetic factors that confer susceptibility to autoimmune diseases and is important for the matching of HLA alleles in transplantation. Here, we compared the accuracy of several published HLA-typing algorithms that are based on next generation sequencing (NGS) data. As genome screens are becoming increasingly routine in research, we wanted to test how well HLA alleles can be deduced from genome screens not designed for HLA typing. The accuracies were assessed using datasets consisting of NGS data produced using the ImmunoSEQ platform, including the full 4 Mbp HLA segment, from 94 stem cell transplantation patients and exome sequences from the 1000 Genomes collection. When used with the default settings none of the methods gave perfect results for all the genes and samples. However, we found that ensemble prediction of the results or modifications of the settings could be used to improve accuracy. Most of the algorithms did not perform very well for the exome-only data. The results indicate that the use of these algorithms for accurate HLA allele determination based on NGS data is not straightforward.


Introduction
Larjo et al. HLA alleles from NGS data present study is the use of the ImmunoSEQ sequencing platform (21), which primarily focuses 103 on identification of rare variants in immunologically relevant regulatory areas and includes the 104 full genomic sequencing of the MHC region. We selected both assembly-based and alignment-105 based methods, which increases methodological diversity. In addition, we used an ensemble 106 approach (12) to test whether accuracy could be increased by combining results from different 107 programs. The classifiers that are part of an ensemble should be diverse enough so that they do 108 not all produce the same erroneous result. Hence, it was essential to select programs that applied 109 different approaches, either assembly-based or alignment-based. We also describe some  The programs can be categorized into read-mapping and assembly-based approaches. Instead of 159 using a single reference, as is usually done for NGS data, the read-mapping methods are often

168
Below, we briefly describe the methods selected for comparison in this study, but refer the reader 169 to the references for further information on the programs.

171
All the methods were installed and used as instructed in their manuals. The exons of each HLA allele are then matched with the assembled contigs and the overall 186 difference (Hamming distance) is calculated as the sum of the differences of each exon in the 187 allele. Only alleles with no more than two mismatches and adequate coverage (20x read 188 coverage, minimum of 85% of exon sequence covered by best-hit contigs, more than 70% of     as "evidence" for the allele being absent. Thus, they contribute a vote for a missing allele, even 299 though such results might also be indicative of various typing problems (related to sequencing or 300 algorithms).

302
Selection of the majority-voted alleles is done by traversing V from the root node towards child 303 nodes, if there is at least one child node whose vote reaches the required threshold (i.e., more 304 than half of all the votes). If more than one child node has the same (highest) number of votes, 305 the child can be selected randomly. When it is not possible to go any deeper (due to going below 306 the voting threshold or being at a leaf node), the allele represented by this node is selected, and 307 the votes of all alleles that contributed to this node are subtracted from V. This selection is done 308 as long as there are nodes that have received more than half of all the possible votes. If there are 309 no nodes with sufficient votes to start with, the result is deemed ambiguous.

311
Depending on the threshold, the voting result can range from no alleles to several alleles.

312
Varying the threshold can be used to control the number of returned alleles. However, it is not 313 always possible to get, for example, just a pair of alleles as the result. This is due to possible ties 314 in voting. In this case, either one or three alleles may be given as the result for any gene. In our 315 comparisons, we used a threshold of 0.5. In the event of more than two alleles per gene, the 316 genes were investigated to see if there were alleles that could be combined at the level of the 317 accuracy of the reference typing. As the voting does not necessarily return only one allele pair Since not all clinical HLA typings were performed to the level of a single allele with a unique 389 amino acid sequence, we next tested whether the concordance rates would change if only those 390 cases with 4-digit allele assignments were included in the analyses. For class I the accuracies 391 were in general slightly lower, except for ATHLATES, which improved its accuracy. The 392 differences seen in class II gene results were negligible (data not shown).

394
The discordant results varied between the different software. However, some samples seemed to 395 be problematic in particular loci as the majority vote gave more than two possible assignments or 396 the consensus could be drawn only at the two-digit level. As an example, there were three 397 samples with a deviant majority vote result and a discordant HLA assignment with Omixon 398 Target ( Table 2). The discordant assignment could not be explained by low coverage or however, has its drawbacks. All methods returning the same allele may lull us into a false sense 475 of security regarding the allele call. It should be noted that even a unanimous result is not 476 necessarily the right one, but can, for example, be caused by problems in sequencing or in the 477 reference allele database. Use of sequencing methods that are based on short sequencing reads 478 results in sequence alignment problems when applied to allele interpretation of the highly 479 homologous HLA alleles and genes. We also noted that some of the problems with accuracy 480 were associated with the properties of the software. A good example is shown in Table 2 in 481 which the updated version of the Omixon program could resolve two of the three discrepancies 482 found using the older version. Furthermore, this example shows the problems related to partial 483 sequences known of some alleles: only exons 2 and 3 sequences of DQB1*03:22 are known. We 484 were also able to show that the ATHLATES program could be adjusted to give much more 485 reliable results by using modifiers as described by Lee et al. (20). We assume that similar 486 modifications could be done to other programs as well.

488
The results presented here for some of the programs were likely worse than the results that might Alternatively, even though the exons might not be fully captured, OptiType is still able to make 515 the most of the data, whereas some other programs had filtering criteria that could not be met 516 when the exons were sequenced only partly. One such program is HLAssign where the cDNA 517 allele sequences need to be fully covered in the alignment.

519
The results of the present study clearly indicate that selecting a program for HLA allele 520 determination based on NGS data that was not designed for HLA typing purposes is not simple 521 This process requires a good understanding of the type of NGS data produced and the HLA 522 frequencies in the study population. Some modifications in the programs, the adoption of an 523 ensemble approach or testing with multiple programs may be needed for accurate performance.