The Diagnostic Performance of Minimally Invasive Biopsy in Predicting Breast Pathological Complete Response After Neoadjuvant Systemic Therapy in Breast Cancer: A Meta-Analysis

Background: Neoadjuvant systemic therapy (NST) is commonly used in patients with early stage breast cancer before definitive surgery. The standard diagnostic approach for pathologic complete response (pCR) of the breast is breast surgery and pathologic examination. In recent years, several trials investigated the predictive value of image-guided minimally invasive biopsy (MIB) for breast pCR after NST. This study conducted a meta-analysis to evaluate the diagnostic accuracy of MIB. Materials and Methods: We identified relevant research reports in online databases through February 2020. The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool was used to evaluate the quality of included trials. We extracted relevant data and constructed a 2 × 2 contingency table to analyze the predictive accuracy of MIB for breast pCR. Subgroup analyses and meta-regressions were also performed to investigate potential causes of heterogeneity. Results: Nine trials (with 1,030 breast cancer patients) were included in this meta-analysis. The pooled sensitivity and specificity of MIB were 0.72 [95% confidence interval (CI): 0.61–0.81] and 0.99 (95% CI: 0.89–1.00), respectively. By combining relevant data, there were no significant differences in sensitivity or specificity among different molecular subtypes of breast cancer (P > 0.05). Subgroup analyses and meta-regressions implied that trials with responses not limited to clinical complete response (cCR) had a significantly higher accuracy of MIB than those with only cCR (RDOR: 7.65; 95% CI: 1.05–55.46; P = 0.046). Conclusion: Current image-guided MIB methods are not accurate enough in terms of predicting breast pCR after NST. It is of utmost clinical importance to standardize the MIB procedure and incorporate other factors into the evaluation in order to improve the accuracy to an acceptable level.


INTRODUCTION
Neoadjuvant systemic therapy (NST) is used in approximately 30% of patients with early stage breast cancer before definitive surgery (1). Pathologic complete response (pCR) is an ideal response to NST, indicating the absence of residual cancer in a surgical specimen although it has different definitions (2,3). In recent years, with the improvement of neoadjuvant chemotherapy and targeted therapy, the pCR rates of breast cancer have increased dramatically. For triple-negative (TN) and human epidermal growth factor receptor 2-positive (HER2+) subtypes, pCR rates of up to 60-70% can be achieved with the administration of carboplatin regimens and dual HER2 blockage (4,5).
Achievement of pCR after NST is associated with less recurrence and favorable survival of breast cancer, and recent studies show that escalation of adjuvant systemic therapy could have additional benefits for patients with residual disease (non-pCR) (2,3,6,7). Currently, the standard diagnostic approach for pCR of the breast is breast surgery and pathologic examination of the specimen. As one of the main options of breast surgery, breast-conserving surgery after NST is universally performed, the oncologic safety of which has been confirmed by a series of studies (8,9). For patients with no residual cancer in the breast (breast pCR), it is reasonable to consider an omission of breast surgery (10,11). From this point of view, it is of great clinical significance to explore a less invasive method to predict breast pCR after NST.
Some studies focus on the accuracy and reliability of noninvasive imaging methods to predict breast pCR, but the results are far from satisfactory. Neither ultrasound nor mammography is reliable with false negative rates (FNR) ranging from 9 to 70% (12)(13)(14). Similarly, magnetic resonance imaging (MRI) demonstrated an FNR up to 30-50% in predicting breast residual tumor after NST (15)(16)(17). Imaging alone is not accurate enough to replace the pathologic examination of a surgical specimen.
In the present study, meta-analysis is performed to assess the diagnostic accuracy of MIB in predicting breast pCR after NST. We also perform subgroup analyses and meta-regressions to find which factors are associated with the predictive capability.

Search Strategy and Study Inclusion
This review was conducted according to the guidelines stipulated in Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) for diagnostic test accuracy (28). We searched electronic databases, including PubMed, EMBASE, and the Cochrane library, and the latest search was performed on February 25, 2020. In addition, we searched conference presentations and abstracts, such as San Antonio Breast Cancer Symposium (SABCS), American Society of Clinical Oncology (ASCO) meetings, and the European Society for Medical Oncology (ESMO) meetings held within the last 15 years. To identify relevant studies, the following terms were employed as queries: "breast cancer, " "neoadjuvant, " "biopsy, " and "pathologic complete response (OR pCR)" as well as MeSH terms "breast neoplasms" and "neoadjuvant therapy." We included clinical trials using the following criteria: (1) The investigation involved patients with early stage invasive breast cancer who received NST; (2) after completion of NST, the patients received MIB (e.g., CNB, VAB, and FNA) of the breast, followed by standard breast surgery (lumpectomy or mastectomy); (3) the trials report histopathologic results of both index tests (MIB) and reference standard tests (surgery) and provide measures of test accuracy (e.g., sensitivity, specificity, and false negative rate), which allowed construction of a 2 × 2 contingency table with absolute numbers of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) results. Studies meeting the following criteria were excluded: (1) reviews or case reports, (2) studies focusing on non-invasive tests (e.g., imaging examinations), (3) studies focusing on axillary evaluation rather than breast, and (4) studies without sufficient data even after attempting to contact the corresponding authors.

Data Extraction and Quality Evaluation
Data extraction and the evaluation of the quality of the studies were independently performed by two reviewers (YL and YZ).
In cases of discrepancies, consensus was reached by them. The following data were extracted: first author, country of origin, update year, study design, sample size, patient characteristics, procedures of MIB, breast pCR rate, test accuracy measures, and complications of biopsy. For investigations with more than one report, data were gathered from the most recent findings. Residual tumor was defined as "positive, " and the absence of residual tumor was defined as "negative" in both surgical and MIB specimens. For purposes of data synthesis, we extracted raw cell numbers for TP, TN, FP, and FN. If the numbers were not available, we employed the Revman calculator to generate these data based on the detailed information of test accuracy measures.
Eligible studies were evaluated for quality using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool.

Statistical Analysis
Inter-study heterogeneity was evaluated using the Cochran's Q statistics and I 2 -test. Heterogeneity was considered significant as either P < 0.05 or I 2 > 50%. In the absence of statistically significant heterogeneity, we calculated the pooled effect using a fixed-effects model; however, with significant heterogeneity, we employed the Mantel-Haenszel random effects model.
The measures of interest for pooled analyses included sensitivity, specificity, false negative rate (FNR = 1-sensitivity), negative likelihood ratio (NLR), positive likelihood ratio (PLR), and diagnostic odds ratio (DOR). Sensitivity and specificity of each study were used to plot the summary receiver operator characteristic (SROC) curve with the area under the curve (AUC) to indicate test accuracy.
For those studies reporting detailed information of specific molecular subtypes of breast cancer, we performed pooled analyses according to different molecular subtypes (TNBC, HER2+, and HR+HER2−). We also conducted subgroup analyses and meta-regressions according to region, study design, tumor response to NST, and sample size in evaluating potential causes of heterogeneity. The relative DOR (RDOR) was used to assess the heterogeneity of subgroups.
The analysis was performed using Stata 12.0 software (Stata Corporation, College Station, TX, USA), RevMan 5.3 software (the Cochrane Information Management System), and Meta-Disc (XI Cochrane Colloquium; Barcelona, Spain). P < 0.05 (two-sided) was considered to be statistically significant.

Characteristics of the Studies
A total of nine clinical trials fulfilled the eligibility criteria. Figure 1 shows the search and selection process used in this study. Altogether, the nine trials described a total of 1,030 patients. Among the trials included, five had published articles with full text, and the other four were reported only as abstracts at annual conferences; five were single-center studies, and four were multicenter studies; eight were prospective, and only one was retrospective; six of them were carried out in Europe, two in America, and one in Asia ( Table 1).
With regard to the response to NST of included patients, three of the trials only included patients with clinical complete response (cCR) by imaging and physical examination, and six of them included patients with responses not limited to cCR (plus other responses, e.g., clinical partial response, "cPR") ( Table 1).
The procedures of MIB of all of the included studies were guided by imaging methods (ultrasound, mammography, or both). Three of the studies used CNB, three of them used Frontiers in Oncology | www.frontiersin.org VAB, two used CNB and/or VAB, and one used a combination of FNA and VAB. Standard breast surgeries (lumpectomy or mastectomy) were performed following the biopsy ( Table 1).
Histopathological evaluation of both MIB and surgical specimen was performed to see whether residual tumors existed. In all of the nine included trials, breast pCR was defined as the absence of cancer (both invasive and in situ, "ypT0").
The quality assessment of included studies by the QUADAS-2 tool is shown in Figure 2. One study has a high risk of bias for patient selection due to a lack of a clear description of the inclusion criteria. Risks of bias for the index test and reference test were primarily caused by a lack of reported blinding when performing pathologic examinations.

Pooled pCR Rate According to Histopathologic Examinations of Surgical Specimens
The pCR rates confirmed by histopathologic examinations of surgical specimens of the included nine studies ranged from 10.0 to 67.5%. A random-effects model was employed due to the significant heterogeneity (P < 0.01, I 2 = 85.5%). The pooled pCR rate was 49.0% with a 95% CI of 40.0-57.1% (Figure 3).
The AUC of the SROC provides a global summary of the diagnostic performance of included studies. The AUC was 0.90 (0.87-0.93) as shown in Figure 5. An AUC score of 1.0 indicates a perfect diagnostic performance.

Subgroup Analysis and Meta-Regression
We also performed subgroup analysis and meta-regression according to region, study design, number of centers, response to NST, and sample size. In terms of response to NST, the studies with responses not limited to cCR had a significantly higher accuracy than those with only cCR (RDOR: 7.65; 95% CI, 1.05-55.46; P = 0.046). However, meta-regression analysis showed that other factors had no significant influence on the diagnostic performance of MIB (P > 0.05; Table 3).

Complications of MIB
Only two of the studies described complications after MIB. Basik et al. reported that 7.1% of the patients had postprocedure complications, with hematoma being the most common. However, Kuerer et al. reported a complication rate of 20.0% with bleeding as the most common, followed by hematoma [ Table 1; (19,23)].

Publication Bias Assessment
Publication bias was not analyzed because of the limited number of included studies (29).

DISCUSSION
In current clinical practice, breast surgery is an important part of standard multimodal treatment for breast cancer patients after NST, and it is also the primary approach to evaluating the pathologic response to NST. Without pathologic examination of surgical specimens, a definitive pathologic diagnosis cannot be obtained. Breast-conserving surgery, followed by radiotherapy and adjuvant systemic therapy, has being confirmed safe in terms of cancer recurrence and overall survival (30). The smaller the tumor size after NST, the less tissue needs to be excised as long as there is no residual tumor in the margins (8). Thus, for patients with breast pCR after NST, complete avoidance of breast surgery is a reasonable option (11,31). Therefore, it is extremely important to explore a non-surgical approach that can replace breast surgery in predicting pCR for breast cancer patients after NST.
Multiple studies evaluated the reliability of imaging approaches, including ultrasound, mammography, and MRI, in predicting breast pCR after NST. However, the accuracy of these techniques was far from satisfactory (12)(13)(14)(15)(16)(17). Some early studies reported that using radiotherapy alone (without  breast surgery) as the local treatment approach for breast cancer patients with cCR on imaging will lead to much higher rates of relapses (31)(32)(33). In recent years, there has been increasing interest in investigating the diagnostic performance of minimally invasive approaches.
To evaluate the diagnostic accuracy of image-guided MIB to identify residual cancer in the breast after NST, FNR (1sensitivity) is a measure of paramount importance. "False negative" means that patients with the residual disease are diagnosed as "pCR, " which might lead to the incorrect omission of surgery and de-escalation of systemic therapy. On the contrary, "false positive" is of less importance because all of the residual tumors might be removed by MIB, and it is possible that the tumor was present only in the MIB specimen but not in the surgical specimen. In addition, if the MIB made a "positive" diagnosis, subsequent breast surgery is mandatory to remove potential residual disease. With regard to the maximum acceptable FNR, which will not translate into significantly worse survival outcomes, most of the included trials deduced 10% from the study design of sentinel lymph node trials (34,35). An FNR <10% was considered acceptable although there is a lack of evidence.
This meta-analysis showed that the pooled sensitivity of image-guided MIB for the diagnosis of residual disease in the breast was 0.72, which means that the FNR was as high as 28%. An FNR of 28% in this pooled analysis is obviously far from accurate, which means that a large proportion of patients with residual disease would be diagnosed as "pCR." Thus, breast surgery might be incorrectly omitted. Furthermore, escalation TNBC, triple-negative breast cancer; HR, hormonal receptor; HER2, human epidermal growth factor receptor 2; FNR, false-negative rare; PLR, positive likelihood ratio; NLR, negative likelihood ratio; DOR, diagnostic odds ratio; CI, confidence interval.
of adjuvant systemic therapy would not be administered to the patients diagnosed as "pCR." Based on these consequences, it is reasonable to believe that missed residual tumor in the breast will eventually lead to insufficient treatment and, in the long run, lead to more relapses and worse survival outcomes. However, there is a lack of evidence about the recurrence and survival outcome of breast cancer patients with MIB-confirmed "pCR, " who forgo breast surgery accordingly. A clinical trial at MD Anderson is in the accrual phase, and the aim is to evaluate the survival consequence of eliminating breast cancer surgery in patients with VAB-confirmed "pCR" (36).
In terms of molecular subtypes of breast cancer, by combining relevant data, our pooled analysis showed that sensitivities, specificities, FNRs, and DORs had no significant differences among different subtypes. Different molecular subtypes of breast cancer showed different patterns of tumor shrinkage to NST: TNBC and HER2+ subtypes mostly exhibit concentric shrinkage, and luminal types mostly exhibit "honeycomb-like" shrinkage featured as scattered tiny foci of tumor and diffuse cell loss (37). This heterogeneous fashion of "honeycomb-like" pattern may increase the likelihood of wrong sampling of MIB, leading to low diagnostic accuracy. One of the trials included only TNBC and HER2+ subtypes and obtained a FNR of 5% (23). However, other trials reported contradictory results (19,24). Future studies are needed to clarify the effects of molecular subtype on the accuracy of MIB.
This study also implied that trials with only response of cCR to NST had worse diagnostic accuracy than trials with responses not limited to cCR. Breast tumor evaluated as cCR by imaging methods has no clearly visible lesions. In this circumstance, MIB is always guided by clip markers placed in the tumor location prior to NST. Nevertheless, the location of clip markers is not necessarily where the tumor used to be, especially as the local tissue has changed dramatically after NST. As a result, this guidance method inevitably increases the likelihood of sampling error. For the tumor that is visible on the imaging or even palpable, it is more likely that the biopsy catches the tumor tissue. The uncertainty of clip-guided biopsy may partly explain the finding of this study, and it is helpful for further discussing the most appropriate patient group to have MIB.
It is important to notice that the procedures of MIB of the included trials varied a lot, and that is one of the main origins of heterogeneity. Many aspects of the MIB procedure can determine the diagnostic performance, including guiding imaging methods (ultrasound, mammography, and MRI), biopsy apparatus (CNB or VAB), size of needle, number of cores, and number and location of clip markers. Lee et al. reported that if more than five samples were obtained, the FNR would be around 10% (27). Heil et al. reported that using a 7G needle dramatically increases the accuracy of VAB (18). However, these included trials did not provide enough data for a pooled subgroup analysis in terms of the MIB procedure. It is of utmost importance for future studies to propose a standardized MIB procedure, which can achieve the highest predictive accuracy.
It is reported that there are other factors that are associated with pCR rates after NST for breast cancer. These factors include tumor-infiltrating lymphocytes and several specific biomarkers (38)(39)(40). Moreover, prior studies have already confirmed the predicting value of some clinical/pathological characteristics, including molecular subtypes, imaging response, and chemotherapy regimens (16,17,41). In the exploratory analysis of one of the included trials, Kuerer et al. reported that combining imaging appearances and MIB results can achieve an even higher predicting accuracy for pCR (23). With more and more research data accumulated, it is possible to incorporate multiple potentially relevant factors into a predictive model, which would dramatically improve the accuracy of confirming residual tumors after NST. In the construction of a predictive model with multiple parameters, artificial intelligence can play an important role (42)(43)(44). Only when a reliable non-surgical tool to rule out residual disease has been developed can omission of breast surgery be clinically feasible and safe.
The main benefit of omitting breast surgery is avoiding potential complications, better aesthetic appearance, and higher quality of life. Considering the current unaccepted accuracy of the MIB method and high possibility of missing residual disease, the benefit of surgery omission should not be achieved at the cost of oncologic safety. Furthermore, with the development of oncoplastic surgery, surgical complications, and the appearance deficits have increased significantly (45,46). Breast surgery cannot be replaced in excellent responders of NST until the accuracy of MIB has improved to a reliable level.
There are some limitations to the present meta-analysis. First, the participants of included trials had various clinical and pathological characteristics, and these different characteristics, including age, molecular subtype, response to NST, and chemotherapy regimens, will inevitably contribute to heterogeneity. We performed meta-regressions to investigate the effects of some of these factors on pCR prediction. Second, our pooled analysis included trials with diverse MIB procedures, thereby causing bias. However, as mentioned above, there was not enough information about biopsy procedures from the included trials to make further subgroup analyses. Third, we extracted data from conference abstracts of some of the included trials, and thus, the information was not complete and may increase the difficulty of data extraction and quality assessment.
In conclusion, current image-guided MIB methods are not accurate enough in terms of predicting breast pCR after NST for breast cancer patients. The predicting accuracy is not significantly different among different molecular subtypes of breast cancer. Including patients with only cCR to NST may lead to worse prediction accuracy. It is of utmost clinical importance to standardize the MIB procedure and incorporate other factors into the evaluation in order to reduce the FNR to an acceptable level.

DATA AVAILABILITY STATEMENT
All datasets presented in this study are included in the article/supplementary files.

AUTHOR CONTRIBUTIONS
YLi, YZ, FM, and QS: conception and design. YLi and QS: administrative support. YLi, YZ, FM, YLin, XZ, and SS: data analysis and interpretation. All of the authors: manuscript writing and final approval of manuscript.