High Prevalence of Lactobacillus crispatus Dominated Vaginal Microbiome Among Kenyan Secondary School Girls: Negative Effects of Poor Quality Menstrual Hygiene Management and Sexual Activity

The vaginal microbiome (VMB) impacts numerous health outcomes, but evaluation among adolescents is limited. We characterized the VMB via 16S rRNA gene amplicon sequencing, and its association with Bacterial vaginosis (BV) and sexually transmitted infections (STIs; chlamydia, gonorrhea, trichomoniasis) among 436 schoolgirls in Kenya, median age 16.9 years. BV and STI prevalence was 11.2% and 9.9%, respectively, with 17.6% of girls having any reproductive tract infection. Three community state types (CST) accounted for 95% of observations: CST-I L.crispatus-dominant (N=178, BV 0%, STI 2.8%, sexually active 21%); CST-III L.iners-dominant (N=152, BV 3.3%, STI 9.7%, sexually active 35%); CST-IV G.vaginalis-dominant (N=83, BV 51.8%, STI 25.3%, sexually active 43%). In multivariable adjusted analyses, sexually active girls had increased odds of CST-III and CST-IV, and use of cloth to manage menses had 1.72-fold increased odds of CST-IV vs. CST-I. The predominance of L.crispatus-dominated VMB, substantially higher than observed in prior studies of young adult and adult women in sub-Saharan Africa, indicates that non-optimal VMB can be an acquired state. Interventions to maintain or re-constitute L.crispatus dominance should be considered even in adolescents.


INTRODUCTION
Globally, adolescent girls and young women account for at least one-third of the 357 million curable sexually transmitted infections (STIs) occurring each year (Blum and Nelson-Mmari, 2004;Newman et al., 2015;Population Reference Bureau, 2021). STIs are syndemic with HIV. UNAIDS reports that 15% of all women living with HIV are aged 15-24 years old, 80% of whom live in sub-Saharan Africa, and 31% of new HIV infections are among adolescent girls (Amornkul et al., 2009). In western Kenya, HSV-2 increases dramatically from 10% in 13-14 year-old girls to 28% in 15-19 year-olds  Kenya Ministry of Health National AIDS and STI Control Programme (NASCOP), 2020). HIV prevalence increases from 1.2% among 15-19 year-old girls to 3.4% of those aged 20-24, compared to 0.5% and 0.6% for males of the same age (Amornkul et al., 2009). Among adolescent girls, the HIV/STI epidemic overlaps with broader reproductive health concerns. For example, to attend school and obtain necessities such as sanitary products, soap and underwear, girls often engage in exchange sex (Phillips-Howard et al., 2015). Menstrual hygiene management (MHM) is a pervasive problem across low-and middle-income countries and a lack of MHM materials negatively impacts girls' health and schooling (Sommer et al., 2016). Phillips-Howard et al. conducted a cluster randomized study of 644 girls aged 14-16 years old, comparing reusable menstrual cups to control condition of menstrual hygiene counseling (Zevin et al., 2016). After one year, menstrual cup use resulted in 35% reduction (p=0.034) in Bacterial vaginosis (BV) prevalence and 52% reduction (p=0.039) in STI prevalence compared to control condition. This decrease in STIs and BV may have been mediated by sexual practices, or the menstrual cups themselves.
Bacterial vaginosis affects 20-50% of general population women in sub-Saharan Africa (Torrone et al., 2018), and increases the risk of HIV acquisition and transmission (accounting for up to 15% of HIV infections) (Atashili et al., 2008), multiple adverse pregnancy outcomes (Eschenbach et al., 1984;Hay et al., 1994;Ralph et al., 1999), and is consistently associated with chlamydia, gonorrhea, HPV and HSV-2 (van de Wijgert et al., 2014). BV typically represents a vaginal microbiome (VMB) that is highly diverse (i.e., having many different types of bacteria) and depletion of Lactobacillus species (McKinnon et al., 2019). Using 16S rRNA gene amplicon sequencing, commonly occurring vaginal community state types (CST) have been identified (Ravel et al., 2011). CST-I (Lactobacillus crispatus dominated) has been considered an advantageous state, due to the demonstrated protective mechanisms of L. crispatus [e.g., maintaining acidic vaginal pH, inhibiting growth of pathogenic bacteria, activating immune cells, production of antibacterial substances, etc. (Lewis et al., 2017;Kovachev, 2018)], and due to the consistent protective association of L. crispatus against BV, HIV, HPV and other sexually transmitted infections (STIs) (van de Wijgert et al., 2014;McKinnon et al., 2019). On the other hand, CST-IV (a high diversity vaginal community that is usually depleted of lactobacilli) is considered a non-optimal vaginal microbiome, or "molecular BV" (McKinnon et al., 2019), and has been associated with epithelial barrier disruption and enhanced immune activation, even in the absence of clinical BV diagnosis (Zevin et al., 2016). The VMB in relation to menstrual hygiene practices and period characteristics (e.g., duration, cramping, flow) has not been rigorously assessed, and is especially lacking among adolescent girls.
We are currently evaluating the effect of menstrual cups on the VMB, BV, and STIs among secondary schoolgirls enrolled in a cluster randomized controlled trial in Siaya County, western Kenya . In the current analysis, we characterized the baseline VMB and factors associated with VMB composition in relation to sexual activity, MHM practices and menstrual characteristics, and presence of BV or STIs.

Study Setting
This study used baseline data and biological specimens from the Cups and Community Health (CaCHe, pronounced "Cash-Ay") study, a prospective cohort study of adolescent secondary school girls in Siaya County. The CaCHe study is nested in Cups or Cash for Girls (CCG), a large cluster randomized controlled trial assessing the impact of menstrual cups and cash transfer interventions on a composite outcome of school dropout, HIV and HSV-2   year-old women and girls was 8.9%, compared to 3.2% among boys and men of same age (Borgdorff et al., 2018).

Study Design and Participants
The CCG trial is an open-label, 4-arm, school-cluster randomized controlled superiority trial. Schools were allocated into 4 arms (1:1:1:1) via block randomization: (1) provision of menstrual cups with training on safe cup use and care; (2) conditional cash transfer (CCT) based on >80% school attendance in previous term; (3) menstrual cup and CCT; and (4) usual practice. All girls received puberty and hygiene education.
For the CaCHe study, nested within the CCG trial, we aimed to enroll 20% of girls in the cup only and control arms of the CCG trial. Eligibility for CaCHe followed eligibility for CCG: attendance at a selected school, being a resident of the study area, provision of assent and parental/guardian consent, and girls had to report established menses (> 3 times). Girls were excluded if they declared pregnancy at baseline.

Data Collection
Following written informed parental consent and assent from minors, participants self-completed a tablet-based survey in their language of choice (English or DhoLuo) to obtain sociodemographic information and to assess sexual and MHM practices. Study nurses and counsellors trained in research and survey administration provided assistance or conducted interviews as needed. Socio-demographic data included age and assessment of household amenities, including water source, light source, latrine type, and possession of a television. Household mobile phone possession was 98% and was not used in the analyses. A household amenity score having range 0-4 was created, with one pint each for piped water source, electricity for light source, flush toilet, and possession of a television. Ever having sexual intercourse was assessed in two questions to differentiate forced sex from willing sex, and via a series of questions around exchange sex (sex in exchange for money, goods, or favors).

Sample Size
CaCHe was designed to estimate the effect of menstrual cups on girls' risk of BV, with an anticipated cumulative event rate of 30-40% among controls occurring over 30 months. In a design of 6 repeated measurements having AR(1) covariance structure, correlation between observations on the same subject ranging 0.25 to 0.4, and accounting for 20% loss to follow-up, group sample sizes of 220 in cup arm and 220 in control arm would achieve >80% power to detect 25% reduced prevalence of BV for the cup arm compared to control arm when BV prevalence is 30%, and 97% power when prevalence is 40% [p=0.05 two-sided test, two proportions in a repeated measures design; PASS v15 (Hintze, 2014)].

Specimen Collection
At baseline and each follow-up visit, girls were asked to take four self-collected vaginal swabs. The first swab obtained was for 16S rRNA gene amplicon sequencing (microbiome), the second for BV, the third for detection of C. trachomatis (CT) and N. gonorrhoeae (NG), and the fourth for detection of T. vaginalis (TV). Prior to vaginal swab collection, girls were given oral and graphic instruction on how to collect the swabs. Girls were instructed to insert each swab approximately 2-3 centimeters into the vaginal opening and to twirl the swab for 20 seconds. Each girl obtained her swabs in a private, enclosed area, with a nurse or female field assistant aiding girls with sample collection one on one. The nurses or research assistant handed the swabs to the girls sequentially, timing each collection for 20 seconds while girls were instructed to twirl, and then retrieving before passing the next swab. Nurses and research assistants prepared smears for BV immediately, with a lab assistant checking each slide for sufficiency after air drying. Swabs for amplicon sequencing, CT/NG, and TV were immediately placed on ice packs in coolers for transport.

Detection of Bacterial Vaginosis, Sexually Transmitted Infections, and HIV
Upon receipt at the lab, specimens for amplicon sequencing were placed at -80°C until shipment to Chicago for processing. Vaginal swabs for amplicon sequencing were collected using OMNIgene Vaginal kits (OMR-130; DNA Genotek TM ). Swabs for CT/NG were shipped weekly for processing at the University of Nairobi Institute for Tropical and Infectious Diseases (UNITID). Following manufacturer protocol, vaginal swabs were tested for CT/NG using the GeneXpert (Cepheid, Sunnydale, California, US). Swabs for TV were processed immediately upon receipt using the OSOM TV antigen detection assay (Sekisui, Lexington, MA, US). Air-dried smears prepared from self-collected vaginal swabs were Gram stained and evaluated according to Nugent's criteria within 48 hours of receipt; a score of 7-10 was defined as BV (Nugent et al., 1991). Finger-stick whole blood collected in EDTA tubes were tested for HIV according to Kenyan national guidelines (National AIDS and STI Control Programme, 2015). HIV positive girls were linked to care.

STI and BV Treatment
CT, NG, and TV were treated following Kenyan National guidelines (National AIDS et al., 2015). Treatment of BV was with 2g of tinidazole once daily for two days. While not a specified regimen in the Kenyan national guidelines, we followed this alternative treatment recommendation as per U.S. Centers for Disease Control and Prevention (CDC, 2015), British Association for Sexual Health and HIV (BASHH, 2021), and International Union against Sexually Transmitted Infections/ World Health Organization (IUSTI/WHO) (Sherrard et al., 2018), due to concerns of greater likelihood of gastrointestinal symptoms and decreased adherence with the longer duration regimens for metronidazole. While guidelines currently do not recommend treatment for asymptomatic BV, we treated all girls with Nugent score 7-10 due variability in recognition and reporting of symptoms (Lewis and Laurent, 2020), and potential benefits as reported in BASHH and IUSTI/WHO guidelines (Sherrard et al., 2018;BASHH, 2021), and due to the high proportion of girls reporting vaginal discharge (23%) overall, which did not differ by BV or STI status ( Table 1). Treatment was documented for 48/49 (98%) of girls with BV, 27/ 27 (100%) with CT, 6/6 (100%) NG, and 13/14 (93%) TV.

DNA Extraction and Amplicon Sequencing and Annotation
Genomic DNA (gDNA) was used as template for PCR amplification of the V3-V4 variable region of bacterial 16S rRNA gene according to a two-stage PCR protocol using primers 341F and 806R, as described previously (Naqib et al., 2018;Mehta et al., 2020). After pooling of barcoded samples, amplicons were sequenced on an Illumina MiSeq instrument, implementing V3 chemistry (600 cycles). DNA extraction, library preparation and sequencing were performed by the Genome Research Core (GRC) at the University of Illinois at Chicago (UIC). Forward and reverse reads were merged using the software package PEAR (Zhang et al., 2014). Quality and primer trimmed sequence data were then processed using a standard bioinformatics pipeline for chimera removal, and annotation was conducted by University of Maryland Institute for Genomic Science (UMD IGS) (Holm et al., 2019). Subsequently, a biological observation matrix was generated at the lowest taxonomic level identifiable. Vaginal CST were identified in a reference dataset using nearest centroid classification (VAginaL community state typE Nearest CentroId clAssifier; VALENCIA) as described in (France et al., 2020). Data were filtered to retain taxa that contributed at least 0.05% of the total sequence reads, resulting in retention of 26 vaginal taxa. There were 5 observations with <5,000 sequence reads which were excluded from analyses.

Statistical Analysis
In this cross-sectional analysis, we examined two questions: (1) how the baseline VMB composition varied by whether girls were sexually active, and BV and/or STI presence; (2) how the baseline VMB composition varied by menstrual management practices and period characteristics. Stacked bar plots summarizing taxa with highest relative abundance were created using Stata/SE 15. Alpha diversity indices were calculated at the amplicon sequence variant level using filtered data after rarefaction to a depth of 5,000 sequence per sample (vegan) (Gihring et al., 2012). We tested for global differences in vaginal community composition by BV and nonulcerative STI status using analysis of similarity (ANOSIM) of the Bray Curtis resemblance matrix; ANOSIM is a nonparametric statistical test that assesses whether observations within a group are more similar to each other than to another group, in this way detecting differences between groups (Clarke, 1993). We visualized the relationship of global bacterial communities by BV and STI status using non-metric multidimensional scaling (NMDS) of bootstrapped averages of centroids with 100 replicates for each of the four groups representing outcome states (negative for both BV and STIs, positive for STI only, positive for BV only, positive for BV and STI). Bootstrapping is a resampling procedure that was used to estimate standard errors that allowed statistical inference on the differences between groups (Paliy and Shankar, 2016). ANOSIM and NMDS procedures were conducted in Primer-E, version 7, United Kingdom. We used multinomial logistic regression to quantify associations between explanatory factors (e.g., age, material used to manage menses, sexual activity) and CST, and Poisson regression with robust variance estimate (Barros and Hirakata, 2003) was used to quantify associations between explanatory factors and BV or STI. Because school was the unit of randomization and there were differences in distribution of socio-demographics, sexual activity, BV and STIs by school (Supplementary Table 1), we included a random effect for school in models of CST, BV, and STI. Multinomial logistic regression and Poisson regression were conducted in Stata/SE 15. Explanatory variables that were associated with outcomes at the p<0.10 level were entered in multivariable regression, and those with Wald p-value <0.05 were retained in multivariable models.
To identify specific taxa associated with Nugent BV and STIs, we used stability selection for feature selection [stabs package, implemented in R (Meinshausen and Bühlmann, 2010)]. In this approach, we applied ElasticNet regression to 250 randomly generated subsets of the vaginal microbiome data and used a cutoff of p<0.20 in combination with detection of a specific taxa in at least 60% of subsets. We chose ElasticNet regression as its ridge regression penalty supports inclusion of highly correlated variables while maintaining sparsity (Zou and Hastie, 2005). Prior to feature selection, data were center log ratio transformed following geometric Bayesian multiplicative prior imputation of zeros [zCompositions package, implemented in R (Palarea-Albaladejo and Martin-Fernandez, 2015)], to address sparsity while maintaining read depth. As a supplementary analysis, we also identified taxa that differed by BV and STI status using similarity of percentage analysis (Clarke, 1993), which determined the percent contributions of individual taxa to the Bray Curtis dissimilarity between groups (Primer-E, version 7, United Kingdom).

Study Population
The median age of girls was 16.9 years (interquartile range 16.0 -17.9) ( Table 1). The median household amenities scorea summed score of flush toilet, piped water, electricity, and televisionwas zero. Majority of girls reported traditional pit (45.7%) for latrine, surface water as main water source (59.3%), and kerosene for lighting (39.7%), with 24.1% having a television. Many girls reported having been to a health facility in the past 6 months, with 20% (n=85) reporting antibiotic use in the past 30 days, primarily for fever (n=64) and generally in combination with other symptoms (such as respiratory or diarrhea). Nearly one-third (30.2%) of girls reported any prior sexual intercourse and, of those, 54% reported that they had been forced or tricked to have sex. Among sexually active girls, just 8.5% reported using a hormonal contraceptive for family planning (n=6 injectable, n=4 implant, n=1 pill) and this sparsity prevented evaluating associations with BV, STIs, or CST.
The global difference in bacterial community composition by BV and STI outcome was statistically significant (ANOSIM test, p=0.001; Supplementary Table 2); all pairwise comparisons were statistically significant (p=0.001, each) except for the comparison of communities in which the participant was positive for both BV and STI versus positive for BV and negative for STI (p=0.871). This difference in VMB composition is visualized in non-metric dimensional plots of the bootstrapped averages of the centroids of the four possible states of outcome ( Figure 3A). The distribution of CST differed by BV and/or STI outcome ( Figure 3B) and results of stability selection identified specific taxa differences between BV and STI outcomes: L. jensenii, Dialister succinatiphilus, Sneathia sanguinegens, Megasphaera, and Lactobacillus spp. (Lactobacillus identified at the genus level, but species was not identified) were associated with BV, while Megasphaera, Atopobium vaginae, S. sanguinegens, and L. crispatus were associated with STI (Table 3 and Figure 3C). In supplementary analysis, the taxa contributing most to Bray Curtis dissimilarity between BV status (Supplementary Table 3) and STI status (Supplementary Table 4) were similar, with notable differences: G. vaginalis had strong contribution to BV positive and STI positive status, and while ElasticNet identified L. jensenii but not L. crispatus in association with BV, L. crispatus was the top differentiating taxa by Bray Curtis dissimilarity while L. jensenii was not identified as an important taxon. Girls with BV and STI had higher alpha diversity metrics (Shannon, Simpson, evenness, richness) (Figures 4, 5), and this is in keeping with the greater frequency of diverse CST-IV among girls with BV and STIs.

The Distribution of Community State Type Varied by Sociodemographic and Behavioral Characteristics
Household amenities scores were higher for girls with CST-II and CST-V (p=0.041), though numbers are small in these CSTs ( Table 2). The median BMI was marginally higher for girls with CST-IV (p=0.094), in keeping with the association we observed between BMI and BV. There were no statistically significant differences in MHM or period characteristics by CST, though cloth use was more common in CST-III (29.1%) and CST-IV (31.3%) than CST-I (19.3%) (p=0.143), and when restricted to these three CSTs the difference was statistically significant (p=0.050). Notably, any sexual activity (willing or forced) was reported by 43.8% of girls with CST-IV, compared to 21% of girls with CST-I, and 35.1% of girls with CST-III. Looking at this in transpose, among girls reporting never being sexually active, 46.6% had CST-I VMB, 32.9% with CST-III, and 15.1% with CST-IV, while among girls reporting having been sexually active, 28.7% had CST-I, 41.1% CST-III, and 27.1% CST-IV (Figure 2).

Prevalence of Bacterial Vaginosis and Sexually Transmitted Infections
The prevalence of STIs was 9.9% (3.0% TV, 6.2% CT, 1.4% NG), and the prevalence of BV was 11.2%. There was substantial coinfection with 31% of girls with BV having an STI, and 35% of girls with an STI also having BV ( Figure 6).
Only two variables were associated with both BV and STI: increasing age and ever having had sex willingly ( Table 1). Increasing BMI was associated with BV, but not with STIs. Unsurprisingly, ever reporting willing and/or forced/tricked sexual activity was more common among girls with detected BV or STI, though 52% of girls with BV and 39% of girls with STI reported never having had any type of sexual intercourse. Among girls who reported any sexual exposure, the distribution of condom use, number of lifetime partners, and age of most recent male sex partner did not differ by BV or STI status (Supplementary Table 6). No individual MHM practices were associated with BV or STI.

DISCUSSION
The major findings in our analyses are: (1) L. crispatus dominant CST-I was the most common vaginal community state type, and was more likely among girls who did not report sexual activity.
(2) Girls who used cloth to manage their menses were more likely to have CST-III or non-optimal CST-IV than CST-I. (3) The prevalence of BV and STIs was high. The majority of girls had a L. crispatus (41%) or L. iners (35%) dominant vaginal CST. This is important because numerous studies show that women of African descent are more likely to have non-optimal CST-IV vaginal community type (Ravel et al., 2011;Lewis et al., 2017). In our study of adult women (median age 23 years) in long-term sexual relationships who resided in Kisumu (approximately 70 km from Siaya County), at baseline 8.7% had L. crispatus dominant CST-I, 42% had L. iners dominant CST-III, and 47.2% had non-optimal CST-IV (Mehta et al., 2020). That such a high proportion of native Kenyan adolescent girls in our current study had L. crispatus dominant CST-I clearly indicates that this is a common phenotype and is most likely altered as girls become sexually active, as reflected by the increased odds of association with CST- FIGURE 4 | Distribution of alpha diversity metrics by Bacterial vaginosis (BV) status. Legend: The distribution of alpha diversity metrics (Simpson, Shannon, Richness, and Evenness) are shown on the y-axis, separately for girls with Nugent score 0-6 ("BV Neg", N=382) and Nugent score 7-10 ("BV Pos", N=49) on the xaxis. Within panels, each grey dot represents a single observation. Box plots indicate the median (horizontal bar) and interquartile range (lower 25 th percentile and upper 75 th percentile). Below each graph, the median value for each alpha diversity metric is reported for "BV Neg" and "BV Pos" observations, with Wilcoxon rank sum p-value of the comparison reported beneath the medians.
III (aOR 2.00) and CST-IV (aOR=2.58) compared to CST-I for girls ever having had sexual exposure, adjusted for age, socioeconomic measure, and cloth use for menses. The association between older age and CST-III and CST-IV may represent that older girls have different types of sex partners, different sexual practices, or may have been sexually active longer. As the cohort is ongoing, our eventual longitudinal evaluation will be able to quantify this change over time as girls become sexually active, and among those becoming sexually active we will be able to examine the association with sexual practices and partner characteristics. This finding has implications for the design of behavioral and biological interventions, indicating that a non-optimal VMB composition may be preventable and that adolescence could be a critical intervention point for preventing adverse reproductive health outcomes. Poor quality menstrual hygiene is modifiable, through provision of cheap accessible hygienic products instead of cloth use, and could have substantial biological consequence. Cloth use may promote non-optimal vaginal microbiome through facilitation of anaerobic bacterial growth, through improperly washed fabric (i.e., direct transfer of bacteria), or an occlusive environment. In a district level household survey of 577,758 women aged 15-49 years in India, those who used cloth during menses were more likely to report vaginal discharge in the past 3 months (aOR=1.30), adjusted for age, gynecologic factors, and socioeconomic indicators FIGURE 5 | Distribution of alpha diversity metrics by Sexually Transmitted Infection (STI) status. Legend: The distribution of alpha diversity metrics (Simpson, Shannon, Richness, and Evenness) are shown on the y-axis, separately for girls testing negative for all three STIs ("STI Neg", N=43) and testing positive for any STI ("STI Pos", N=388) on the x-axis. Within panels, each grey dot represents a single observation. Box plots indicate the median (horizontal bar) and interquartile range (lower 25 th percentile and upper 75 th percentile). Below each graph, the median value for each alpha diversity metric is reported for "STI Neg" and "STI Pos" observations, with Wilcoxon rank sum p-value of the comparison reported beneath the medians.  (Anand et al., 2015). Cloth use for menses has also been associated with BV among women in Tanzania (Baisley et al., 2009) and in India (Torondel et al., 2018), and tampon use has been associated with VMB composition among women in the United States (Noyes et al., 2018). Alterations in vaginal flora during menses may be modified with other MHM products such as the menstrual cup. Menstrual cups are medical grade silicone bell chambers inserted vaginally to collected menstrual flow. Among 406 U.S. women having 4,750 collective days of menstrual cup use, colonization with Lactobacillus was maintained at pre-cup use levels with no change in pH or colonization with S. aureus, G. vaginalis, or Bacteroides spp (North and Oldham, 2011). Systematic review with metaanalysis suggests that menstrual cups are a safe option for menstrual hygiene in low-, middle-, and high-income countries (van Eijk et al., 2019). In a cluster randomized controlled feasibility study of 644 girls aged 14-16 years old, Phillips-Howard et al. randomized girls by school cluster 1:1:1 to reusable menstrual cups, disposable sanitary pads, or standard water, sanitation and hygiene counseling (Phillips-Howard et al., 2016). The prevalence of BV (Gram stain Nugent score 7-10) was reduced by 35% (aPRR=0.65; p=0.034) for menstrual cup users (13%) compared to pad users (20%) and control subjects (19%). Menstrual cup use also resulted in 52% (p=0.039) reduction in the prevalence of STIs (composite measure of N. gonorrhoeae, C. trachomatis, T. vaginalis). In our current analysis, cloth use was more common among girls with BV (29.2%) than without BV (24.5%), and for girls with STI (31.7%) than without STI (24.4%), though neither difference was statistically significant. However, cloth use was significantly associated with CSTI-III (aOR=1.59) and CST-IV (aOR=1.72). While this may seem contradictory, this could reflect underreporting of cloth use, which could have attenuating effects on the measure of association with BV and STI, both having smaller sample size than CST-III and CST-IV. Of note, vaginal discharge was more commonly reported by girls using cloth (28.7%) than those without (21.1%), though not statistically significant (p=0.10; data not shown).
The prevalence of BV and STIs was high, with 9.9% of girls having STIs and 11.2% having BV. While BV is considered a sexually enhanced condition (Verstraelen et al., 2010), there are non-sexual risk factors including intravaginal and vaginal hygiene practices (Low et al., 2011), cigarette smoking (Nelson et al., 2018), and male sexual partner's circumcision status (Liu et al., 2015). Of girls who reported ever having had sexual activity, 37% reported not knowing the male partner's circumcision status and just 3% reported the male partner as uncircumcised [it is estimated that 40% of men in Siaya County are uncircumcised (McKinnon et al., 2019)], precluding meaningful analysis of this variable. Only one girl reported smoking cigarettes. It is a limitation that we did not ask about  intravaginal practices or application of substances to the vagina as it was felt by the local study team to be too invasive and that girls would not answer due to perceived stigma. Among girls who reported they had ever had sexual activity, we did not find factors that differentiated girls with BV or STI, though this analysis was biased by underreporting of sexual activity, as evidenced by 39% of girls with STI reporting never having been sexually active. Antibiotic use within the past 30 days was common (20%), and we did not find an association between recent antibiotic use and BV, STI, or CST. This may be due to misclassification (e.g., taking anti-malarial and reporting it as antibiotic use), underreporting of antibiotics, use of antibiotics class, dose, or duration that was not strongly influential to the VMB, or because the sample represented a mixture of antibiotic classes and indications, and therefore too much noise to detect a signal.
The VMB composition differed substantially by BV and/or STI status, as demonstrated by global community comparison (ANOSIM), distribution of CSTs, and distribution of specific taxa. These differences were in keeping with previous literature. Of note, G. vaginalis was not identified by ElasticNet implemented within stability selection as one of the specific taxa discriminating between BV and STI states, though it is considered a key taxa in BV pathogenesis (Schwebke et al., 2014) and was one of the top taxa by contribution to Bray Curtis dissimilarity analysis. Differences in results by machine learning and ecological approaches highlight the importance of using different analytic approaches to maximize information gain and robustness.

Limitations
There was substantial underreporting of sexual activity, as 39% of STIs occurred among girls who reported never having had sexual activity (willing or forced). Having a small number of girls infected with each STI, we analyzed STI as a composite of CT, NG, and TV; while data comparing the VMB composition by each pathogen are limited, the specific taxa associated with each pathogen may differ (Masha et al., 2019). Nevertheless, despite high co-infection of BV and STIs, we demonstrate that taxa associated with STIs differ from those associated with BV and longitudinal analyses will provide insight on the temporal occurrence of BV and/or STIs, and VMB composition and taxa in relation to specific STI pathogens. HIV prevalence at baseline was 1.4%, and while this is high given the young median age of girls, the number is small and we cannot relate HIV status to VMB in this analysis. Our results may not be generalizable to girls who are not in school. In this cross-sectional analysis of baseline data, we cannot be certain that exposures preceded outcomes.

CONCLUSIONS
Nearly half of adolescent girls had a L. crispatus dominant VMB, differing substantially from studies of young adult and adult women in Kenya and other parts of sub-Saharan Africa. This indicates that non-optimal VMB may be an acquired state for many women and girls, and interventions to maintain or reconstitute L. crispatus dominance should be considered, with adolescence being a potentially critical point. Menstrual cups may be a potential intervention for preventing non-optimal vaginal microbiome composition associated with non-hygienic menstrual management.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi. nlm.nih.gov/bioproject/PRJNA540529.

ETHICS STATEMENT
This study was approved by the institutional review boards of the Kenya Medical Research Institutes (KEMRI, SERU #3215), Liverpool School of Tropical Medicine (LSTM, #15-005), and University of Illinois at Chicago (UIC, #2017-1301). Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.

AUTHOR CONTRIBUTIONS
SM: Obtained funding, study conceptualization and design, statistical analysis inference, data visualization, drafted manuscript. GZ: Study oversight and management to ensure integrity to protocols and integration of CaCHe within CCG trial, data management and cleaning, critical review and revision of manuscript. FO: Study oversight and management to ensure integrity to protocols of CaCHe, critical review and revision of manuscript. EN: Study oversight and management to ensure integrity to protocols of CCG trial, critical review and revision of manuscript. WA: Development, implementation, and oversight of laboratory protocols in Kenya, acquisition of data, microbiologic analyses and interpretation, critical review and revision of manuscript. RB: Design and execution of statistical analysis approaches, critical review and revision of manuscript. SG: Development and oversight of protocols for amplicon sequencing, microbiologic analyses and interpretation, critical review and revision of manuscript. AE: Data management and cleaning, critical review and revision of manuscript. DK: Study oversight and management to ensure integrity to protocols of CCG trial and regulatory integration of CaCHe, critical review and revision of manuscript. PP-H: Obtained funding, study oversight and management to ensure integrity to protocols, critical review and revision of manuscript. All authors contributed to the article and approved the submitted version.