16S rDNA Full-Length Assembly Sequencing Technology Analysis of Intestinal Microbiome in Polycystic Ovary Syndrome

Objective To study the characteristics and relationship of the gut microbiota in patients with polycystic ovary syndrome (PCOS). Method We recruited 45 patients with PCOS and 37 healthy women from the Reproductive Department of Shengjing Hospital. We recorded their clinical indexes, and sequenced their fecal samples by 16S rDNA full-length assembly sequencing technology (16S-FAST). Result We found decreased α diversity and different abundances of a series of microbial species in patients with PCOS compared to healthy controls. We found LH and AMH were significantly increased in PCOS with Prevotella enterotype when compared to control women with Prevotella enterotype, while glucose and lipid metabolism level remained no significant difference, and situations were opposite in PCOS and control women with Bacteroides enterotype. Ruminococcus gnavus, Prevotella stercorea, Dialister succinatiphilus and Bacteroides fragilis were more abundant while Christensenellaceae spp. were less abundant in the PCOS group. P. stercorea was significantly more prevalent in PCOS-not insulin resistance (NIR) compared to control-NIR and PCOS-not overweight (NOW) patient groups compared to control-NOW groups. Kyoto Encyclopedia Genes and Genomes reflecting pathways related to lipopolysaccharide biosynthesis were more abundant in the PCOS group. Conclusion Our study found gut microbiota that had different abundance in patients with PCOS compared to healthy controls. An intimate relationship was shown between the gut microbiota and pathological changes in PCOS. We suggest the gut microbiota should be taken into consideration in the treatment of symptoms of PCOS via drugs and diet.


INTRODUCTION
Polycystic ovary syndrome (PCOS) is a reproductive endocrine disorder affecting 5%-10% of women of reproductive age (Aversa et al., 2020). The pathophysiological mechanism of PCOS is currently unclear. It is often characterized by polycystic ovarian changes, oligomenorrhea, and elevated androgen levels. Some patients can have metabolic syndrome with various clinical manifestations accompanied by obesity, type 2 diabetes, hyperlipidemia, insulin resistance hypertension, and high-risk cardiac factors for cerebrovascular disease. Some patients also have an increased risk of infertility, miscarriage, complications during pregnancy, and endometrial cancer (Escobar-Morreale, 2018).
Ten trillion bacteria exist within the human intestine, with the total number of bacterial genes about 150 times more than the number of human genes (Kamdar et al., 2013). The intimate exchange of information between intestinal microbes and the host plays an important role in the regulation of metabolism, immunity, and the nerve system (Rangan and Hang, 2017). The composition of the microbiome is influenced by age, ethnicity, diet, and medications, which, in turn, affects a host's metabolism and immune response (McPhee and Schertzer, 2015). Emerging evidence shows many metabolic diseases, such as type 2 diabetes, nonalcoholic fatty liver, insulin resistance, and cardiovascular diseases (Qin et al., 2012;Qin et al., 2014;Jie et al., 2017;Woodhouse et al., 2018;Pierantonelli and Svegliati-Baroni, 2019), are related to changes in the gut microbiota spectrum.
Recently, many studies have used 16S ribosomal (r)RNA to compare the intestinal microbiome of patients with PCOS and healthy people (Supplementary Table 1). Alpha diversity mainly focuses on the number of species in a local uniform habitat, and, therefore, is also called habitat-within-habitat diversity. Patients with PCOS have a decreased a diversity compared to healthy people (Lindheim et al., 2017;Insenser et al., 2018;Torres et al., 2018), while some studies showed no difference (Qi et al., 2019;Zeng et al., 2019). Beta diversity refers to the difference in species composition between different habitat communities along the environmental gradient, which is also called between-habitat diversity. Studies have shown that patients with PCOS had a different b diversity from healthy people (Lindheim et al., 2017;Insenser et al., 2018;Torres et al., 2018;Qi et al., 2019;Zeng et al., 2019). Bacteroides is a major genus in intestinal flora. Several species of Bacteroides are increased or decreased in number in patients with PCOS compared to healthy people in studies by Terros et al., Zeng et al., Qi et al., Liu et al., Zhang et al., and Chu et al (Liu et al., 2017;Torres et al., 2018;Qi et al., 2019;Zeng et al., 2019;Zhang et al., 2019;Chu et al., 2020). Co-abundance groups (CAG) refers to a cluster of bacterial species based on their relative abundance. A study by Liu et al. showed that some CAGs that are altered in PCOS correlated significantly with inflammation, hyperinsulinemia, hyperandrogenism, and obesity (Liu et al., 2017). Insulin resistance and inflammation had a positive correlation with the abundance of bacteria belonging to the Bacteroidaceae family, which is increased in PCOS, while the testosterone level had a negative correlation with the abundance of Prevotellaceae, a genus found decreased in PCOS in the study of Zeng et al. (2019). Short-chain fatty acids (SCFAs) are substances released by bacteria in the human body when digesting fiber in the intestine. Two good metagenomic species (MGS) that were decreased in patients with PCOS, Faecalibacterium prausnitzii and Bifidobacterium spp., showed a positive correlation with SCFAs . What is more, when Qi et al. transplanted the intestinal bacteria of patients with PCOS into mice, the mice developed a PCOS-like phenotype with a decrease in the gut bacteria bile acids, glycodeoxycholic acid (GDCA), and tauroursodeoxycholic acid (TUDCA), and in the intestinal immune factor, interleukin (IL)-22. After giving the PCOS-like mice GDCA or interleukin (IL)-22 treatment, hormone abnormalities, estrous cycle disorders, polycystic ovaries, decreased fertility, and insulin resistance were significantly improved (Qi et al., 2019).
The term "16S" refers to 16S rDNA (or 16S rRNA). The 16S rDNA is a component of the small prokaryotic ribosomal subunit. The gene that encodes 16S rDNA has a length of about 1542 bp, including nine variable regions and 10 conserved regions. The sequence of conserved regions reflects an inter-species genetic relationship, while the variable region sequence can reflect differences between species. Traditional 16S rDNA segment sequencing technology only detects the several 16S hypervariable regions of bacteria, with the detection level only reaching the genus level. 16S rDNA full-length assembly sequencing technology (16S-FAST) can sequence full-length 16S, which provides more information than some segments, and can perform further classifications to reach a species level (Karst et al., 2018). Therefore, we used 16S-FAST to analyze differences in the intestinal microbiome of patients with polycystic ovary syndrome and healthy women in northeast China for the first time. On this basis, a correlation analysis of microbiome abundance and clinical indicators as well as enterotype was carried out. After analyzing the gene function composition of a sequenced microbial genome, we used the species composition obtained by 16S-FAST to infer the composition of functional genes in a given sample, and analyzed functional differences in the bacterial population. Our work advances the elucidation of metabolic abnormalities related to symptoms of PCOS, and will contribute to the notion of having to adjust the makeup of the intestinal microbiome to improve the metabolism of patients with this disease.

Participants
Study participants were made up of a female population aged 18-40 (age was calculated according to the difference between screening survey and birth dates) who visited the outpatient clinic of the Reproductive Department of Shengjing Hospital, China Medical University. Patients were diagnosed with PCOS according to 2003 Rotterdam criteria (Group, 2004). The inclusion criteria for participants in the control group were regular menstrual cycles, normal ovarian morphology, and normal hormone levels. Exclusion criteria were: endocrine disorders such as diabetes mellitus, impaired glucose tolerance, hyperprolactinemia, Cushing syndrome, 21-hydroxylase deficiency, thyroid disease, androgen-secreting tumors, congenital adrenal hyperplasia, and hyperprolactinemia or other causes of hyperandrogenemia, or ovulation dysfunction; on antibiotics and hormone medications within 6 months; smokers or alcoholics; other diseases or on medication within 6 months that was known to influence the composition of the intestinal flora, and other reasons that were considered not suitable for this study. Women who were pregnant or who had been breastfeeding in the past year were also excluded. In total, 82 women returned a fecal sample, in which 37 were non-POCS controls and 45 were PCOS patients. Twenty-five women in the control group were not overweight (NOW) (body mass index [BMI] 18.5-23.9 kg/m 2 ), and 12 were overweight or obese (OW) (BMI ≥24 kg/m 2 ). Women with PCOS consisted of 14 NOW and 31 OW. Seventy-one women had their venous blood sample tested in routine clinical testing, of which 29 were controls and 42 were patients with PCOS. Twenty-two women did not have, and seven had, insulin resistance (IR) in the control group. Fifteen had PCOS without IR and 27 had PCOS with IR. This study protocol was approved by the Ethics Committee of Shengjing Hospital affiliated to China Medical University.

Sampling
Fecal samples were collected in the morning during each nonmenstrual period. Venous blood samples were collected in the morning after overnight fasting (≥8 h) on the 2nd to 4th day of a spontaneous menstrual cycle, or after progestin-withdrawal bleeding. Height, weight, waist and hip circumferences were recorded on the day of fecal sampling.

16s Full-Length Library Construction Technology
Patients were instructed to collect fecal samples into fecal DNA storage tubes (CW2654, CwBiotech, Beijing, China) and then the samples were stored and sent to laboratory under room temperature. Bacterial DNA was extracted using an intestinal DNA extraction kit (Qiagen Fecal DNA Extraction Kit, Qiagen, Hilden, Germany). Quantitative and qualitative analysis, and quality control of the extracted DNA was subsequently performed. For samples that passed quality control, 10 ng DNA was used to construct the following full-length 16S library. Full-length PCR amplification system 1 was configured, and 10 ng of DNA template was added to perform the first round of three cycles of full-length amplification. The amplified product was purified by magnetic beads.
Full-length PCR amplification system 2 was nest configured, and the second round of a 32-cycle full-length amplification was performed on the product of the first round of magnetic bead purification. The full-length amplified product was then purified by magnetic beads, and the purified product was analyzed quantitatively and qualitatively.
We then constructed a splicing library. Ten nanograms of full-length amplification was taken for transposase digestion. A spliced library PCR amplification system 1 was configured. Ten microlitres of the digested product was used for PCR amplification. A DNA fragment was selected from PCRamplified products. A spliced library PCR amplification system 2 was then configured, and 2 mL of the product, after magnetic bead screening and purification, was used for the second round of a spliced library PCR amplification. The amplified product was also purified by magnetic beads. The product of this round was the spliced library.
We also constructed a link library. A ligation library PCR amplification system 1 was constructed. Three nanograms was used for the first round of eight cycles of PCR amplification. The amplified product was purified by magnetic beads. Ligation library PCR amplification system 2 was configured, and 2 mL of the product of the first round of library construction and purification was used to perform the second round of eight cycles of PCR amplification. The PCR amplification product was purified by magnetic beads. The product of this round was known as the ligation library. Electrophoresis and the mesurements of Qubit concentrations were performed on spliced and ligated libraries. A library that passed quality control was sent for sequencing.

Bioinformatics Analysis
Unique Molecular Identifier (UMI) pairing relationships were extracted through the ligation library. All sequences corresponding to each paired UMI from the spliced library were extracted. For each paired UMI sequence, Cutadapt V1.2.1 was used to excise primers and UMI tags, and every UMI was assembled to a full-length 16S sequence through default parameters using software SPAdes V3.13.1. Mothur V1.42.0 and SILVA_132_SSURef_Nr99 databases were used for all the above sequences to perform species annotations with default parameters. The a bacterial diversity of the gut microbiota community was estimated by qiime1 V1.8.0. The difference between groups was calculated by a Mann-Whitney U test using a scipy 1.3.1 package in python3.6. The b diversity was analyzed by an R3.6.1 package vegan2.5-3 analysis. The P value from partial least squares-discriminate analysis (PLS-DA) was calculated by matching the Adonis method. The Linear discriminant analysis Effect Size (LEfSe) was conducted by LEfSe version 1.0, parameter setting at 2, to discover gene or functional characteristics that could best explain differences between the groups. Cytoscape_v3.8.0 was used to construct a network diagram, and Spearman in SparCC was used to calculate correlation and P values. For the analysis of genus level and above, results with a correlation greater than 0.5 and P value ≤0.05 were retained; for species level, results with a correlation greater than 0.6 and P value ≤0.05 were retained. Mann-Whitney U and Kruskal-Wallis sum-rank tests of nonparametric coefficients were used to detect significant differences in clinical indexes between groups. The correlation between bacterial species and metabolic indicators was conducted by R3.6.1 package psych v1.8.4, using Spearman's method. Heatmaps were drawn by an R package heatmap 1.0.12. PICRUSt2 was used to infer the composition of functional genes in the sample, and analyze the functional differences between different samples and groups. Python3.6 package sklearn0.23.2 was used to obtain the top 10 contribution species through recursive feature elimination, and a random forest was constructed to classify and predict samples based on flora or clinical indicators. Enterotypes were determined as follows: clustering was performed using the Center Point Partition Algorithm according to the Jensen-Shannon distance between samples and the optimal number of classifications was determined by the Calinski-Harabasz (CH) index. The cluster group with the highest CH index was considered the optimal number of groups.

Clinical Parameters Characteristic of Participants
According to the following basic information, which included weight, BMI, waistline, hipline and WHR were significantly higher in the PCOS group (P < 0.05). As for biochemical indices, TP, ALB, AST, ALT, GGT, ALP, PALB, CHR, MAO, CHOL, TG, apoB, and sd-LDL were higher, while HDL-C and DBIL were lower in the PCOS group (P < 0.05). PRL, SHBG, and FSH were lower, while FAI, TT, LH, FT3, FINS, AMH, and IR were higher in the PCOS group (Supplementary Table 2). A comparison of indexes between PCOS-NIR and control-NIR, PCOS-IR and control-IR, PCOS-NOW and control-NOW, and PCOS-OW and control-OW are also shown in Supplementary

Differences in Bacterial Diversity of Gut Microbiota
Samples containing a number of contigs of more than 5000 were filtered after sequencing full length 16S rRNA. All samples met the standard. A rarefaction curve showed the abundance of species in samples with different amounts of sequencing data (Supplementary Figure 1). When the curve tends to be flat, it indicates that the amount of sequencing data is reasonable. We assessed a diversity by a Shannon index based on amplicon sequence variant (ASV). We observed a higher Shannon index (P = 0.11) in the control group, which demonstrated a decrease in a diversity in PCOS women when compared to the control group though no statistical significance was reached. Beta diversity based on ASV was assessed by PLS-DA. The outcome could not separate PCOS and control groups (P = 0.751; Figure 1).
When patients with PCOS were further classified according to HOMA-IR, the Shannon index showed a significant difference between PCOS-IR and control-NIR groups. When PCOS and control groups were both classified according to BMI, the Shannon index showed a significant difference between PCOS-OW and control-NOW groups, as well as control-OW and control-NOW groups. As for b diversity, no significant difference was observed between groups.

Difference in Species of Bacteria Between PCOS and Control Groups
The relative abundance of the top 10 taxa at a species level ( Figure 2) showed that Bacteroides vulgatus and Prevotella copri were the two major species. The LEfSe was then used to further investigate the microbiota in order to explain the difference between PCOS and control groups. Ruminococcus gnavus, Prevotella stercorea, Dialister succinatiphilus, B a c t e r o i d e s f r a g i l i s , R o s e b u r i a s p p . 1 1 S E 3 8 , and Lachnospiraceae bacterium 2_1_58FAA at a species level was more abundant in the PCOS compared to control group with a LDA score over 2. While Christensenellaceae, Barnesiellaceae, and Pasteurellaceae at a family level, Fusicatenibacter, Barnesiella, and Haemophilus at a genus level, uncultured Lachnospiraceae bacteria, uncultured Christensenellacceae bacteria, Fusicatenibacter saccharivorans, unidentified rumen bacterium 12-110, Barnesiella intestinihominis, uncultured Ruminococcaceae bacteria, uncultured bacterium adhufec108, Oscillibacter sp. ER4, Blautia sp. Marseille-P3387, Haemophilus parainfluenzae, uncultured Blautia sp., Alistipes_obesi, Alistipes_unclassified and uncultured bacterium adhufec236 at a species level were more abundant in the control compared to PCOS group. Moreover, P. stercorea showed a greater abundance in the PCOS-NIR compared to control-NIR group, and in the PCOS-NOW when compared to control-NOW group, respectively (Figure 3). The distribution of the different microbiota species in the samples is shown in Figure 4.

Correlation Between Clinical Indexes and Species
In species that were significantly abundant in patients with PCOS, R. gnavus was positively correlated with FINS, HOMA-IR, weight, BMI, TP, and TG. P. stercorea showed a mild positive correlation with LH and AMH levels. While among species abundant in the control group, F. saccharivorans, uncultured Christensenellacceae bacteria, and B. intestinihominis correlated positively with FINS, HOMA-IR, weight, and BMI, but had a negative correlation with SHBG ( Figure 5).

Prediction of Alterations in Metabolic Pathways
After analyzing the composition of gene functions in the sequenced microbial genome, we inferred functional genes in the sample through the species composition obtained by 16S-FAST so as to analyze functional differences between different samples and groups. Functional differences on KEGG and pathway between PCOS and control groups are shown in Figure 6 and degradation, superpathway of GDP-mannose-derived O-antigen building blocks biosynthesis, queuosine biosynthesis, polyisoprenoid biosynthesis (E. coli), preQ0 biosynthesis, GDPmannose biosynthesis and superpathway of pyrimidine deoxyribonucleoside salvage were significantly abundant in PCOS, while peptidoglycan maturation (meso-diaminopimelate containing), L-glutamate and L-glutamine biosynthesis, sucrose degradation III (sucrose invertase), L-arginine biosynthesis II (acetyl cycle), purine ribonucleosides degradation, superpathway of purine deoxyribonucleosides degradation, phosphatidylglycerol biosynthesis II (non-plastidic), phosphatidylglycerol biosynthesis I (plastidic), L-arginine biosynthesis I (via L-ornithine), L-arginine biosynthesis IV (archaebacteria), superpathway of pyrimidine deoxyribonucleosides degradation, galactose degradation I (Leloir pathway), glycogen degradation I (bacterial), superpathway of &beta;-D-glucuronide and D-glucuronate degradation, superpathway of N-acetylglucosamine, Nacetylmannosamine and N-acetylneuraminate degradation, mixed acid fermentation, superpathway of hexuronide and hexuronate degradation and myo-, chiro-and scillo-inositol degradation were more abundant in control group with significant difference.

Network of Microbiota in Different Groups
A network of bacterial abundance was conducted at a species level. Bacteria that had correlated with and could promote the growth of each other are clearly shown in the network, respectively (R 2 > 0.6, P ≤ 0.05, Figure 7).

Random Forest Model Constructed to Distinguish PCOS and Control Women
A receiver operating characteristic (ROC) curve was drawn to find a model that could distinguish women with PCOS from healthy control women. The area under the curve was 0.63 when a random forest model was constructed with all species; it increased to 0.87 when the top 10 contributing species were used (Figure 8). The top 10 contributing species are listed in Table 2.

DISCUSSION
We first studied the gut microbial community of PCOS patients by sequencing full-length 16S rDNA of the microbiota. In our study, participants were recruited and divided into PCOS and control groups; they were further divided into subgroups according to IR index, BMI and enterotype respectively. We analyzed clinical indexes. We noticed a trend that patients in Table 2), though no significant statistical difference was observed. This indicated more glucose and lipid metabolism disorders may have been presented in PCOS-IR and PCOS-OW groups. LEfSe analysis of pathways ( Figure 6) also showed fatty acid elongation and lipid biosynthesis were abundant in PCOS patients. In comparison, a trend of increased hypothalamic-pituitary-ovarian axis (HPOA) functional disorders in PCOS-NIR and PCOS-NOW groups was observed since they had higher LH and AMH levels than PCOS-IR and PCOS-OW groups. However, due to the restriction in sample numbers, more participants are needed for a more definitive conclusion. The reason for this observation may be a reduced negative feedback on the pituitary of E 2 since we noticed that E 2 levels of patients in PCOS-NIR and PCOS-NOW groups were lower than those of patients in control-NIR and control-NOW groups, and they showed higher level of LH than PCOS-IR and PCOS-OW, in which we hypothesized that their pituitary had a higher sensitivity to GNRH, or got less negative feedback on their pituitary due to low level of E 2 . A significant difference between PCOS and control groups was not observed with regard to a diversity. Significant differences were observed when participants were sub-grouped by BMI and IR (Figure 1), indicating metabolic dysfunction added to the decrease in bacterium diversity. According to LEfSe analysis, R. gnavus, Roseburia sp. 11SE38, D. succinatiphilus, and L. bacterium 2_1_58FAA, all from the Firmicutes phylum, were increased in patients with PCOS. The increased abundance of Firmicutes is associated with a high BMI (Husen et al., 2008); it was shown to be the most abundant phylum in obese women (Chavez-Carbajal et al., 2019), and was believed to participate in the occurrence of obesity. In our study, participants in PCOS group had a significantly higher BMI than non-PCOS women. Lachnospiraceae spp. and Roseburia spp. were found to be more abundant in women with obesity and metabolic syndrome (Chavez-Carbajal et al., 2019). R. gnavus utilizes glycans from the intestinal mucus layer as energy sources; it was significantly more abundant in patients with inflammatory bowel disease (IBD) when compared to a control group, and found to grow when IBD activity increased, which was explained as it had gene  families involved in protection against the oxidative stress response in IBD gut (Hall et al., 2017). In the former study, women with PCOS, whether fat or lean, showed an increased reactive oxygen species level, which positively correlated with androgen secretion stimulated by human chorionic gonadotrophin (HCG) (Gonzaĺez et al., 2019). The reason that R. gnavus was abundant in women with PCOS may be because its clade led to a tolerance of the oxidative stress environment found in the gut, as well as several genes that could help in the utilization of adhesion and mucus (Hall et al., 2017). R. gnavus was also observed to be associated with respiratory allergies and their increase before the onset of allergy symptoms (Gupta et al., 2015). Several Roseburia and Lachnospiraceae species are butyrate-producing colon bacteria, and play an important role in colonic health such as in improving the intestinal barrier. Roseburia sp. 11SE38 is from unclassified Roseburia, and L. bacterium 2_1_58FAA is from unclassified Lachnospiraceae, both exist mostly in patients with PCOS according to a heatmap (Figure 4), meaning it may have different effects from other "good" species in Roseburia. In a study by Raman et al. of non-alcoholic fatty liver disease, the amount of Lachnospiraceae and Roseburia was increased in such patients, which indicated that these bacteria may be associated with metabolic dysfunction in the body (Raman et al., 2013). P. stercorea was in greater abundance in the PCOS group when compared to the control group (Figures 3 and 4). Moreover, it was increased in PCOS-NIR and PCOS-NOW groups when compared to control-NIR and control-NOW groups, respectively, suggesting P. stercorea may have a role in the pathological changes of PCOS in people with less metabolic disorder. Silvia et al. also showed that a hypocaloric diet led to an increase in the abundance of P. stercorea in obese and overweight patients (Pisanu et al., 2020). P. copri and Prevotellaceae spp. promoted the growth of each other in PCOS-NOW as well as PCOS-NIR groups (Figure 7). P. copri were more abundant in a rheumatoid arthritis than osteoarthritis group (Lee et al., 2019). In research by Dillon et al., the prevalence of mucosal P. copri and P. stercorea had a positive correlation with the expression of CD40 on colonic myeloid dendritic cells (mDCs). CD40 is positively associated with a mucosal HIV-1 viral load, and mucosal inflammatory cytokines, such as IL-23, IL-1b, IL-6, and tumor necrosis factor-a as well as with IL-10 levels. This indicates that Prevotella spp. may contribute to mucosal inflammation and the activation of immune disease (Dillon et al., 2016). The genus Prevotella shows an increased prevalence when more fiber is added to a diet (Kovatcheva-Datchary et al., 2015). People with Prevotella-dominated enterotype favored sugar, especially monosaccharides. Lipolytic and proteolytic fermentation were detected at lower levels in Prevotella enterotypes (Wu et al., 2011), indicating that diet may be the reason that such people weighed less than patients in PCOS-OW and PCOS-IR groups. John et al. (Lukens et al., 2014) showed that lean-fat diet (LFD) mice developed an outgrowth of Prevotella genera when compared to high-fat diet (HDF) mice; the osteomyelitis-related inflammatory factor, IL-1b, was increased in LDF mice. In the former study, the Prevotellaceae family was found to be decreased in patients with PCOS, especially in the PCOS-IR group, in a study by Zeng et al., (2019). In their study, people in the healthy control group had a higher relative abundance of Prevotellaceae than Bacteroidaceae, in contrast to the PCOS group. However, our data showed that the Bacteroides genus from the Bacteroidaceae family dominated in both control and PCOS groups. Twenty-six women in the control group and 31 women in the PCOS group had a Bacteroides-dominated enterotype, while only 11 women in the control group and 14 women in the PCOS group had a Prevotella-dominated enterotype. A reason for this difference may be differences in diets from northeast China (our research) and southwest China (Zeng's research). Enterotypes are influenced by long-term dietary patterns. Eating protein-rich food, the bacteria in gut would tend to be dominated by Grampositive bacteria while carbohydrate-rich food might lead to a Gram-negative-dominated gut environment (Wu et al., 2011). In our study, PCOS with Prevotella-dominated enterotype had a lower level of E 2 than control with Prevotella-dominated enterotype, and they had a higher level of LH than PCOS with Bacteroides-dominated enterotype. Meanwhile we find both two enterotype of PCOS patients had higher level of BMI and FINS, HOMA-IR than control groups, and especially PCOS with Bacteroides-dominated enterotype showed significant changes of lipid metabolism status ( Table 1). The enterotype may not intervene the mechanism that causes the pathogenesis, but it might be the external manifestation of different types of PCOS with different pathogenesis. We hypothesized that PCOS women with a Bacteroides enterotype in southwest China may lack fiber in their diet, thus increasing any dysfunction of lipid modulation and appearances of metabolic disorders. Women in northeast China showed an abnormally high level of P. stercorea from the Prevotella genus compared to healthy people; this may lead to or accelerate the pathological development of PCOS in patients, and, at the same time, retaining a normal lipid metabolism. P. stercorea is a Gram-negative bacterium. LEfSe difference analysis showed that CMP-3-deoxy-D-manno-octulosonate biosynthesis I, Kdo transfer to lipid IV A III (Chlamydia), and super-pathway of GDP-mannose derived O antigen building block biosynthesis, which participate in the biosynthesis of a component of bacterial, lipopolysaccharide (LPS), were enriched in PCOS women ( Figure 6 and Supplementary Figure 2). Increased LPS production from Gram-negative gut bacteria contributes to metabolic aberrations when mucosal barriers weaken. The gut leak makes it convenient for LPS to enter into the circulation, which promotes the development of metabolic endotoxemia, and accelerates obesity, insulin resistance, and other metabolic disturbances (d 'Hennezel et al., 2017). More study will be needed in the future regarding the role of P. stercorea in the pathogenesis of lean PCOS, and the relationship of diet in different areas of China with enterotype. This may allow the correction of inflammatory disorders in PCOS patients by adjusting their diets.

PCOS-IR and PCOS-OW groups showed more disturbances in biochemical indexes, FINS, FPG, HOMA-IR, and TT when c o m p a r e d t o P C O S -N I R a n d P C O S -N O W g r o u p s (Supplementary
Our data appeared to be similar to those of articles reported by Liu (Liu et al., 2017), Torres (Torres et al., 2018), Zeng (Zeng et al., 2019), Zhang , Qi (Qi et al., 2019), and Chu (Chu et al., 2020) in that bacteria from the Bacteroides family showed an increased abundance in patients with PCOS. Here we identified several specific species from Bacteroides that showed a difference in abundance between groups. B. fragilis was significantly increased in PCOS patients with an LDA of score more than 3. B. fragilis is a common anaerobe in extraintestinal infections (Valguarnera and Wardenburg, 2020), and enterotoxigenic B.fragilis may contribute to systemic inflammation . Bacteroides   xylanisolvens showed a greater abundance in PCOS-IR when compared to control-IR. It was found to be specialized in the degradation of xylans with low complexity (Despres et al., 2016). Christensenellaceae spp. were shown to have increased relative abundance in the control group. The relative abundance of the Christensenellaceae family has a negative relationship with BMI, LDL, and apolipoprotein B as well as features of metabolic syndrome such as obesity and hypertriglyceridemia (Waters and Ley, 2019;Li et al., 2020), and was positively associated with HDL (Fu et al., 2015;Hibberd et al., 2019). It is reported to be higher in people who have a balanced omnivorous diet than in vegetarians (De Filippis et al., 2016). Our method of library construction is suitable for the detection of bacterial colony structure of all types of samples. The 16S of bacteria has a total length of about 1500bp and contains 9 variable regions. In the past, 16S sequencing only selected one or two variable regions, and the length was only 3/ 400bp. The detection of full length of all of the 9 variable regions upgraded the identification of flora structure to "species" level from the traditional way which usually resulted in "genus" level, determining the ecological structure of environmental microorganisms more accurately, which is convenient for indepth research. Therefore, the samples can be tested for specific bacterial species. Moreover, compared with Pacbio and other third-generation sequencing technologies, the reads sequence obtained by Illumina sequencing technology is of higher quality and can obtain more accurate DNA sequence. Based on high-throughput sequencing technology, a large number of samples can be analyzed at once by adding different tag sequences to each sample. In this way, the abundance of the detected microbiota could have a good fidelity.

CONCLUSION
Our research used 16S-FAST technology for the first time to study the characteristics and differences in the gut microbiota of patients with PCOS and healthy controls. Several bacterial species with different abundances as well as differences in metabolic pathways were detected in the PCOS group. We found that levels of R. gnavus, Roseburia spp., and Lachnospiraceae spp. were higher in PCOS patients. We also found that the level of P. stercorea was significantly higher in PCOS-NIR and PCOS-NOW groups compared to control-NIR and control-NOW groups, and may be involved in the pathogenesis of lean PCOS patients in northeast China. PCOS with Prevotella enterotype showed similar clinical indexes level with PCOS-NIR and PCOS-NOW groups, and PCOS with Bacteroides enterotype showed similar clinical indexes level with PCOS-IR and PCOS-OW groups. The intestinal flora may be regarded a new treatment site that can be regulated by diet and drugs to interfere with the occurrence and development of, or to improve, PCOS.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: NCBI SRA, PRJNA694729.

ETHICS STATEMENT
This study protocol was approved by the Ethics Committee of Shengjing Hospital affiliated to China Medical University. The patients/participants provided their written informed consent to participate in this study. Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.

AUTHOR CONTRIBUTIONS
JJ and XW contributed to the conception of the study. KY performed the sequencing. ZW contributed significantly to data analyses. SD performed the data analyses and wrote the manuscript. SJ, GL, WZ, CL, and DL helped perform the analysis with constructive discussions. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the National Natural Science Foundation of China (No.81671423 and No.81402130).

ACKNOWLEDGMENTS
In writing this paper, my deepest gratitude goes first to my professor Xiuxia Wang for her constant encouragement and guidance. She has walked me through all the stages of the work and made many invaluable suggestions. Without her consistent and illuminating instruction, this thesis could not have reached its present form. Second, I would like to express my heartfelt gratitude to doctor Jiao Jiao, my supervisor who guided me thoughout my writing of this thesis. She carefully read the whole draft and offered painstaking and precious criticism. Her perceptive observation had proved immensely constructive. My parents took good care of my life and gave me a lot of encouragement when I was working at this thesis. They always share my share my weal and woe. I feel much grateful and heartily owe my achievement to them. Last but not least, none of this would have been possible without the help of those individuals and organizations mentioned.