- 1Department of Education, University of California, Santa Barbara, Santa Barbara, CA, United States
- 2Department of Research Methods and Information Science, University of Denver, Denver, CO, United States
The evolving skill demands of the data science workforce present unique challenges for individuals trained in the social science disciplines. This study examines the readiness of U.S. graduate programs in preparing social data scientists for the AI era. We collected and analyzed publicly available coursework plans (n = 97) from graduate programs at research universities in the U.S. that focus on training social data scientists. Required skills for data scientists were identified through a random sample of current job postings (n = 30) on LinkedIn and cross-validated with findings from the relevant literature. Using Python-based web scraping and text content analysis, we identified the 10 most in-demand skills within the data science industry and conducted a binary coding of whether each program offers coursework relevant to these skills. These 10 binary indicators were subsequently analyzed using Rasch modeling. The results indicate notable gaps between graduate curricula and industry expectations, and also highlight the need to reform graduate education to better prepare social data scientists for the new demands of the AI era.
Introduction
Consider this question: What skills are needed for a successful career in data science? Twenty years ago, responses would have emphasized data management, research methodology, and statistical modeling expertise. Today, while these skills remain fundamental, the list of skills in people’s responses may expand considerably to include machine learning, natural language processing, Python programming, and artificial intelligence (AI) competencies. Data science has flourished in the 21st century (Donoho, 2017; Schwab-McCoy et al., 2021). The integration of AI techniques is propelling this field to new heights, which has triggered substantial changes in various ways (Dong, 2025; Ho, 2024; Liu et al., 2024; Luan et al., 2020; Min et al., 2024). Concurrently, the skill sets required for data scientists are shifting, particularly for those trained in traditional social science disciplines (e.g., Educational Statistics, Quantitative Psychology, and Data Analytics for Social Sciences).
Data scientists represent a broad and somewhat heterogeneous population, given their diverse training in disciplines such as social science, medical science, and computer science (Donoho, 2017). Programs and curricula in computer science (or engineering schools) are more likely to keep up with rapid advancements, as they are the native residents who initiate the development of AI techniques and make early contributions to introducing AI to the data science field. In contrast, those from the social science family rarely have the first-mover advantage and might lag behind new shifts (Luan et al., 2020).
In effect, training in social science disciplines is still striving to catch up with the advanced but conventional skill sets demanded prior to the AI boom in the early 2020s. For example, Everson (2022) identifies substantial statistical skills gaps for professors within schools of education, and these gaps are evident in both advanced methods (e.g., propensity score matching, structural equation modeling, and item response theory) and software packages (e.g., R, SAS, and Stata) needed for training educational data scientists. The challenges derived from these gaps could be even more significant for those in minority-serving colleges and universities, where they tend to have less federal and financial support (Brown, 2013). The constantly evolving and dynamic nature of data science has been a major hurdle for faculty teaching up-to-date content (Schwab-McCoy et al., 2021). Now, AI is reshaping the landscape in data science, as well as the needs of the associated industrial labor market (Hijazi and Alfaki, 2020; Liu et al., 2024), which may create new challenges or magnify existing ones for training data scientists in the social science fields.
Current study
Program curricula are undoubtedly a fundamental component of data science education (Gundlach and Ward, 2021; Hardin et al., 2021; Nolan and Temple Lang, 2010; Schwab-McCoy et al., 2021). The present study aims to examine the curriculum readiness of graduate programs in preparing social data scientists for the AI era, that is, to investigate whether current graduate curricula in social science disciplines adequately cover the skills required for data scientists in today’s evolving landscape. The objective of this study is not to develop a new measure of curriculum readiness, which would typically require comprehensive psychometric validation (e.g., Boateng et al., 2018). Rather, we aimed to generate complementary and additional evidence from a measurement-analog model to address our research goal. Furthermore, identifying potential gaps in existing curricula is a crucial step for advancing data science education (Everson, 2022; Hardin et al., 2021), as well as for promoting effective curriculum reforms. Thus, this research also contributes to broader discussions on how to better align graduate training with the evolving demands placed on data scientists in the AI era.
Methods
The Methods section outlines the sources of curriculum data from graduate programs, data collection procedures (e.g., web scraping), and a description of the main analyses (i.e., text analysis and Rasch modeling) employed in the current study.
Data sources and collection procedures
To address the research purpose, we collected data from two sources: (1) web-based curricula data and (2) job posting suggested skill requirement data for data scientists in the AI era.
For the curricula data, we gathered all publicly available Coursework Plans (CWPs, n = 97) from U.S. universities’ graduate programs aimed at training social data scientists. These programs included Quantitative Psychology, Measurement and Quantitative Methods, Educational Statistics, and so forth. These social science programs were chosen for a shared goal of equipping graduates with the quantitative and methodological skills for a career in data science. Additionally, we targeted graduate-level programs at R1 and R2 universities (i.e., those with high or very high research activity). This is because research universities place greater emphasis on graduate-level education, whereas teaching universities typically focus more on undergraduate education. Although graduate training is not strictly required for being a data scientist, graduate programs often offer deeper training in areas such as programming, advanced statistics, or machine learning, which are highly valued in data science (Jiang and Chen, 2022).
Based on a recent version of the Carnegie Classification of Institutions of Higher Education (2024), we compiled a list of website URLs for the program CWPs through a manual search. Specifically, two researchers scrutinized each research university’s websites to locate qualifying programs and CWPs. After that, web scraping was performed using Python (Mitchell, 2018) to extract the content of coursework plans from each URL. The web scraping gathered coursework titles and descriptions, skill emphases, and training goals of programs from each website.
For the skill requirement data, we collected the essential or required skills for current data scientists by examining a sample of new job postings in 2024 on LinkedIn. The search terms for locating job postings on LinkedIn included “Data Scientist,” “AI or Artificial Intelligence” Research Scientist,” and “Social Science.” In filtering search outcomes, we selected multiple work experience options: “Entry,” “Senior,” and “Manager” to ensure positions required different levels of data science skill proficiency are represented. The locations of search jobs were restricted to the U.S, given the study population. We randomly sampled 30 data scientist job postings, which included industry leaders such as Lockheed Martin, Udemy, Gusto, Deloitte, Google, DoorDash, and UC Health. This sample encompassed organizations from various sectors that may hire data scientists with a social science background, including technology, healthcare, education, and consulting. For each job posting, skill sets (e.g., SQL, Python, and machine learning) were collected and coded based on the specific requirements (e.g., required skills or qualifications) outlined in the job descriptions.
Overview of analysis
The present study primarily utilized content analysis of text through Python (Sarkar, 2016), followed by a Rasch analysis of the extracted information and recoded indicators representing curriculum readiness for training social data scientists. First, we examined the frequency of keywords about skills from 30 online data science job postings on LinkedIn. These frequencies highlighted the skills currently demanded or preferred by employers in the U.S. data science industry. Subsequently, using the compiled list of 10 key skills, another text analysis was implemented to analyze the CWP content of the 97 graduate programs in social data science to identify gaps between industry requirements and graduate training. Each CWP was coded via Python following a dichotomous coding scheme to indicate whether each identified skill was reflected from the program’s coursework (1 = at least one course containing keywords matching the skill; 0 = the skill is not reflected in any coursework).
Next, we entered the indicators into a dichotomous Rasch model to quantify the curriculum readiness of each program and to examine the alignment between current social data science graduate curricula and industry demands. The current study does not aim to comprehensively develop or validate a measure via Rasch modeling; however, certain Rasch analysis results (e.g., Wright maps; Boone et al., 2014) can effectively reveal and visualize potential skill gaps between graduate training and industry needs. In this study, three Rasch analyses, including unidimensionality, item fit, and construct coverage (i.e., Wright map), were conducted using Winsteps 5.3.1 (Linacre, 2022). It is important to note that all CWPs were collected and analyzed by the summer of 2024, by which time all graduate programs were expected to have released their coursework for the most recent academic year (i.e., 2024–2025).
Results
Here, we first present descriptive findings that identify the key skills required for social data scientists in the AI age and assess how well these skills are covered in the analyzed social science graduate curricula. Rasch modeling results are then provided to further demonstrate the gaps between current industry needs and the skills taught in social science graduate programs for training data scientists.
Demanded skills of data scientists in the AI age
A total of ten key skills were identified from job postings: machine learning (including deep learning), Bayesian analysis, cloud computing, artificial intelligence, statistics, algorithms, programming, Python, SQL, and research. This list was also cross-validated with skills suggested in the literature for data science (e.g., Ismail and Abidin, 2016; Li et al., 2021) to ensure its coverage and representativeness. These skills predominantly include specialized technical and programming skills such as Python, deep learning, machine learning, and artificial intelligence, as well as general research and statistics skills valued in traditional social data scientist training. The results highlight the high demands of technical skills for social data scientists (Costa and Santos, 2017), as well as reflect the evolving nature of the data science field.
Gaps between industry needs and social science graduate curricula for data scientists
Table 1 summarizes the number and percentage of programs offering courses that cover each data science skill identified from the analyzed job posts, ranked from the lowest to the highest percentages. The majority of graduate programs offer courses covering content related to research (97.94%) and statistics (86.60%). However, beyond these traditional skills, less than 10% of the programs provide training in more advanced technical skills or tools (e.g., machine learning, algorithms, and cloud computing).
A total skill-coverage score was calculated for each program based on the 10 dichotomously coded skill variables (i.e., whether or not each of the 10 skills was covered by each program). The total score had a possible range from 0 to 10, with lower course-skill coverage scores indicating a more severe misalignment between program training and industrial demands. Among the 97 programs, the mean score was 2.13 out of 10 (SD = 0.78), with a median of 2, indicating that most programs’ current coursework covers only a limited number of skills (typically traditional research and statistics skills) required for data scientists in the AI era.
Psychometric evidence and implications from Rasch modeling
In addition to the descriptive statistics presented above, Rasch modeling was applied to quantify and investigate the curriculum readiness of each program for training social data scientists, using nine of the t10 dichotomous skillset indicators. The “cloud computing” indicator was excluded from the analysis because it showed no variance—no program involved in this study offered coursework covering this particular skill.
We first examined the dimensionality of the curriculum readiness items through a principal components analysis of residuals (PCAR). The Rasch dimension accounted for 86.1% of the variance in the observations, while the first contrast in the residuals (i.e., the largest secondary dimension) had a relatively small eigenvalue of 2.78 and explained only 4.3% of the variance. This indicates that the Rasch dimension explained the overwhelming majority of variance in the data, and including a secondary dimension would contribute minimally to explaining additional variance. Collectively, these results provide strong evidence of the unidimensionality of the measure (Linacre, 2025).
Table 2 summarizes the mean square (MNSQ) and standardized (ZSTD) item fit statistics. The “Research” indicator exhibited an outfit MNSQ of 9.9 and a ZSTD of 9.91, which substantially exceeds the typical acceptable fit range (e.g., MNSQ between 0.6 and 1.4; Wright and Linacre, 1994; ZSTD between −2 and 2; Bond and Fox, 2015). The misfit of the “Research” indicator suggests that it is not an appropriate item for representing curriculum readiness in the context of AI data science training. This is likely because nearly all collected graduate programs offer research-related coursework, making it ineffective at differentiating program-level readiness.
We further examined the alignment between graduate curricula and industry skill demands using a Wright map (see Figure 1). The curriculum readiness level of programs was placed on the left side of the continuum, while the difficulty levels of the nine skill indicators were plotted on the right. Within the current research context, a more difficult skill indicator generally indicates that programs are less likely to offer coursework covering that skill. From Figure 1, most skill indicators (all except “Statistics” and “Research”) cluster near the upper range of the continuum, between +1.5 and +4 logits. In contrast, programs are concentrated toward the lower end, around the −3.5 logit position. These results echo the preliminary descriptive findings and further confirm the existence of a substantial gap between the skills taught in U.S. social science graduate curricula and those demanded by data science industry jobs. Although the general misalignment was evident, several programs demonstrated better alignment. For example, the “Statistics/Machine Learning Joint PhD” program at Carnegie Mellon University (CMU) was positioned near +3 logits at the high end of the continuum. The program’s CWP covers six of the ten outlined skills and has shown the closest alignment between its curriculum and industry demands among the 97 programs analyzed. This finding suggests that a joint training model that integrates social science and computer science could be an innovative and effective strategy for preparing social data scientists in the AI era.
Discussion
In terms of the results above, we discuss the observed misalignment between academic preparation and industry demands in the social data science and elaborate on several possible pathways to narrow this gap. The limitations and future directions of the current work are also included in this section.
The growing misalignment
When we look back at the long-standing conversations around academic preparation versus industry demands (e.g., Everson, 2022; Stone et al., 2009; Trauth et al., 1993), gaps seem to be inevitable. However, the misalignment between social data scientist curricula and industrial demands observed in the current study appears to be substantial than usual. Graduate programs are expected to provide foundational learning opportunities to help students build expertise in data science. Unfortunately, the skill gap was dramatically large and appeared not to be feasibly bridged through work experience alone, as most programs do not offer emerging skills (e.g., machine learning, cloud computing, and programming) required by the data science labor market in the current AI era. This might result in employment challenges for graduates, as well as raise concerns about the value of current higher education.
Moreover, the actual gap could be larger than what the present work observed for two reasons. First, it is a stringent assumption to expect all graduates to master the skills by just taking a course that covers relevant content. Curricula are fundamental but insufficient for students to build expertise in data science, especially in the age of AI (Spanjaard et al., 2018). Additionally, the current study investigates the graduate curricula of R1 and R2 universities. When these generally better-resourced institutions in graduate education lag behind, others might struggle even more to provide the necessary training.
The arrival of AI has clearly played a role in enlarging this gap. AI gained more popularity with the prevalent use of Chat-GPT in 2022, but it has permeated data science research and practice (Dong, 2025) and impacted the industrial labor market (Liu et al., 2024) for decades. As Liu et al. highlighted, since 2010, there has been a growing emphasis on “hard” technical skills within AI-related data scientist roles. The current study echoes this finding, showing that technical skills are highly demanded. Therefore, addressing this gap, especially in the area of hard skills, is essential for graduates to land jobs in the data science industry.
To narrow the gap: curricula reform or training mode reform?
To narrow the gap, reforming graduate curricula to better align with industrial needs seems to be a necessary step. However, this task could be challenging if we only seek solutions within social science disciplines. College faculty are the main agents to deliver curricula. As Everson (2022) noted, even the current faculty in social science commonly struggle with catching up on programming skills and have high demands for related professional development. A shortcut could be hiring new faculty with developed AI expertise (or with computer science backgrounds), but the immediate hiring of new faculty does not seem to be a viable strategy for every university. The existing tenure track system makes faculty turnover slower than in industrial or corporate sectors, which means most programs could take years or decades to accomplish a faculty iteration. By that time, dynamic industrial demands would have shifted again.
Then, reforming the mode of program training could be a more promising path to narrow the identified gap (Maassen and Cloete, 2006). The direction of reform may integrate interdisciplinary and technical training by collaborating with other schools (e.g., schools of engineering) that already have experts and talents in AI-related skills. The CMU joint PhD program, which offers students in social science both traditional statistical coursework and technical machine learning content, could be a good example. More importantly, such a mode might be practically scalable because it primarily reorganizes and reconciles already existing resources within a university. In addition to formalizing a joint program, incorporating individualized elective or cognate courses from other disciplines into students’ CWPs can be an alternative but more flexible strategy. Notably, such an approach often requires more advising support to help students identify suitable courses for their skill development. Meanwhile, it is generally recommended to integrate practical or experiential learning opportunities into programs to further align academic preparation with industry demands (Kolb, 2014). Such changes may collectively improve graduates’ employability and readiness and ensure that data science programs in social science disciplines remain competitive in the AI era.
Limitations and future directions
Given the samples of graduate programs and job posts, research conclusions are limited to social science graduate programs and the social data science industry in the U.S. Future research may examine the generalizability of the study findings in a global setting where non-U.S. graduate programs and job markets are included in the analyses. In particular, it would be beneficial to increase the number of job posts analyzed. The current study used Python-based text analysis to efficiently analyze course webpages and job posts, which may limit the interpretive depth of available data. Certain research findings (e.g., the skill demands of industry and program training coverage) warrant cross-validation through further investigation, such as in-depth interviews with industry leaders or employers in the data science field regarding their specific skill demands for data scientist employees and their perspectives on how to better align graduate training in social science with industry needs. Additionally, the keyword matching approach applied in the current work may assess the course coverage of skills but could be less effective in understanding the depth of training for each skill. Some skills (e.g., AI literacy) cannot be adequately represented or captured by one or two keywords. Future research should consider developing more sophisticated coding schemes to capture related skills based on broader textual contexts.
Data availability statement
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://osf.io/vmwzq/.
Author contributions
YD: Project administration, Visualization, Formal analysis, Methodology, Resources, Investigation, Conceptualization, Writing – review & editing, Supervision, Software, Writing – original draft. DB: Investigation, Software, Writing – review & editing, Writing – original draft, Formal analysis, Validation, Data curation. KB: Software, Investigation, Writing – review & editing, Writing – original draft, Data curation, Methodology.
Funding
The author(s) declared that financial support was not received for this work and/or its publication.
Conflict of interest
The author(s) declared that this work was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Generative AI statement
The author(s) declared that Generative AI was used in the creation of this manuscript. Tools for copy-editing this paper (e.g., Grammarly) may utilize generative AI engines; however, no generative AI applications were used to produce any original content in this paper.
Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
Boateng, G. O., Neilands, T. B., Frongillo, E. A., Melgar-Quiñonez, H. R., and Young, S. L. (2018). Best practices for developing and validating scales for health, social, and Behavioral research: a primer. Front. Public Health 6:149. doi: 10.3389/fpubh.2018.00149,
Bond, T. G., and Fox, C. M. (2015). Applying the Rasch Model: Fundamental Measurement in the Human Sciences (3rd edition) : Psychology Press.
Boone, W. J., Staver, J. R., and Yale, M. S. (2014). “Wright maps: first steps” in Rasch Analysis in the Human Sciences. eds. W. J. Boone, J. R. Staver, and M. S. Yale, 111–136.
Brown, M. C. II. (2013). The declining significance of historically black colleges and universities: relevance, reputation, and reality in Obamamerica. J. Negro Educ., 82, 3–19. doi: 10.7709/jnegroeducation.82.1.0003
Carnegie Classification of Institutions of Higher Education. 2024. Carnegie classifications. The Carnegie Foundation for the Advancement of Teaching. Available online at: https://carnegieclassifications.acenet.edu/
Costa, C., and Santos, M. Y. (2017). The data scientist profile and its representativeness in the European e-competence framework and the skills framework for the information age. Int. J. Inf. Manag. 37, 726–734. doi: 10.1016/j.ijinfomgt.2017.07.010
Dong, Y. (2025). Pre-uniform measures in the artificial intelligence era. Curr. Psychol. 44, 7919–7933. doi: 10.1007/s12144-025-07374-1
Donoho, D. (2017). 50 years of data science. J. Comput. Graph. Stat. 26, 745–766. doi: 10.1080/10618600.2017.1384734
Everson, K. C. (2022). Statistical skills gaps of professors of education at U.S. universities and HBCUs. J. Stat. Data Sci. Educ. 30, 45–53. doi: 10.1080/26939169.2022.2034488
Gundlach, E., and Ward, M. D. (2021). The data mine: enabling data science across the curriculum. J. Stat. Data Sci. Educ. 29, S74–S82. doi: 10.1080/10691898.2020.1848484
Hardin, J., Horton, N. J., Nolan, D., and Lang, D. T. (2021). Computing in the statistics curricula: a 10-year retrospective. J. Stat. Data Sci. Educ. 29, S4–S6. doi: 10.1080/10691898.2020.1862609
Hijazi, R., and Alfaki, I. (2020). Reforming undergraduate statistics education in the Arab world in the era of information. J. Stat. Educ. 28, 75–88. doi: 10.1080/10691898.2019.1705943
Ho, A. D. (2024). Artificial intelligence and educational measurement: opportunities and threats. J. Educ. Behav. Stat. 49, 715–722. doi: 10.3102/10769986241248771
Ismail, N. A., and Abidin, W. Z. (2016). Data scientist skills. IOSR J. Mob. Comp. Appl. 3, 52–61. doi: 10.9790/0050-03045261
Jiang, H., and Chen, C. (2022). Data science skills and graduate certificates: a quantitative text analysis. J. Comput. Inf. Syst. 62, 463–479. doi: 10.1080/08874417.2020.1852628
Kolb, D. A. (2014). Experiential Learning: Experience as the Source of Learning and Development. Upper Saddle River, NJ: FT press.
Li, G., Yuan, C., Kamarthi, S., Moghaddam, M., and Jin, X. (2021). Data science skills and domain knowledge requirements in the manufacturing industry: a gap analysis. J. Manuf. Syst. 60, 692–706. doi: 10.1016/j.jmsy.2021.07.007
Liu, J., Chen, K., and Lyu, W. (2024). Embracing artificial intelligence in the labour market: the case of statistics. Humanit. Soc. Sci. Commun. 11:1112. doi: 10.1057/s41599-024-03557-6
Luan, H., Geczy, P., Lai, H., Gobert, J., Yang, S. J. H., Ogata, H., et al. (2020). Challenges and future directions of big data and artificial intelligence in education. Front. Psychol. 11:580820. doi: 10.3389/fpsyg.2020.580820,
Maassen, P., and Cloete, N. (2006). Global reform trends in higher education. Transformation in higher education: Global pressures and local realities. eds. N. Cloete, P. Maassen, R. Fehnel, T. Moja, T. Gibbon, and H. Perold. (The Netherlands: Springer).
Min, J., Song, X., Zheng, S., King, C. B., Deng, X., and Hong, Y. (2024). Applied statistics in the era of artificial intelligence: a review and vision. arXiv. doi: 10.48550/arXiv.2412.10331
Mitchell, R. (2018). Web Scraping with Python: Collecting More Data from the Modern Web. Sebastopol, CA: O'Reilly Media.
Nolan, D., and Temple Lang, D. (2010). Computing in the statistics curricula. Am. Stat. 64, 97–107. doi: 10.1198/tast.2010.09132
Schwab-McCoy, A., Baker, C. M., and Gasper, R. E. (2021). Data science in 2020: computing, curricula, and challenges for the next 10 years. J. Stat. Data Sci. Educ. 29, S40–S50. doi: 10.1080/10691898.2020.1851159
Spanjaard, D., Hall, T., and Stegemann, N. (2018). Experiential learning: helping students to become ‘career-ready’. Australas. Mark. J. 26, 163–171. doi: 10.1016/j.ausmj.2018.04.003
Stone, K. B., Kaminski, K., and Gloeckner, G. (2009). Closing the gap: education requirements of the 21st century production workforce. J. Ind. Teach. Educ. 45, 5–33.
Trauth, E. M., Farwell, D. W., and Lee, D. (1993). The IS expectation gap: industry expectations versus academic preparation. MIS Q. 17:293. doi: 10.2307/249773
Keywords: artificial intelligence, coursework, curriculum gap, data science, Rasch modeling, text analysis, web-scraping
Citation: Dong Y, Baral D and Baral K (2026) Are U.S. graduate curricula ready to prepare social data scientists for the AI era? Front. Educ. 10:1657651. doi: 10.3389/feduc.2025.1657651
Edited by:
Barbara Jones, Bibliometrica Limited, United KingdomReviewed by:
Francisco Rafael Trejo-Macotela, Universidad Politécnica de Pachuca, MexicoRany Sam, National University of Battambang, Cambodia
Copyright © 2026 Dong, Baral and Baral. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Yixiao Dong, eWRvbmdAdWNzYi5lZHU=
Deodatta Baral2