Your new experience awaits. Try the new design now and help us make it even better

ORIGINAL RESEARCH article

Front. Manuf. Technol., 18 December 2025

Sec. Digital Manufacturing

Volume 5 - 2025 | https://doi.org/10.3389/fmtec.2025.1601903

A development framework for human work integrated AI systems in manufacturing

Yuji Yamamoto
Yuji Yamamoto1*lvaro Aranda-MuozÁlvaro Aranda-Muñoz2Kristian SandstrmKristian Sandström1
  • 1School of Innovation, Design, and Engineering, Mälardalen University, Eskilstuna, Sweden
  • 2Research Institutes of Sweden, Kista, Sweden

Integrating AI-driven applications in manufacturing processes holds immense potential for enhancing production performance. However, adopting AI technology in manufacturing presents significant challenges, including the frequent need for human and AI work harmonisation and the knowledge gap among diverse stakeholders. This paper presents a development framework for human-AI systems that effectively integrates human work in manufacturing settings. The framework is specially designed for development project leaders from the manufacturing domain who play a critical role in integrating the diverse expertise required to realise such systems. The framework was derived based on a literature review, and its practical utility was examined through empirical applications. The resulting framework contains 147 task recommendations associated with six conceptual development progression stages. The empirical studies, including external reviews and preliminary empirical validation of the framework through pilot applications to two real-world projects, indicate the framework’s practical utility, such as facilitation of deep multi-domain dialogues for early phase project planning and mid-term reflective planning, as well as remarks, such as its use in iterative development environments. The present study extends the existing manufacturing research by offering deeper insights into and advancing a structured approach to developing human-integrated AI systems in industrial environments, which the study found to be immature.

1 Introduction

Manufacturing companies recognise the immense potential of incorporating modern AI-driven applications using machine learning (ML) techniques into production processes. Application areas are diverse, including production planning, automation, quality inspection, process control, and predictive maintenance (Papageorgiou et al., 2022; Hadid et al., 2024).

Despite the acknowledged potential, adopting AI technology in manufacturing presents numerous challenges. The technology is often intended for high-stakes operational processes where faulty decisions or controls can severely damage operational performance (Lee, 2020). People working close to the manufacturing operations, such as operators and technicians, are often in the loop of the AI-incorporated solution systems, involving data collection, curation, decision-making based on ML inferences, error handling, and system maintenance. This often necessitates that solution systems incorporate socio-technical perspectives to foster human-AI collaboration—human work and AI-based functions are harmonised and mutually complemented to improve operations (Bousdekis et al., 2020; Emmanouilidis et al., 2021).

Further, the knowledge gap among those involved in developing and implementing AI applications in manufacturing poses a significant challenge. Persons with diverse expertise and backgrounds, such as operators, production engineers, quality engineers, data scientists, human-machine interface designers, and software engineers, are often involved in the development (Arinez et al., 2020). An insight from a real-world AI development project in manufacturing is that even though a cross-functional team is formed, the lack of knowledge overlap—the ability of each participant to comprehend the contents and properties of the solution systems beyond those parts immediately relevant to one’s expertise—inhibits effective collaboration (Yamamoto et al., 2024). The participants experienced confusion due to the lack of shared understanding of what activities the participants should expect from others and how their activities would affect others. This observation aligns with findings from other studies indicating that the separation of concerns—experts from various domains only concern and engage in their own areas—becomes hardly effective in the development, because of the profound interdependence of critical components of the AI-incorporated solution system, such as data, ML models, and user interactions (Subramonyam et al., 2022).

With the above challenges, manufacturing scholars and practitioners increasingly observe or experience the phenomenon of so-called AI graveyards—even though early AI prototypes demonstrate inspiring potential impacts, applications rarely go beyond the prototyping phase and are implemented in real operational environments, creating significant value for manufacturing organisations.

Those challenges can be addressed in multiple ways. At least, there is a strong need among manufacturing organisations for a structured and deliberate approach to integrating various expertise in the development and implementation of AI-incorporated solution systems in manufacturing. A development framework assisting in realising human-AI collaborative systems in high-stakes areas is particularly anticipated. However, developing such a framework is still in its infancy; existing frameworks in the literature fall short in comprehensiveness or granularity for practical use. For instance, Kaymakci et al. (2021) and Pokorni et al. (2021) propose staged process models for AI development in manufacturing environments. Each stage in these models contains recommended activities but lacks details for operationalisation. Emmanouilidis and Waschull (2021) and Emmanouilidis et al. (2021) and Waschull and Emmanouilidis (2023) aim to derive a detailed method for realising AI solution systems in manufacturing, but it only focuses on the requirement elicitation phase of the development process.

The study in this paper aims to fill the existing research gap by deriving a comprehensive development and implementation framework. With sufficient details in task recommendations, the framework should contribute to the practice of realising AI-incorporated solution systems that effectively integrate human work in manufacturing environments. Hereafter in this paper, ‘development’ is used as an encompassing term that includes implementation activities.

This paper uses the terms “human work integrated AI systems”, “human-AI collaborative systems”, “AI-incorporated solution systems,” and “AI solution systems” synonymously. Our understanding of such a human-AI system is based upon conceptual models suggested by Bousdekis et al. (2020) and Emmanouilidis et al. (2021) wherein human and AI actors mutually interact to enhance each other’s capabilities, to realise the overall functionality and efficiency of the system. In such a system, AI-enabled components can augment the situation awareness, data analysis, and decision-making capabilities of human actors. Conversely, human actors involved in the system, such as operators and technicians on shop floors, can support AI components by providing control and feedback on the AI components’ inference results (Bousdekis et al., 2020). These conceptual models suggest that human actors are integral components of the solution system. This paper considers that an AI solution system contains a technical and social subsystem. A technical system represents an AI-enabled software or hardware system, and a social system encompasses human actors or human operational workflows interacting with the technical system.

We aim to derive a framework that encompasses the following features. First, it includes a conceptual model representing an end-to-end development process for AI solution systems. The post-deployment stage is considered beyond the scope of this study and thus not detailed in the model. Further, the framework is designed to be a determinant framework, wherein each process stage incorporates factors and tasks that are believed or have been found to influence development outcomes (Nilsen, 2015). We intend for these determinants to be sufficiently detailed and actionable for practitioners.

Second, the primary users of the framework are development project leaders with manufacturing backgrounds, such as production engineering. Such individuals driving the project are critical for successfully implementing AI solution systems, especially when front-line employees are the primary users of the systems. The project leaders should be able to deeply understand these internal customers’ operational workflows, work contexts, and cognitive processes, and effectively communicate the benefits, concerns, and limitations of the solution systems using the language of those customers.

Third, the framework addresses the lack of knowledge overlap by increasing the transparency of the development process, enabling development participants with diverse expertise and roles to understand and anticipate the activities of other participants at various stages of the process. Transparency can also contribute to effective collaboration with external parties when the manufacturing organisation needs to rely on external parties to acquire or co-develop an AI solution system (Merhi and Harfouche, 2024). With a better understanding of the overall development process, the internal team should be able to set realistic and well-scrutinised requirements for external collaborators.

While structuring such a framework is in its infancy in manufacturing research, dozens of development frameworks have been suggested in other research fields, such as AI applications in clinical settings (e.g., de Hond et al., 2022), human-AI interaction design (e.g., He et al., 2023), and software engineering (e.g., Lavin et al., 2022). Advancements in those fields can be used to derive a framework targeted to manufacturing. With this understanding, the present study endeavours to meet its aim by following three steps: 1) conduct a literature review to identify and analyse existing development frameworks for AI solution systems. This informs the state of the art, its shortcomings, and a strategy for the framework derivation; 2) derive a framework by combining and modifying the existing frameworks; 3) preliminarily validate the practical utility of the theoretically derived framework through external reviews by academic experts and industrial professionals, and pilot applications to real-world AI solution system development projects. The step includes refinements of the framework based on the validation.

The resulting framework comprises a six-stage process, with 147 task recommendations associated with these stages. This framework draws upon the SAILENT framework (van der Vegt et al., 2023b), proposed in clinical research, but substantially modified for the purpose of the study. The preliminary validation indicates that the framework offers several practical benefits. It facilitates deep and constructive dialogue among participants from multiple domains, particularly during the early stages of project planning, when teams explore solution concepts and identify potential development challenges. It also supports mid-term reflective planning by enabling teams to review project progress and align their future actions. However, the validation also revealed that the framework’s effective use depends on clearly communicating its purpose, providing informed facilitation during its application, and adapting it to the organisational context.

The remainder of the paper is organised as follows. Section 2 outlines the study method, and Section 3 presents the results of the literature review on existing AI development frameworks. Section 4 explains the framework proposed in this study and the critical considerations made during the theory-based derivation. Section 5 presents the results of the empirical studies and the refinements made based on them. Finally, Section 6 discusses the study’s contributions and limitations, as well as avenues for future research.

2 Research methods

This section explains the methods adopted for the literature review and theory-based framework derivation, as well as the empirical studies conducted for the preliminary validation of the framework.

2.1 The literature review and theory-based framework derivation

Articles published in English were searched using Web of Science (WoC), Scopus, and Google Scholar from 2013 until August 2024. Articles before 2013 were not searched because the present study concerns adopting contemporary AI technologies discussed under Industry 4.0 and 5.0. The former concept has gained broad attention since 2013. The following keywords were used in the search: 1) “AI” or “artificial intelligence”, AND 2) a combination of two words, each of which is from the following two lists [development, implementation, adoption, design] and [framework, process, methodology, method, guidance, guidelines]. For instance, “development framework” is one combination, AND 3) “manufacturing” or “production”.

The keyword search found 1,037 and 965 articles from WoC and Scopus, respectively. Over 18,700 articles were found in Google Scholar. The search found articles from manufacturing and other areas, such as clinical applications, human-computer Interaction (HCI), and AI ethics, attributed to keywords such as AI and development framework. As a complementary, a search with the above keywords without “manufacturing” or “production” was done on WoC to ensure that relevant articles from the non-manufacturing domains were found. This search found 7,231 articles.

The screening process consisted of two loops. The first loop was based on the title, abstract, and full-text scan, and the second was based on full-text analysis (Heyvaert et al., 2017).

For the first loop, the following screening strategy was adopted. First, articles presenting a process or determinant framework for AI application or AI solution system development were included. The following rating method was also applied during the first loop screening. Papers containing frameworks with the following features were rated as highly relevant: 1) targeted for the manufacturing domain, 2) targeted for human-AI collaborative system development, 3) comprehensively addressing issues and tasks during the development lifecycle, 4) containing issues and tasks that are detailed enough for the practical use but not too detailed and limited to specific AI tasks such as AI vision.

A relevance saturation strategy was devised and applied to deal with the high number of search results on Google Scholar (over 18,700 hits) and WoC without the search words manufacturing or production (7,231 hits). The search results were ordered by relevance, and the screening was executed starting with the most relevant one. The screening was terminated when more than 200 articles in a row were found irrelevant to the study’s purpose. We assumed that the chance of finding strongly relevant articles after the termination would be sufficiently low. The screening of the search results was terminated after 800 and 700 articles were screened at the aforementioned search results of over 187,000 and 7,231 records, respectively. After the first loop, 91 articles remained for the second loop screening.

The second screening loop with full-text analysis adopted the same rating strategy to classify the articles into three classes. Class A articles were highly rated and considered to contribute significantly to the derivation of the framework. Class B articles were mildly relevant, and Class C articles were weak or irrelevant. The Class C articles were decided not to be used for the framework derivation. For Class A articles, a limited backwards and forward snowball search (Wohlin, 2014) was conducted as a complement to identify potentially relevant articles. The same screening process was applied to those articles. After the second loop screening, 74 articles remained. The review results of those articles are presented in Section 3.

The literature review entails multiple potential limitations, such as publication, search, selection biases and study quality of included articles (Booth et al., 2022). Those limitations were mitigated by various strategies, such as conducting the literature search using multiple search engines with diverse sets of keywords, ensuring a structured and transparent screening process, prioritising the selection of peer-reviewed publications, and assessing methodological rigour during the full-text analysis.

As a result of the literature review, it was found that the SAILENT framework (van der Vegt et al., 2023b) was the most relevant to the study’s aim. The theory-based framework derivation strategy emerged to adopt this framework as the base framework and reorganise it toward the study’s aim using other reviewed articles. Section 4 provides a detailed account of the derivation strategy and the alternation of the base framework, which can be better explained after the presentation of the literature review results in Section 3.

The main author of this paper performed the derivation of the framework with the assistance of the co-authors. These authors have several years of academic and practical experience in the manufacturing sector. The methodological quality of the derivation process was further sought by making the process transparent (Tong et al., 2012). A study protocol was created to document the changes to the base framework and their rationales. The protocol was exposed to four research colleagues for review. The derivation process is detailed in Section 4.

2.2 Empirical studies for the framework validation and improvement

The empirical studies involved two steps. The first step was to conduct external reviews on the framework derived from the literature review. This was to initially assess the practical utility of the framework and identify improvements. The second step was further validating the framework through pilot applications in real-world AI solution development projects. It should be noted that they were limited applications aimed at drawing preliminary validation results. Comprehensive real-world implementation across the entire development lifecycle was reserved for future research, as such validation requires longitudinal studies.

The external reviews aimed to gather feedback from the reviewers on the potential utility of the framework and identify areas for its improvement. This was considered a preparatory step for the second step of the real-world pilot applications. Two scholars and eight industry professionals, as shown in Table 1, participated in the reviews. All participants possess practical experience in developing AI solution systems in manufacturing environments. Half of the participants were manufacturing engineers and potential users of the framework, while the remaining participants were specialists in AI development, human-computer interaction design, and IT system development, providing multidisciplinary perspectives. All the participants were affiliated with research or manufacturing organisations located in Sweden. Companies A and B in Table 1 are large companies, whereas Company C is a medium-sized one.

Table 1
www.frontiersin.org

Table 1. Participants for the external reviews.

The review sessions lasted one and a half to 2 hours. The main author of this paper drove these sessions. The session began with him explaining the background and objectives of the framework development, as well as the purpose of the review. Then, the participant reviewed the framework and discussed its practical utility and suggestions for improvement. The researcher took notes during the sessions, and conversations were audio-recorded. The refinements based on the reviewers’ feedback were registered in the study protocol.

The subsequent step of the pilot applications aimed to gain further insights into the framework’s practical benefits and limitations and to support further refinement. The applications were made to projects at Companies A and B.

Company A operates a sintering process for producing fuel mass, a key component supplied to the energy industry. A process improvement team sought to apply machine learning techniques to enhance the predictability of output quality by utilising multiple process parameters. This would enable process operators and engineers to adjust parameters proactively to maintain consistent quality. During the project definition and business justification phase, task recommendations associated with Stage II and III of the framework (as presented in Section 4) were applied to help project members comprehensively understand the problem context and critically assess project scope and solution feasibility.

Three discussion sessions were conducted, during which project members discussed each task recommendation related to Stages II and III. Those sessions lasted a total of four and a half hours. The members included a process engineer, who also represented the process operators; an IT engineer representing the internal AI team; an improvement coordinator; and an automation engineer specialising in extracting data from the sintering equipment.

Company B, a manufacturer of large-scale power distribution equipment, had an ongoing project to apply AI’s image recognition capabilities to realise a real-time assembly quality assurance system. At the time of the empirical study, the project was approximately aligned with Stages IV and V of the framework. Two groups of project members participated in a 2-h discussion session where they were asked to retrospectively consider how access to the framework from the early stages might have supported project execution.

The first part of the discussion focused on reviewing the framework’s overall process, as illustrated in Figure 1. The framework’s compatibility with the agile philosophy adopted in the project was also discussed. The second part involved examining task recommendations associated with Stages II to V, as presented in Supplementary Appendix A. The participants included a project leader, a user experience designer, a software developer, three AI engineers, and an automation engineer.

Figure 1
Framework diagram for developing human-AI systems in manufacturing environments. It consists of three stages: Needs Finding, Preparation, and Feasibility Study. Stage I involves understanding AI capabilities for generating use case ideas. Stage II analyzes potential use cases for stakeholder needs and feasibility. Stage III builds AI prototypes and evaluates user value. The process supports shared understanding and aligns teams for system development. Project management is integrated throughout for engagement, ethics, and safety. Flowchart illustrating stages of AI system development and deployment. Stage IV involves integration of intermediate-fidelity prototypes, focusing on experimental environments. Stage V features integration into operational environments with high-fidelity prototypes, forming a socio-technical system. Stage VI consists of system validation and deployment, with solution systems approved for operations. Each stage involves workflow, HCI, AI model, and data pipeline development or updates, leading to technical and socio-technical system integration, followed by system monitoring and reporting. Post-deployment involves maintenance and improvement activities for system performance and workflow adaptation.

Figure 1. A development framework for human work integrated AI systems in manufacturing. Each section (coloured rectangular boxes in the figure) contains task recommendations presented in Supplementary Appendix A.

In the sessions at Company A and B, at least one of the authors was present to facilitate the discussions. All conversations were audio-recorded and subsequently analysed by the researchers.

3 AI solution system development frameworks in the literature

This section reports the literature review results. It contains three subsections corresponding to the review results in manufacturing, clinical, HCI, and other research domains.

3.1 Manufacturing research domain

The review shows seventeen articles relevant to the development and implementation framework for AI solution systems.

Two articles, Kaymakci et al. (2021) and Pokorni et al. (2021), present development lifecycle process models. The former contains four stages: planning, experimentation, implementation, and operation, and the latter eight stages: mobilise, check, understand, collect, analyse, develop, deploy, and operate. While these process models cover the development lifecycle comprehensively, the descriptions of task recommendations at each stage lack detail. For instance, at the planning stage of Kaymakci et al.’s (2021) model, the authors suggest that development stakeholders draft an initial version of an AI system, considering key elements such as input data, functionality, system performance, and usage environment. However, they do not provide further details about the actions necessary to extract the draft. The lack of granularity in such recommendations makes it difficult for practitioners to translate them into specific interventions. Further, both models primarily focus on technical system development based on the CRIPS-DS model (Getachew et al., 2024), paying limited attention to realising effective human-AI collaboration in the systems. The review also identified articles discussing enablers for AI implementation, such as assuring data quality and gaining support from vendors (Dora et al., 2022; Merhi and Harfouche, 2024). However, these enablers are still too general to inform what actions should be considered or undertaken at which stage of the development process.

The reports from two preceding EU projects, ASSISTANT and STAR, are found relevant to our study. The former project aims to develop AI-integrated digital twin systems in manufacturing environments. Castañé et al. (2023) discuss a system development process that still focuses on technical system development. Under the same project, Buchholz et al. (2022) suggest considering responsible AI concerns in the development process by conducting an AI-incurred risk assessment.

The STAR project is closer to our study than the former, as it aims to develop an actionable methodology for human-AI collaborative systems in manufacturing (Emmanouilidis et al., 2021; Ipektsidis and Soldatos, 2021; Waschull and Emmanouilidis, 2022). The project proposes a method for structuring the requirement elicitation stage of the development process. While the coverage is limited to this stage, the recommended tasks are more detailed than those in other works in manufacturing research; the method suggests addressing various issues to foster effective human-AI collaboration, such as system trustworthiness, safety, explainability, and users’ error handling and feedback for system improvement.

Overall, the literature review indicates that research on developing a framework for AI solution systems in manufacturing is in its infancy. Previous studies either provide a comprehensive overview of the development process with insufficient detail in task recommendations or offer more detailed guidance only at a specific stage.

3.2 Clinical research domain

The review identified 24 clinical research articles relevant to the study objective. Analysis of these articles shows that methodological development is more advanced in this domain than in manufacturing.

Assadi et al. (2022) suggest a four-stage development process addressing 37 challenges along with the process. van der Vegt et al. (2023b) present a five-stage implementation framework with 63 task recommendations related to those stages. de Hond et al. (2022) and Sendak et al. (2020) also present process models with fewer determinants. However, Sendak et al.’s (2020) framework is notable for being derived from the actual implementation of an AI solution system, which remains rare in the literature. Further, Nair et al. (2024) present a comprehensive overview of barriers and preventative measures to those barriers in the development and implementation lifecycle. Task recommendations from Tahvanainen et al. (2024), Zajac et al. (2023), and He et al. (2023) focus more on human-computer interfaces.

The review indicates that AI implementation in clinical settings shares multiple characteristics with manufacturing. First, AI applications in these settings often aim to assist clinicians and nurses in decision-making rather than supplant them. This often requires designing solution systems to enhance effective collaboration between humans and AI.

Second, several reviewed clinical research articles emphasise the importance of seamless integration of the AI system into clinical workflows (Reddy et al., 2020; Sandhu et al., 2020; Sendak et al., 2020; Fogliato et al., 2022; Gu et al., 2023; Wenderott et al., 2024). This echoes manufacturing environments as flawless integration into operational workflows is critical. In manufacturing, workflow integration can be even more complex when multiple professionals with diverse knowledge and skills are involved in the system operation. However, the reviewed manufacturing articles scarcely addressed workflow integration in the development process.

Third, AI-incorporated clinical applications often involve high-stakes areas, such as pathology diagnosis (Gu et al., 2023) and sepsis detection (Sendak et al., 2020). In such areas, human involvement is crucial in ensuring the reliability of AI solution systems. AI solution systems are frequently applied to high-stakes areas in manufacturing (Lee, 2020).

Among the suggested frameworks, the review finds that the SAILENT framework (van der Vegt et al., 2023b) is the most relevant to our study’s purpose. This framework was developed to address the lack of a structured, end-to-end approach for implementing AI in clinical settings. It integrates established clinical study reporting standards, such as TRIPOD-AI (Collins et al., 2021) and CONSORT-AI (Liu et al., 2020), with implementation stages, bridging the gap between research and real-world deployment. The framework was informed by a scoping review and gap analysis, aiming to guide healthcare organisations through safe, effective AI adoption. It contains a process framework with five stages from Stage 1—preparation to Stage 5—clinical validation and deployment, and identifies 63 task recommendations related to those stages. Our study sets this framework as a base framework for the framework development for the following reasons.

First, van der Vegt et al. (2023b) aim to structure an end-to-end clinical AI implementation framework to be used by a broad audience of stakeholders involved in developing, testing, and deploying AI-based decision support systems in clinical settings. The framework’s end-to-end nature and its intention to integrate multiple domains in development align well with our study objective.

Second, the framework identifies four components of an AI solution system: workflow, HCI (Human-Computer Interaction), AI model, and data pipeline. The framework does not provide exact definitions of those components. Our study interprets the workflow component as standardised human-executed operational procedures that immediately interact and cooperate with the technical system. The HCI, AI model, and data pipeline components constitute a technical system. The HCI component is a human-computer interface part of the technical system. The AI model component is one or more ML-driven programs that make predictions based on data. The last component is a set of methods that collect, store, process, and present data for inputs or outputs of the AI model component. This component includes the hardware realising these methods. Identifying these four components, especially the workflow component, is relevant to manufacturing, as the importance of its integration has scarcely been argued in the manufacturing literature. Separating the four components is also beneficial as they are associated with specific roles in AI solution system development. The workflow component development is often associated with manufacturing domain experts, the HCI component with user experience or interaction designers, AI models with data scientists, and data pipelines with data engineers and software engineers.

Third, this base framework identifies 63 task recommendations for these components’ development, integration, monitoring, and reporting. This adds granularity and comprehensiveness to the framework and clarifies task allocation to different roles in the development. Fourth, the framework identifies cross-stage project management tasks such as stakeholder engagement, communication, and consideration of ethics, privacy, transparency, and equality. This is relevant to our purpose because the aimed framework targets development project leaders. Fifth, the framework is validated by mapping enablers and barriers found in the literature reporting real-world implementations of AI solution systems in clinical settings (van der Vegt et al., 2023a). This study shows that the framework can account for these factors, implying some empirical validity.

Despite the advancements in this base framework, several limitations persist toward our study objectives. Four notable ones are as follows. First, the framework is articulated in the context and language of clinical practice. For instance, one task recommendation is to “Identify target patients and sample sizes”. Such descriptions should be translated into more natural or manufacturing contexts. Second, task recommendations are less comprehensive at earlier stages. This is probably because the framework is derived from clinical reporting standards such as TRIPOD-AI (Collins et al., 2021) and CONSORT-AI (Liu et al., 2020), which are oriented toward system evaluation rather than system specification and development.

Third, many task recommendations are assigned over multiple stages. For instance, the task “Develop end-to-end data pipeline” is assigned across Stages II to V in the base framework. This makes it unclear when a specific task should be conducted. For better conceptual structuring, a task recommendation could be associated with a particular stage of the development process. Fourth, the exit state of each stage is unclear. For example, Stage III involves building a technical system and conducting a laboratory test. The framework, however, does not describe what level of maturity or readiness the system should achieve at the end of the stage.

The literature review indicates that these shortages can be complemented by other articles reviewed in the study. This is described in Section 3.4.

3.3 HCI, ethics, and other domains

The literature review identifies relevant articles from HCI, AI ethics, and other domains such as software engineering.

3.3.1 HCI research

The review identifies fifteen articles related to HCI research. These articles contain process frameworks and task recommendations for AI-enabled application development targeted at consumers, public sectors, and the healthcare domain. None of them is specific to manufacturing.

Many of these articles focus on the design phase of the development life cycle, aiming to envision and specify key features of AI solution systems (Colombo and Costa, 2024; Kawakami et al., 2024). They mainly concentrate on human-machine interaction design, making their suggestions particularly relevant to the HCI component of the base framework. Some recommendations pertain to the workflow component when the articles adopt design thinking to capture user requirements and their contexts (e.g., Amershi et al., 2019; Schleith and Tsar, 2022; Kawakami et al., 2024). Several articles stress the importance of coherent interaction between human and AI-enabled functions and advocate for co-designing system features where users, interaction designers, AI developers, and other stakeholders participate in the design process (e.g., Colombo and Costa, 2024; Lee et al., 2024; Tahvanainen et al., 2024). This enhancement of collaboration aligns with our framework objective.

Although these articles mainly focus on the early stages of the AI solution system development, they are rich in task recommendations. For instance, Amershi et al. (2019) suggests 18 human-AI interaction (HAI) design guidelines covering user interface communication, error handling, and system maintenance, such as “Help the user understand how often the AI system may make mistakes” and “Enable the user to access an explanation of why the AI system behaved as it did”. He et al. (2023) proposed 25 HAI guidelines for AI-assisted clinical decision systems, Lee et al. (2024) suggested 20 recommendations about user interface design, and Kawakami et al. (2024) introduced over 40 deliberate questions for the design of AI applications for the public sector to facilitate early and cross-functional conversations about the goal and intended use of the system and societal and legal considerations. Moreover, large information technology companies such as Google, Microsoft, and IBM publish HAI design guidelines. For instance, the Google People + AI (PAIR) research team proposes 23 guidelines (Google, 2024).

Overall, these suggestions and deliberate questions address human-computer interface design to enhance effective human-AI collaboration, including error handling and user feedback, system transparency, explainability, and trustworthiness. At the same time, there are divided considerations about exactly when these recommendations should be applied in the development lifecycle. For instance, PAIR notes that their design guidelines can be used for initial user requirement elicitation before prototyping or its evaluation and refinement with built prototypes (Google, 2024). The guidelines suggested by Kawakami et al. (2024) focus more on the latter.

3.3.2 AI ethics research

The thirteen reviewed articles focus on establishing trustworthy AI in system development. They seek practical means of implementing ethical concerns, such as transparency, safety, fairness, security, and privacy, in AI systems so that the resulting systems become trustworthy to stakeholders.

Many of them suggest evaluation frameworks for trustworthy AI. For instance, an expert group on AI set up by the European Commission suggested a self-assessment tool (ALTAI) that helps organisations evaluate the trustworthiness of AI systems under development (Ala-Pietilä et al., 2020). The tool poses over 60 questions in the seven requirements, such as 1. Human Agency and Oversight, 2. Technical Robustness and Safety, 3. Privacy and Data Governance, and 7. Accountability. Brundage et al. (2020) proposed ten recommendations to secure AI trustworthiness. Madaio et al. (2020) present a fairness checklist and Raji et al. (2020) suggest conducting an FMEA-like process to assess potential ethical risks. Further, Zicari et al. (2021b) proposes a trustworthiness audit framework for AI applications.

These articles broadly address issues and provide task recommendations to ensure trustworthy AI in system development. However, these are often presented as self-contained evaluation frameworks not deeply integrated into the AI solution development process. The lack of integration makes it unclear at which stage of the development process these trustworthiness concerns, assessments, and audit procedures should be addressed or executed.

3.3.3 Other research domains

The literature reviews found one article from the business and organisation research (Makarius et al., 2020), which features a high-level process of employee acceptance of AI technology. However, this is an organisation-level activity rather than a project-level activity. Four articles are from software engineering. These articles discuss and suggest development processes and task recommendations, but only focus on software development. Park et al. (2024) present an AI software system architecture design methodology and Steidl et al. (2023) present an AI-integrated software development lifecycle.

Lavin et al. (2022) and Martínez-Plumed et al. (2021) integrate the Technology Readiness Level (TRL) into the AI software development process and suggest development stages and task recommendations associated with the TRL levels. We consider the TRL integrations relevant to our study objective because they would clarify expected outputs from task recommendations and development stages, which the base model discussed in Section 3.2 lacks. Clarifying the expected outputs may increase the framework’s utility by helping practitioners anticipate the resources required to achieve a specific task recommendation or development stage.

3.4 Strategy for deriving a development framework

To summarise, the literature review shows that AI solution system development frameworks in manufacturing research are in their infancy. The SILENT framework from clinical research (van der Vegt et al., 2023b) is found to be the most relevant to the aim of the present study, even with several limitations. The review informs that these limitations can be addressed by incorporating other reviewed frameworks.

With this understanding, a basic strategy for the theory-based framework derivation is emerged as follows: 1) the SILENT framework is used as the base framework, 2) as suggested by the base framework, the workflow, HCI, AI model, and data pipeline components are considered those constituting an AI solution system, 3) the descriptions of the process stages and task recommendations in the base framework are rewritten to be not clinical context specific, 4) the limitation of less comprehensive task recommendations at the earlier stages in the base framework is complemented by those found in HCI research (e.g., Colombo and Costa, 2024; Kawakami et al., 2024) and STAR project reports in manufacturing (Emmanouilidis et al., 2021; Waschull and Emmanouilidis, 2022; 2023), 5) the base framework’s ambiguity in the exit state of development stages and anticipated outputs of task recommendations is addressed by integrating the TRL-based frameworks (Martínez-Plumed et al., 2021; Lavin et al., 2022; Tahvanainen et al., 2024), and 6) trustworthy AI evaluation frameworks (e.g., Zicari et al., 2021b; Kinney et al., 2024) will be referred to in the framework. However, their comprehensive integration into the AI solution system development process is deferred for future research, as it necessitates substantial additional study.

4 AI solution system development framework

This section explains the AI solutions system development framework, which was initially derived from the literature review and refined based on empirical validation. Details of how the latter contributed to minor refinements are described in Section 5.

4.1 The framework overview

Figure 1 presents an overview of the development process, comprising six conceptual progression stages. Stage I, Needs finding, explores and identifying improvement opportunities in manufacturing that can potentially be resolved by AI-based systems. Stage II, Preparation and specification, involves a deeper analysis of a promising improvement idea identified in Stage I. The analysis is conducted from the perspective of the AI solution system components to establish initial user and system requirements. The outcome of this stage supports the assessment of whether the analysed use case has sufficient potential for further progression. In Stage III, Feasibility study, early AI model prototypes are built to evaluate technical feasibility and to assess the solution system’s potential value for users and other organisational stakeholders. The stage provides a basis for understanding whether the use case warrants further development and implementation, committing significantly more resources for its realisation.

In Stage IV, Development and evaluation at experimental environments, each AI solution system component is further developed, and the technical components (HCI, AI model, and data pipeline components) are integrated into a technical system, which is tested in experimental environments with limited integration with the workflow component. Stage V, Development and evaluation at operational environments, involves further refinement of AI solution system components, full integration of all components, pilot runs, and system monitoring and reporting at the operational environments. At Stage VI, System validation and deployment, the AI solution system is validated, audited and approved for deployment. The outcome of this last stage is the system in operation. The framework does not contain the details of the post-deployment stage, as it is considered outside of the study scope. As illustrated in Figure 1, anticipated activities in this stage include continuous monitoring of system performance, tracking data and model drifts, ML model retraining, version control, and stakeholder communication on system updates (Kawakami et al., 2024; Nair et al., 2024).

As described in Figure 1, TRLs are associated with the progression stages, such as TRL 3 for Stage III, TRL 4 and 5 for Stage IV, etc., referring to TRL for AI software development (Martínez-Plumed et al., 2021). Each stage in Figure 1 contains sections depicted in coloured rectangular shapes, corresponding to the development, integration, and monitoring of the AI solution system components. For instance, the blue-coloured rectangular boxes in Figure 1 denote the development of the workflow component. Along with those stages, there is a cross-stage project management section. Each section contains a total of 147 task recommendations, as presented in the tables in Supplementary Appendix A. These tables use the same colour coding as the corresponding rectangular shapes in Figure 1 to maintain visual consistency. Examples of the task recommendations for the workflow section in Stage II are presented in Table 2. In Supplementary Appendix A, each task recommendation is accompanied by its expected outcomes and literature sources, indicated by superscript numbers. The references corresponding to these superscripts are listed in Supplementary Appendix B. For example, task recommendations marked with superscript “1” indicate that they are derived from or informed by van der Vegt et al. (2023b), who proposed the base framework. Among the reviewed articles, forty-five immediately contributed to adding, modifying, or strengthening task recommendations. Others contributed to increasing the researchers’ understanding of the AI solution development process and its contents. Some task recommendations are identified or refined based on the external reviews, which are also superscripted in Supplementary Appendix A.

Table 2
www.frontiersin.org

Table 2. Examples of task recommendations for the workflow section in stage II.

4.2 Similarities and differences from the SALIENT framework

The literature review-based framework is derived by applying the framework derivation strategy outlined in Section 3.4. First, the base framework’s process stage descriptions and task recommendations are changed to better suit manufacturing contexts. The base framework is then integrated with TRL for AI software development (Martínez-Plumed et al., 2021) to relate the stages with system maturity progression, identify anticipated outcomes of each task recommendation, and associate these recommendations with the stages.

The reviewed articles from the HCI research and the STAR project are used to add and refine the task recommendations, especially at Stages II and III. The remaining articles from clinical research, manufacturing, and software engineering are used to add or refine task recommendations. The trustworthy AI-related articles are used only to refine the project management section, adding that ALTAI or similar self-assessments should be conducted in the earlier and middle stages (Zicari et al., 2021a; Bousdekis et al., 2024) and an extensive trustworthy audit at later stages (Zicari et al., 2021b).

The derived framework inherits several key features from the base framework. A notable one is the identification of sections within or across the process stages. The sections depicted in Figure 1, such as workflow, HCI, technical system integration, solution (socio-technical) system integration, and cross-stage project management, are informed by the base framework. Stage II to VI in the derived framework corresponds to Stage 1–5 in the base framework. Several task recommendations in the derived framework are based on those in the base framework, as indicated in Supplementary Appendices A and B.

Beyond those inherited elements, substantial modifications have been made based on the literature review. Key differences from the base framework are described below. First, Stage I, Needs finding, is added to the base framework. This is done because we agree with a few reviewed articles addressing the criticality of “doing the right thing” before focusing on “doing things right” (Subramonyam et al., 2022; Yildirim et al., 2023). These authors argue that, even from the ideation stage, potential users and other stakeholders should understand AI capabilities, including what it can do and cannot, and have conversations with data scientists to avoid working on use cases that are found in later stages to be technically unattainable or generate too little value for the users and stakeholders. Most reviewed articles start with Stage II of the derived framework, assuming the development team has already identified a use case to scrutinise further.

Second, the total number of task recommendations increased from 63 to 147. Task recommendations are especially enriched at Stages II and III. For instance, the derived framework contains 22 task recommendations in the workflow section at Stage II, whereas the base framework has three in the corresponding section. Stage III’s workflow and HCI sections are not in the base framework’s corresponding stage (Stage 2). The base framework identifies only the AI model and data pipeline sections, recommending building early AI model prototypes with retrospective datasets. However, the literature review indicates, and the external reviews later confirm, that early evaluation of HCI design and user value assessment based on the prototypes should be undertaken at this stage (Wiens et al., 2019; Colombo and Costa, 2024; Kawakami et al., 2024; Tahvanainen et al., 2024).

Third, the TRL integration in the derived framework clarifies the anticipated outcomes of process stages and task recommendations, which are ambiguous in the base framework. Further, the framework incorporates aspects of design-development-evaluation iterations occurring throughout the development process, which are not explicit in the base framework. Zajac et al. (2023) uses the metaphor of “growing” in the AI solution system development. These authors and others (Schleith and Tsar, 2022; Colombo and Costa, 2024) recognise that the maturity of AI solution systems is incrementally increased through those iterations. In the framework, several task recommendations are repeated across multiple stages, each anticipating different outcomes and maturity levels.

4.3 Remarks about the derived framework

There are some remarks about the proposed framework. As stated in Section 1, the framework is intended to assist in developing AI solution systems that are well synchronised with human actors in manufacturing environments. The primary users of the framework are production engineers driving AI solution system development. The framework should enhance the transparency of the development process and task contents among multi-disciplinary participants, enabling them to share an understanding of development progress and each other’s expected tasks and outcomes. This may enhance the deep and proactive collaboration among the development participants.

It should be emphasised that the progression stages in the framework in Figure 1 are conceptual. They are intended to provide an overall structure that helps development stakeholders anticipate high-level planning, identify key milestones and deliverables, and share an understanding of the development progress. In reality, especially in iterative development environments, these stages may overlap and iterate considerably. At least, the literature review informs that Stages II and III often iterate (Colombo and Costa, 2024), implying that the boundary between these stages in practice is not as explicit as it appears in the framework. Empirical findings presented in Section 5 further illustrate that some development projects may transition from use-case ideas (Stage I) to building AI prototypes (Stage III) without in-depth problem analysis and solution specification at Stage II. This may still be viable, as prototypes often act as boundary objects (Broberg, 2011) to enhance and concretise the analysis and specification at Stage II. Additionally, some projects may experience varying progression speeds across system components and their subcomponents, suggesting that progression alignment may not be as rigid as the framework implies.

Some remarks about the task recommendations should be communicated. Task recommendations in different sections are written from the perspectives of different development roles. For instance, workflow sections are written from the perspectives of manufacturing domain experts, HCI sections from interaction designers, AI model sections from data scientists, data pipeline and system integration sections from data and software engineers, and project management sections from project leaders.

The granularity of the recommended tasks is aligned with those in the base framework and the derived framework’s intention—development participants with different roles can understand others’ tasks to deepen collaboration.

5 Results of the empirical studies

This section presents the results of the empirical studies. The first subsection reports the results of external reviews and refinements made based on their feedback. This is followed by the report on pilot applications of the framework in real-world projects. The third subsection discusses the practical utility of the framework implied by these empirical studies.

5.1 Initial feedback on the practical utility and improvement suggestions

Reviewers were generally positive about the framework structure, especially the separation of the sections corresponding to AI solution system components and different roles in the development. Reviewers R4 and R9 commented that since the framework contains many task recommendations, assigning them into sections in the process stages helps them comprehend the process overview. R9 noted that separating into sections clarifies the interplay between technical system development and human system integration.

Respondents recognised the framework’s potential to facilitate internal customer focus (i.e., users of the AI solution system), cross-functional communication, front-loading of development concerns, and the manufacturing domain’s development process ownership.

R8 emphasised that candid and unpretentious dialogue among development participants, such as between AI experts and operators, was crucial for successfully developing AI solution systems. Humans’ and systems’ capabilities and limitations should be openly and constructively communicated throughout development. He and R3, R4, and R6 perceived that the framework structure and the richness in task recommendations would make the development process more transparent and comprehensive, fostering such communication. R7 commented that the multitude of tasks in the workflow sections would enhance the system users’ perspective in the development and, thus, internal customer focus.

R5, R6, and R9 observed that the companies’ information system development processes were predominantly designed for and owned by the IT departments, making them less informative and integrative across other domains. They discussed that realising effective human-AI collaborative systems requires deep involvement in multiple domain stakeholders, especially manufacturing experts. R5 stated that the framework with multiple task recommendations related to manufacturing domain experts could enhance their greater engagement, responsibility, and accountability in AI solution development.

Reviewers’ feedback includes how the framework can be practically employed at their workplaces. They identified multiple uses for the framework: It has a strong potential to be integrated into the company’s gate model for AI solution system development (R7, R8). It can be utilised as educational material for AI development stakeholders so that they understand the process overview and what tasks would be expected during the development (R9). It can be employed to identify an organisation’s weaknesses in specific stages or sections (R9). R8 said the framework could be used to prepare system design and evaluation workshops and other integration events during development. It could be used as a checklist to determine key agendas and questions to be addressed on those occasions. R7 remarked that the framework could be used as a documentation framework, wherein the outcomes of each task recommendation would be recorded. This documentation could be used to communicate project progress, key decisions, and their rationale to internal or external stakeholders. Task recommendations, especially those at Stage II, can be used to form a system supplier benchmarking items to compare their capabilities and offerings (R9). R10 noted that while the comprehensiveness of the task recommendations was valuable, the relevance of the recommendations could vary depending on use cases.

Table 3 summarises reviewers’ suggestions for improvement and the refinements made based on them.

Table 3
www.frontiersin.org

Table 3. Suggestions for framework improvement and its subsequent refinement.

Overall, their inputs led to minor adjustments in the framework’s process description and task recommendations. For example, R6 suggested that the tasks in Stage II’s workflow section should be executed earlier than those in other sections in the same stage. This would allow the development team to concentrate on understanding the problem and its context rather than immediately focusing on technical solutions. The present study deemed this feedback pertinent and updated the description of the workflow section in Stage II.

Some suggestions are considered for future study. For example, R7 and R10 wished the framework included task recommendations for post-deployment, as they considered that a certain degree of design-development-evaluation iterations would continue in that phase. This, however, was not reflected in the framework improvement because identifying post-deployment activities is delimited in the present study, although it is acknowledged as an area for future research.

Many other suggestions concerned the granularity of task recommendations, indicating the challenges in determining an appropriate level of abstraction for task recommendation descriptions. For example, R9 noted that the risk of “knowledge silos”, where a few individuals with critical knowledge of the AI solution system development leave the organisation, should be one of the risk assessment items. Two task recommendations (WF33 and PM14 in Supplementary Appendix A) pertain to risk assessments. However, this suggestion was not reflected in the refinement of these task recommendations, as identifying and listing all possible risk items was beyond the scope of this study. Nonetheless, this argument implies that the appropriate level of granularity depends on the organisation’s or project’s specific needs and contexts. At least, the framework provides a foundation for adoption.

5.2 Pilot applications

As outlined in Section 2.2, the framework, refined after the external reviews, was applied to real-world projects at Companies A and B to further validate its practical implications.

5.2.1 Company A

At the company, the participants went through each task recommendation associated with Stage II and some with Stage III to gain and share a detailed understanding of the problem and its context, and assess the feasibility and viability of the potential solution. The participants referred to these discussion sessions as a ‘reality check’. Outcomes of task reviews were documented in a spreadsheet.

Cross-domain learning was a recurring feature of these discussions. For instance, discussions of tasks such as WF1, WF6, WF7, and D4 led the process engineer to explain the details of the sintering process, operators’ routines for adjusting controllable process parameters, and methods for gauging fuel mass quality to other participants who were less familiar with the problem area. The same engineer, with limited knowledge of ML techniques, raised questions regarding the concepts such as ‘data quality’ (D3) and ‘data shift’ (AI4). This led to a discussion on the data dependency of AI models and how data shift could occur and affect system performance in this specific use case. The solution system’s possible level of automation, hinted by HC5, led to a discussion of the level dependency on the ML model performance. The automation engineer explained to other participants regarding data sources and their accessibility (D3, D6) that some process parameters had been measured and stored for years, while others were relatively new, resulting from the company’s digitisation initiatives. The IT and automation engineers explained the heterogeneity of the current information system, referring to D6. The relevant data were collected at varying frequencies across multiple databases with different security classes.

Along with this learning, participants required additional clarification or explanation from the researchers on some task recommendations, particularly the rationales behind those tasks. For example, the process engineer and improvement coordinator wanted to understand the reason behind the task ‘Describe how frequently the problem occurs’ (WF5). The IT engineer explained how this could affect interface design, AI model selection, and data flow design, but still required confirmation from the researchers who understood the intention behind task formulation. The researchers clarified other tasks, such as the level of detail appropriate for the task “describe workflow” (WF7).

Participants repeatedly emphasised the value of conducting early and thorough examinations of the project. This would help them anticipate potential issues that might emerge during development, as well as the complexity, competence, and workload necessary to develop the solution system. It would also indicate which features or functionalities of the solution system should be deployed first and which ones should be deployed later, as well as the potential for business justification. The participants acknowledged that the framework helped to facilitate this examination process. Documenting the discussion results for each item was considered essential for maintaining project traceability and communicating effectively with those external to the project team. In the examination, the IT engineer further emphasised the importance of involving multi-domain stakeholders, citing a previous case in which a data scientist developed an ML prototype without involving key stakeholders. The project experienced a prolonged delay until a consensus was reached among them.

The iterative and evolving nature of the development was mentioned several times. The participants recognised that some task recommendations, such as “Consider why AI should be used and not other means” (WF9) and “Describe acceptable level of system performance” (WF12), could not be answered at once but often through the iteration of experiment, development, and evaluation. They acknowledged the interdependencies among the system components and task items, and thus the necessity of revisiting those items, as design decisions and their contexts would change continuously during development.

In the post-session reflection, the participants perceived the task recommendations as a meaningful checklist, facilitating in-depth early examination and iterative dialogue among cross-domain stakeholders. They thought that the checklist could also be used for mid-term project reviews to share an understanding of the project’s progression and remaining issues. The process engineer thought that reviewing task items together with other domains provided learning opportunities, especially to those having limited experience in AI system development projects.

5.2.2 Company B

The participants were asked to retrospectively consider how the framework would have supported the project if it had been available from the early phases of the development. The discussion primarily focused on Stages II through IV of the framework. Identified benefits were similar to those found in the case at Company A. The framework could be used as a checklist for the project’s early examination and mid-term reviews, as well as facilitation of stakeholder involvement. The project leader and software developer commented that the framework contained insightful control questions that could only be answered by specific domain experts. This would facilitate involving the right people at the right time to make proper decisions in the projects. The leader further remarked that task recommendations associated with Stage II would help validate project ideas and enhance early and mature dialogue among stakeholders to identify project scope, including the solution systems’ purpose, requirements, boundary conditions, and constraints. This would reduce late scope changes that he experienced in the project. An AI engineer, who joined the project recently, expressed that documenting the results of those task recommendations would help in understanding in detail why the project was initiated. The user experience (UX) designer commended that the framework would be used for project planning and mid-term reviews to understand the project’s general progression and reflect on the difference between plans and execution outcomes. The same person further remarked that task recommendations related to HCI and Workflow would be a helpful checklist for planning and evaluating interface design workshops with assembly operators. Referring to data exploration and data in the deployment (D1, D2, D5), another AI engineer reflected that the project should have explored more diverse data in different operational environments and system use scenarios to prevent the late algorithm changes that they encountered. Overall, participants acknowledge that while the number of task recommendations was somewhat overwhelming, they were deemed necessary for an effective project execution.

Besides the discussed benefits of the framework, participants identified several ambiguities and limitations, which led to its refinement, particularly in the process overview. As mentioned in Section 4.3, the staged process model is conceptual, representing the general development progression and the solution system’s maturity. The same section noted that multiple iterations would occur within or among stages, and that some tasks and stages might be conducted earlier or later than described in the framework. Although this was communicated during the sessions, the overview presentation before the refinement was susceptible to misinterpretation that the stages were to be followed sequentially, and that tasks were strictly confined to their respective stages. This misconception led to confusion among some of the participants. The reality of iterative development was frequently mentioned during the sessions. For instance, the project leader and UX designer noted that early AI model prototypes were created (associated with Stage III) before the project scope was detailed (Stage II). The software developer and automation engineer perceived that Stages IV and V were conducted nearly simultaneously, as the project sought to test an integrated solution in an operational environment as early as possible to identify potential implementation issues. This was possible by using one of the assembly stations as a testbed. They also noted that different system components matured at varying rates, and that previously closed issues often reopened due to late changes in other components. Furthermore, the project prioritised deploying a subset of quality detection features, implying that some system functionalities would be approved (Stage VI) earlier than others.

While participants acknowledged the value of waterfall representation for providing a high-level structure for development progression, the researchers concluded that the model should more explicitly convey its applicability to iterative development contexts. The following refinements are made, resulting in Figure 1. The stages in the process model are expressed as “conceptual progression stages” to emphasise that they are only a conceptual representation and do not intend to prescribe the sequence of development project activities rigidly. A note of the model’s relevance to iterative development is added to Figure 1, and the arrows between stages are made bidirectional to emphasise the iterative nature of the development. A minor refinement of task recommendations was also made; two task recommendations associated with the AI sections at Stages II and III were merged into others, as the AI engineers noted and the researchers agreed that the separation was redundant.

One limitation noted by participants was not addressed in the current refinement. The developer and automation engineer noted that while several tasks in the data pipeline sections were relevant to software developers who integrated the system into existing operational hardware and software, many of the tasks were more detailed for roles directly involved with AI models, such as data engineers. They discussed that software integration details, such as data exchange across hardware, software, and various system APIs, were only broadly represented under the task “fully integrating the solution system into the existing hardware and software systems in the operations” (TS11, ST3). The developer further remarked that software integration can be a challenge and a potential bottleneck, although not unique to AI projects. The same participant suggested that the organisation’s routines, practices, or checklists for software integration could meaningfully complement the framework. The researchers reflected that detailing software integration tasks could be a valuable extension of the framework. However, this was considered outside the scope of the present study, presuming that such knowledge is already accumulated within organisations.

5.3 Implications for the framework’s practical use

This subsection draws implications for the practical use of the framework, as a summary of empirical findings. Overall, no substantial disagreements are observed among practitioners regarding its benefits and limitations.

Several benefits are acknowledged. The framework serves as a useful checklist for planning and evaluating AI solution system development projects. In particular, tasks associated with Stages II and III facilitate early, mature, and iterative dialogue among multi-domain stakeholders to clarify project scope and constraints and to identify potential issues in the development. Such dialogue not only offers learning opportunities but also helps to bridge knowledge gaps and fosters shared understanding and alignment among stakeholders from diverse domains. The framework is also found to support mid-term project reviews to evaluate discrepancies between plans and outcomes, examine addressed and unaddressed issues, and coordinate future actions. Documenting task review outcomes contributes to project traceability and strengthens internal and external communication.

However, the preliminary validation indicates that the current framework should be applied with careful consideration. First, the intent of the process model and the association of tasks with process stages must be communicated clearly. The model and task association are intended to depict a conceptual, high-level progression of development activities, rather than to prescribe a strict sequential project execution. Second, the effective use of task recommendations may require guidance from individuals with an in-depth understanding of their underlying rationale and intent. The presence of such persons facilitates multidisciplinary dialogue and learning and prevents misunderstandings about their purpose and use. Third, while the granularity of the task recommendations is set to enhance transparency in development and collaboration among stakeholders, it can be further expanded by utilising the organisation’s existing knowledge, such as established routines or control questions for software integration.

6 Conclusions and discussion

The study presented in this paper aims to derive an AI solution system development framework, intended for project leaders in manufacturing domains to facilitate shared understanding, effective communication, and deep collaboration among multi-domain development participants. It is also aimed to preliminarily validate the framework through external reviews and its pilot applications to real-world projects. The framework draws upon the SAILENT framework, but a significant alteration is made to meet the study’s purpose. The preliminary validation indicates that the framework provides a structured means to support reflective, collaborative, and traceable development practices in AI solution system projects. However, its effective use is further enhanced by clear communication of its purpose, informed facilitation during application, and ongoing adaptation to organisational contexts, ensuring that teams can work cohesively toward shared goals and aligned outcomes.

6.1 Contributions

This paper makes several contributions to manufacturing research. First, it theoretically advances the structuring of AI solution system development processes, which have lagged behind other research domains. A few previous works in the manufacturing area present high-level AI solution system development processes, but lack details in task recommendations (Kaymakci et al., 2021; Pokorni et al., 2021). The works by Emmanouilidis and Waschull (2021), Emmanouilidis et al. (2021), Waschull and Emmanouilidis (2022), and Waschull and Emmanouilidis (2023) suggest more detailed tasks, but focus mainly on the user-side solution system specification stage. Among the 147 task recommendations in the derived framework, 48 have their sources related to the reviewed manufacturing literature. This discrepancy implies the increased comprehensiveness of the derived framework, signifying the advancement made in manufacturing research.

Second, as evidenced in the empirical studies, the proposed framework addresses the challenge of lack of ‘knowledge overlap’—defined in this paper as the ability of development participants to comprehend components and properties of the solution systems beyond those immediately relevant to their expertise. Addressing this challenge is crucial because of the complex and deep interdependencies among the solution system components (Sendak et al., 2020; Zajac et al., 2023; Colombo and Costa, 2024). However, no particular measures have been suggested in the manufacturing literature other than the conventional ones: forming multi-functional development teams and briefly introducing AI technologies to non-AI-expert stakeholders (Kaymakci et al., 2021; Merhi and Harfouche, 2024). A previous study indicates that such measures are still insufficient to establish deep collaboration (Yamamoto et al., 2024). The collaboration is ineffective when the development participants lack a holistic understanding of how the AI solution system should be developed and what form of cooperation is anticipated from other participants from other domains (Yamamoto et al., 2024).

The derived framework identifies task recommendations for different roles in AI solution system development. As identified in the empirical studies, this increases process transparency and enhances development participants’ proactive engagement in the process, with an increased understanding of how others are involved in the development. We believe the framework is an additional measure to strengthen knowledge overlap. Furthermore, the studies also observe that the current granularity of task recommendations facilitates detailed and nuanced discussions specific to the project and its contexts, such as the details of the manufacturing process, operators’ work procedures, data extraction methods, and the heterogeneity of existing hardware and software systems. Enhancing such dialogues and shared understanding further addresses the-lack-of-knowledge-overlap challenge.

Third, the framework is designed to support the development of human-AI systems that effectively integrate human work. Previous studies have discussed their importance (Bousdekis et al., 2020; Emmanouilidis et al., 2021). However, the literature review found no previous studies addressing how such human integrated systems can be realised in practice. This implies that the realisation presumably depends on researchers’ or practitioners’ prior experiences and less externalised knowledge.

The present study highlights the importance of workflow component development and integration, and identifies relevant task recommendations (n = 37) for which the manufacturing domain is primarily responsible. This causes explicit attention to such tasks and clarifies the domain’s role in the development. The lack of recognition and a structured approach for workflow integration has been an area in which manufacturing research has lagged behind other research domains, especially clinical research.

6.2 Contributions beyond the manufacturing research

Beyond the manufacturing research, the study contributes to the conceptual structuring of AI solution system development in industrial environments, particularly in areas where workflow integration is crucial. As discussed in Section 4, the framework advances the SAILENT (base) framework in several ways.

One of the shortcomings of the base framework is that task recommendations are assigned to multiple stages, which makes it ambiguous which specific tasks are associated with which stages and what outcomes can be anticipated from those tasks. This shortage was addressed by integrating the TRL into the base framework. TRL integration into AI-enabled software development has been proposed in the literature (Martínez-Plumed et al., 2021; Lavin et al., 2022), but integration into workflow-incorporated AI solution system development is unprecedented. While the resulting clarification and association remain conceptual, which may not always exactly echo the practice, they enhance the conceptual clarity of the development.

The present study has contributed to task comprehensiveness in the development. Table 4 compares the number of tasks between the derived and base frameworks in different sections. The stage-wise comparison of task counts is not meaningful except in Stage II because, in the base framework, many tasks are assigned across Stages III to IV. As mentioned in Section 5, we assess that the task granularity between the two frameworks is similar; therefore, the count changes due to variations in granularity are marginal.

Table 4
www.frontiersin.org

Table 4. Comparison of task recommendation counts between the derived framework and the SAILENT framework.

The task count comparison in sections indicates a significant increase in the workflow, HCI, TS, and STS sections, a moderate increase in the AI model, data pipeline, and project sections, and slightly less in the monitoring and reporting section.

The task counts in the workflow and HCI sections increased notably in earlier stages, contributed by the HCI articles (e.g., Colombo and Costa, 2024; Kawakami et al., 2024) and the articles from the STAR project (e.g., Waschull and Emmanouilidis, 2023). The lack of workflow and HCI tasks at these stages has been one of the base framework’s weaknesses. This increases the benefits of the user domain’s early and active participation in the development process and front-loading development concerns from the users’ perspective, as recognised in the empirical studies.

The task increase in the TS and STS sections is primarily because only a few tasks have been identified in the base framework. The articles from TRL integration in AI software development and clinical and HCI research enrich the tasks in these sections.

The task count becomes slightly smaller for the system monitoring and reporting section. Similar tasks in this section of the base framework were consolidated, and no additional tasks were identified in the study. We speculate that this section is already extensive because the base framework was derived from clinical study reporting standards (e.g., Liu et al., 2020; Collins et al., 2021).

6.3 Limitations and future work

Although the pilot applications of the proposed framework to real-world projects revealed its benefits and limitations, the empirical validation results are still preliminary. Broader applications of the framework across the entire development lifecycle are a meaningful area of future work, which may yield further insights into a structured approach to developing AI solution systems in manufacturing.

The framework has other limitations or extension opportunities that can be addressed in future studies. Although identifying task recommendations for the post-deployment stage was not the study’s scope, its importance is recognised in a few articles (de Hond et al., 2022; Nair et al., 2024) and by some industrial reviewers. Broadly identifying and structuring post-deployment activities and integrating them into the proposed framework represents a meaningful direction for future work.

The literature study found several articles stressing the importance of considering AI ethics, safety, and security (i.e., trustworthy AI) during the development of AI systems. These works suggest self-contained risk assessment and audit frameworks, although it is unclear when in the development process such assurance should be prepared and executed. Consolidating the use of these assessment frameworks with the proposed framework is a worthwhile area for future research.

The task recommendations in the proposed framework deliberately refrained from referencing specific tangible techniques that might assist task executions, such as FMEA for risk assessment (Raji et al., 2020), AI model cards explaining the AI model’s critical characteristics to stakeholders (Raji et al., 2020), and Wizard-of-Oz to simulate system usage scenarios (Colombo and Costa, 2024). A study identifying and mapping such techniques over a framework may enhance its practical utility.

The study enriched task recommendations for the project management section. However, through a few reviewed articles and informal conversations with practitioners during the study, we recognised the importance of the project aligning with the organisational context, such as the organisation’s vision and goals, human resource plan, and external partnership strategies (Dora et al., 2022; Merhi and Harfouche, 2024). Weber et al. (2023) refer to the ability of such a contextual alignment as an organisation’s AI project planning capability. Detailing tasks that ensure contextual alignment can be a valuable area for future work.

Finally, we recognise the rapidly growing interest in leveraging generative AI capabilities in manufacturing. Exploring how the proposed framework contributes to such development can be an intriguing research direction. It may also be investigated how language models assist the effective utilisation of the framework. As mentioned in Section 5.3, the current framework requires assistance from those with an in-depth understanding of its underlying rationale and intent. Intelligent AI agents may assist in providing such support.

Data availability statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author contributions

YY: Methodology, Data curation, Validation, Formal Analysis, Writing – original draft, Conceptualization, Funding acquisition, Writing – review and editing, Investigation. ÁA-M: Investigation, Visualization, Writing – review and editing. KS: Investigation, Writing – review and editing, Supervision, Funding acquisition.

Funding

The authors declare that financial support was received for the research and/or publication of this article. This work was supported by Vinnova, a Swedish agency for innovation, under grant number 2022-01281.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The authors declare that no Generative AI was used in the creation of this manuscript.

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmtec.2025.1601903/full#supplementary-material

References

Ala-Pietilä, P., Bonnet, Y., Bergmann, U., Bielikova, M., Bonefeld-Dahl, C., Bauer, W., et al. (2020). The assessment list for trustworthy artificial intelligence (ALTAI). Brussels: European Commission.

Google Scholar

Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., et al. (2019). “Guidelines for Human-AI interaction,” in CHI 2019: proceedings of the 2019 CHI conference on human factors in computing systems. New York: Association for Computing Machinery (ACM).

Google Scholar

Arinez, J. F., Chang, Q., Gao, R. X., Xu, C. Y., and Zhang, J. J. (2020). Artificial intelligence in advanced manufacturing: current status and future outlook. J. Manuf. Sci. Engineering-Transactions Asme 142 (11), 110804. doi:10.1115/1.4047855

CrossRef Full Text | Google Scholar

Assadi, A., Laussen, P. C., Goodwin, A. J., Goodfellow, S., Dixon, W., Greer, R. W., et al. (2022). An integration engineering framework for machine learning in healthcare. Front. Digital Health 4, 932411. doi:10.3389/fdgth.2022.932411

PubMed Abstract | CrossRef Full Text | Google Scholar

Booth, A., Booth, A., Sutton, A., Clowes, M., and Martyn-St James, M. (2022). Systematic approaches to a successful literature review. Los Angeles: SAGE.

Google Scholar

Bousdekis, A., Apostolou, D., and Mentzas, G. (2020). A human cyber physical system framework for operator 4.0 - artificial intelligence symbiosis. Manuf. Lett. 25, 10–15. doi:10.1016/j.mfglet.2020.06.001

CrossRef Full Text | Google Scholar

Bousdekis, A., Mentzas, G., Apostolou, D., and Wellsandt, S. (2024). “Assessing trustworthy artificial intelligence of voice-enabled intelligent assistants for the operator 5.0,” in IFIP international conference on advances in production management systems (Cham: Springer), 220–234.

Google Scholar

Broberg, O. (2011). “Enabling objects for participatory design of socio-technical systems,” in 18th international conference on engineering design (ICED). Glasgow: Design Society, 64–73. doi:10.48550/arXiv.2004.07213

CrossRef Full Text | Google Scholar

Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., et al. (2020). Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv Prepr. arXiv:2004.07213. doi:10.48550/arXiv.2004.07213

CrossRef Full Text | Google Scholar

Buchholz, J., Lang, B., and Vyhmeister, E. (2022). The development process of responsible AI: the case of assistant. IFAC Pap. 55, 7–12. doi:10.1016/j.ifacol.2022.09.360

CrossRef Full Text | Google Scholar

Castañé, G., Dolgui, A., Kousi, N., Meyers, B., Thevenin, S., Vyhmeister, E., et al. (2023). The ASSISTANT project: AI for high level decisions in manufacturing. Int. J. Prod. Res. 61 (7), 2288–2306. doi:10.1080/00207543.2022.2069525

CrossRef Full Text | Google Scholar

Collins, G. S., Dhiman, P., Navarro, C. L. A., Ma, J., Hooft, L., Reitsma, J. B., et al. (2021). Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. Bmj Open 11 (7), e048008. doi:10.1136/bmjopen-2020-048008

PubMed Abstract | CrossRef Full Text | Google Scholar

Colombo, S., and Costa, C. (2024). Can designers take the driver’s seat? A new human-centered process to design with data and machine learning. Des. J. 27 (1), 7–29. doi:10.1080/14606925.2023.2279835

CrossRef Full Text | Google Scholar

de Hond, A. A. H., Leeuwenberg, A. M., Hooft, L., Kant, I. M. J., Nijman, S. W. J., van Os, H. J. A., et al. (2022). Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: a scoping review. NPJ Digit. Med. 5 (1), 2. doi:10.1038/s41746-021-00549-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Dora, M., Kumar, A., Mangla, S. K., Pant, A., and Kamal, M. M. (2022). Critical success factors influencing artificial intelligence adoption in food supply chains. Int. J. Prod. Res. 60 (14), 4621–4640. doi:10.1080/00207543.2021.1959665

CrossRef Full Text | Google Scholar

Emmanouilidis, C., and Waschull, S. (2021). “Human in the loop of AI systems in manufacturing,” in Trusted artificial intelligence in manufacturing: a review of the emerging wave of ethical and human centric AI technologies for smart production. Boston–Delft: Now Publishers Inc., 158–172.

Google Scholar

Emmanouilidis, C., Waschull, S., Bokhorst, J. A. C., and Wortmann, J. C. (2021). “Human in the AI loop in production environments,” in Advances in production management systems: artificial intelligence for sustainable and resilient production systems, APMS 2021, pt IV. Editors A. Dolgui, A. Bernard, D. Lemoine, G. VonCieminski, and D. Romero, Cham: Springer, 331–342.

CrossRef Full Text | Google Scholar

Fogliato, R., Chappidi, S., Lungren, M., Fisher, P., Wilson, D., Fitzke, M., et al. (2022). “Who goes first? Influences of human-AI workflow on decision making in clinical imaging,” in Proceedings of the 2022 ACM conference on fairness, accountability, and transparency. New York: Association for Computing Machinery (ACM), 1362–1374.

CrossRef Full Text | Google Scholar

Getachew, M., Beshah, B., Mulugeta, A., and Kitaw, D. (2024). Application of artificial intelligence to enhance manufacturing quality and zero-defect using CRISP-DM framework. Int. J. Prod. Res., 1–25. doi:10.1080/00207543.2024.2407919

CrossRef Full Text | Google Scholar

Google (2024). People + AI guidebook. Available online at: https://pair.withgoogle.com/guidebook (Accessed November 1, 2024).

Google Scholar

Gu, H. Y., Liang, Y., Xu, Y. F., Williams, C. K., Magaki, S., Khanlou, N., et al. (2023). Improving workflow integration with xPath: Design and evaluation of a Human-AI diagnosis system in pathology. ACM Trans. Computer-Human Interact. 30 (2), 1–37. doi:10.1145/3577011

CrossRef Full Text | Google Scholar

Hadid, W., Horii, S., and Yokota, A. (2024). Artificial intelligent technologies in Japanese manufacturing firms: an empirical survey study. Int. J. Prod. Res. 63, 193–219. doi:10.1080/00207543.2024.2358409

CrossRef Full Text | Google Scholar

He, X., Zheng, X., Ding, H., Liu, Y., and Zhu, H. (2023). AI-CDSS design guidelines and practice verification. Int. J. Human-Computer Interact. 40, 5469–5492. doi:10.1080/10447318.2023.2235882

CrossRef Full Text | Google Scholar

Heyvaert, M., Hannes, K., and Onghena, P. (2017). Using mixed methods research synthesis for literature reviews. Los Angeles: SAGE.

Google Scholar

Ipektsidis, C., and Soldatos, J. (2021). Report on co-design workshops and focus groups-initial version. Available online at: https://star-ai.eu/deliverables (Accessed September 1, 2024).

Google Scholar

Kawakami, A., Coston, A., Zhu, H., Heidari, H., and Holstein, K. (2024). “The situate AI guidebook: co-designing a toolkit to support multi-stakeholder, early-stage deliberations around public sector AI proposals,” in The CHI conference on human factors in computing systems. New York: Association for Computing Machinery (ACM), 1–22.

CrossRef Full Text | Google Scholar

Kaymakci, C., Wenninger, S., and Sauer, A. (2021). “A holistic framework for AI systems in industrial applications,” in Innovation through information systems, vol ii: a collection of latest research on technology issues. Cham: Springer, 78–93. doi:10.1007/978-3-030-86797-3_6

CrossRef Full Text | Google Scholar

Kinney, M., Anastasiadou, M., Naranjo-Zolotov, M., and Santos, V. (2024). Expectation management in AI: a framework for understanding stakeholder trust and acceptance of artificial intelligence systems. Heliyon 10 (7), e28562. doi:10.1016/j.heliyon.2024.e28562

PubMed Abstract | CrossRef Full Text | Google Scholar

Lavin, A., Gilligan-Lee, C. M., Visnjic, A., Ganju, S., Newman, D., Ganguly, S., et al. (2022). Technology readiness levels for machine learning systems. Nat. Commun. 13 (1), 6039. doi:10.1038/s41467-022-33128-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Lee, J. (2020). Industrial AI: applications with sustainable performance. Singapore: Springer Singapore.

CrossRef Full Text | Google Scholar

Lee, C. P., Lee, M. K., and Mutlu, B. (2024). “The AI-DEC: a card-based design method for user-centered AI explanations,” in Proceedings of the 2024 ACM designing interactive systems conference (DIS). New York: Association for Computing Machinery (ACM), 1010–1028.

Google Scholar

Liu, X., Rivera, S. C., Moher, D., Calvert, M. J., Denniston, A. K., Ashrafian, H., et al. (2020). Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Lancet Digital Health 2 (10), e537–e548. doi:10.1016/s2589-7500(20)30218-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Madaio, M. A., Stark, L., Wortman Vaughan, J., and Wallach, H. (2020). “Co-designing checklists to understand organizational challenges and opportunities around fairness in AI,” in Proceedings of the 2020 CHI conference on human factors in computing systems. New York: Association for Computing Machinery (ACM), 1–14.

CrossRef Full Text | Google Scholar

Makarius, E. E., Mukherjee, D., Fox, J. D., and Fox, A. K. (2020). Rising with the machines: a sociotechnical framework for bringing artificial intelligence into the organization. J. Bus. Res. 120, 262–273. doi:10.1016/j.jbusres.2020.07.045

CrossRef Full Text | Google Scholar

Martínez-Plumed, F., Gómez, E., and Hernández-Orallo, J. (2021). Futures of artificial intelligence through technology readiness levels. Telematics Inf. 58, 101525. doi:10.1016/j.tele.2020.101525

CrossRef Full Text | Google Scholar

Merhi, M. I., and Harfouche, A. (2024). Enablers of artificial intelligence adoption and implementation in production systems. Int. J. Prod. Res. 62 (15), 5457–5471. doi:10.1080/00207543.2023.2167014

CrossRef Full Text | Google Scholar

Nair, M., Svedberg, P., Larsson, I., and Nygren, J. M. (2024). A comprehensive overview of barriers and strategies for AI implementation in healthcare: mixed-Method design. PLoS One 19 (8), e0305949. doi:10.1371/journal.pone.0305949

PubMed Abstract | CrossRef Full Text | Google Scholar

Nilsen, P. (2015). Making sense of implementation theories, models and frameworks. Implement. Sci. 10, 53. doi:10.1186/s13012-015-0242-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Papageorgiou, K., Theodosiou, T., Rapti, A., Papageorgiou, E. I., Dimitriou, N., Tzovaras, D., et al. (2022). A systematic review on machine learning methods for root cause analysis towards zero-defect manufacturing. Front. Manuf. Technol. 2, 972712. doi:10.3389/fmtec.2022.972712

CrossRef Full Text | Google Scholar

Park, S., Lee, J. Y., and Lee, J. (2024). AI system architecture design methodology based on IMO (Input-AI Model-Output) structure for successful AI adoption in organizations. Data Knowl. Eng. 150, 102264. doi:10.1016/j.datak.2023.102264

CrossRef Full Text | Google Scholar

Pokorni, B., Volz, F., Zwerina, J., and Hämmerle, M. (2021). “Development of a holistic method to implement artificial intelligence in manufacturing areas,” in Advances in intelligent systems and computing. Cham: Springer.

Google Scholar

Raji, I. D., Smart, A., White, R. N., Mitchell, M., Gebru, T., Hutchinson, B., et al. (2020). “Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing,” in Proceedings of the 2020 conference on fairness, accountability, and transparency. New York: Association for Computing Machinery (ACM), 33–44.

CrossRef Full Text | Google Scholar

Reddy, S., Allan, S., Coghlan, S., and Cooper, P. (2020). A governance model for the application of AI in health care. J. Am. Med. Inf. Assoc. 27 (3), 491–497. doi:10.1093/jamia/ocz192

PubMed Abstract | CrossRef Full Text | Google Scholar

Sandhu, S., Lin, A. L., Brajer, N., Sperling, J., Ratliff, W., Bedoya, A. D., et al. (2020). Integrating a machine learning system into clinical workflows: qualitative study. J. Med. INTERNET Res. 22 (11), e22421. doi:10.2196/22421

PubMed Abstract | CrossRef Full Text | Google Scholar

Schleith, J., and Tsar, D. (2022). “Triple diamond design process: human-Centered design for data-driven innovation,” in Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). Cham: Springer.

Google Scholar

Sendak, M. P., Ratliff, W., Sarro, D., Alderton, E., Futoma, J., Gao, M., et al. (2020). Real-World integration of a sepsis deep learning technology into routine clinical care: implementation Study. JMIR Med. Inf. 8 (7), e15182. doi:10.2196/15182

PubMed Abstract | CrossRef Full Text | Google Scholar

Steidl, M., Felderer, M., and Ramler, R. (2023). The pipeline for the continuous development of artificial intelligence models-current state of research and practice. J. Syst. Softw. 199, 111615. doi:10.1016/j.jss.2023.111615

CrossRef Full Text | Google Scholar

Subramonyam, H., Im, J., Seifert, C., and Adar, E. (2022). Human-AI guidelines in practice: leaky abstractions as an enabler in collaborative software teams. arXiv Prepr. arXiv:2207.01749. doi:10.48550/arXiv.2207.01749

CrossRef Full Text | Google Scholar

Tahvanainen, L., Tetri, B., and Ahonen, O. (2024). “Exploring and extending human-centered design to develop AI-Enabled wellbeing technology in healthcare,” in Digital health and wireless solutions, pt ii, ncdhws 2024. Cham: Springer, 288–306.

CrossRef Full Text | Google Scholar

Tong, A., Flemming, K., McInnes, E., Oliver, S., and Craig, J. (2012). Enhancing transparency in reporting the synthesis of qualitative research: ENTREQ. Bmc Med. Res. Methodol. 12, 181. doi:10.1186/1471-2288-12-181

PubMed Abstract | CrossRef Full Text | Google Scholar

van der Vegt, A. H., Scott, I. A., Dermawan, K., Schnetler, R. J., Kalke, V. R., and Lane, P. J. (2023a). Deployment of machine learning algorithms to predict sepsis: systematic review and application of the SALIENT clinical AI implementation framework. J. Am. Med. Inf. Assoc. 30 (7), 1349–1361. doi:10.1093/jamia/ocad075

PubMed Abstract | CrossRef Full Text | Google Scholar

van der Vegt, A. H., Scott, I. A., Dermawan, K., Schnetler, R. J., Kalke, V. R., and Lane, P. J. (2023b). Implementation frameworks for end-to-end clinical AI: derivation of the SALIENT framework. J. Am. Med. Inf. Assoc. 30 (9), 1503–1515. doi:10.1093/jamia/ocad088

PubMed Abstract | CrossRef Full Text | Google Scholar

Waschull, S., and Emmanouilidis, C. (2022). “Development and application of a human-centric co-creation design method for AI-enabled systems in manufacturing,” in IFAC-PapersOnLine. Amsterdam: Elsevier, 55 (2), 516–521.

CrossRef Full Text | Google Scholar

Waschull, S., and Emmanouilidis, C. (2023). “Assessing human-centricity in AI enabled manufacturing systems: a socio-technical evaluation methodology,” in IFAC-PapersOnLine. Amsterdam: Elsevier, 56 (2), 1791–1796.

CrossRef Full Text | Google Scholar

Weber, M., Engert, M., Schaffer, N., Weking, J., and Krcmar, H. (2023). Organizational capabilities for ai implementation—coping with inscrutability and data dependency in ai. Inf. Syst. Front. 25 (4), 1549–1569. doi:10.1007/s10796-022-10297-y

CrossRef Full Text | Google Scholar

Wenderott, K., Krups, J., Luetkens, J. A., and Weigl, M. (2024). Radiologists' perspectives on the workflow integration of an artificial intelligence-based computer-aided detection system: a qualitative study. Appl. Ergon. 117, 104243. doi:10.1016/j.apergo.2024.104243

PubMed Abstract | CrossRef Full Text | Google Scholar

Wiens, J., Saria, S., Sendak, M., Ghassemi, M., Liu, V. X., Doshi-Velez, F., et al. (2019). Do no harm: a roadmap for responsible machine learning for health care. Nat. Med. 25 (9), 1337–1340. doi:10.1038/s41591-019-0548-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Wohlin, C. (2014). “Guidelines for snowballing in systematic literature studies and a replication in software engineering,” in Proceedings of the 18th international conference on evaluation and assessment in software engineering. New York: Association for Computing Machinery (ACM), 1–10.

CrossRef Full Text | Google Scholar

Yamamoto, Y., Munoz, A. A., and Sandström, K. (2024). Challenges in designing a Human-centred AI system in manufacturing. Int. J. Mechatronics Manuf. Syst. 17 (4), 351–369. doi:10.1504/ijmms.2024.10068969

CrossRef Full Text | Google Scholar

Yildirim, N., Oh, C., Sayar, D., Brand, K., Challa, S., Turri, V., et al. (2023). “Creating design resources to scaffold the ideation of AI concepts,” in Designing interactive systems conference, dis 2023. New York: Association for Computing Machinery (ACM).

Google Scholar

Zajac, H. D., Li, D., Dai, X., Carlsen, J. F., Kensing, F., and Andersen, T. O. (2023). Clinician-facing AI in the wild: taking stock of the sociotechnical challenges and opportunities for HCI. ACM Trans. Computer-Human Interact. 30 (2), 1–39. doi:10.1145/3582430

CrossRef Full Text | Google Scholar

Zicari, R. V., Ahmed, S., Amann, J., Braun, S. A., Brodersen, J., Bruneault, F., et al. (2021a). Co-Design of a trustworthy AI system in healthcare: deep learning based skin lesion classifier. Front. Hum. Dyn. 3, 688152. doi:10.3389/fhumd.2021.688152

CrossRef Full Text | Google Scholar

Zicari, R. V., Brodersen, J., Brusseau, J., Düdder, B., Eichhorn, T., Ivanov, T., et al. (2021b). Z-Inspection®: a process to assess trustworthy AI. IEEE Trans. Technol. Soc. 2 (2), 83–97. doi:10.1109/TTS.2021.3066209

CrossRef Full Text | Google Scholar

Keywords: AI system, manufacturing, human-centric AI, development framework, socio-technical systems

Citation: Yamamoto Y, Aranda-Muñoz Á and Sandström K (2025) A development framework for human work integrated AI systems in manufacturing. Front. Manuf. Technol. 5:1601903. doi: 10.3389/fmtec.2025.1601903

Received: 28 March 2025; Accepted: 31 October 2025;
Published: 18 December 2025.

Edited by:

Yee Mey Goh, Loughborough University, United Kingdom

Reviewed by:

Diana Segura, Loughborough University, United Kingdom
Devbrat Anuragi, Tampere University, Finland
Ahmed Farooq, University of Tampere, Finland
Ahmed Farooq, Tampere University Devbrat Anuragi, Tampere University, in collaboration with reviewer AF

Copyright © 2025 Yamamoto, Aranda-Muñoz and Sandström. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yuji Yamamoto, eXVqaS55YW1hbW90b0BtZHUuc2U=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.