Deep learning – enabled visual computing in construction: application and digital technology integration

Perera, Prasad; Perera, Srinath; Jin, Xiaohua; Rashidi, Maria; Nanayakkara, Samudaya; Yazbek, Gina; Yazbek, Andrew

doi:10.3389/fbuil.2025.1655847

REVIEW article

Front. Built Environ., 01 September 2025

Sec. Construction Management

Volume 11 - 2025 | https://doi.org/10.3389/fbuil.2025.1655847

Deep learning – enabled visual computing in construction: application and digital technology integration

Prasad Perera ¹*

Srinath Perera ¹

Xiaohua Jin ¹

Maria Rashidi ¹

Samudaya Nanayakkara ¹

Gina Yazbek²

Andrew Yazbek²

¹ Centre for Smart Modern Construction, Western Sydney University, Kingswood, NSW, Australia
² Commnia Pty Ltd., Sydney, NSW, Australia

The rapid advancement of Artificial Intelligence (AI) and the integration of digital technologies present transformative opportunities to improve productivity, safety, and efficiency in construction project management. This study is based on the Systematic Reviews and Meta-Analysis extension for Scoping Review (PRISMA-ScR), and 144 research articles were reviewed. The application of deep learning (DL)-enabled visual computing (VC) in construction is investigated, and a comprehensive analysis of the technological application and the DL models is conducted. While prior reviews surveyed computer vision in construction broadly, this study’s systematic review focused exclusively on deep learning-enabled VC and its integration with eight digital technologies through a comprehensive mapping of algorithm trends, application domains, and real-world integration challenges. The systematic analysis reveals five primary application domains: Object Detection (33%), Construction Safety (28%), Damage Detection (22%), Construction Quality (9%), and Productivity Analysis (8%). Additionally, the integration of DL-enabled VC with emerging digital technologies such as Automatic Construction Robotics, Unmanned Ground Vehicles, Unmanned Aerial Vehicles, LiDAR, Building Information Modelling, Blockchain, Intelligent Internet of Things, and Digital Twin in construction applications is reviewed extensively. An in-depth analysis of the DL algorithms and models deployed for applications revealed annual trends while illustrating the prominence of Convolutional Neural Networks and their derivatives, such as YOLO, R-CNN, Mask R-CNN, Faster R-CNN, SSD, U-Net, VGG, etc. Finally, the research identified gaps in existing research, proposing directions for prospective investigations of research gaps in areas such as real-world scalability, data quality, and ethical considerations, focusing on future work in explainable AI, edge computing, and privacy-preserving VC.

1 Introduction

The construction sector is vital for any nation to support the development of infrastructure, which results in overall economic development while contributing 13% of the Gross Domestic Product (GDP) (Hoedemaekers, 2024; Nisa and Khalid, 2024). Further, it is contributing 7% to job creation and the development of infrastructure (Opoku et al., 2021; Rodrigo et al., 2024). The widespread adoption of modern construction processes served as a catalyst for the development of the modern construction industry, facilitating extensive projects in housing, industry, transport, and city development (Kaya and Dikmen, 2024). However, the construction industry is plagued with multiple challenges, such as poor project performance, quality-related issues, safety concerns, project delays, and cost overruns (Yap et al., 2019). The isolation of processes, complicated stakeholders and the temporary nature of construction projects have increased the level of issues, resulting in delivering large construction projects being challenging (Wuni et al., 2024).

In recent years, there has been a notable surge in the utilisation of technology for construction. Digital transformation has promised a revolution in construction (Perera S. et al., 2023; Begić et al., 2022; Hewavitharana et al., 2025). Importantly, the stakeholders and policymakers have considered digitalisation as an important solution for the challenges in the construction industry, resulting in a strong interest in supporting digitalisation within the industry (Pal et al., 2024; Kaya and Dikmen, 2024). Digital technologies such as Building Information Modelling (BIM), Virtual Reality (VR), Augmented Reality (AR), Extended Reality (XR), Blockchain, Internet of Things (IoT), Digital Twins, Artificial Intelligence (AI), Big Data (BD), Cloud Computing (CC), Geographic Information System (GIS), Unmanned Aerial Vehicles (UAV), Unmanned Ground Vehicles (UGV), Terrestrial Laser Scanning (TLS) and Robotics can be integrated to create robust solutions for the construction industry by enhancing productivity, sustainability, efficiency, and safety (Baduge et al., 2022; Perera et al., 2020; Mohammadi et al., 2023; Pan and Zhang, 2021; Sánchez et al., 2024; Musarat et al., 2024). The use of Blockchain in construction for enhanced trust and transparency is discussed by multiple scholars (Hewavitharana et al., 2023; Perera et al., 2020; Nanayakkara et al., 2021). Ali et al. (2022) and Sompolgrunk et al. (2023) had presented the integration of multiple digital technologies with BIM. BIM is implemented in modern construction projects to achieve integration and efficiency (Greenwood et al., 2010; Cepa et al., 2023; Zhu et al., 2023; Corrado et al., 2023).

Bahoo et al. (2023) defined AI as the ability of a software system to interpret data and influence hardware to improve decision-making, problem-solving, innovativeness, and adaptation. The layered architecture of Deep Neural Networks (DNNs) simulates the feature extraction and classification capability of the human brain (Pouyanfar et al., 2018; Akinosho et al., 2020; Janiesch et al., 2021). The application of Artificial Intelligence in construction was worth 429.2 million USD in 2018, and the projection for 2026 is to be 4.51 billion USD. 63% of the total market share comprised Machine Learning (ML) and DL algorithms/models (Reports and Data, 2019). While automation and digitalisation offer undeniable benefits to the construction industry, including enhanced performance and efficiency, this technological revolution concurrently creates novel challenges and risks (Salami Pargoo and Ilbeigi, 2023; Hewavitharana et al., 2021).

The utilisation of digital technologies applies to the entire life-cycle of construction processes and has generated a substantial amount of multidimensional data, which necessitates analysis through big data analytics (Nanayakkara et al., 2015; Lu et al., 2025). Moreover, the availability of refined data volumes has facilitated the implementation of data-driven AI applications within the construction industry (Li et al., 2023). Similarly Jan et al. (2023) and Das et al. (2023) highlighted the emergence of AI in conjunction with the vast amount of data acquired through modern digital technologies, which have emerged as crucial components of the cyberphysical systems that form the foundation of the fourth industrial revolution (I4.0). Adaptation of AI techniques possesses the potential to surpass conventional digital technologies in delivering enhanced technical solutions for complex construction industry challenges, thereby contributing to the achievement of desired sustainability goals (Collins et al., 2021; Moragane et al., 2022; Perera et al., 2025a). Establishing similar arguments, a report from Ernst and Young (2021) had presented AI as the new Frontier of digitalisation in the construction industry. Kor et al. (2022), Mondal and Chen (2022) and Nyokum and Tamut (2025) had strengthened the argument by stating that AI and related technologies are best suited for the resolution of uncertainties in the construction industry. AI embedded systems are widely available and have a high usage among users, and explainable AI technologies are making such systems trustworthy (Perera et al., 2025b). However, despite the promising capabilities of AI, the construction industry has yet to fully leverage its transformative potential, hindered by various existing challenges, including data silos and quality issues, workforce skills shortages, interoperability and standardisation gaps, high implementation costs, and stakeholder resistance, which continue to hinder the transformative potential of AI in construction. (Abioye et al., 2021; Elghaish et al., 2022b).

This systematic review aims to analyse the application of DL-driven VC technologies in the construction industry. It is important to evaluate the impact on construction project management systematically by analysing the trends, gaps, and challenges. In the evaluation of the impact of DL-driven VC in the construction sector, key performance indicators (KPIs), namely, efficiency enhancement, safety enhancement, quality enhancement, and productivity enhancement, were identified, and the application categorisation was developed accordingly. While the importance of visual computing in construction is increasingly recognised, existing review literature often surveys traditional computer vision techniques or focuses on specific, isolated applications. Therefore, conducting this systematic review is essential to provide insights into the current state and future directions of DL-driven VC in the construction sector. Furthermore, it systematically maps the evolving landscape of DL algorithms used in VC for construction, highlighting prominent models and annual trends, by identifying the specific integration points of DL-enabled VC with various digital technologies.

The objectives of the review study are to investigate the current applications of deep learning–enabled visual computing in construction, to examine these applications integrated with digital technologies, and lastly to explore gaps that remain for future research and industry adoption. The paper is structured to provide a comprehensive outline of the results presented. A comprehensive literature review follows the introduction. The methodology section of the paper outlines the method used to conduct the systematised review, including the search strategy. The results section visualises the extracted findings and analysis. The discussion section systematically presents the findings, focusing on the opportunities, trends, gaps, and challenges in the application of DL-driven VC in the construction industry.

2 Deep learning enabled visual computing

Computer Vision (CV) and Image Processing (IP) a segments of technology achieving rapid growth of innovation in multiple disciplines such as medical applications, smart device applications, industrial applications, video monitoring, intelligent transportation, remote sensing, military applications, etc (Lepcha et al., 2023). Visual Computing (VC) comprises CV and IP techniques; that is, VC is the broad field of extracting information from visual data. Within VC, CV focuses on interpreting image and video content, while IP enhances and manipulates raw images to support analysis. (Xu et al., 2021). VC applications are based on extracting information from the input images or videos for meaningful interpretation using AI (Ji et al., 2023; Xiong and Tang, 2021). Theoretically, VC is an interdisciplinary technology related to the automated extraction of suitable information from visual input data for understanding or representing the physical world, either qualitatively or quantitatively (Spencer et al., 2019; Perera P. et al., 2023).

Visual data capturing in construction projects for a variety of purposes has resulted in image datasets consisting of 400,000 images per large-scale project (Paneru and Jeelani, 2021). The high availability of visual data, coupled with advancements in AI technologies, has significantly enhanced the feasibility of computer vision-based applications, leading to a projected enhancement in the accuracy of potential deployments (Ekanayake et al., 2021; Khallaf and Khallaf, 2021; Hamledari et al., 2017). DL-driven VC is transforming the technological landscape with their layered architecture of extracting features of visuals (Fan, 2023; O’Mahony et al., 2020; Delhi et al., 2020).

The performance related to accuracy, robustness, and scalability of DL algorithms and models in various computer vision tasks, such as image classification, object detection, image retrieval and semantic segmentation, is higher compared to the traditional CV techniques (Chai et al., 2021; Liu et al., 2021; Nomura et al., 2022). The affordable computing power and the availability of related hardware have made the use of DL for VC a possibility. Specific hardware devices, such as GPUs with parallel multicore systems, have facilitated the resource-intensive computer vision application as a possibility (Afif et al., 2020; Inazumi et al., 2020).

Goswami (2018), Chen S. et al. (2023), Elghaish et al. (2022c) and Elghaish et al. (2022a) had identified the wide use of VC in the Architecture, Engineering, and Construction (AEC) industry while Zhao et al. (2024) highlighted the capability to deliver advanced solutions when compared to sensor-based technologies. However, Pal and Hsieh (2021), Ekanayake et al. (2022) and Hamledari et al. (2017) had identified a few limitations of DL-driven VC, such as changing viewpoints, highly cluttered spaces, obstructions, and varied illumination conditions. Further achromatic characteristics of objects, such as studs and electrical outlets, can result in poor detection, while the small size of some items has increased this complexity (Mohammed Abdelkader, 2022).

The following methodology section of the study highlights the systematised approach applied to the comprehensive review of the application of DL-driven VC.

3 Methodology

The study adopts an approach that follows a systematic literature review (SLR) to provide a comprehensive understanding of the use of DL-driven VC applications in the construction industry. Further, it has adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) to conduct a transparent and replicable SLR in order to extrapolate the findings of the study (Das et al., 2023).

A systematic review presents a summary of previous studies in the specific field of search and enables the identification of knowledge gaps in the published papers (Watson and Webster, 2020; Mishra and Mishra, 2023). Eriqat et al. (2024) have stated that a systematic literature review should follow an explicit methodology detailing the procedure used to be reproducible and methodical, which heavily relies on the exploitation of prominent research databases, namely, Web of Science and Scopus. Further, this systematic review adheres to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, (Hijriyah et al., 2024). PRISMA guidelines will ensure the reproducibility of the analysis and the results while ensuring transparent reporting of the study (Mishra and Mishra, 2023).

The quality assessment of the articles filtered through the systematic review is essential to ensure transparency, methodological rigour, completeness of technical detail, and reproducibility (Mushtaha et al., 2025). The methodical process was based on analysing the technological significance driven by AI and DL to resolve the challenges faced by the construction industry is a key purpose of the study. DL-driven technologies have made specific algorithms and models to support more efficient and accurate computer vision and image processing applications, offering unprecedented capabilities for analysing and interpreting visual data in novel solutions. Hence, the analysis of the significant DL algorithms and models is performed. The final analysis would encompass the analysis of the integration of multiple digital technologies with DL-driven visual computing.

3.1 Stage 1: literature search

The process consists of a three-stage search and filtration process of academic journal papers based on specific search criteria. The specifically developed search criterion aimed to assess and examine the specific academic content, based on the keywords related to the objectives and the research questions (Ogunmakinde et al., 2024).

The literature search was performed in April 2024 on the two leading publication databases, Scopus and Web of Science. These databases were selected because of the high level of accuracy and the broad coverage of publications related to the construction industry, and the comprehensive coverage and indexing quality (Das et al., 2023; Yadav et al., 2023). Moreover, Web of Science and Scopus stand as two leading and opposing citation databases (Pranckutė, 2021). In recent times, there has been a notable surge in scholarly publications within this field, indexed by the Web of Science and Scopus citation databases (Zhu and Liu, 2020).

A Boolean search string was developed by considering the specific search requirement of the Boolean searching string, while the keywords were selected through pilot searches, cross-checked against terms used in recent systematic reviews, as follows. Further, the complete research process is illustrated in Figure 1.

Figure 1

Flowchart depicting a literature review process in three stages. Stage 1, literature search, uses Scopus and Web of Science to identify 330 records, reducing to 274 after removing duplicates. Stage 2, literature selection, excludes unrelated studies, leaving 262, and further examines abstracts and keywords to select 236, ultimately resulting in 144 relevant records. Stage 3, review process, involves establishing algorithms and models, deep learning applications in visual computing for construction, and digital integrations.

Figure 1. The systematic literature review and the research process.

(((“construction” OR “building”) AND (“industry” OR “site” OR “infrastructure”)) OR (“built environment”)) AND ((“deep learning” OR “artificial intelligence”) AND (“visual computing” OR “image processing” OR “computer vision”))

A comprehensive search was performed on the two databases under the “article title/abstract/keyword” fields using the Boolean searching string. The exclusion criteria were developed to refine the outcome through the final search operation. A specific exclusion process was performed by considering only the last 5 years of research articles (2020-2024) in order to ensure the review reflects the most current and relevant research in DL-enabled VC. This approach also allows us to avoid including outdated pre-DL techniques that are less representative of contemporary practice (Palmatier et al., 2018; Paul and Criado, 2020). Further, review articles were excluded to ensure inclusion of only primary research with full methodological detail, preventing duplication of findings. The category of research was selected to be Engineering, Construction and Building-related content. The language of the articles was limited to English.

Although there may be some overlap among the keywords in sets, including them ensures a larger number of papers for analysis compared to similar reviews. The objective was to make the keywords collectively exhaustive, even if they were not entirely mutually exclusive (Jacobsen and Teizer, 2022).

Searching for articles in the two databases resulted in 330 publications. The duplicated articles were removed after creating a single record of articles. Once the duplications were removed, the resulting record was composed of 274 articles. A primary screening was performed to confirm the relevance of the complete set of articles.

3.2 Stage 2: literature selection

The second phase of the filtering process was initiated with a comprehensive examination of the abstract and the keywords of the articles. This filtering involved examining the abstracts of the publications to identify topics that fell outside the scope of the review and the removal of review articles. The process of screening involved the assessment of technological adherence to visual computing technologies and applications, namely, image and video processing, computer vision, and computer graphics. The review of keywords and abstracts was performed by two independent reviewers. Applications related to visualisation, VR, and AR were also selected due to the relevance of computer graphics for virtualisation.

3.3 Stage 3: review process

Stage 3 consists of a detailed analysis of the research articles. During the final screening, only the studies with a VC application were considered, and it was examined whether the article had been completed and reported the detailed technical application, including the algorithms used for the application. The final screening had incorporated a methodical bias elimination tool adapted from Mushtaha et al. (2024b). Furthermore, a quality assessment was performed on the final set of included articles to evaluate their methodological rigour and potential for bias. The final screening resulted in 144 articles for the detailed analysis.

4 Results

The number of annual publications in DL-based VC applications could be used as an indicator of possible technical applications and their technical feasibility. Mushtaha et al. (2024a) had highlighted the importance of analysing statistics to derive outcomes of a review study, hence a comprehensive analysis in performed. The transformation from traditional computer vision techniques to DL-based techniques illustrates a notable upsurge in the visualisation of Figure 2. Figure 2 illustrates the annual distribution of research articles and highlights significant growth and increasing research interest in this domain. This exponential growth is attributed to several factors, including the increasing accessibility and computational power of Graphics Processing Units (GPUs), the maturation and open sourcing of powerful DL frameworks and the growing recognition of DL’s potential to address complex visual challenges in dynamic construction environments. This trend highlights the rapid shift from traditional computer vision to DL-driven approaches within the sector.

Figure 2

Bar chart depicting the yearly distribution of research articles on deep learning-driven VC applications from 2020 to 2024. Article counts: 2020 (14), 2021 (26), 2022 (32), 2023 (48), 2024 (24).

Figure 2. Yearly distribution of Research Articles on DL-driven VC Applications.

Subsequently, a final set of literature articles was investigated for two crucial parameters: the specific type of deep neural network employed by the authors and the application purpose. The determination of these parameters involved an extensive analysis of the metadata presented in the articles, along with a thorough examination of both the abstracts and full papers. Recurring themes and common objectives of the DL applications were identified and grouped. The robustness of these categories was further validated through keyword co-occurrence analysis (as depicted in Figure 3), which confirmed the strong associations between the identified themes and the research focus of the articles within each category.

Figure 3

Network visualization of interconnected terms related to

Figure 3. Keywords co-occurrence network of the Research Articles.

It was identified that the DL-based VC applications can be separated into six main subcategories after a thorough investigation. The identified major categories are as follows.

1. Construction Quality

2. Construction Safety

3. Damage Detection

4. Object Detection

5. Productivity Analysis

The distribution of application purposes is shown in Table 1, the percentages of applications are presented against the DL-based VC application category.

Table 1

Table 1. The purpose of the Application of DL-based VC.

Figure 4, illustrates the same information by visualisations to interpret trends in application focus. Subsequently, an annual breakdown is provided to conduct a detailed analysis of trends across various application domains.

Figure 4

Pie chart showing the distribution of purposes for applying DL-based VC. Construction Quality is 8%, Construction Safety is 9%, Damage Detection is 22%, Object Detection is 28%, and Productivity Analysis is 33%.

Figure 4. The purpose of the Application of DL-based VC.

Figure 5 showcases the annual progression of DL-based VC application categories. By examining Figures 4, 5, it becomes evident that Object Detection Applications and Construction Safety-focused Applications exhibit notable prominence with a notable surge.

Figure 5

Figure 5. Annual distribution of the purpose of Applications of DL-based VC.

Keywords co-occurrence networks were constructed as part of the scientometric inquiry. Keywords serve to encapsulate the thematic essence of research articles while aiding in their indexing. By mapping all keywords, a comprehensive overview of domain-specific knowledge is obtained (Bukar et al., 2023). The selection of the VOSviewer® software tool was predicated on its proficiency in generating, visualising, and leveraging bibliometric networks (Waltman, 2023). Figure 3 illustrates the keywords co-occurrence network of the Research Articles in a graphical format.

Through keyword analysis, it becomes apparent that numerous terms contribute to comprehending the technological and application landscape of DL-driven VC in the construction industry. Examination of application types reveals the prominence of construction safety applications, as evidenced by keywords such as “personal protective equipment,” “accidents,” “safety,” “worker,” and “occupational risks.” Additionally, object detection emerges as a crucial aspect in VC applications, demonstrated by keywords like “object recognition,” “construction equipment,” “classification,” “identification,” “detection models,” “feature extraction,” “image segmentation,” and “recognition.” Moreover, damage detection garners attention through keywords such as “structural health monitoring” and “concrete.” DL-based analysis of construction productivity is delineated by keywords like “performance” and “productivity” alongside “excavation.” Other noteworthy keywords, such as “convolutional neural networks,” “BIM”, “cameras,” “neural networks,” and “detection models,” emphasise the technological integration of DL and VC. Notably, Convolutional Neural Networks (CNN) assume a pivotal role in driving VC based on DL, while BIM exhibits clear integration with DL-driven VC applications.

Analysis of keywords revealed the dominance of CNN models for VC applications. A thorough examination of selected articles was conducted to identify the specific DL algorithms and models employed. Certain articles consist of ensembled DL algorithms to devise DL models incorporating diverse DL algorithms to be used for specific applications. CNN is the most utilised and optimised DL model based on the application type and the characteristics of the dataset. The possibility of adjusting the Kernel, Convolutional layers and Pooling layers has resulted in a vast amount of DL algorithms for optimised applications (Alzubaidi et al., 2021). The following Figures 6, 7 were generated to illustrate and analyse the range of CNN algorithms and their frequency of applications. Figure 6 was generated by integrating multiple dimensions, including the year of publication and the DL model utilised, facilitating a focused exploration of DL model trends and particulars. A further analysis was conducted on the prominence of various DL algorithms used for the VC applications. Figure 7 consists of the total number of articles with different DL models and algorithms.

Figure 6

Figure 6. Yearly Distribution of Deep Learning Algorithms used in Applications.

Figure 7

Figure 7. Distribution of Deep Learning Algorithms used in DL-based VC Applications.

CNNs are a type of feedforward neural network that are designed specifically to process data in the form of pixels. This architecture is well-suited for grid-like data, such as time series and image data. The primary feature that distinguishes CNNs from other types of artificial neural networks (ANNs) is the presence of convolution layers, which gives them their name (Han et al., 2022). Overall, CNNs offer a powerful and flexible tool for visual data processing and analysis, which can outperform traditional ANNs in many applications (Al-Shboul et al., 2023; Thangarajan and Chokkalingam, 2021).

Figure 6 illustrates the Deep Learning algorithms used for the applications per year. A summarised visualisation is presented in Figure 7 below.

Figures 6, 7 provide a key insight by illustrating the emergence of single-stage detector CNN algorithms such as YOLO and SSD. Further double-stage detector CNN algorithms are also being utilised and improved. CNN double-stage detector algorithms, such as R-CNN models, are improved further to create fast R-CNN and faster R-CNN models. In-depth analysis illustrates that the creation of application-specific CNN models, such as “Point-Net” and “Alex-Net”, by modifying the layers of the CNN model. Further, these two visual illustrations depict the diverse algorithms developed for specific tasks and applications.

5 Findings and discussion

The primary focus of this study revolves around the utilisation of VC applications facilitated by DL algorithms/models in the construction industry. The application types of visual computing-based applications were isolated for in-depth analysis, as depicted in Figure 4. The impact on multiple construction processes was assessed in detail, encompassing aspects like efficiency, safety, quality, productivity, and sustainability. Figure 4 highlights the prevalence of construction safety, damage detection, and object detection as the predominant types of VC applications based on DL within the construction industry. Despite consistent enforcement of worksite safety protocols, the construction industry remains plagued by a disproportionately high rate of accidents and casualties, earning its infamous label as one of the world’s most hazardous sectors (Rahnamayiezekavat et al., 2024). Hence, the impact of DL-based VC applications would play a vital role in resolving key issues faced by the construction industry.

When considering the publication year of each study, as depicted in Figure 2, it becomes evident that the utilisation of DL-powered VC in the construction industry is experiencing a notable upsurge. Over the past 5 years, there has been a substantial increase in research and development, and this is attributed to advancements in relevant hardware, which have made the practical implementation of high-tech VC applications feasible. The rise in computing power, the availability of high-precision image and video-capturing cameras, and the significant refinement of DL algorithms have collectively opened new avenues for extensive research and development in this field.

The findings of the study primarily centred on the algorithms and applications related to deep learning in visual computing. It became evident that most visual computing applications employing deep learning techniques were characterised by technological integrations. Notably, research conducted by Pal et al. (2022), Wang and Hu (2022), Pizarro et al. (2022), and Yang and Cai (2023) emphasised the integration of visual computing technologies with BIM, DT, IoT, AR and Robotics.

A detailed analysis was performed on the integration of digital technologies with DL-driven VC applications. A tabular presentation is prepared as Table 2 to illustrate the depth and breadth of the feasible technological integrations.

Table 2

Table 2. Summary of integration of digital technologies and DL-driven VC.

The upward trajectory of DL-driven VC is evident, as it increasingly addresses challenges encountered in the construction industry through multiple digital technology integrations. Above Table 1 is crucial for understanding the multidisciplinary nature of DL-enabled VC in construction. Industry practitioners can leverage DL-VC for proactive safety monitoring, automated defect detection, and progress tracking. Policymakers can integrate these insights into regulatory frameworks for improved compliance and safety enforcement. The analysis illustrates integration points with key digital technologies and VC. Further, it identifies the corresponding research gaps that impede their full potential and suggests future work directions for each, thereby directly addressing the complexities of real-world implementation and guiding future research efforts. However, it is important to identify the Challenges that remain in scaling DL-VC due to high computing demands, lack of standardised datasets, and interoperability barriers, while workforce readiness is one of the major challenges. Addressing these is critical for real-world adoption. Furthermore, this systematic review provides a comprehensive comprehension of DL-enabled VC within the construction sector, facilitating informed decision-making and industry progress.

6 Research gaps and future directions

The review study delivers empirical evidence portraying the expansion and progression of artificial intelligence (AI)-based applications custom-tailored for the construction sector. The initiation of the DL-based VC has guided substantial opportunities in research and development, resulting in innovative solutions across diverse construction-related domains. This notable advancement can be credited to significant strides in related hardware technologies, which have facilitated the practical implementation of sophisticated VC applications. The augmentation in computational ability, alongside the availability of high-definition image and video-capturing devices, coupled with the extensive refinement of DL algorithms, collectively broadens the horizons for extensive research and development in this domain.

Subsequent research endeavours should be focused on key areas such as construction efficiency enhancement, enhancement of safety, ensuring construction quality, enhancement of productivity of construction methods, and the innovative and adaptive techniques to integrate digital technologies with DL-driven VC. Further, challenges are associated with integrating DL-enabled VC into construction practices and exploring potential avenues for further research and development. It is important to explore the most suitable DL algorithms and models for VC applications. A complementary area of investigation could centre on optimising application performance by fine-tuning DL algorithms through parameter and hyperparameter adjustments to strengthen overall model efficiency. Furthermore, the potential of ensemble learning, wherein two DL algorithms are amalgamated to harness the best attributes of each, holds promise for advancing the accuracy of VC applications.

Conversely, several challenges associated with the application of VC techniques are prominent in the construction sector. Several immediate challenges could be identified, lack of benchmark datasets tailored to construction environments could be identified, which limits the DL model validation. In DL-based applications, the significance of high-quality data plays a pivotal role in enhancing the performance of DL models for visual applications. Fortunately, the construction industry has witnessed extensive digitalisation, resulting in the generation of significant amounts of visual datasets. The inherent complexity of construction sites distorts visual input data. Particularly, lighting conditions and disturbances have rendered capturing quality visual inputs challenging for training and testing DL-driven VC models.

To mitigate these challenges, several approaches could be adopted in DL-driven VC for construction. Data augmentation and synthetic data generation enhance the diversity of training datasets, although they may not fully replicate the complexity of real-world construction environments. High Dynamic Range (HDR) imaging and illumination-invariant feature extraction help address variability in lighting conditions, though they often require additional hardware and preprocessing effort. Domain adaptation and transfer learning enable models trained on large generic datasets to adapt effectively to construction-specific contexts, though risks of bias transfer remain.

Further, the testing and deployment of developed DL-VC models could be highlighted due to the hardware limitations and optimisation challenges. The willingness of the construction industry to invest substantial resources in enhancing computing infrastructure to implement DL-driven VC is a concern. Being among the least digitalised industries, its perspective on embracing this technological shift will be pivotal in determining the use of these novel applications. Development of lightweight DL models optimised for edge devices could be considered as an immediate solution, although such models may sacrifice some accuracy compared to full-scale architectures. Limited integration between DL-VC solutions and existing digital platforms such as BIM, IoT systems, and digital twins could be identified as a key immediate challenge. The development of other digital technologies has provided multi-dimensional integrations for innovative solutions in areas most challenging in the construction industry. Addressing these gaps is critical for enabling practical, on-site implementation in the near term.

Based on the comprehensive analysis of the current landscape of DL-driven VC in construction, several critical research gaps and promising avenues for future exploration have been identified. These concerns and directions are beyond the immediate challenges. The long-term research directions should be focused on the development and adaptation of explainable AI and ethical frameworks to govern data usage, algorithmic fairness, and accountability. Fostering broad industry adoption requires a human-centric approach, developing explainable AI interfaces and intuitive tools that enhance transparent insights into model decisions, thereby improving trust among the stakeholders.

Long-term research and development should be focused on enabling real-time, on-site analytics. Research must prioritise the development of lightweight and efficient Convolutional Neural Network models specifically optimised for deployment on edge devices. These advanced model compression techniques and hardware-aware architectures are critical to optimise the current processing limitations and reduce latency. Further, addressing significant data silos necessitates dedicated efforts towards cross-technology interoperability and data integration. This involves establishing standardised Application Programming Interfaces (APIs), Common Data Environments (CDEs), and unified frameworks to create a more cohesive digital ecosystem that seamlessly integrates with existing visual computing systems.

Further, future research should explore the development of comprehensive guidelines and best practices for ethical AI deployment within the construction industry is essential. Privacy-preserving visual computing approaches, such as federated learning and secure data-sharing mechanisms, will be vital to protect sensitive construction data while supporting large-scale adoption. Finally, the challenge of data scarcity and generalisation for robust deep learning model training demands innovative solutions. Future studies should focus on advanced data augmentation techniques, transfer learning from diverse datasets, and the creation of large-scale, publicly accessible benchmark datasets tailored to construction environments. Additionally, research into synthetic data generation utilising advanced generative models could effectively mitigate data limitations and enhance model generalisation capabilities. Addressing these areas is essential for the transition of DL-driven VC from research prototypes to widespread, impactful industry adoption.

It was noted that the adoption of DL-VC technologies demonstrates considerable variability across regions. Developed economies are making significant investments in robotics, BIM integration, and digital twin technologies, enabling advanced DL-VC applications to be integrated. In contrast, developing regions often face barriers such as inadequate digital infrastructure, limited financial resources, and insufficient training of construction professionals. These disparities highlight the importance of tailoring DL-VC research and deployment strategies to regional contexts, ensuring equitable global diffusion of technological benefits in applications of DL-VC technologies.

In conclusion, the integration of AI-based applications is increasingly shaping the technological transformation of human lives. It is crucial to realise the significance of DL-based visual computing applications in the future of digitalisation within the construction sector for future transformations.

Author contributions

PP: Formal Analysis, Investigation, Methodology, Writing – original draft, Writing – review and editing. SP: Conceptualization, Project administration, Supervision, Writing – review and editing. XJ: Supervision, Writing – review and editing. MR: Supervision, Writing – review and editing. SN: Supervision, Writing – review and editing. GY: Resources, Writing – review and editing. AY: Resources, Writing – review and editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The publication fee was provided by the School of Engineering, Design and Built Environment, Western Sydney University.

Conflict of interest

Authors GY and AY were employed by Commnia Pty Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that Generative AI was used in the creation of this manuscript. Generative AI was used solely to assist with language editing, improving clarity, and refining the structure of the text. All intellectual content, research insights, and interpretations are original and solely the work of the author(s).

Any alternative text (alt text) provided alongside figures in this article has been generated by Frontiers with the support of artificial intelligence and reasonable efforts have been made to ensure accuracy, including review by the authors wherever possible. If you identify any issues, please contact us.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

A, V. T. M., Alexander, B., Florian, N., André, B., Heiko, B., and Denis, W. (2021). Recognition of temporary vertical objects in large point clouds of construction sites. Proc. Institution Civ. Eng. - Smart Infrastructure Constr. 174, 134–149. doi:10.1680/jsmic.21.00033