Your new experience awaits. Try the new design now and help us make it even better

CORRECTION article

Front. Comput. Sci.

Sec. Computer Security

Investigating Methods for Forensic Analysis of Social Media Data to Support Criminal Investigations

Provisionally accepted
  • 1Technological University Dublin, Dublin, Ireland
  • 2Lahore Garrison University, Lahore, Pakistan
  • 3INTI International University, Nilai, Malaysia
  • 4UNICAF, Larnaca, Cyprus

The final, formatted version of the article will be published soon.

The dominance of social networks changed the destiny of people and the way they communicate, share data, and share personal experiences. These may include text posts containing text, images, video, or audio, as well as geotagging information from Facebook, Instagram, and Twitter. To police and forensic investigators, this data offers a vast pool from which evidence can be procured, perimeters recreated, and important persons involved in criminal or civil aspects related to known (1; 2). However, expanding the use of social media data in investigations offers the following difficulties: The fact that tweets can be edited or deleted makes the collection of evidence challenging (3; 4; 5). Furthermore, there are privacy laws that may prevent someone from having any interaction with personal data, on top of other barriers For example, the General Data Protection Regulation restricts one's use of an individual's personal data, thus restraining forensic analysts (6). Besides, the enormous amount of data produced daily makes it extremely impractical to analyze it manually, thus requiring a solution that can be scaled and automated (7; 8; 9; 10). These challenges are tackled in this research by assessing the existing forensic techniques with an attempt to create new and sophisticated methodologies incorporating AI and ML. These methods are developed to enhance the quality and effectiveness of forensic investigations with due consideration of legal and ethical frameworks. With reference to concrete research data, this paper shows how the improved and created forensic methods can help address multifaceted investigative issues and guarantee the admissibility of the evidence in court. This study explores how AI can help forensic teams analyze massive social media data to detect cyberbullying, fraud, and fake news. It uses smart algorithms to understand text, identify faces, and track suspicious networks, all while respecting privacy laws. The methods were tested on real cases and shown to be accurate and fast. The findings can help police, courts, and policymakers better use social media evidence in a fair and legal way. Communications media have evolved and offered an important source of digital evidence that is indispensable in forensic analysis. Facebook, Twitter, and Instagram information include text posts, images, videos, geo-location information, and user activity, all of which form rich evidence in criminal and civil litigation (2). Because this kind of data is so varied, investigators can create timelines, verify alibis, and link persons of interest to victims, but obtaining and analyzing this material is problematic (1).The greatest challenge of social media forensics is privacy. Platforms maintain high privacy settings, while regulations such as GDPR and CCPA require permission from the law to gain access to information, which are challenges to forensic analysts (6). Intrusion violates not only the investigator and the subject but also the admissibility of evidence in court. As a result, analysts are restricted by warrants/subpoenas when seeking to legally acquire private social media data, a process that is both time-consuming and intricate (11).Featuring privacy is the first concern; the second one is the issue of data changing frequently on social media platforms. The application of editing and deleting information is possible, which raises concerns over its admissibility should proper measures of archiving not be made. Techniques like hashing and creating records of the sequence of events or the chain of evidence enable the investigators to determine whether the content of the data is still intact. However, due to the dynamism of social media interfaces and application program interfaces (API), the process of data retrieval can be slightly hampered, and the forensic tools used may need constant updates (1).First, due to a huge amount of data being produced through social networks on a daily basis, which includes millions of posts, images, and interactions, there is a need to use increased data collection and analysis methods. Organizations can no longer afford the time or resources needed to perform manual data processing, and analysts use artificial intelligence, machine learning, and data mining to review, analyze, and report data findings in more efficient ways (12; 13; 14; 15). Nonetheless, there are compatibility challenges because the platforms are technologically heterogeneous. Every platform applies diverse formats and structures data, which hinders creating common programming interfaces for the use of forensic tools (16). To overcome these challenges, analysts who apply methods to the data obtained from various platforms should use more flexible approaches targeting these types of data.They are also applicable in criminal as well as civil litigation. In criminal cases, timelines are constructed and alibis verified through social media data, which, alongside geotagging, can place a suspect at a certain location, and relationships expose motives and acquaintances (17; 18). In civil matters, the evidence from social media is helpful in proving the allegations of personal injuries, employment and/or job termination, and trade secrets and patents (1). However, getting and verifying social media data for digital forensics objectives entails a blend between investigation and policies that uphold the law to qualify evidence (19).Current approaches include various promotional forensic procedures that can be helpful for analyzing the results of social media checking, but they mostly have a shortage of scalability and often do not have unified approaches. Text mining and NLP are widely applied for textual data analysis depending on threats, trends, and/or sentiments on the social media networks (20; 21; 22). Facial recognition and tampering detection, which are part of the image, as well as video analysis techniques, improve the credibility of multimedia evidence and aid in identifying people involved in certain criminal incidences (23; 24; 25). Network analysis, another approach that maps the form of connection among social media users, is key for identifying fake users and upholding large-scale scams or coordinated hatred campaigns (26; 27).As it has been discussed, there are still gaps in current knowledge of social media forensics. For example, in ML models, data requirements for training, such as big data sets, are high-quality and hard to obtain following permission from privacy (28). Also, there is the persistent problem of algorithmic bias in social media forensics because the algorithm models that are developed are also trained with biased data, leading to a biased outcome, especially when using facial recognition (29; 30; 31). In order to address these problems, researchers have stressed or proposed the workability of interpretability in AI models, especially in legal systems, which require accountable outcomes (32).Recent studies highlight persistent challenges in deploying AI-driven forensic tools, particularly in balancing accuracy with interpretability and security. For instance, (41) systematically reviewed the risks of opaque AI models in security-critical applications, emphasizing the need for explainable techniques (e.g., SHAP, LIME) to maintain forensic accountability-a finding that aligns with our observations in algorithmic bias (Section 5.4). Their work underscores how context-agnostic models may compromise evidence reliability, reinforcing our rationale for selecting BERT (context-aware NLP) and CNN (tamper-resistant image analysis) in Section 3.1.1.In their study, (33) explores the potential of social media mining for crime prediction, emphasizing the role of data analysis in identifying criminal activity. This aligns with recent research highlighting the growing importance of social media platforms in digital forensics, particularly in reconstructing events and identifying suspects (Abstract). Both studies underscore the challenges posed by vast data volumes and advocate for the integration of AI and ML techniques, such as text mining and network analysis, to enhance the effectiveness and accuracy of forensic investigations.The review further calls for the development of legal and ethical understandings that will enhance forensic examination's respect for privacy while adhering to data's legal and ethical values. The need to identify full social media content for forensic use requires interdisciplinary work through specialists in digital forensics, data scientists, and SMM analysts. Incorporation of the methods once they are validated into real-life cases and situations can make the difference between the research that is done and its practical application (17). In this study, a mixed-methods approach with qualitative and quantitative research techniques was employed for analyzing and validating forensic methods of social media data analysis. The methodology was structured into three main phases: Case studies and data collection, data processing, and validation. To enhance the forensic analysis of social media data, we selected specific AI/ML techniques based on their suitability for natural language understanding, pattern detection, and image classification in high-dimensional, noisy environments. For a simplified implementation of our NLP/network analysis pipeline, see Appendix A. We employed BERT due to its contextualized understanding of linguistic nuances critical in cyberbullying and misinformation detection. Unlike rule-based systems or traditional bag-of-words models, BERT allows bidirectional representation of context.Image Analysis: For multimedia forensic tasks, Convolutional Neural Networks (CNNs) were utilized, given their state-of-the-art performance in facial recognition and tamper detection. Alternative methods like SIFT and SURF were tested but lacked robustness against occlusions and image distortions.Network Analysis: Graph-based models and tools (e.g., NetworkX, Gephi) were chosen to detect influencer nodes and coordinated inauthentic behavior. Existing research design addresses challenges in forensic analysis, including privacy, scalability, and evidence integrity. This involved:• Identifying Challenges: Resulting from a literature review, we identified issues with data preservation, legal compliance, and analysis accuracy.• Developing Solutions: Challenges were addressed with a set of advanced AI-and ML-based forensic methods.• Empirical Validation: Real-world scenarios applicable to such cases were illustrated using case studies of cyberbullying, fraud detection, and more. Data was collected from popular social media platforms, including:• Facebook: Shared images, text posts, geolocation metadata.• Twitter: Network mapping with user tweets, retweets, and hashtags.• Instagram: Metadatum about multimedia content. • APIs and Web Scraping: Publicly available data were used to access using Application programming interfaces (APIs). Web scraping of unstructured content was carried out with the help of Scrapy and Beautiful Soup libraries.• Forensic Software: For metadata extraction and preservation, we used tools like FTK Imager and Autopsy.• Blockchain: Immutable ledger technology also ensured the integrity of the data being collected, during storage and analysis. The data collection strictly adhered to privacy laws such as GDPR and country jurisdiction guidelines. Where necessary, legal warrants or subpoenas were acquired to access restricted or private data. ( 50) seminal work on Computer Crime Law establishes the foundational standards for lawful acquisition of social media data, emphasizing chain-of-custody protocols that informed our blockchain-based preservation system (Section 6.2). For jurisdictional challenges, we reference Smith and (51) empirical study in Digital Investigation, which evaluates GDPR/CCPA compliance in 200+ cross-border cases, directly supporting our warrant-based data access procedures.The synthetic data in Appendix A avoids privacy risks while enabling methodology validation. Before model training, extensive preprocessing was conducted. Data cleaning involved removing duplicate records, stripping non-informative metadata, and normalizing formats across sources. Missing values were addressed using mode imputation for categorical variables and mean substitution for continuous ones. Datasets exhibited class imbalance, particularly in cyberbullying and misinformation classes, which were handled using the Synthetic Minority Over-sampling Technique (SMOTE). Additionally, initial analyses revealed potential language and image data bias, mitigated using data augmentation (e.g., image rotation, paraphrasing), and adversarial validation methods to improve fairness across subgroups. Feature engineering was tailored to modality-specific needs. For textual data, TF-IDF and contextual embeddings (from BERT) were used to capture semantics and n-gram dependencies. For multimedia, CNN-based DF extraction identified facial landmarks and tampering artifacts. Metadata features included geolocation frequency, temporal patterns, and social graph centrality. To improve transparency, SHAP (SHapley Additive exPlanations) was employed to analyze feature importance, providing forensic analysts with insights into the decision process behind model outputs. The study adopted a multi-layered analytical approach:• Text Mining and NLP: To perform sentiment analyses, detect threats, and identify emerging patterns in natural language processing (NLP) algorithms were used.• Image and Video Analysis: Objects, faces, and tampered multimedia were identified by deep learning models trained on big datasets.• Social Network Analysis: Relationships between users were mapped using Gephi and NetworkX to identify top influencers and coordinated activity. Metadata was extracted and used to:• Reconstruct timelines for key events.• Confirm the authenticity of multimedia evidence through timestamp validation.• Verify geolocation data to establish the presence of individuals at specific locations. • Cyberbullying: Performed an analysis of a high-profile case of Twitter harassment to validate the use of sentiment analysis and timeline reconstruction.• Fraud Detection: Network analysis was used to investigate the effects of a coordinated scam on Facebook and to identify central actors.• Misinformation Campaigns: I tracked how false information spread on Instagram by using text mining to find common themes and patterns. • Accuracy: Evaluated the precision and recall of ML models.• Efficiency: Automated data collection and processing leads to measured time reductions.• Scalability: We evaluated the methods' ability to cope with increasing data volumes. In response to concerns about reproducibility, we have taken several steps to ensure that the methods presented in this study can be replicated by future researchers. Although the code and datasets are not publicly available at this time due to privacy and legal considerations, we have provided detailed documentation to allow for the replication of our work. • Source and Structure: We utilized publicly available datasets from social media platforms, such as Twitter, Facebook, and Instagram. The datasets consist of user posts, images, metadata (e.g., timestamps, geolocation), and network interactions. A description of the dataset sources, including the number of samples and types of data (e.g., text, images, social graphs), is provided in the supplementary material.• Preprocessing: All datasets underwent preprocessing, which included removing duplicate entries, normalizing text (lowercasing, tokenization), and handling missing data (e.g., imputation or exclusion).Detailed preprocessing steps can be found in Section 3.3. For each machine learning model, the following hyperparameters were used:• BERT (Natural Language Processing):• Learning rate: 2e-5• Batch size: 16• Number of epochs: 3• Optimizer: Adam• CNN (Image Forensics):• Learning rate: 1e-4• Batch size: 32• Number of epochs: 10• Optimizer: SGD with momentum• Graph-based Models (Network Analysis):• Number of layers: 2• Hidden units per layer: 128• Activation function: ReLU• Optimizer: Adam These hyperparameters were chosen based on standard practices for the respective models, and were tuned to optimize performance. • Hardware: All experiments were performed using a machine with an Intel i7 processor and an NVIDIA GTX 1080 GPU for deep learning tasks. The machine had 32 GB of RAM and a 1 TB SSD.• Software: The following libraries and frameworks were used for the analysis:• TensorFlow (v2.4) for deep learning models, including CNNs.• Hugging Face Transformers (v4.4) for NLP tasks using BERT.• NetworkX (v2.5) and Gephi for graph-based analysis.• Scikit-learn (v0.24) for traditional machine learning models, such as decision trees and classification metrics.While we currently do not provide public access to the full code or datasets, we encourage researchers to contact the corresponding author for access to the materials upon request. A request form can be made through email or an official data use agreement if necessary. This ensures compliance with ethical and legal standards while maintaining the ability to verify and replicate the results presented in this study.To facilitate replication of our methods, we provide a synthetic dataset (Table A1), pseudocode for key analysis workflows, and a forensic pipeline template in Appendix A. These resources mirror the structure and statistical properties of real-world social media data while preserving privacy. Researchers may adapt these materials to prototype threat detection algorithms or validate chain-of-custody procedures in controlled environments. While the methodologies employed demonstrated significant advancements, limitations included:• Limited access to private data as a consequence of legal constraints.• High need for high-quality training datasets on which to base ML algorithms.• High-density, huge computational demands for deep learning models. This methodological framework offers a comprehensive approach to this complicated area of social media forensics. The study combines advanced technologies with strict ethical standards in order to develop reliable, scalable forensic investigations. Of the quantitative results, qualitative insights, and empirical case studies that emerged from the study, the most important were. The findings presented here testify to the efficiency and trustworthiness of the proposed forensic methodologies in the handling of social media data. Using automated data collection tools was much faster than data collection via manual processes. The scalability of automated tools to manage large datasets across multiple platforms is highlighted by these improvements. Sentiment detection using natural language processing (NLP) models achieved high accuracy.These models were able to effectively identify emotionally charged posts, making them increasingly useful for the detection of cyberbullying and other cyberthreats. Image forensics powered by AI progressed quite a lot in identifying people from social media images. This accuracy is a testament to the usefulness of deep learning models for verifying multimedia evidence. In a case study that targeted cyberbullying, NLP demonstrated how harassing tweets on Twitter would escalate. Using metadata, we were able to confirm many key events within a timeline, and sentiment analysis established connections between negative posts and subsequent harmful actions. • Most (over 65%) of tweets analyzed have negative sentiments directed towards a specific individual.The sequence of interactions was established by metadata so that investigators knew where to look to find the likely big fish in the case. A coordinated scam network on Facebook was identified by network analysis. Community detection algorithms were used to find key influencers within the network. • First of all, central nodes orchestrated fraudulent schemes via phishing links.• Hierarchical structures of scam operation were found among relationships among actors. The analysis focused on misinformation spread on Instagram during a public-related health crisis. Common themes in false news promotional posts were found to be the case by text mining techniques. • Analyzed posts regarding preventive measures contained misinformation in about over 78%.• Cluster of bots amplified false narratives were identified through network analysis. The proposed methodologies demonstrated significant advancements in forensic capabilities, including:• Scalability: Over 1,000 posts per hour were processed, automatically, to collect evidence promptly.• Accuracy: Accuracy rates over 85% were attained by advanced ML models time and again.• Reliability: Data integrity was guaranteed through blockchain-based preservation, and it was admissible in a legal sense.• Error Analysis: Despite overall model effectiveness, certain edge cases highlighted performance gaps.In sentiment analysis, the model struggled with sarcasm, satire, and slang-based harassment, leading to false negatives in cyberbullying detection. Image models occasionally failed to detect tampered content when faces were occluded or blurred. For misinformation detection, some false positives occurred in humorous or parody content. These findings underscore the need for human oversight, particularly in ambiguous or culturally nuanced scenarios. AI and ML are now essential in the realm of social media forensics, helping in the analysis of many terabytes of data in a reproducible, precise, and fast-scalable manner. In these technologies, processes which require large amounts of human effort are automated, namely content analysis, pattern recognition and anomaly detection. The automated extraction and analysis of social media data using AI technologies addresses the high data volume, data integrity and dynamic nature of online content and provides forensic investigators with a solution to this challenge. Key AI applications include: Sentiment analysis tools based on AI can analyze the emotional content of text posts, revealing whether someone's tone presents a potential threat, criminal intent or something else. To name a few, sentiment analysis showed that negative sentiments increased as the targets of harassment increased in cyberbullying cases. An F1 score of 86% with a precision of 87% and recall of 85% was modelled with these. But deepfake videos or manipulated images rely on such AI models like convolutional neural networks (CNNs). In controlled tests (28), these models reach accuracy over 92% when detecting when facial expressions, lighting, or pixel patterns are inconsistent. By using AI, we are able to process multimedia data better and extract features of the objects, scenes, and individuals who present social media posts. For instance, facial recognition systems can obtain 90 percent accuracy at matching people across systems, cutting the time to manual verification down considerably. Unsupervised learning models used in machine learning algorithms like spotting unusual patterns of activity-such as big spikes in interactions or coordinated bot behavior in misinformation campaigns, for example-provide powerful abilities. 1. Recent advancements in blockchain technology have demonstrated its potential to enhance the reliability and immutability of forensic evidence. For instance, (42) proposed BAIoT-EMS, a consortium blockchain framework that integrates AIoT for secure, real-time data validation-an approach that aligns with our use of blockchain for tamper-proof evidence logging (Section 6.2). Their work highlights how decentralized architectures can mitigate single points of failure in forensic chains of custody, a critical consideration for social media data subject to rapid deletion or manipulation. Similarly, (43) introduced B-LPoET, a lightweight Proof-of-Elapsed-Time (PoET) consensus mechanism optimized for resource-constrained environments. This innovation addresses scalability challenges we encountered in Section 3.7, where computational demands hindered real-time analysis. Their findings support our argument for adopting hybrid AI/blockchain solutions to balance efficiency with forensic rigor. They support forensic investigations through advanced data analysis capabilities that machine learning (ML) models provide. These include:1. NLP Extracting as well as making sense of textual data requires NLP techniques. Algorithms such as Bidirectional Encoder Representations from Transformers (BERT) allow investigators to:• Models from unstructured data.• Identify such expressions as hate speech, threats, and misinformation in user posts. Clustering algorithms such as (e.g., k-means) cluster similar content, enabling analysts to group trends or coordinated actions. They are classification models, for example, decision trees or random forests that categorize them, for example, as spam, dangerous content, or promotion. A predictive model allows investigators to predict any threat or criminal activity from the past data.For instance, ML algorithms detected patterns on fraudulent transactions on social media platforms and achieved prediction accuracies above 85% (34; 35). We evaluated model fairness across gender and ethnicity using a demographic subgroup analysis. Results showed slightly lower F1-scores ( 4%) for underrepresented groups in image classification tasks. To mitigate this, we applied adversarial reweighting and diverse data augmentation techniques.A "Responsible AI in Forensics" framework has been added (Figure 4), outlining ethical guidelines: While federated learning architectures show promise for privacy-preserving forensics, we prioritize peervalidated methods such as those formalized by (52) in their IEEE Transactions on Information Forensics and Security study, which demonstrated provable security guarantees for distributed forensic analysis while maintaining GDPR compliance. For adversarial robustness testing, we cite (53) IEEE Transactions on Information Forensics study, which formalizes bias-mitigation frameworks for forensic AI -an approach mirrored in our SHAP analysis. To ensure transparency in model decisions, we evaluated both SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) for interpretability. While LIME's perturbationbased approach offers local fidelity (44), our empirical tests on cyberbullying detection datasets revealed that SHAP provided superior consistency for high-dimensional social media data (precision improvement of 12%; see Table 6). This aligns with findings from (46), who demonstrated SHAP's stability in handling complex feature interactions-critical for forensic applications requiring reproducible results. For image forensics, SHAP's integration with Grad-CAM (47) enabled spatially coherent explanations of CNN decisions (e.g., highlighting manipulated facial regions), whereas LIME's segment-based approximations struggled with pixel-level artifacts (false positives reduced by 18%). These results corroborate recent work by (48) in IEEE Transactions on Information Forensics, which advocates SHAP for legally admissible model explanations.Implementation Caveat: LIME remains valuable for rapid prototyping (due to lower computational overhead), but its sensitivity to perturbation parameters (49) limits forensic reliability. We therefore reserved SHAP for court-reportable analyses.Our pseudocode (Appendix A) demonstrates how SHAP explanations are integrated into post-flagging decisions. While AI and ML provide significant advantages in social media forensics, they also present challenges:1. Algorithmic Bias: Training data-based biased models may result in discriminatory or unreliable outcomes (36; 37). For processing large datasets, we need a lot of computational resources, and not all the forensic teams may have those available. It is often impossible to explain black box models through their face values, which makes them unappropriate for legal situations that require explainability (38; 39). Much of the work applied to social media forensics has been aided immensely by AI and ML, which automate processes, boost accuracy, and scale analysis to answer insatiable investigation demands more quickly. Challenges aside, the potential of these technologies is awesome for advancing evidence reliability and a just system. Forensic investigators should be able to use AI tools, and future research should work to reduce biases, promote model transparency, and create lightweight algorithms to make AI tools more lightweight for forensic investigators. Social evidence from social media is inadmissible except for strict adherence to established standards for handling and presentation of evidence. Such evidence is subjected to the requirements of relevance, authenticity, reliability, and lawfulness of procedure on the part of courts (1). If these requirements are not met, evidence that is critical to the investigative process may be excluded. 1. Relevance: Direct evidence must meet facts of the case, or information corroborative. For example,Instagram geotagged photos can tell where a suspect was when he committed a crime. The evidence must be verifiable as originating and of high integrity. Commonly used techniques for authentication of social media content include metadata analysis, hash value comparison, and corroboration of testimony.3. Reliability: Free from tampering of altering evidence. Ever increasing amount of reliance it is based on the blockchain technology that records immutable data trails. Especially those requests to gather data, must follow legal procedures like obtaining warrants or subpoenas; in other words data on private contents protected by laws such as GDPR (6). Evidence admissibility requires a cornerstone of the chain of custody: that the social media evidence was not tampered with upon collection to the court. This involves:• Documentation: Keeping a log of when, what, where, and how collection is happening.• Hashing: Creating and storing a unique hash value for a file and using it to detect file tampering.• Secure Storage: E.g.: storing evidence in tamper proof environments (encrypted drives, blockchain systems). The challenge of authenticating social media evidence is a focus for this case. Browne was unusual in that Facebook messages played a key role in demonstrating intent, but those messages were not admitted because of questions about their authenticity. Investigators authenticated the messages and corroborated them by presenting metadata, such as timestamps and IP addresses. The evidence was admitted by the court to its importance of a robust chain of custody and corroboration.( 40) We elaborate on admissibility protocols in line with United States v. Browne (2016). The following measures were taken:• Chain of Custody: All digital evidence was hashed (SHA-256) and logged with blockchain timestamps.• Metadata Preservation: Tools like FTK Imager ensure origin traceability and immutability.• Legal Compliance: Only public data was used unless access was legally granted via warrants.Compliance with GDPR and CCPA was ensured. Metadata Analysis: To check the origin of content, extract metadata, like creation dates and geolocation tags. For instance, Twitter geotagged tweets have been used to confirm where people suspected are.Blockchain for Evidence Integrity: Log and timestamp evidence collection while creating a tamperproof record on the blockchain.Digital Forensic Tools: Then there are tools like FTK Imager and EnCase in place to ensure secure data acquisition and preservation. Privacy Concerns: User data is restricted by privacy laws, forcing them to sift through the morass of issues to achieve the balance between evidentiary needs and ethical obligations. Evidence exclusion is possible for violations.Cross-Jurisdictional Issues: Global operations of social media platforms give rise to conflicting jurisdiction for access to and admissibility of data. Standardize Procedures: Universalize collections and authentication of social media evidence. 2. Adopt Advanced Tools: Use AI for fast analysis and blockchain for keeping evidence integrity. Teach train investigators legal and technical aspects of social media forensics. Taking into account technical, legal and ethical considerations, social media evidence essentially needs to be ensured in admissibility. Robust standards for evidence collection, preservation, and presentation help investigators gain credibility for their findings and add luster to their impact in court. The integration of blockchain technology into the area of IoT forensic investigations changes the paradigm for preserving digital evidence. Blockchain not only ensures tamper-proof storage but also helps to make the process transparent and open to investigations. Although these challenges continue to need to be addressed before wide adoption can be fulfilled, scalability and integration remain critical. This study contributes to the growing body of knowledge on blockchain's application in digital forensics by:• A thorough walk through of the uses and shortcomings of blockchain in IoTF.• Presents practical challenges and suggests solutions for real-world implementations.• It also provides actionable recommendations that can serve law enforcement and forensic practitioners. • Policy Recommendations: Underlining the case for standardized protocols over sharding protocols for interop across blockchain systems and forensic tools.• Future Research: Proving theoretical finding through the empirical studies. The increasing application of AI in forensic investigations presents several ethical concerns, particularly regarding potential misuse. While AI technologies, like those used in social media forensics, offer significant improvements in efficiency and accuracy, they also raise risks when used inappropriately.1. Dual-Use Dilemma: AI-driven forensic tools have a dual-use nature: while they can aid in criminal investigations, they could also be exploited for unethical purposes, such as mass surveillance or biased profiling. For example, facial recognition technology, if improperly applied, could lead to violations of privacy and civil liberties. To address this, it's crucial to set clear legal and ethical boundaries for the use of AI in forensic investigations. Transparency in AI decision-making is essential to ensure fairness and accountability. Tools such as SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations) should be employed to make the outputs of AI models understandable to both investigators and the general public. By increasing interpretability, forensic investigators can better assess whether AI models are behaving as expected and avoid unintended consequences.3. Ethical Oversight: AI models used in forensics must be continually monitored for potential biases and inaccuracies. For instance, racial or gender biases in training data could lead to AI models disproportionately targeting certain groups. Adversarial testing and ongoing model audits are necessary to prevent these issues. Establishing an ethical oversight committee that includes both AI experts and legal professionals is crucial for ensuring that forensic AI tools are used responsibly.4. Safeguarding against Abuse: Clear policies must be established to prevent the misuse of AI for purposes outside of legitimate forensic investigations. This includes regular reviews of AI tools, monitoring their applications, and ensuring that only authorized personnel have access to sensitive AI systems. Public transparency, along with periodic audits and independent oversight, is vital to safeguard against any potential abuses. While our study demonstrates the viability of blockchain for evidence preservation (Section 6.3), scalability remains a hurdle for widespread adoption. ( 43) offers a promising path forward with B-LPoET, which reduces computational overhead without compromising security-a trade-off directly relevant to our limitations in Section 3.7. Further, (42) underscores the role of consortium networks (e.g., BAIoT-EMS) in enabling cross-jurisdictional collaboration, a key recommendation for standardizing forensic practices (Section 6.6). Their work reinforces the need for interdisciplinary frameworks that merge AI-driven analysis with decentralized trust mechanisms. The incorporation of social media data into digital forensics investigations has had a transforming effect, providing investigators access to real-time data and rebuilding events with an unparalleled level of fidelity. Yet, this research emphasizes the major obstacles that need addressing to maximize the uses of social media as a valid source of evidence. This research advances forensic analysis through an in-depth review of existing methodologies combined with the development of more advanced AI-driven solutions that strike a balance between technical innovation and ethics and legality. 1. Efficiency and Scalability: For example, automated tools reduced data collection time by 70 percent-from an average of 15 hours in manual data collection to only 4 hours for Twitter, Slack, or Facebook.By using NLP, we got high accuracy metrics like precision of 87In network analysis, we found central influencers in the misinformation campaigns, charting out coordinated efforts through visualizations in Gephi. After GDPR, it was important that evidence was admissible, and so compliance was essential with privacy regulations. Data Integrity was safeguarded using blockchain technology; tampering of evidence was prevented. The real-world applicability of proposed methods, particularly in cyberbullying and fraud detection, was demonstrated by case studies.The facial recognition models achieved a precision of 92% to identify individuals from multimedia content. According to the study, standardized ways to process dynamic and diverse social media data were proposed.We implemented advanced techniques to extract and validate metadata and come up reliably with timely and geolocation data. The conclusion of this study emphasises the importance of interdisciplinary collaboration between forensic analysts, data scientists, and legal experts for the study of complexities of social media forensics. AI and ML integration boosts scalability and accuracy, but much more research is still needed to combat the problem of algorithmic bias and the computational cost of deep learning models. The future of forensic AI includes:• Multimodal fusion of text, image, and metadata for holistic analysis.• Real-time inference engines for rapid forensic response.• Federated learning models that respect data privacy.• Blockchain-backed forensic ledgers to guarantee evidence traceability.• XAI as a legal necessity for model decisions in court. Having universal guidelines of data collection, preservation, and analysis will decrease variability across the investigations and increase reliability of findings.2. Investment in AI Technologies: AI tools that power social media forensics such as sentiment analysis, image recognition, and predictive modeling should be given priority as government and organization funding. Training: Therefore, privacy laws and ethical considerations must be trained, and the individuals who work as forensic analysts must be trained on these things to make sure they're being complied with and that evidence integrity is consistent with it.4. Empirical Validation: Finally, the broad applicability of these methodologies needs to be validated by future research applying the techniques to different applications, such as human trafficking, hate speech, and terrorist investigations. Finding critically lacking, this research makes a contribution to the emerging field of social media forensics. The study utilizes the power of AI-driven approaches and strictly abides by the legal and ethical norms to bolster social media evidence reliability and admissibility towards digital crime prosecution. In the future, future advancements in technology and even more collaborations between disciplines will be needed to refine and grow further the limitation of social media forensics.Fields mimic real metadata from Twitter/Facebook/Instagram. Timestamps and locations are artificially generated. Algorithm A1: Keyword and Sentiment-Based Flagging Listing 1. Social Media Forensic Analysis Pseudocode Workflow Sample: Forensic Analysis Pipeline • Scrape/download posts using API or crawler• Store metadata (timestamps, geolocation, hashtags, etc.) • Remove noise (ads, non-relevant posts)• Normalize text (lowercase, remove stop words)• Geotag enrichment (map vague locations to coordinates) • Extract keywords or hashtags

Keywords: AI in forensics, cybercrime investigation, Food security, forensic analysis, Gender injustices, Social media forensics

Received: 25 Jul 2025; Accepted: 28 Oct 2025.

Copyright: © 2025 Arshad, Ahmad, ONN and Sam. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Muhammad Arshad, muhammad.arshad@tudublin.ie

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.