Beyond-accuracy: a review on diversity, serendipity, and fairness in recommender systems based on graph neural networks

By providing personalized suggestions to users, recommender systems have become essential to numerous online platforms. Collaborative filtering, particularly graph-based approaches using Graph Neural Networks (GNNs), have demonstrated great results in terms of recommendation accuracy. However, accuracy may not always be the most important criterion for evaluating recommender systems' performance, since beyond-accuracy aspects such as recommendation diversity, serendipity, and fairness can strongly influence user engagement and satisfaction. This review paper focuses on addressing these dimensions in GNN-based recommender systems, going beyond the conventional accuracy-centric perspective. We begin by reviewing recent developments in approaches that improve not only the accuracy-diversity trade-off but also promote serendipity, and fairness in GNN-based recommender systems. We discuss different stages of model development including data preprocessing, graph construction, embedding initialization, propagation layers, embedding fusion, score computation, and training methodologies. Furthermore, we present a look into the practical difficulties encountered in assuring diversity, serendipity, and fairness, while retaining high accuracy. Finally, we discuss potential future research directions for developing more robust GNN-based recommender systems that go beyond the unidimensional perspective of focusing solely on accuracy. This review aims to provide researchers and practitioners with an in-depth understanding of the multifaceted issues that arise when designing GNN-based recommender systems, setting our work apart by offering a comprehensive exploration of beyond-accuracy dimensions.


INTRODUCTION
With their ability to provide personalized suggestions, recommender systems have become an integral part of numerous online platforms by helping users find relevant products and content Aggarwal et al. (2016).There are various methods employed to implement recommender systems, among which collaborative filtering (CF) has proven to be particularly effective due to its ability to leverage user-item interaction data to generate personalized recommendations Koren et al. (2021).Recent advances in Graph Neural

BACKGROUND
Graph neural networks (GNNs) have recently emerged as an effective way to learn from graph-structured data by capturing complex patterns and relationships Hamilton (2020).Through the propagation and transformation of feature information among interconnected nodes in a graph, GNNs can effectively capture the local and global structure of the given graphs.Consequently, they emerge as an ideal method especially suitable for dealing with tasks involving interconnected, relational data such as social network analysis, molecular chemistry, and recommender systems among others.
In recommender systems, integrating Graph Neural Networks (GNNs) with traditional collaborative filtering techniques has been shown beneficial.Representing users and items as nodes in a graph with interactions acting as edges allows GNNs to provide more accurate personalized recommendations by discovering and utilizing intricate connections that would otherwise remain undetected Wang et al. (2019a).In particular, higher-order connectivity together with transitive relationships play an essential role when trying to extract user preferences in certain scenarios.
GNN-based recommender systems represent an evolving field with continuous advancements and innovations.Recent research has focused on multiple aspects of GNNs in recommender systems, ranging from optimizing propagation layers to effectively managing large-scale graphs and integration of auxiliary information Zhou et al. (2022).Aside from these aspects, an expanding interest lies in exploring beyond-accuracy objectives for recommender systems.Such objectives include diversity, explainability/interpretability, fairness, serendipity/novelty, privacy/security, and robustness which offer a more comprehensive evaluation of the system's performance Wu et al. (2022a); Gao et al. (2023).However, our work focuses primarily on three key aspects: diversity, serendipity, and fairness, since these aspects have a significant impact on user satisfaction, while also considering ethical concerns in the field of recommender systems.Ensuring diversity amongst recommendations minimizes over-specialization effects, benefiting users in product/content discovery and exploration Kunaver and Požrl (2017).Considering serendipity also helps to overcome the over-specialization problem by allowing the system to recommend novel, relevant, and unexpected items, thus improving user satisfaction Kaminskas and Bridge (2016).The aspect of fairness ensures that the system does not discriminate against certain users or item providers, thereby promoting equitable user experiences Deldjoo et al. (2023).Diversity, serendipity, and fairness in recommender systems are interconnected and often influence each other.For instance, increasing diversity can lead to more serendipitous recommendations, since users are exposed to a wider range of unexpected and less-known items Kotkov et al. (2020).Furthermore, focusing on diversity and serendipity can also promote fairness, since it ensures a more equitable distribution of recommendations across items and prevents the system from consistently suggesting only popular items Mansoury et al. (2020).However, it's important to note that these aspects need to be balanced with the system's accuracy and relevance to maintain user satisfaction.Considering beyond-accuracy dimensions contributes to supporting the development of GNN-based recommender systems that are not only robust and accurate but also user-centric and ethically considerate.
While GNNs have seen rapid advancements, their application in recommender systems has also been the subject of several surveys.Wu et al. (2022a) and Gao et al. (2023) provide a broad overview of GNN methods in recommender systems, touching upon aspects of diversity and fairness.Dai et al. (2022) delves into fairness in graph neural networks in general, briefly discussing fairness in GNN-based recommender systems.Meanwhile, Fu et al. (2023) explores serendipity in deep learning recommender systems, with limited focus on GNN-based recommenders.Building on these insights, our review distinctively emphasizes the importance of diversity, serendipity, and fairness in GNN-based recommender systems, offering a deeper dive into these dimensions.
To conduct our review, we searched for literature on Google Scholar using keywords such as "diversity", "serendipity", "novelty", "fairness", "beyond-accuracy", "graph neural networks" or "recommender system".We manually checked the resulting papers for their relevance and retrieved 21 publications overall from relevant journals and conferences in the field (see Table 1).While re-ranking and post-processing methods are often used when optimizing beyond-accuracy metrics in recommender systems Gao et al. (2023), this paper specifically concentrates on advancements within GNN-based models, thus leaving these methods outside the discussion.Finally, it is important to highlight that diversity, serendipity, and fairness are extensively researched in recommender systems beyond GNNs.Broader literature across various architectures has provided insights into these challenges and their overarching solutions.While our paper primarily focuses on GNNs, we direct readers to consult these works for a comprehensive perspective Kaminskas and Bridge (2016); Wang et al. (2023a).

MODEL DEVELOPMENT
The construction of a GNN-based recommender system is a complex, multi-stage process that requires careful planning and execution at each step.These stages include data preprocessing (DP), graph construction (GC), embedding initialization (EI), propagation layers (PL), embedding fusion (EF), score computation (SC), and training methodologies (TM).In this section, we provide an overview of this multi-stage process as it is crucial for understanding the specific stages at which current research has concentrated efforts to address the beyond-accuracy aspects of diversity, serendipity, and fairness in GNN-based recommender systems.
Figure 1.The simplified multi-stage process of developing a GNN-based recommender system, each of these stages strongly impacts resulting recommendations and can be considered when designing a model that takes into account beyond-accuracy objectives.

Data preprocessing, graph construction, embedding initialization
The initial stage of developing a GNN-based collaborative filtering model is data preprocessing, where user-item interaction data and auxiliary information such as user/item features or social connections are collected and processed Lacic et al. (2015a); Duricic et al. (2018); Fan et al. (2019a); Wang et al. (2019b); Duricic et al. (2020).Techniques like data imputation ensure that missing data is filled, providing a more complete dataset, while outlier detection helps in maintaining the data's integrity.Feature normalization ensures consistent data scales, enhancing model performance.Addressing the cold-start problem at this stage ensures that new users or items without sufficient interaction history can still receive meaningful recommendations Lacic et al. (2015b); Liu et al. (2020).
The graph construction stage is crucial, as the graph's structure directly influences the model's efficacy.Choosing the type of graph determines the nature of relationships between nodes.Adjusting edge weights can prioritize certain interactions while adding virtual nodes/edges can introduce auxiliary information to improve recommendation quality Wang et al. (2020); Kim et al. (2022); Wang et al. (2023b).
In the embedding initialization stage, nodes are assigned low-dimensional vectors or embeddings.The choice of embedding size balances computational efficiency and representation power.Different initialization methods offer trade-offs between convergence speed and stability.Including diverse information in the embeddings can capture richer user-item relationships, enhancing recommendation quality Wang et al. (2021).This initialization can be represented as item , where h (0) user and h (0) item are the initial embeddings of the user and item nodes, respectively.

Propagation layers, embedding fusion, score computation, training methodologies
Propagation layers in GNNs aggregate and transform features of neighboring nodes to generate node embeddings, represented as l) , where H (l) is the matrix of node features at layer l, A is the adjacency matrix, D is the degree matrix, W (l) is the weight matrix at layer l, and σ is the activation function Hamilton (2020).There are numerous approaches built on this concept.For instance, He et al. (2020) adopt a simplified approach, emphasizing straightforward neighborhood aggregation to enhance the quality of node embeddings; whereas Fan et al. (2019a) integrate user-item interactions with user-user and item-item relations, capturing complex interactions through a comprehensive graph structure.
Afterward, these embeddings are combined during the embedding fusion stage, forming a latent user-item representation used for score computation by applying a weighted summation, concatenation, or a more complex method of combining user and item embeddings Wang et al. (2019a); He et al. (2020).
The score computation stage involves a scoring function to output a score for each user-item pair based on the fused embeddings.The scoring function can be as simple as a dot product between user and item embeddings, or it can be a more complex function that takes into account additional factors Wang et al. (2019a); He et al. (2020).
Finally, in the training methodologies stage, a suitable loss function is selected, and an optimization algorithm, typically a variant of stochastic gradient descent, is used to update model parameters Rendle et al. (2012); Fan et al. (2019b).
Understanding the unique strengths of each stage outlined in this section is essential, and a comparative evaluation can guide the selection of the most suitable approach for specific collaborative filtering scenarios, such as addressing the challenges associated with beyond-accuracy metrics.In Table 1, we provide a comprehensive overview of existing literature, aiding readers in navigating the diverse methodologies and findings discussed throughout this review.

Definition and importance of diversity
Diversity in recommender systems indicates how different the suggested items are to a user.It's vital for recommendation quality, preventing over-specialization, and boosting user discovery.Diverse recommendations offer users a wider item range, enhancing satisfaction and user engagement Kunaver and Požrl (2017); Duricic et al. (2021).Diversity has two types: intra-list (variety within one recommendation list) and inter-list (variety across lists for different users) Kaminskas and Bridge (2016).

Review of recent developments in improving accuracy-diversity trade-off
A number of innovative approaches have emerged recently to tackle recommendation diversity using graph neural networks (GNNs).These methods can be broadly categorized based on the specific mechanisms or strategies they employ: • Neighbor-based mechanisms 1 : An approach introduced by Isufi et   • Adversarial learning4 : To improve the accuracy-diversity trade-off in tag-aware systems, the DTGCF model utilizes personalized category-boosted negative sampling, adversarial learning for categoryfree embeddings, and specialized regularization techniques Zuo et al. (2023).Furthermore, the above-mentioned DGCN model also employs adversarial learning to make item representations more category-independent.
• Contrastive learning5 : The Contrastive Co-training (CCT) method by Ma et al. (2022) employs an iterative pipeline that augments recommendation and contrastive graph views with pseudo edges, leveraging diversified contrastive learning to address popularity and category biases • Heterogeneous Graph Neural Networks6 : The GraphDR approach by Xie et al. (2021) utilizes a heterogeneous graph neural network, capturing diverse interactions and prioritizing diversity in the matching module.
Each of these methods offers a unique approach to the accuracy-diversity challenge.While all aim to improve the trade-off, their strategies vary, highlighting the multifaceted nature of the challenge at hand.

Definition and importance of serendipity and novelty
Serendipity and closely related novelty are crucial in recommender systems, both aiming to boost user discovery.Serendipity refers to surprising yet relevant recommendations, promoting exploration and curiosity.Novelty suggests new or unfamiliar items, expanding user exposure.Both prevent overspecialization and encourage user curiosity Kaminskas and Bridge (2016).

Review of recent developments in promoting serendipity and novelty
Recent advancements in GNN-based recommender systems have shown promising results in promoting serendipity and novelty, although notably fewer efforts have been directed towards balancing the accuracyserendipity and accuracy-novelty trade-offs in comparison to the accuracy-diversity trade-off.In our exploration, we identified several studies addressing these efforts and have categorized them based on the primary theme of their contribution: • Neighbor-based mechanisms: Approach proposed by Boo et al. (2023) enhances session-based recommendations by incorporating serendipitous session embeddings, leveraging session data and user preferences to amplify global embedding effects enabling users to control explore-exploit tradeoffs.
• Long-tail recommendations7 : The TailNet architecture is designed to enhance long-tail recommendation performance.It classifies items into short-head and long-tail based on click frequency and integrates a unique preference mechanism to balance between recommending niche items for serendipity and maintaining overall accuracy Liu and Zheng (2020).
• Normalization techniques8 : Zhao et al. ( 2022) proposed r-AdjNorm, a simple and effective GNN improvement that can improve the accuracy-novelty trade-off by controlling the normalization strength in the neighborhood aggregation process.
• General GNN architecture enhancements9 : Similarly to the popular LightGCN approach by He et al. (2020), the ImprovedGCN model by Dhawan et al. (2022) adapts and simplifies the graph convolution process in GCNs for item recommendation, inadvertently boosting serendipity.On the other hand, the BGCF framework by Sun et al. (2020), designed for diverse and accurate recommendations, also boosts serendipity and novelty through its joint training approach.These GNN-based models, while focusing on accuracy, inadvertently elevate recommendation serendipity and/or novelty.
These studies collectively demonstrate the potential of GNNs in enhancing the serendipity and novelty of recommender systems, while also highlighting the need for further research to address existing challenges.

Definition and importance of fairness
Fairness in recommender systems ensures no bias towards certain users or items.It can be divided into user fairness, which avoids algorithmic bias among users or demographics, and item fairness, which ensures equal exposure for items, countering popularity bias Leonhardt et al. (2018); Kowald et al. (2020); Abdollahpouri et al. (2021); Lacic et al. (2022); Kowald et al. (2023); Lex et al. (2020).Fairness helps to mitigate bias, supports diversity, and boosts user satisfaction.In GNN-based systems, which can amplify bias, fairness is crucial for balanced recommendations and optimal performance Ekstrand et al. (2018); Chizari et al. (2022); Chen et al. (2023); Gao et al. (2023).

Review of recent developments in promoting fairness
In the evolving landscape of GNN-based recommender systems, the pursuit of user and item fairness has become a prominent topic.Recent advancements can be broadly categorized based on the thematic emphasis of their contributions: • Neighbor-based mechanisms: The Navip method debiases the neighbor aggregation process in GNNs using "neighbor aggregation via inverse propensity", focusing on user fairness Kim et al. (2022).Additionally, the UGRec framework by Liu et al. (2022b) employs an information aggregation component and a multihop mechanism to aggregate information from users' higher-order neighbors, ensuring user fairness by considering male and female discrimination.The SKIPHOP approach focuses on user fairness by introducing an approach that captures both direct user-item interactions and latent knowledge graph interests, capturing both first-order and second-order proximity.Using fairness for regularization, it ensures balanced recommendations for users with similar profiles Wu et al. (2022b).
• Adversarial learning: The UGRec model additionally incorporates adversarial learning to eliminate gender-specific features while preserving common features.
• Contrastive learning: The DCRec model by Yang et al. (2023b) leverages debiased contrastive learning to counteract popularity bias and addressing the challenge of disentangling user conformity from genuine interest, focusing on user fairness.The TAGCL framework also capitalizes on the contrastive learning paradigm, ensuring item fairness by reducing biases in social tagging systems Xu et al. (2023).

DISCUSSION AND FUTURE DIRECTIONS
In this paper, we have conducted a comprehensive review of the literature on diversity, serendipity, and fairness in GNN-based recommender systems, with a focus on optimizing beyond-accuracy metrics.Throughout our analysis, we have explored various aspects of model development and discussed recent advancements in addressing these dimensions.
To further advance the field and guide future research, we have formulated three key questions: Q1: What are the practical challenges in optimizing GNN-based recommender systems for beyondaccuracy metrics?GNNs are able to capture complex relationships within graph structures.However, this sophistication can lead to overfitting, especially when prioritizing accuracy Fu et al. (2023).Data sparsity and the need for auxiliary data, such as demographic information, challenge the optimization of high-quality node representations, introducing biases Dhawan et al. (2022).An overemphasis on past preferences can limit novel discoveries Dhawan et al. (2022), and while addressing popularity bias is essential, it might inadvertently inject noise, reducing accuracy Liu and Zheng (2020).Balancing diverse objectives, like fairness, accuracy, and diversity, is nuanced, especially when optimizing one can compromise another Liu et al. (2022b).These challenges emphasize the need for focused research on effective modeling of GNN-based recommender systems focused on beyond-accuracy optimization.
Q2: Which model development stages of GNN-based recommender systems have seen the most innovation for tackling beyond-accuracy optimization, and which stages have been underutilized?By conducting a thorough analysis of the reviewed papers (see Table 1), we have observed that the graph construction, propagation layer, and training methodologies have seen significant innovation in GNN-based recommender systems.This includes advanced graph construction methods, innovative graph convolution operations, and unique training methodologies.However, stages like embedding initialization, embedding fusion, and score computation are relatively underutilized.These stages could offer potential avenues for future research and could provide novel ways to balance accuracy, fairness, diversity, novelty, and serendipity in recommendations.
Q3: What are potentially unexplored areas of beyond-accuracy optimization in GNN-based recommender systems?
A less explored aspect in GNN-based recommender systems is personalized diversity, which modifies the diversity in recommendations to match individual user preferences.Users favoring more diversity get more diverse recommendations, whereas those liking less diversity get less diverse ones Eskandanian et al. (2017).This concept of personalized diversity, currently under-researched in GNN-based systems, hints at an intriguing future research direction.It can also relate to personalized serendipity or novelty, tailoring unexpected or novel recommendations to user preferences.Thus, incorporating personalized diversity, serendipity, and novelty in GNN-based systems could enrich beyond-accuracy optimization.
Overall, this review aims to help researchers and practitioners gain a deeper understanding of the multifaceted issues and potential avenues for future research in optimizing GNN-based recommender systems beyond traditional accuracy-centric approaches.By addressing the practical challenges, identifying underutilized model development stages, and highlighting unexplored areas of optimization, we hope to contribute to the development of more robust, diverse, serendipitous, and fair recommender systems that cater to the evolving needs and expectations of users.
Wu et al. (2022a))) nearest neighbors (NN) and furthest neighbors (FN) with a joint convolutional framework.The DGRec method diversifies embedding generation through submodular neighbor selection, layer attention, and loss reweightingYang et al. (2023a).Additionally, DGCN model leverages graph convolutional networks for capturing collaborative effects in the user-item bipartite graph ensuring diverse recommendations through rebalanced neighbor discoveryZheng et al. (2021).DGCF framework diversifies recommendations by disentangling user intents in collaborative filtering using intent-aware graphs and a graph disentangling layerWang et al. (2020).DDGraph approach involves dynamically constructing a user-item graph to capture both user-item interactions and non-interactions, and then applying a novel candidate 1 Neighbor-based mechanisms aggregate and propagate information from neighboring nodes (users or items) to enhance the representation of a target node, capturing intricate relational patterns for improved recommendationsWu et al. (2022a).
Skarding et al. (2021)isms 2 :• Dynamic graph construction 3 : 2 Disentangling mechanisms aim to separate and capture distinct factors or patterns within graph data, ensuring more interpretable and robust recommendations by reducing the entanglement of various latent factorsMa et al. (2019).3Dynamicgraphconstructioninvolves continuously updating and evolving the graph structure to incorporate new interactions and/or entitiesSkarding et al. (2021).

Table 1 .
This table summarizes key literature on GNN-based recommender systems emphasizing beyondaccuracy metrics: Diversity, Serendipity, and Fairness.Each entry specifies the paper's publication venue/journal, targeted metric, a broad strategy categorization, and the model development stages the method utilizes or adapts to enhance the respective metric.These stages include data preprocessing (DP), graph construction (GC), embedding initialization (EI), propagation layers (PL), embedding fusion (EF), score computation (SC), and training methodologies (TM).
Zhang et al. (2022)tail issue by focusing on popularity bias in session-based recommendation systems.It aims to ensure item fairness by normalizing item and session representations, thereby improving recommendations, especially for less popular items.Additionally, the above-mentioned approach byLi et al. (2019) also focuses on long-tail recommendations.Self-training mechanisms 11 : The Self-Fair approach byLiu et al. (2022a)employs a self-training mechanism using unlabeled data with the goal of improving user fairness in recommendations for users of different genders.By iteratively refining predictions as pseudo-labels and incorporating fairness constraints, the model balances accuracy and fairness without relying heavily on labeled data.In the broader context of graph neural networks, researchers have also tackled fairness in nonrecommender systems tasks, such as classificationDai and Wang (2021);Ma et al. (2021);Dong et al. (2022);Zhang et al. (2022).Their insights provide valuable lessons for future development of fair recommender systems. •