Fairness in Music Recommender Systems: A Stakeholder-Centered Mini Review

The performance of recommender systems highly impacts both music streaming platform users and the artists providing music. As fairness is a fundamental value of human life, there is increasing pressure for these algorithmic decision-making processes to be fair as well. However, many factors make recommender systems prone to biases, resulting in unfair outcomes. Furthermore, several stakeholders are involved, who may all have distinct needs requiring different fairness considerations. While there is an increasing interest in research on recommender system fairness in general, the music domain has received relatively little attention. This mini review, therefore, outlines current literature on music recommender system fairness from the perspective of each relevant stakeholder and the stakeholders combined. For instance, various works address gender fairness: one line of research compares differences in recommendation quality across user gender groups, and another line focuses on the imbalanced representation of artist gender in the recommendations. In addition to gender, popularity bias is frequently addressed; yet, primarily from the user perspective and rarely addressing how it impacts the representation of artists. Overall, this narrative literature review shows that the large majority of works analyze the current situation of fairness in music recommender systems, whereas only a few works propose approaches to improve it. This is, thus, a promising direction for future research.


FACCTREC '22 ABSTRACT
Our narrative literature review [1] acknowledges that, although there is an increasing interest in recommender system fairness in general, the music domain has received relatively little attention in this regard.However, addressing fairness of music recommender systems (MRSs) is highly important because the performance of these systems considerably impacts both the users of music streaming platforms and the artists providing music to those platforms.The distinct needs that these stakeholder groups may have, and the different aspects of fairness that therefore should be considered, make for a challenging research field with ample opportunities for improvement.The review first outlines current literature on MRS fairness from the perspective of each stakeholder and the stakeholders combined, and then identifies promising directions for future research.
The two open questions arising from the review are as follows: (1) In the MRS field, only limited data is publicly available to conduct fairness research; most datasets either originate from the same source or are proprietary (and, thus, not widely accessible).How can we address this limited data availability?(2) Overall, the review shows that the large majority of works analyze the current situation of MRS fairness, whereas only few works propose approaches to improve it.How can we move forward to a focus on improving fairness aspects in these recommender systems?
At FAccTRec '22, we emphasize the specifics of addressing RS fairness in the music domain.For example, a domain-specific challenge is the fact that the available music datasets are skewed.This

INTRODUCTION
The art of music recommendation was traditionally performed exclusively by people, such as DJs, record store owners, and friends.In the last few decades, however, this task has been partially automated using machine learning (ML) techniques; recommender systems (RSs) in particular [18].Learning from large-scale user behavior and music features, so-called music recommender systems (MRSs) can automatically produce recommendations tailored to a specific user [27].This is one of the reasons why music streaming platforms, that typically integrate MRSs, have become one of the main sources of music consumption [43].Consequently, the performance of MRSs highly impacts users' overall music listening experience [48] and considerably impacts artists in terms of exposure and resulting royalty payments [35].
ML system users frequently perceive RS decisions as objective [39].However, many factors make such systems' processes prone to biases, resulting in unfair outcomes [27].One such factor is that ML models are created and trained by humans whose intrinsic biases may be carried over.Furthermore, the data that is used to train ML models may contain biases as well.This is problematic, as fairness is a fundamental value of human life [38,67].Moreover, antidiscrimination regulations explicitly prohibit that characteristics such as gender, age, and nationality cause different outcomes for otherwise similar people [4,21,31,Art. 21].It is, therefore, crucial to critically review MRSs for any form of unfairness to ensure that they do not unfairly disadvantage any user or artist.
Overall, there is an increasing interest in research on fairness in ML in general [42], and in RSs in particular [26].One of the challenges in fairness research is that it is scattered across several disciplines [40,61].Moreover, it concerns several stakeholders with distinct fairness needs, calling for various bias mitigation strategies [27].Considering those needs is, thus, key to both understanding fairness in music recommendation algorithms and designing strategies to improve it.To the best of our knowledge, an overview of such needs and strategies does not yet exist for the music recommendation field specifically.Therefore, this work addresses the following research question: What is the state-of-the-art of MRS fairness research from the various stakeholders' perspectives?To address this RQ, we conduct a narrative literature review, giving a thorough overview of works that explicitly target RS fairness in the music domain.We also include some works that are not explicitly concerned with fairness, yet address fairness as a side effect.
In Section 2, we first define each relevant stakeholder group.Then, in the Sections (2.1, 2.2, and 2.3), we present our narrative literature review in which we address each of the relevant stakeholders separately.In Section 3, we conclude this work with a discussion of the lessons learned from this overview and derive research gaps, thereby forming a solid basis for future research.

FAIRNESS FOR MULTIPLE STAKEHOLDERS IN MUSIC RECOMMENDER SYSTEMS
The digital music value chain embraces a wide set of stakeholders, who have different goals and interests regarding the music being recommended [11].Recommender systems literature typically distinguishes three stakeholders: platform users (end consumers), item providers, and the platform itself [2,15,65].Some variations can be found in literature; for instance, Mehrotra et al. [50] and Patro et al. [57] only consider user and item provider as stakeholders, yet not the platform; conversely, Jannach and Bauer [44] include society at large as a fourth stakeholder.
In MRSs, there are three main stakeholders.Firstly, the users (Section 2.1)-also called consumers or customers-are the party consuming the music recommendations.A user may be an individual or a group of individuals, served by music streaming platforms.As individuals have different profiles containing, for instance, different characteristics, preferences, or needs, MRSs might create a better experience for some user groups than for others.Ideally, a MRS creates a good user experience for all users.
Secondly, the item providers (Section 2.2)-also referred to as producers or suppliers-form the stakeholder supplying the recommended music and benefiting from it being consumed or purchased.In MRS research, the artists (including performers, music producers, and songwriters) are typically the item providers, but record companies or publishers representing several artists may also be considered item providers.Each item provider usually represents a multitude of items in the form of music tracks.A higher MRS ranking for an item implies a higher chance of exposure to users, resulting in a higher chance that users interact with the item [13,24].This is desirable, as item interaction results in revenue [23].Typically, item providers have little control over when and to whom their items are recommended [15,35].
Thirdly, the platform exists at the center of the music recommender ecosystem [3,64].Music streaming platforms (such as Apple Music, Deezer, Pandora, QQ Music, Spotify, and Tidal) act as an interface between huge repositories of music tracks and millions of music consumers.On such platforms, the interaction between users and items is facilitated by a MRS.A platform needs to attract and retain users as well as item providers and, thus, benefits from a successful match between users and items [15].As the platforms are in control of the MRSs they embed [10] and can even significantly influence consumption decisions through functionalities such as curated playlists [5], they are typically not considered being at risk of unfair treatment.Rather, platforms might impose fairness constraints to satisfy an organizational mission or meet demands of, e.g., government regulators or interest groups [27].Further, there is increasing external pressure to make these platforms and their integrated MRSs fairer [11,16,35,52,57].
As multiple stakeholders with possibly diverging interests are involved and affected by MRSs, multi-stakeholder research (Section 2.3) addresses several stakeholder groups simultaneously.Each stakeholder may have distinct fairness needs, which may further differ per context and application [15,28].Consequently, solely optimizing RSs on metrics such as user satisfaction may be detrimental to user fairness, item provider fairness, or both [11,57].Hence, several studies urge to consider the interests of all stakeholder groups [15,50,51].We note that research that addresses fairness, for example, for item providers, while also measuring performance indicators such as user satisfaction in the evaluation, are not necessarily multi-stakeholder approaches; a multi-stakeholder perspective integrates the various stakeholders fundamentally.
Table 1 provides an overview of the papers on fairness in MRSs considered in this narrative literature review.It also includes information on the research focus, methodology, considered fairness attributes, the stakeholders in the loop, and the datasets used for conducting the research.

User Perspective
From the user perspective, fairness in MRSs is primarily studied based on distinct user groups defined by personal characteristics.In addition to groups based on protected characteristics, groups differentiated by other characteristics may experience unfairness as well.
A wealth of literature analyzes popularity bias and subsequent mitigation strategies in various application domains [e.g., 1,36,70].It is, for instance, widely acknowledged that collaborative filteringbased recommendation approaches are prone to popularity bias [19,45].The music domain is a well-known example of the long-tail economy [6] and popularity bias is, thus, particularly relevant.It can be considered either a problem [6] or a desired feature, as popularity in the community signifies some relevancy [18].In general, many works address popularity bias in MRSs with various intentions.Some address the cold-start problem for items without prior user ratings to make them recommendable [e.g., 32]; others aim at increasing user satisfaction by adding novelty through recommending items from the long tail [e.g., 12]; yet other works leverage the long tail to specifically address discovery [e.g., 25].While fairness is not always necessarily put in the loop of the investigation, this research thread does address fairness aspects.
As for insights from works that explicitly consider user fairness in MRSs, recommendation accuracy tends to be higher for 'mainstream' users, who are inclined toward what is popular, compared to 'beyond-mainstream' users who prefer less popular items [46,47].This also holds when defining user groups based on a more fine-grained music taste level [46,59].Some works [e.g., 9] have proposed mechanisms that better reflect the preferences of beyondmainstream users.When defining user groups based on user country, popularity bias also negatively affects MRS performance for groups from countries with preferences beyond the global mainstream [8,55].In a later work, Bauer and Schedl [9] propose contextprefiltering approaches to mitigate this issue.
Zooming in on another user characteristic, several studies investigate gender.They show that popularity bias particularly affects minority gender groups (in these studies: women), resulting in lower-quality recommendations in terms of accuracy and coverage [e.g., 49,52].In addition to finding similar results for user gender, Ekstrand et al. [29] and its reproducibility study by Neophytou et al. [55] found performance differences for different user age groups, too.Here, the older user group received lower-quality recommendations.
Lastly, on the mitigation side, Boratto et al. [14] present a reproducibility study focusing on user age and gender, applying various mitigation strategies in the music and movie domains.Different from the movie domain, the size of the user group was not indicative of the recommender accuracy in the music domain.Given their indecisive results, it is important to look beyond popularity bias and demographic group size to understand the drivers of demographic differences.
Melchiorre et al. [53] define user groups based on personality traits.In contrast to the work on gender, age, and country, personality traits are not among the characteristics acknowledged by antidiscrimination regulations, and fairness research is not clear about this issue either.Nonetheless, they may be a source of bias and an opportunity for MRS improvement.Melchiorre et al. [53] illustrate this by showing that scoring low on the personality traits openness, extraversion, and conscientiousness results in higher recommender performance, whereas scoring low on neuroticism or agreeableness leads to lower performance.Additionally, Htun et al. [41] study the effect of personality traits on the perception of fairness in group recommendations when creating group music playlists.Here, the personality trait openness is negatively correlated with the perception that fairness is important in groups.Given that diversity needs and personality traits correlate [20], considering those traits in user modeling may help improve MRS performance.

Item Provider Perspective
When considering harm against music providers caused by unfairness in MRSs, research mainly focuses on group fairness [63].Item provider groups in MRS research have been primarily defined based on gender [28,34].Several approaches are used to study and mitigate item provider gender bias, illustrating that a multifaceted approach is needed.To date, most research has focused on understanding existing gender biases [e.g., 30,69].The former analyzed a Spotify streaming sample and found a disparity between artist genders in users' listening behavior.In 'organic' streaming, such as streams originating from a user library or user's search, 21.75% of tracks were from either a woman or multi-gender formation.For streams programmed by MRSs, this number was 23.55%.This gender gap in listening behavior is further reflected in commonly used datasets such as LFM-1b and LFM-360k, in which 23% of (solo) artists are women [34].These datasets roughly reflect the gender gap in business reality [30,71].Overall, these percentages reflect the barriers to entry, and subsequently climbing to the top, for minority genders.In addition, pre-existing gender biases might influence which tracks users select in a MRS.Ferraro et al. [33] and Shakespeare et al. [62] found that collaborative filtering algorithms could propagate or even amplify those biases in a MRS, thereby negatively impacting minority genders.In the latter, no evidence was found for the algorithms introducing new gender biases, which is supported by Epps-Darling et al. [30] who found that recommendation-based streaming even contained a slightly higher proportion of tracks by women than in organic listening.On the gender bias mitigation side, re-ranking is a promising method.Ferraro et al. [34] demonstrate breaking bias amplification through gradually increasing exposure for minority genders.
In addition to gender, Oliveira et al. [56] consider genre, locality, and contemporaneity.Embracing these attributes, they introduce a multi-objective approach to diversification that addresses fairness for users and item providers alike.Ferraro et al. [33] use similar categories and add artist type (e.g., solo artist, band).Their analysis of the locality attribute indicates that group size may foster exposure: the artists from the most represented countries in the dataset (here: United Kingdom and United States) reached high exposure, while minority countries were penalized.
Defining item provider groups based on their popularity level has been investigated, too [7,19].Although popularity bias is a frequently researched topic, fairness goals are predominantly defined for MRS users and not item providers.One exception to this is Flexer et al. [37] who study the 'hubness' phenomenon, which can occur in content-based RS models that use song similarity as their main feature.Hubness refers to some music tracks being connected to many other in the database without a clear semantic musical connection.This may introduce unfairness for tracks that are more similar semantically, but not recommended as often.
To date, one study directly discusses fairness in MRSs with the item providers themselves: Ferraro et al. [35] interviewed artists about their perception of fairness in MRSs, and how item provider fairness could be improved on music streaming platforms.In those interviews, the main noted fairness improvement areas relate to nurturing diversity in general, and in particular to gender representation, addressing popularity bias, and providing a better representation of genres beyond the mainstream.These topics also correspond to the aforementioned research focuses in literature.

Multi-Stakeholder Perspective
Studies may simultaneously take several different MRS stakeholder objectives (e.g., satisfaction, utility, fairness, or diversity) into account.Generally, across application domains, a trade-off between such objectives is reported [22,50,63], though it is possible that multi-stakeholder objective optimization benefits all stakeholders.Item provider fairness, for example, does not have to be detrimental to user satisfaction [50], and persuasive strategies may even be implemented to promote new and less popular artists while increasing user satisfaction [54].Furthermore, even if users do not directly benefit from or even consider fairness for item providers, they indicate that it is important to incorporate it in RSs [65].
Overall, fairness-related multi-stakeholder MRS work mainly defines objectives and stakeholders rather than aiming to improve fairness.Mehrotra et al. [50], though, do contribute to fairness improvement by introducing a counterfactual estimation framework that balances provider fairness with user relevance and can optimize either, aiming to provide an alternative for expensive online A/B tests.In another study, Mehrotra et al. [51] use 'contextual bandits' that can optimize multiple objectives simultaneously in a fair way, this time focusing on user-and platform objectives as opposed to item providers'.
We might also draw inspiration from multi-stakeholder MRS research where fairness is not an explicitly defined goal.For instance, Unger et al. [68] introduce a multi-objective RS that aims to fulfill both user satisfaction (measured by saves, likes, and engagement) and item provider satisfaction (determined by, e.g., acquiring new fans).A similar approach may be taken to implement fairness objectives for multiple stakeholders.Patro et al. [57] propose FairRec, which exhibits fairness for both user and item provider while the loss in overall recommendation quality remains marginal.FairRec has, however, not been applied to the music domain yet.

DISCUSSION AND CONCLUSIONS
This literature overview demonstrates that, while there is increasing interest in research on fairness in RSs in general, comparatively little research has addressed the music domain.Below, we discuss the main findings we derive from this review.

Research focus
Contrary to what literature frequently claims [e.g., 35,57], fairness in this context has been addressed from both the user perspective and the item provider perspective.Yet, multi-stakeholder approaches to fairness are scarce.This review also shows that the large majority of MRS fairness works analyze the current situation, using existing approaches and available datasets.We, therefore, identify improvement-focused research as the main research gap.A major challenge remains here: we still need to improve our understanding of the normative nature of fairness.While an entirely fair system is likely unachievable, it is crucial to recognize RS fairness issues, mitigate them, and incrementally improve fairness over the current state.

Gender bias
Interestingly, various MRS works address gender fairness, both for user and item providers.We speculate that this focus has emerged from gender being an immutable characteristic, the wide acknowledgment that gender fairness is of societal relevance, and gender labels being available to some extent in relevant datasets.While it is a known limitation that a binary concept of gender oversimplifies gender expression, current datasets predominantly restrict the gender labels to man and woman [14,34,62].A notable exception is the work by Epps-Darling et al. [30].

Popularity bias
While popularity bias may be considered an item provider fairness issue as the gap between popular and unpopular items increases, research frequently focuses on the user.Addressing popularity is seen as a means to provide more diverse content to increase user satisfaction.Similarly, we observe that some works do not explicitly focus on fairness, but still demonstrate fairness intentions or improvements in their research.As this review focused on works that address fairness explicitly, this overview is not intended to be exhaustive.

Data availability
As can be seen in Table 1, the most frequently used datasets originate from Last.fm: LFM-1b [58], LFM-1K, LFM-360K [both 17], and the recently added LFM-2b [60].This results in only a few datasets being used for research on fairness in MRSs; most of which are either based on the same or similar Last.fmdata, or are proprietary and therefore not accessible to other researchers.Overall, this means that the used datasets might not be representative.Additionally, only a few open datasets in the music domain contain user interaction or preference data.They also typically include only limited fairness-related stakeholder metadata (e.g., gender, age, ethnicity), as sensitive data is often not shared [66].For ethical reasons, it is debatable whether it should be.Lastly, a current limitation is the focus on short-term bias mitigation, while real world-systems are active over years [62].Longitudinal data or simulation frameworks are needed to better address these temporary aspects and to study fairness in MRSs in the long run.Summing up, to achieve significant MRS fairness improvements, richer and more representative data is needed.

Table 1 :
Overview of literature on fairness in music recommender systems.Not transparent which fairness attributes participants were considering.
a Hubness can create unfairness for any attribute.b