The impact of text topic and assumed human vs. AI authorship on competence and quality assessment

Background While Large Language Models (LLMs) are considered positively with respect to technological progress and abilities, people are rather opposed to machines making moral decisions. But the circumstances under which algorithm aversion or algorithm appreciation are more likely to occur with respect to LLMs have not yet been sufficiently investigated. Therefore, the aim of this study was to investigate how texts with moral or technological topics, allegedly written either by a human author or by ChatGPT, are perceived. Methods In a randomized controlled experiment, n = 164 participants read six texts, three of which had a moral and three a technological topic (predictor text topic). The alleged author of each text was randomly either labeled “ChatGPT” or “human author” (predictor authorship). We captured three dependent variables: assessment of author competence, assessment of content quality, and participants' intention to submit the text in a hypothetical university course (sharing intention). We hypothesized interaction effects, that is, we expected ChatGPT to score lower than alleged human authors for moral topics and higher than alleged human authors for technological topics and vice versa. Results We only found a small interaction effect for perceived author competence, p = 0.004, d = 0.40, but not for the other dependent variables. However, ChatGPT was consistently devalued compared to alleged human authors across all dependent variables: there were main effects of authorship for assessment of the author competence, p < 0.001, d = 0.95; for assessment of content quality, p < 0.001, d = 0.39; as well as for sharing intention, p < 0.001, d = 0.57. There was also a small main effect of text topic on the assessment of text quality, p = 0.002, d = 0.35. Conclusion These results are more in line with previous findings on algorithm aversion than with algorithm appreciation. We discuss the implications of these findings for the acceptance of the use of LLMs for text composition.


Introduction
The rise of ChatGPT and other Large Language Models (LLMs) has been referred to as a major step forward in Generative AI technology (Eke, 2023).It pushes "the boundaries of what is possible in natural language processing" (Kasneci et al., 2023, p. 2).LLMs are generators of text that appears like text written by humans.Such text is created by LLMs based on deep learning technology in response to people's prompts (Eke, 2023).In this context, the term creation means that the AI is able to generate content that is similar to the data with which it was trained, in this case texts.However, since this creation is essentially always prompted by human users, it can also be referred to as the co-construction of content (Cress and Kimmerle, 2023).LLMs can improve efficiency and workflow correcting syntactical and grammatical errors and provide short summaries of texts.According to Dwivedi et al. (2023), LLMs can even help with generating reasonable research hypotheses.ChatGPT can be used in almost any field, such as in the medical context (Cascella et al., 2023), in journalism (Pavlik, 2023), or in event planning (Keiper, 2023).However, LLMs also have their limitations (Deng and Lin, 2022;Salewski et al., 2024).For example, they might endanger academic integrity and the functionality of the review process of academic journals (Eke, 2023).Moreover, they often tend to reproduce biases, lack transparency, and might even reduce critical thinking by their users (Kasneci et al., 2023).
When users evaluate the performance of AI technology, a relevant factor is the magnitude of the consequences of its use.Bigman and Gray (2018), for example, found that people were averse to machines making moral decisions in several domains, including medicine, law, military, and autonomous driving.This aversion seemed to be quite robust and difficult to eliminate, even when machines were limited to a mere advisory role.Böhm et al. (2023) investigated whether an author label, that is, the fact that ChatGPT was labeled as the author of an article-as opposed to a human author-had any influence on how the texts were perceived.In their experiment, participants read short articles addressing societal challenges and evaluated the author's competence, the content quality of the text, and their intention to share the article with friends and family.Böhm et al. (2023) found a significant interaction effect of the author's identity and transparency on the author's competence.That is, whenever the author's identity (ChatGPT vs. human) was communicated to participants, they rated ChatGPT less competent than human authors.This effect vanished when the author's identity was not made transparent, indicating that people were not able to distinguish between texts written by people or AI.Lermann Henestrosa et al. (2023), however, did not find any differences between human-written and AI-written texts with respect to perceived credibility and trustworthiness of the texts.In a study by Köbis and Mossink (2021), participants had to distinguish whether various stories, news articles, and recipes were created by a human or an AI.This study found that people were unable to make this distinction, and they made their decisions at random.
The effectiveness of AI-generated content depends on how people evaluate the content and whether the advice is accepted at all.Current research is therefore focusing a lot on the question of whether and when people accept the help of algorithms.Contradictory statements can be found in the literature and opposing phenomena are shown (Hou and Jung, 2021).A large part of the research shows algorithm appreciation and refers to the tendency to give algorithmic content more weight than human content (Logg et al., 2019).The research on algorithm aversion, however, reveals the opposite effects (Dietvorst et al., 2015;Burton et al., 2020).Here, study results show that algorithmic content is increasingly rejected (Hou and Jung, 2021) or that people prefer human authorship over automated text generation (Lermann Henestrosa and Kimmerle, 2024).
In conclusion, there seems to be an ambivalence of how LLMs (and ChatGPT in particular) are perceived.On the one hand, LLMs are frequently associated with technological progress (e.g., Wei et al., 2022).Dwivedi et al. (2023), for example, point out that ChatGPT can successfully process more than eight programming languages.On the other hand, people are quite averse to machines making moral decisions (Bigman and Gray, 2018;Dietvorst and Bartels, 2022).In this study, we include the research results on algorithm aversion and appreciation.Therefore, the participants were presented with both texts on moral and on technological topics.The authorship label (ChatGPT vs. human) was randomly assigned.The texts had to be evaluated in terms of the author competence, the quality of the texts, and the intention to share the text in a hypothetical university course.
We expected interaction effects between the authorship label and text topic for each of the dependent variables.H1: Interaction effect for author competence: ChatGPT's author competence will be evaluated better with a technological text topic, while a human's author competence will be evaluated better with a moral text topic.H2: Interaction effect for content quality: content quality of a text that was labeled as written by ChatGPT will be evaluated better with a technological text topic, while the content quality of a text that was labeled as written by a human author will be evaluated better with a moral text topic.H3: Interaction effect for sharing intention: sharing intention for a text that was labeled as written by ChatGPT will be stronger with a technological text topic, while sharing intention for a text that was labeled as written by a human author will be stronger with a moral text topic.

Materials and methods
The experiment that is presented here was preregistered on AsPredicted before the start of data collection (https://aspredicted. org/6ie9b.pdf).

. Sample
We recruited n = 222 participants for this online experiment using university mailing lists.We terminated data collection on January 22, 2024.Participants were required to have an age of at least 18 years and have good knowledge of the German language.Exclusions of participants were performed according to the preregistered criteria: first, the data of participants were only used, if they gave their written informed consent and had completed the entire questionnaire.Second, we excluded participants who did not pass an attention check.For this, they had to pick the correct authors of all six texts (see procedure).A total of n = 58 participants were excluded from the analysis due to a failed attention check.This left us with n = 164 participants in the final dataset.Their mean age was M = 23.37 years (SD = 4.91); 130 were female, 30 male, and four non-binary persons.

. Procedure
The experiment was composed in German and ran on www.soscisurvey.de.It started with a brief introductory part elaborating on anonymity, voluntariness, and participants' possibility to withdraw their own data without any consequences.Subsequently, participants were instructed to carefully read and evaluate the six texts they were going to be presented with sequentially.
On the next page, an authorship label appeared above the first text, which either indicated that the text was allegedly written by ChatGPT or by a human author (predictor authorship).For ChatGPT as author the label read: "The following text is from ChatGPT.ChatGPT ('Chat Generative Pre-trained Transformer') is a text-based dialog system ('chatbot') developed by the company OpenAI, which is based on machine learning."For a human author the label stated: "The following text was written by a human author."This label was randomly assigned to the texts-differently for each individual participant (with the restriction that each label appeared at least once per topic).Below the label, participants read the respective text that was either about a moral or technological topic (predictor topic).The order in which texts appeared was randomized across participants.
After reading each text, participants had to evaluate it regarding author competence, content quality, and their sharing intention.Subsequently, they had to answer the attention check question: "Which author wrote the text you have just read?" by clicking on "ChatGPT" or "human author" (the order of appearance was randomized).In a manipulation check question participants indicated which topic the text was about on a 7-point scale ranging from "moral" to "technological."After passing through the six trials, participants were asked to report their age and gender.At the end of the experiment, participants could sign up for a drawing of seven vouchers for an online store (two vouchers worth 50 e each and five vouchers worth 20 e each).It took about 20 min to complete the entire experiment.

. Material
Each text had a length of 200 words.The three moral texts were about moral behavior, moral and globalization, as well as moral dilemmas.The technological texts included the technological revolution of the 21st century, the role of smartphones, and the future of technology.The texts were composed using ChatGPT and the respective topic prompts.They were modified minimally to achieve a length of 200 words.To ensure that the quality of the presented texts was even among the texts, we conducted a pilot study with n = 34 participants.The results showed no meaningful differences in the quality of the texts.The mean scores of the texts were 3. 85, 3.71, 3.80, 3.93, 4.01, and 3.78 respectively (rated on 5-point Likert-scales).   .

Measures
The measures of author competence and content quality were taken from Böhm et al. (2023).To measure participants' sharing intentions, we adapted a question from the same study.All items were measured on 7-point scales ranging from "1 = do not agree at all" to "7 = completely agree." Author competence was captured with three items: "The author is trustworthy, " "The author is knowledgeable of the subject, " and "The author is smart." Content quality was measured with five items: "The proposed solution described in the text is very concrete", "The content of the text is very creative, " "The text is easy to understand, " "The text is well written, " and "The text is credible." To measure their sharing intentions, participants had to answer the prompt: "I would hand in this text as a student within a university course (in this scenario no legal consequences have to be considered)."

TABLE Means (
M) and standard deviations (SD) of assessed author competence by experimental conditions.

TABLE Means (
M) and standard deviations (SD) of sharing intention by experimental conditions.
TABLE Overview of the fitted linear mixed-e ects models.