Inclusiveness, Equity, Consistency, and Flexibility as Guiding Criteria for Enabling Transdisciplinary Collaboration: Lessons From a European Project on Nature-Based Solutions and Urban Innovation

The structural research programmes of the European Union dedicated to advance the sustainability sciences are increasingly permeated by the notion of transdisciplinarity (TD). A growing body of literature residing at the intersection of research methodology and sustainability studies can guide researchers to adopt appropriate research approaches in their projects. However, how to implement the transdisciplinary approach in multidisciplinary and multi-stakeholder projects that develop in different countries for several years is still relatively undocumented. This study seeks to fill this gap by sharing the experience of a group of researchers and stakeholders involved in the Horizon 2020 research and innovation project Nature-Based Urban Innovation (NATURVATION). The article discusses the monitoring and evaluation strategy that employed four criteria of transdisciplinary research quality as “reflexive devices” to enable a systematic reporting on the project's most important collaborative activities. By examining how the four criteria captured transdisciplinary quality, new insights were produced for improving this monitoring and evaluation strategy for future transdisciplinary research, allowing a number of concrete recommendations to be formulated.


INTRODUCTION
The structural research programs of the European Union (EU) dedicated to advance the sustainability sciences have been increasingly influenced by the notion of transdisciplinarity (TD). A case in point is the Horizon 2020 Responsible Research and Innovation Action, which is oriented toward "promoting inter-and transdisciplinary solutions" (European Commission Horizon 2020Programmes, 2021. One explanation for the "institutionalization" of this methodological approach is the "quest for legitimate knowledge" (Basta, 2017) that underpinned the epistemological debate of the past decades on the roles of science in and for society (e.g., Owen et al., 2012). This debate contributed to the advancement of research practices where researchers, policymakers, and stakeholders collaborate in inclusive processes of knowledge production (Jasanoff and Wynne, 1998;Hirsch-Hadorn et al., 2008;Wyborn, 2015). The assumption that motivates these research practices, and that seems to have determined their growing acceptance on the side of the scientific community and research funding agencies worldwide, is that "transdisciplinary teams can generate new knowledge to address complex problems while integrating multiple disciplines and stakeholders" (Harris and Lyon, 2014).
As the science of "complex problems" par excellence, the sustainability sciences have been particularly receptive to relevant methods of "co-production" of scientifically grounded and, at the same time, transformative knowledge (Lemos and Morehouse, 2005;Godemann, 2008;Brandt et al., 2013;Gaziulusoy et al., 2016;Norström et al., 2020). As a consequence, at the intersection of research methodology and sustainability studies, a hybrid literature has emerged on which quality principles and operational criteria could support the design and evaluation of collaborative and integrative research practices (Schramm et al., 2005;Wickson and Carew, 2006;Pohl and Hirsch Hadorn, 2008;Carew and Wickson, 2010;Jahn et al., 2012;Belcher et al., 2016;Wall et al., 2017).
Generally accepted quality principles include relevance, credibility, legitimacy, and effectiveness of the knowledge production process (among others, Belcher et al., 2016). The challenges commonly faced by the multidisciplinary and multistakeholder teams seeking to operationalize these principles in their work are also widely studied. These include the combination of the different knowledge bases of research participants into a shared problem formulation ; enabling dialog and building trust among researchers and stakeholders with different backgrounds and goals (Harris and Lyon, 2014); and more generally, minimizing the gap between the "ideal conditions" for effective knowledge co-production and synthesis, and the reality in which research projects normally develop (Lang et al., 2012;Verwoerd et al., 2020).
These practical challenges, and the guiding principles that assist in dealing with them, are indeed well-documented in the literature. However, how to tackle these challenges when the operationalization of these principles occurs in the framework of projects that develop in different countries over several years-that is, the typical setting of large EU research projects-is still relatively undocumented. In particular, there is a dearth of studies that combine the adoption of the transdisciplinary guiding principles of relevance, credibility, legitimacy, and effectiveness with the systematic reporting on their operationalization in the framework of large international projects. This research seeks to fill this gap by discussing the self-assessment of a transdisciplinary monitoring and evaluation strategy developed in the Horizon 2020 research and innovation project Nature-Based Urban Innovation (NATURVATION). As participants within the project, we reflect on the operationalization of transdisciplinary research quality principles in the context of one of the project's key activities, namely, conducting various knowledge co-production events on the benefits and implementation of nature-based solutions (NBS) in local urban plans. The events were organized on the basis of the common agenda of the six Urban-Regional Innovation Partnerships (URIPs) active in NATURVATION's consortium. These local partnerships included academics, researchers, urban professionals, and stakeholders involved in the common search for naturebased solutions to pressing urban challenges. As such, the six URIPs constituted the local "transdisciplinary teams" of NATURVATION's consortium.
Our self-assessment of the monitoring and evaluation strategy was tailored to assess the transdisciplinary quality of the URIPs' knowledge co-production events. It reflects on the pathway taken from the adoption of the aforementioned transdisciplinary quality principles to the identification of four operational criteria. The latter were used as "reflexive devices" for reporting on the events and gauging their transdisciplinary quality. The objective of the article is therefore two-fold. The first is methodological, and regard the development of our monitoring and evaluation strategy as an exemplary case on how to operationalize transdisciplinary research quality principles in large international projects where multiple transdisciplinary teams operate at a local level to advance project objectives. The second objective is reflexive and consists of self-assessing the efficacy of our strategy to inform the transdisciplinary quality of the collaboration within the URIPs.
This article is divided in four parts. Section The challenge of building transdisciplinary capacities in large European Union projects: nature-based urban innovation methodological approach, and the perspective of this study describes NATURVATION's objectives, organizational setup, and the notion of transdisciplinarity that informed the project's research design. This part provides a snapshot of the research context in which we developed and assessed the monitoring and evaluation strategy of the project's transdisciplinary quality. The section Materials and methods: literature review, Urban-Regional Innovation Partnerships' Summary Reports, and our collaborative self-assessment then describes the materials and methods that inform our study, and also provides a short account of the different proposals of implementation of the TD approach in NATURVATION that led to the definition of our strategy. The Discussion: What the Analysis of the Urban-Regional Innovation Partnerships' Summary Reports Reveals, and Our Self-Assessment of the Relevant Monitoring and Evaluation Strategy discusses the results of the self-assessment. This section anticipates some conclusions regarding the four criteria of transdisciplinary quality adopted in the reporting system, reflecting, in particular, on how this could be improved in future applications. The section Conclusive Remarks: Transdisciplinary Research as the Art of "Bringing Order to Creative Chaos" provides our concluding remarks. These regard improving the efficacy of systematic reporting by using explicit criteria of transdisciplinarity in the framework of large international projects during their entire development. The conclusions also stress the importance of communicating the scope of reporting on transdisciplinary quality criteria clearly to all project participants. These final recommendations are directed also at the research funding agencies that promote transdisciplinary research.

THE CHALLENGE OF BUILDING TRANSDISCIPLINARY CAPACITIES IN LARGE EUROPEAN UNION PROJECTS: NATURE-BASED URBAN INNOVATION METHODOLOGICAL APPROACH, AND THE PERSPECTIVE OF THIS STUDY
NATURVATION is a European research and innovation project that aimed at advancing innovative knowledge on naturebased solutions (NBS) to urban sustainability challenges. These included climate adaptation, air quality, and the related social questions of equity and inclusiveness. NBS (for example, green urban roofs) are solutions inspired or delivered by nature that constitute sustainable alternatives to their technological counterparts (for example, air conditioning systems) (Bulkeley, 2016). Identifying costeffective NBS that could contribute to advance the sustainable development goals and promoting their implementation in urban and regional plans constituted the main objectives of the project.
For identifying and assessing the multiple benefits and potential uses of NBS, the project relied on an iterative program of activities conducted in six Urban-Regional Innovation Partnerships (URIPs) based in Utrecht (The Netherlands), Gyor (Hungary), Newcastle (UK), Leipzig (Germany), Barcelona (Spain), and Malmö (Sweden). Each URIP acted as "transdisciplinary unit" by being co-convened by researchers from local universities or research centers, local government representatives, and stakeholders relevant to the implementation of NBS in the respective urban region. As such, the URIPs constituted the "operational units" of the research consortium.
The methodological approach that framed the collaboration among their participants is the transdisciplinary approach. In NATURVATION's project plan, the transdisciplinary approach was qualified as "on-going and collective process of learning, where different knowledge communities are brought together" (Bulkeley, 2016, p. 27). Moreover, "the project emphasizes the importance of collaboration, co-production of knowledge, and the maximum outreach of its results" (Bulkeley, 2016, p. 31).
The concept of knowledge co-production was particularly relevant to the URIPs program of activities. Indeed, knowledge co-production "occurs" in context-based, pluralistic, goaloriented, and interactive settings (Norström et al., 2020). The establishment of the six local partnerships, and the definition of their program of activities, was therefore meant to create the most favorable conditions for harvesting and channeling multiple local knowledge toward innovative learnings on NBS.
The URIPs' relevant activities were coordinated by ICLEI, the global network of local governments for sustainability, a boundary organization that supports local governments' sustainable development capacities (Frantzeskaki et al., 2019). The most important among these activities consisted of a set of thematic events on NBS focused on their assessment and implementation in local urban plans. Other events, like the Stakeholders Dialogues held in Utrecht and Malmö, were dedicated to advance specific NATURVATION's deliverables like the NBS integrated assessment framework. All events shared the goal of facilitating co-production of knowledge by bringing together multiple knowledge communities. Each URIP organized them autonomously on the basis of the common agenda coordinated by ICLEI. In parallel, ICLEI led an iterative program of knowledge exchange among the URIPs on the outcomes of all events in such a way to secure the accessibility of the relevant progresses to all project's participants.
As part of NATURVATION's transdisciplinary capacitybuilding objectives, the Netherlands Environmental Assessment Agency (PBL), one of the project's partner institutes, conducted a research on the operationalization of the transdisciplinary approach from an observant position. This article is one of the deliverables of this research trajectory. This study therefore combines the perspective of researchers not directly involved in the coordination of the transdisciplinary process with the perspective of its coordinators and participants. In it, we also jointly reflect on how the role of observers of the PBL researchers influenced the way in which the URIPs members experienced the task of reporting on the transdisciplinary quality criteria. Before doing so, in the following section, we describe the theoretical premises and methodological challenges that have informed NATURVATION's transdisciplinary research design.
Fostering Transdisciplinary Co-Production of Knowledge: Epistemological Premises, Methodological Questions, and Nature-Based Urban Innovation's Relevant Challenges At the early stage of NATURVATION, the first step taken to identify workable strategies for operationalizing the transdisciplinary approach in the research practices of URIPs and evaluating their quality consisted of executing a literature review on different conceptions of transdisciplinary research (TDR) (Basta and Kunseler, 2018). The review, described in more detail in the section Literature Review: Transdisciplinary Quality Principles, Operational Criteria, and How "Putting Them to Work", explored different literatures, among which research methodology and sustainability literatures. This enabled to distill the "common denominators" among different conceptions of TDR. The review also revealed how, in approaching questions of transdisciplinary research quality, research methodology, and sustainability literatures draw on the same underlying epistemological debate on the role of science in society (Owen et al., 2012;Osborne, 2015) 1 .
From the prevention of natural and technological hazards (De Marchi and Ravetz, 1999;Culwick and Patel, 2016) to climate change adaptation (Gaziulusoy et al., 2016;Turnhout et al., 2016;Howarth and Monasterolo, 2017) up to nature-based solutions to sustainability challenges (Nesshöver et al., 2017;Steger et al., 2018;Hanson et al., 2020), the fields of study that have embraced this integrative conception of knowledge have steadily increased. In the field of urban sustainability studies, the most relevant to NATURVATION's objectives, such conception of integrated knowledge overlapped with the established theoretical tradition that sees the participation of different actors in knowledge production and decisional processes as instrumental to pursue urban goals more effectively (e.g., Forester, 1999;Maiello et al., 2010) and legitimately (Healey, 2003;Muller et al., 2005).
The interrelation of this wide range of sources with the idea of transdisciplinarity as "process of mutual learning" adopted in NATURVATION provided solid theoretical foundations for approaching the research design of the activities of the URIPs. However, at the beginning of the project, several operational questions had to be solved still. A particularly important question consisted of how monitoring such activities against transparent transdisciplinary quality principles at a two-fold scope of securing their methodological consistency and informing their progresses accordingly 2 . The relevant challenge was relative not only to conceptual questions like the identification of 1 Initiated in the second half of the 20th century, such debate rooted in the academic rivalry between the theoretical and the applied sciences and in the "gulf of mutual incomprehension" between the two respective academic cultures (Snow, 1959). The following epistemological debate, progressed also under the influence of the French structuralist movement, developed up to envision "a superior order of knowledge" that integrates different disciplinary outlooks in the process of scientific inquiry. Such 'superior order' is what the French linguist and epistemologist Jean Piaget called transdisciplinary knowledge (Nicolescu, 2014). 2 For the PBL researchers who author this study, besides a question of research quality, this operational question was also a matter of research ethics. Such matter was touched upon in the paper "Transdisciplinarity in Urban Studies: From 'preaching it' to doing it" presented at the yearly congress of the European Association of Schools of Planning in the summer of 2017 (Basta, 2017) and in a follow-up study (Basta, 2021, in progress). The ethical question regards the accountability of researchers involved in transdisciplinary projects funded by the EU structural research programs for the consistency between the methodological approach described in the respective project plans, and its concrete implementation. The relative concern originates from the observation that the involvement of multiple stakeholders in a project's consortium, and the labeling of such involvement as "participatory" and "collaborative" research, does not guarantee that their knowledge will be integrated in the project's deliverables. Research projects funded with EU structural funds that apply participatory and collaborative approaches to the production of knowledge should therefore include transparent monitoring and evaluation mechanism able to document the integration of the knowledges of different actors in the project's deliverables. However, as documented in the study that followed-up on the cited congress paper, in the H2020 program this has rarely been the case. The study includes the review of more than 40 Final Reports of H2020 projects in the social and in the environmental sciences that adopted the transdisciplinary approach. Of them, none included robust monitoring and evaluation mechanisms dedicate to document the integration of different knowledges in the project's deliverables. In the view of the author, this striking finding suggests the desirability transdisciplinary quality criteria adapt to the operational context of the URIPs but also to pragmatic barriers like the multiple countries in which the URIPs were due to develop their works simultaneously.
To tackle these issues, the PBL researchers who author this article tailored the research strategy described in the following section, dedicated to the materials and methods that inform this study.
MATERIALS AND METHODS: LITERATURE REVIEW, URBAN-REGIONAL INNOVATION PARTNERSHIPS' SUMMARY REPORTS, AND OUR COLLABORATIVE SELF-ASSESSMENT This section reconstructs the development of the monitoring and evaluation strategy adopted for documenting the adherence of the thematic events of the URIPs to the four transdisciplinary research quality principles adopted in NATURVATION. It also describes the methods used for elaborating the following selfassessment. Some of the steps described in the following subsections constitute the background work also for other studies conducted in the framework of our research on the TD approach (e.g., Basta, 2021, forthcoming). Some others instead are relative to this study only. These steps consist of: 1) A literature review on the transdisciplinary research methodology and on its implementation in research projects in the broad field of the sustainability sciences (section Literature Review: Transdisciplinary Quality Principles, Operational Criteria, and How "Putting Them to Work"); 2) The identification of criteria suitable to operationalize the transdisciplinary guiding principles of relevance, credibility, legitimacy, and effectiveness in key knowledge coproduction events (this study, section From transdisciplinary guiding principles to operational criteria: contextualizing transdisciplinary practices); 3) The establishment of a consistent practice of reporting on such events by the URIPs by means of the provision of template Summary Reports (this study, section From Operational Criteria to Information Gathering: Establishing a Consistent Reporting System); 4) The analysis of the reporting gathered over time by means of document analysis (this study, section The Urban-Regional Innovation Partnerships Summary Reports: A Document Analysis); and 5) A self-assessment of the efficacy of the reporting system to gather information and stimulate reflection on the transdisciplinary quality of each event (this study, section Looking Back: Shaping a Collaborative Self-Assessment Exercise).
We elaborate on each step separately in the following subsections.
of rendering these monitoring and evaluation mechanisms pre-requisites for obtaining structural research funds.
Literature Review: Transdisciplinary Quality Principles, Operational Criteria, and How "Putting Them to Work" The first step taken to operationalize the transdisciplinary approach in NATURVATION and in the URIPs thematic events consisted of scoping relevant literature. The question that guided the literature review was how to operationalize the notion of transdisciplinarity as co-production of knowledge in a large international project of the scope and complexity of NATURVATION, with a focus on the methods for transdisciplinary knowledge co-production. Standard scientific repositories and search tools like SCOPUS and Google Scholar were employed in the search of sources. Keywords like "transdisciplinary methodology, " "knowledge coproduction, " and "transdisciplinary operationalization, " among others, were used to detect relevant studies. From an initial set of several hundreds of titles, 100 sources on the theory and practice of transdisciplinary research were selected. These included both primary sources and grey literature. The selection was executed by quick-scanning abstracts and executive summaries. A second reading of the sources resulted into two subsets. One subset grouped the studies dedicated to the historical and epistemological development of the concept of transdisciplinarity from its origins to date. The other subset grouped the studies on the operationalization, monitoring, and evaluation of the practice of TD research in projectbased researches 3 . This study is informed mostly by this latter subset.
From it, the clear prominence of the principles of relevance, credibility, legitimacy, and effectiveness as guiding principles for designing transdisciplinary investigations emerged (Belcher et al., 2016). These principles were therefore adopted as guiding principles for the research design of NATURVATION. At the same time, the review revealed the scarcity of studies on how operationalizing such principles in large projects that build-up upon different activities in multiple countries over several years (Hoffmann et al., 2017). Thus, rather than providing definite answers, the literature review supported the formulation of the following questions: a) what criteria can facilitate the monitoring of the URIPs' knowledge co-production activities in such a way to assess their adherence to the principles of relevance, credibility, legitimacy, and effectiveness? b) how can these criteria be operationalized in such activities in such a way to generate robust and consistent information and, at the same time, promote relevant reflections from the side of URIPs' members?
These questions are briefly discussed in the two following subsections.

From Transdisciplinary Guiding Principles to Operational Criteria: Contextualizing Transdisciplinary Practices
The literature review recalled in the previous section had made clear that the greatest challenge for monitoring and evaluating situated processes of knowledge co-production like those led by the six URIPs consists of identifying transdisciplinary quality criteria adapt to their unique research contexts. At this scope, between March and June 2017 the research team of PBL held several brainstorming sessions. Parts of these sessions were extended to ICLEI and to the coordinators of the project. The PBL researchers advanced multiple proposals for operationalizing the transdisciplinary guiding principles of relevance, credibility, legitimacy, and effectiveness by means of suitable criteria. The initial proposal centered on the notion of mutual learning as key to transdisciplinary work. It consisted of inviting the members of the URIPs to set individual learning goals. Such goals should have covered the entire duration of the project and should have been the object of reporting regarding their achievement on a regular basis. This approach-inspired among others by the work of Roux et al. (2017)-was meant to generate information, from the perspective of the participants in the URIPs, regarding their achieved learnings. The idea was then to evaluate such learnings against the TD guiding principles of the project. If, for example, learning new ways to minimize heat island effects in the city would have been an explicit learning goal for one or more participants in the URIPs, whether or not such goal would have been achieved during the respective works would have enabled the evaluation of their relevance to the desired learnings of participants. The PBL team would have then been in a position to produce robust observations on the adherence of the project's outcomes to the principle of relevance and, by replicating the approach for all principles, to those of credibility, legitimacy, and effectiveness.
In the light of the task load that this otherwise promising monitoring and evaluation method could have implied for the participants in the URIPs, the proposal of its implementation was discarded. Indeed, due to the intensive project plan, at the time several URIP participants had already flagged the risk of suffering from "stakeholders' fatigue" (Baró, 2017): a risk that was not explicitly anticipated in the project proposal ( Table 1), and that this approach may have exacerbated further. A subsequent proposal then consisted of identifying a set of quality criteria relevant to the four transdisciplinary guiding principles and proposing them to the participants of the URIPs as "reflexive devices" on the dynamics and results of the thematic events already on their agenda. The observations gathered would have enabled to reflect on the factors that had enhanced or undermined the quality of the TD collaboration among the members of the URIPs during each event. In the light of the simpler implementation of this method, the relevant proposal was endorsed by ICLEI and by the coordinators of the project.
The following steps consisted of identifying the most adapt criteria for reporting on the thematic events of the URIPs in such a way to capture their relevance, credibility, legitimacy, and effectiveness reported in Table 2. Based on inputs from the URIPs, Urban-Regional Innovation Partnerships; PBL, the Netherlands Environmental Assessment Agency.
literature review and on following brainstorming sessions, these criteria were identified in the criteria of inclusiveness, equity, flexibility, and consistency. From the side of the brainstorming sessions, one important input for arriving to identify these criteria consisted of articulating questions like, "what makes the questions discussed during a URIP thematic event relevant for those participating in it?"; or, "what makes participation in the event effective for individuals?" The writing of short answers to these questions-e.g., "one's professional goals"; "one's ability to voice her opinion, " etc.,-provided the basis for reasoning around the criteria most adapt for capturing the guiding principles of transdisciplinarity as these would have "worked" in the specific context of the URIP events. From the side of the literature, an important role for their identification was played by the relevant study of Belcher et al. (2016), where questions of inclusiveness and equity of participation of stakeholders in collaborative forms of knowledge production are explicitly addressed. The Bracken et al. (2015) study on the perspective of stakeholders involved in large transdisciplinary projects provided additional arguments for including the criterion of flexibility. The fourth criterion of consistency was added with the intent of stimulating reflection regarding the overall adherence of thematic events with the inclusive, equitable, and flexible spirit that should have permeated their organization and management. Finally, comparable experiences of transdisciplinary research design of the authors of this study led to endorse the final set of four criteria (Kunseler et al., 2015;Wamsler, 2017). These are reported in Table 3.
How the criteria were "administered" to the URIP coordinators in such a way to gather relevant information is described below.

From Operational Criteria to Information Gathering: Establishing a Consistent Reporting System
Having identified workable criteria for generating information on the URIP thematic events relevant to the adopted guiding principles of transdisciplinary quality, the next question consisted of how "putting them to work." A particularly sensitive dilemma for the PBL researchers consisted of whether opting for "intrusive" information-gathering approaches, like interviewing the members of the URIPs regarding the dynamics and outcomes of thematic events by revolving around the four criteria, or opting for approaches that would have minimized their direct involvement in the gathering of information. This latter concern was corroborated by inputs provided by some URIP coordinators regarding the risk, for stakeholders, to feel like "guinea pigs" (Baró, 2017).
By virtue of this and other practical difficulties, including multiple language barriers, the most effective strategy seemed that of promoting systematic reporting on the four criteria from the side of URIP coordinators by incorporating them into the template Summary Report already used by them for reporting on the thematic events on their agenda. By filling in the template, URIP coordinators were required to report on "quantitative" aspects of each thematic event-like the number of participantsas well as on content-related aspects like the thematic sessions held, the information exchanged, the agreements reached, and so on. With the introduction of the four quality criteria of inclusiveness, equity, flexibility, and consistency as explicit points of reflection in the Summary Reports, starting from June 2017 the coordinators of the URIPs were put in condition to generate also this information on each event. Table 4 reports a copy of the template used for facilitating the systemic reporting of the URIPs. In the following section, we present the method of analysis of the Summary Reports that was used for informing the self-assessment exercise discussed in the section Discussion: What the Analysis of the Urban-Regional Innovation Partnerships' Summary Reports Reveals, and Our Self-Assessment of the Relevant Monitoring and Evaluation Strategy.
The Urban-Regional Innovation Partnerships Summary Reports: A Document Analysis To prevent any confusion, it is newly emphasized that the primary scope of this study does not consist of reflecting on the examined thematic events in relation to the criteria discussed in  Belcher et al., 2016;in Basta and Kunseler, 2018).

Quality principles Research evaluation criteria
Relevance Relevance is the importance, significance, and usefulness of the research project's objectives, process, and findings to the problem context and to society.
• The appropriateness of the timing of the research, the questions being asked, the outputs, and the scale of the research in relation to the societal problem being addressed; • Researchers must demonstrate an in-depth knowledge of and ongoing engagement with the problem context in which their research takes place; • From the early steps of problem formulation and research design through to the appropriate and effective communication of research findings, the applicability, and relevance of the research to the societal problem must be explicitly stated and incorporated.
Credibility Credibility refers to whether or not the research findings are robust and the knowledge produced is scientifically trustworthy.
• Clear demonstration that the data are adequate, with well-presented methods and logical interpretations of findings; • High-quality research is authoritative, transparent, defensible, believable, and rigorous; traditional disciplinary criteria can be applied in TDR evaluation to an extent; • Additional and modified criteria are set that address the integration of epistemologies and methodologies and the development of novel methods through collaboration, the broad preparation, and competencies required to carry out the research, and the need for reflection and adaptation when operating in complex systems; • Researchers are actively engaged in the problem context, which includes extra-scientific actors as part of the research process so that the relevance and legitimacy of the research are facilitated; • Heightened requirements of transparency, reflection, and reflexivity to ensure objective are carried out; • Transdisciplinary researchers must ensure they maintain a high level of objectivity and transparency while actively engaging in the problem context.

Legitimacy
Legitimacy refers to whether the research process is perceived as fair and ethical by end-users. Whereas credibility refers to technical aspects of sound research, legitimacy deals with socio-political aspects of the knowledge production process and products of research.
• Genuine and appropriate inclusion and consideration of diverse values, interests, and the ethical, and fair representation of all involved; regardless of the depth of participation, processes for effective and fair collaboration are present; • Societal actors are involved along a continuum of participation from consultation to co-creation of knowledge; • Researchers explicitly reflect on and account for their own position, potential sources of bias, and limitations throughout the process, and make the process transparent to those external to the research group who can then judge the legitimacy based on their perspective of fairness.

Effectiveness
The research contributes to positive change in the social, economic, and/or environmental problem context. Transdisciplinary inquiry must have the potential to (ex-ante) or actually (ex-post) make a difference if it is to be considered of high quality.
• Potential research effectiveness can be indicated and assessed at the proposal stage and during the research process through a clear and stated intention to address and contribute to a societal problem, the establishment of the research process and objectives in relation to the problem context, and the continuous reflection on the usefulness of the research findings and products to the problem; • Ex post research effectiveness can be measured "conventionally" (outputs such as e.g., journal articles) but require additional indicators, for example: • The contribution of the project to social learning and change (through e.g., capacity-building events); • The contributions of the project to changes in policy and practice resulting in social, economic, and environmental benefits.
the previous section. Our scope is rather self-assessing whether the identification and "administration" of the four criteria of transdisciplinary quality for reporting on such events was experienced as effective monitoring and evaluation strategy by the members of the URIPs who were involved in the reporting. That is why, for reasons of practicality, only a limited number of Summary Reports produced between June 2017 and December 2019 were included in the analysis. The URIPs of reference were reduced to three, namely, to Barcelona, Malmö, and Utrecht. For reasons of comparability, three reports per URIP were selected. The Summary Reports analyzed are, thus, nine in total 4 . All refer to events held in the same period of the year in the three respective cities. Overall, their level of elaboration and density of information is consistent. To corroborate the statements and narratives extracted from these Summary Reports, additional materials were included in the analysis, among which Barcelona's 4 A more comprehensive analysis the URIPs Summary Reports is under completion and will be collected in the Final Report on the Transdisciplinary Practice in NATURVATION (Basta and Kunseler, 2018, in progress).
URIP Yearly Report (Baró, 2017) and two narrative reports relevant to the Stakeholders Dialogues held in Utrecht and Malmö (2018 and 2019, respectively). The method used for analyzing this material is document analysis (e.g., Bowen, 2009). The main advantage of this method consists of the verifiability of the sources included in the analysis; something that other methods of information gathering like, e.g., participant observations methods, do not enable in full. One of the main disadvantages consists of the limitations intrinsic in the generation of information from the side of the document's writer, who filters it according to her subjectivity and contextual circumstances. This limitation is particularly relevant to the type of material analyzed here. A further limitation consists of the mutual subjectivity of the analyst in detecting significant statements.
For these reasons, rather than limiting the document analysis to the sections of text explicitly dedicated to comment on the four criteria of transdisciplinary quality, the analysis included all the text and illustrations in each Summary Report. Significant and/or recurrent statements from each report were extracted 3 | Overview of the quality criteria chosen for operationalizing the transdisciplinary research quality principles of relevance, credibility, and legitimacy.

Quality criteria Indicators
Inclusiveness How heterogeneous and representative in terms of interests, stake, and perspectives on NBS were the participants in the meeting? Were any disciplines, positions, interests, and/or cultural groups over or underrepresented? Was the overall age and/or gender diversity of participants noticeable?

Equity
Besides being present at the meeting, did all participants have equal opportunities to voice their opinions, interests, needs, and objectives? Could you give some examples? In case not all participants could be "heard" (e.g., because of lack of time, or because of the "predominance" of one or more participants' on others) what changes and/or improvements could be considered for organizing future events and ensuring all can participate equally?

Flexibility
Allowing for changes, remaining open to feedback, and facilitating learning helps engage participants in the co-creation of knowledge on NBS. Was flexibility evident in the organization of the event? Can you describe how this was the case?

Consistency
Reflecting on the three criteria of inclusiveness, equity, and flexibility-and reporting on them critically and with integrity-is essential for securing consistency among and distilling "lessons learnt" from the URIP's work. What practical measures have made the process and/or event consistent with a view on the criteria inclusiveness, equity, and flexibility? What new measures and/or criteria would you recommend considering and implement in the future?
without clustering them by attribute (e.g., positive/negative) or category (factual or causal statement). Data and observations, whether explicit or implicit, regarding the four quality criteria of inclusiveness, equity, flexibility, and consistency of the activities object of reporting were also extracted. The extraction included both quantitative (i.e., data) and qualitative statements (i.e., observations, reflections).
To validate their relevance to the scope of this study, the extraction of statements was executed by one researcher and subsequently reviewed by a second researcher. Extracted statements were then submitted to the URIP members who participate in this study for further validation. Without the pretense of having executed a rigorous triangulation, the reliability of the statements extracted can be therefore considered high. Their overview is reported in Table 5. Table 6 collects further significant statements extracted from the mentioned additional sources.
In the sections Discussion: What the Analysis of the Urban-Regional Innovation Partnerships' Summary Reports Reveals, and Our Self-Assessment of the Relevant Monitoring and Evaluation Strategy and Urban-Regional Innovation Partnerships' Summary Reports: A Closer Look, insights from these statements are briefly discussed. In the following subsection, a short description of the methods used for performing the self-assessment discussed in section The Integration of Transdisciplinary Quality Criteria in the Practice of Reporting: A Self-Assessment is reported.

Looking Back: Shaping a Collaborative Self-Assessment Exercise
The extraction of statements from the URIP Summary Reports relevant to the criteria of transdisciplinary quality of the examined events was meant to provide the basis for further discussion with the participants in this study regarding how the practice of reporting on the criteria was experienced in the course of the project. In essence, the self-assessment "looked back" at whether the four criteria were relevant to the thematic events, have influenced the relevant organization and management effectively, and enhanced the transdisciplinary quality of the URIP process.
To stimulate the discussion regarding these points, after the collection of statements reported in Table 5 was shared with the URIPs members and the ICLEI coordinators who participate in this study, some informal questions were proposed, namely:

Was the setup of the transdisciplinary coordination effective?
Was good and sufficient guidance provided? 2. Were the four criteria proposed for capturing the adherence of the process to explicit guiding principles useful for reflecting on relevant aspects of the URIPs meetings? 3. Would you, also just "in your mind, " reflect on them again in future projects? 4. What are the key learnings-positive and critical-you can derive from the transdisciplinary process that you have led/in which you have participated?
The answers to these questions were summarized in a form suitable for further discussion. Then, a first summary of the outcomes of the self-assessment was circulated by the author leading this study. All participants in it have had an opportunity to integrate and modify its content. Its definitive version is reported in the section The integration of transdisciplinary quality criteria in the practice of reporting: a self-assessment.

DISCUSSION: WHAT THE ANALYSIS OF THE URBAN-REGIONAL INNOVATION PARTNERSHIPS' SUMMARY REPORTS REVEALS, AND OUR SELF-ASSESSMENT OF THE RELEVANT MONITORING AND EVALUATION STRATEGY
This section discusses the two questions introduced at the beginning of the article, namely, the criteria chosen for operationalizing transdisciplinary quality principles in the thematic events led by the URIPs, and the reporting system in which they were incorporated at the two-fold scope of gathering

Meeting number and theme:
This report is authored by (name(s) and affiliation):

Host of meeting: Place/venue of event Date and time of event
(1) Description of the event Objective(s) Participants (please list the names and affiliation of each participant) Agenda Key points discussed (e.g., identified priorities, difficulties, knowledge-gaps, findings on NBS) Main outcomes (e.g., agreements, decisions taken, solutions found; please add timeline if applicable) (2) Reflection on the event General observations Were the objective(s) set for the meeting achieved? What were the main challenges faced during the event (e.g., engaging participants, coordinating the discussion, agreeing on main points, keeping track of all contributions)? What general "lessons learnt" could be taken into consideration for organizing future events? Which follow-up actions (e.g., responsibilities/roles assigned, next meetings, events) have you identified? And by when should these actions be implemented, if applicable? Which issues/aspects/actions will you take into the next meeting? What kind of inputs/support does the URIP need from NATURVATION in the near future (e.g., in terms of research work or content provided, organizational support, input during workshop, update, communication)?
Observations on the transdisciplinary practice The four criteria listed below, which make transdisciplinarity possible, are illustrated in the URIPs guidance document. Please share your observations from the meeting.

Inclusiveness
How heterogeneous and representative in terms of interests, stake, and perspectives on NBS were the participants in the meeting? Were any disciplines, positions, interests, and/or cultural groups over or underrepresented? Was the overall age and/or gender diversity of participants noticeable?

Equity
Besides being present at the meeting, did all participants have equal opportunities to voice their opinions, interests, needs, and objectives? Could you give some examples? In case not all participants could be 'heard' (e.g., because of lack of time, or because of the "predominance" of one or more participants' on others) what changes and or points of improvements could be considered for organizing future events and ensuring all can participate equally?

Flexibility
Allowing for changes, remaining open to feedback and facilitating learning helps engage participants in the co-creation of knowledge on NBS-and acts as a motivational factor. Was flexibility evident in the organization of the event? Can you describe how this was the case?

Consistency
Reflecting on the three criteria of inclusiveness, equity, and flexibility-and reporting on them critically and with integrity-is essential for securing consistency among and distilling "lessons learnt" from the URIP. What practical measures have made the process and/or event consistent with a view on the criteria inclusiveness, equity, and flexibility? What new measures and or criteria would you recommend to consider and implement in the future?
information and generating reflections on the quality of the transdisciplinary collaboration in the URIPs. To do so, the section Urban-Regional Innovation Partnerships' Summary Reports: A Closer Look discusses what the statements reported in Table 5 reveal regarding the dynamics of the thematic events of the URIPs, and what relevant information the four criteria were able to generate. Section The Integration of Transdisciplinary Quality Criteria in the Practice of Reporting: A Self-Assessment shares the outcomes of our selfassessment exercise, which looks back at the overall monitoring and evaluation method to which the criteria have contributed.

Urban-Regional Innovation Partnerships' Summary Reports: A Closer Look
Our document analysis and the following discussion of its outcomes with the participants in this study reveal a consistent demand of flexibility regarding the organization and content of the thematic events of the URIPs examined. Other similarities among the respective dynamics have emerged. For example, more scientists and public authorities have participated in such events than business representatives. This seems to show a consistent (under)representation of different parties in the URIP events examined. Another similarity that emerged from the reports is the demand of relevance and applicability of the scope and outcomes of the thematic meetings to the daily practices of stakeholders. Such demand seemed particularly strong in relation to one of the most important deliverables of the project, namely, the NBS assessment framework whose co-design involved the URIPs in Utrecht and Malmö in particular.
From the perspective of this study, what is important to note is that the narratives that can be extracted from the materials examined suggest a strong interrelation between the criterion of flexibility and the principle of relevance: at times, they both conveyed the demand of aligning the URIPs thematic events to the goals of stakeholders. In fact, the connotation of flexibility in large part of the text extracted is both organizational and thematic: in other words, it conveys a demand of "adaptivity" of thematic events to the stakeholders' goals. This two-fold meaning of flexibility as both organizational and thematic that emerged from the document analysis suggests, therefore, to consider the "synthetic" criterion of adaptivity of knowledge co-production events as possible criterion of transdisciplinary quality in future similar exercises.

Barcelona Inclusiveness
We could reach a fair variety of stakeholders (from public authorities to community-based organizations), including representatives of four levels of public administrations (regional, provincial, metropolitan, and municipal).
[However] SME and community or non-governmental organizations were clearly a minority.
Our group is overrepresented by public authority and academia. The main reason for this is the scheduled time for the meetings.
In terms of age and gender diversity we think there is an acceptable balance (…) sessions in breakout groups are clearly valuable because they facilitate the involvement of all participants in the discussions and allow to focus on specific topics or case studies in accordance to stakeholders' interests or expertise We plan to invite other stakeholders to present their initiatives, plans, or policies in relation to NBS in future meetings since the online questionnaire results showed that many URIP members are ready and happy to do that.

Equity
The unbalanced mix of stakeholders in the meeting had a direct impact on the prioritization or "voting" process A flexible approach is adopted in the organization (…) we try to engage with ongoing local policy processes or key topics (e.g., urban resilience in this case) related to NBS in order to raise interest and involvement among key participants.

Consistency
(…) maintaining the engagement of some stakeholders during the whole URIP process will be challenging because of stakeholders' fatigue due to participation in other research or policy processes; critical view of NBS concept; feeling of "being used" by research projects but not getting any useful output in exchange.
(…) to ensure that forthcoming meetings are also successful, we really need to keep fostering a transdisciplinary/co-creation process in which stakeholders feel that their interests and priorities are considered Reaching full consistency is very challenging …especially (for) the lack of policy mandate (stakeholders' participation is only based on their own interest and willingness) New criteria/measures should be clearly orientated toward mitigating stakeholders' fatigue (…) the presentation of ongoing initiatives related to NBS and UGI by the stakeholders themselves (in this case the Barcelona Resilience Strategy) is clearly positive because: (1) it provides an opportunity to stakeholders to actively contribute to the URIP meetings; (2) it links the URIP process with policy or social initiatives that have a clear mandate or support; and (3) it has a beneficial effect in terms of mutual learning and knowledge exchange.

Utrecht
Inclusiveness Some sectors may have been underrepresented There was no noticeable underrepresentation in terms of age All five Dutch partners were represented There was a reasonable gender and age distribution among the external partners who were represented To improve inclusiveness, URIP Utrecht prepared posters announcing the event together with GroenMoetJeDoen! Equity All participants seemed to be able to voice their concern and opinions, perhaps aided by the informal setting There was limited time for discussion A small number of people did not actively participate in the discussion. This could have been prevented with small-group discussions There were plenty of opportunities to ask questions following the symposium and during the informal bicycle tour

Flexibility
There was scope for questions Speakers were flexible, open to questions, provided ample explanations The program was changed during the event to allow time for presentations The format for the discussion was very open Different ideas for the event were discussed, leading to e.g., the decision to include a mini-symposium, to invite the alderman, and to visit examples of initiatives in disadvantaged parts of the city

Consistency
The interactive and active mode of the meeting worked well in drawing in and engaging stakeholders There was little time for discussion and little representation of external partners.
Planning the event with representatives of three different organizations, visiting disadvantaged areas during the bicycle tour, inviting citizens, not only professionals, organizing it during the weekend to make it easier for citizens to participate, keeping the talks relatively short and at 'introductory' level.

Malmö Inclusiveness
Need to increase the representatives from construction companies and business-oriented activities A female bias is in the group The possible commercial developer was represented by one actor, the public (actors) of the two cases were not represented Many more women attended the meeting than men Mainly consultants, authorities, and scientists were present at the event The meeting lacked the perspective of the property developer Seven women and four men attended the meeting (Continued) Frontiers in Climate | www.frontiersin.org The moderator made sure all participants who wanted to contribute had a chance to do so The mini workshops provided all participants the opportunity to actively reflect and discuss from the perspective of their roles and competencies

Flexibility
Flexible agenda, no real time slots All presentations allowed for discussion. This was very positive for knowledge exchange We experienced the meeting environment as equal and flexible-no problem for anyone to ask questions, share their thoughts/ideas The meeting always allowed for open comments and/or questions which is positive from a learning and exchange perspective

Consistency
We experienced the meeting environment as inclusive, equal, and flexible

URIP Qualitative statements
Barcelona Some outputs are difficult to be communicated to URIPs members because they are perceived as "too academic" or irrelevant for practice Inviting a wide range of stakeholders is an important aspect of the URIPs process [There is a] difficulty to engage grassroots/civil organizations in the process Flexibility measures introduced to adapt URIPs sessions to stakeholders' interest avoid stakeholders' fatigue (clearly perceived as a risk) Utrecht Participants found it difficult to understand the challenges "inclusive and equitable governance" and "social justice and cohesion" The participants also made some critical remarks (…). In their view, the [assessment] method has no added value for practical purposes

Malmö
Relevance and legitimacy are very important for the stakeholders.
To ensure its legitimacy it is essential that [the tool] is made to fit into existing processes The developed tool has to fit existing and upcoming assessment needs, if not, there is a risk that it will not be used The document analysis revealed also several "local specificities." For example, the criteria of inclusiveness and equity led to signal the predominance of participation of women in the events held in Malmö. Less effective seems to have been the criterion of consistency, which was meant to capture the "practical measures [that] have made the process and/or event consistent with a view on the criteria of inclusiveness, equity, and flexibility" ( Table 4). The criterion generated very different contents in the Summary Reports analyzed and was difficultly associable to any contents other than the content shared in the specifically dedicated section. Meant as a sort of "overarching criterion" for reflecting on the factors that had facilitated the satisfaction of the other three criteria, in fact, the observations generated by the criterion of consistency failed to convey original content in comparison with those generated by the other three criteria. This suggests to reconsider the suitability of the criterion of consistency as productive "reflexive device" in the framework of the systematic reporting on the transdisciplinary quality of knowledge co-production events of the type examined here.
In sum, the criteria of inclusiveness, equity, and flexibility concurred in generating valuable reflections on the thematic events examined. They were also sufficiently descriptive and comprehensive, in the sense that they did not seem to require further criteria to generate original information on the events' dynamics.
The real novelty though is the considerations engendered by the criterion of equity. Its scope was stimulating the coordinators of the URIPs to observe whether gender, age, cultural, and professional diversities among the attendants of knowledge coproduction events were corresponded by equal opportunities to participate in the respective works. In other words, the criterion of equity was meant to generate considerations on whether the diversity of contributors to each event was only "formal, " or also substantive. While a consistent climate of equity seems to have characterized the events examined, several Summary Reports reported on situations in which such equity of involvement was not fully achieved. The criterion of equity was therefore effective in generating observations on whether all participants in the events were concretely involved in their development.
That is precisely what this criterion of equity was "meant to do." The rationale of its inclusion in the systematic reporting consisted of preventing what we may call a "bureaucratic" approach to the inclusion of multiple stakeholders in the works of the URIPs. In other words, the criterion intended to capture whether relevant organizational choices were limited to invite stakeholders to knowledge co-production events by virtue of their "representation" of different interest groups without, at the same time, empowering them to contribute to their development in concrete.
Such intent was grounded on the experiential knowledge of the researchers of the PBL. Such knowledge suggests that, when limited to secure the "representation" of different parties and stakes in collaborative knowledge production events, the criterion of inclusiveness may well secure the invitation of a heterogeneous group of participants: but not their actual participation in the generation and mediation of contents. Some, for example, may find themselves in a position of minority and not feeling relevant enough to voice their viewpoints; others may be predominant in the discussion of contents and prevent others from participating in it actively. The criterion of equity was therefore meant to draw the attention of the coordinators and moderators of the URIPs' events not only on whether their participants would be representative of different disciplinary perspectives, professions, and social stakes, but also on the fair chance that each individual would have to contribute to the objectives of the event in concrete.
From the perspective of this study, having facilitated the emergence of observations relevant to the equity of participation of individuals in the events of the URIPs is one of the positive outcomes of the monitoring and evaluation strategy here described. Among the criteria of transdisciplinary quality for which there exists ample literature of reference, the criterion of equity is the least represented: thus, besides "productive, " it is the most original criterion among those that were chosen as quality criteria of one of the most significant knowledge coproduction activities of NATURVATION. This latter observation introduces our final reflections regarding how the four criteria for monitoring and evaluating the quality of transdisciplinary collaboration in the framework of the events of the URIPs, and the method through the criteria were "put to work, " were experienced by the members of the URIP who participated in this study. This is discussed in the following section, after which we present our conclusive remarks.
The Integration of Transdisciplinary Quality Criteria in the Practice of Reporting: A Self-Assessment Providing a thorough overview of all the remarks that the questions that were shared among the participants in this study in such a way to self-assess the criteria chosen for monitoring the transdisciplinary quality of the thematic events of the URIPs would conflict with the limited space of this article. This section is therefore limited to the clear points of agreements and shared reflections that have emerged from the analysis and discussion of the examined Summary Reports.
The first agreement regards the quality of the coordination of the URIP transdisciplinary process provided by the staff of the ICLEI. Besides the thematic events discussed here and the periodic plenary sessions of knowledge exchange, the devised iterative program included regular webinars. These provided an easily accessible platform for exchanging thoughts on the transdisciplinary process under development and sharing relevant experiences despite some participants experienced the iterative program as rather rigid ("There was too much top-down steering on the agenda and a much more open-ended reflexive approach could have been taken in which the URIPs were invited to respond more strongly to local ambitions and processes"), generally "the thorough guidance, structure and documentation of the URIP activities, which was very well coordinated by ICLEI, helped the URIPs' staff to not get lost in such a huge project." A second important point that emerged from the selfassessment regards the four criteria. All the participants in this study experienced them as rather useful for stimulating reflection on aspects deeper than the mere heterogeneity of the participants in the URIPs events. From this viewpoint, it can be concluded that the criteria were effective in stimulating reflections on the dynamics of participation of the events. However, most of the participants revealed limited knowledge of the background work that motivated the introduction of these criteria in the reporting system. The brainstorming sessions that led to their identification, and the several proposals advanced by the PBL team for operationalizing them in the activities of the URIPs, were from unfamiliar to unknown to the very members of the URIPs who co-author this study.
Retrospectively, this unfamiliarity can be explained as due to the combination of different factors. The first factor is the different timing of the activities of different teams in the indeed "huge project" that NATURVATION has certainly been. For example, the selection of criteria for operationalizing the guidingprinciples of transdisciplinary quality in the operational context of the URIPs took place in the early months of 2017; at the time, however, several URIPs had not yet started to organize and communicate about their works on regular basis. A second factor that may explain the unfamiliarity of URIPs members with the selection of the four criteria and with the scope of their "administration" by means of the reporting system is the rigid separation of functions between the coordinators and the observants of the transdisciplinary process in NATURVATION: a separation explained in section The Challenge of Building Transdisciplinary Capacities in Large European Union Projects: Nature-Based Urban Innovation Methodological Approach, and the Perspective of this Study, and on which we will return in the conclusions of the article.
What our self-assessment certainly shows is that the lack of deeper "background knowledge" regarding the use and scope of the four criteria in the reporting system led several participants in the URIPs to approach the relevant sections of the template Summary Report as a bureaucratic task of unclear added value to their learning experience. One remark captures this feeling of unclarity well: "The systematic reporting on URIP meetings was a good initiative to show the evaluation of the knowledge exchange process over time, although it has never been completely clear to me whether this was simply to fulfil our bureaucratic duties or whether that would serve a broader learning purpose." Moreover, "These four criteria were useful to critically reflect on aspects of the process but rather broad, and therefore could easily be interpreted in a selective way." As recommended by another study's participant, "My recommendation would be that the criteria should not only become a reporting task but actually something that gets explicitly discussed in the activities themselves." While these and other similar remarks mirror a general feeling of unclarity regarding the rationale of selection and utilization of the four criteria of inclusiveness, equity, flexibility, and consistency in the reporting system, a general consensus regarding the perspective of using some of them as guidingcriteria in future collaborative projects emerged. For example, "The URIP process made me appreciate the dimension of flexibility a bit more, because [by] taking a flexible approach we managed to engage many stakeholders and (hopefully) influence decisionmaking processes in the city relevant to urban nature." In the words of another participant, "The criteria we used are relevant and good for operationalizing the overarching criteria. . . I would definitively use these criteria for future projects. I think that it is a relevant exercise to do this." What is needed though is "(. . . ) to make people understand what is only an academic exercise, and what is actually relevant for practice." This latter advice anticipates our conclusive remarks, which conclude our reflection on our experience of transdisciplinary collaboration in NATURVATION.

CONCLUSIVE REMARKS: TRANSDISCIPLINARY RESEARCH AS THE ART OF "BRINGING ORDER TO CREATIVE CHAOS"
At the end of this account of our experience of transdisciplinary collaboration in the framework of a large EU research and innovation project, we share some reflections on the relevant challenges, distill lessons relevant to the two research questions that motivated this study, and formulate recommendations for the monitoring and evaluation of future transdisciplinary research. With respect to the latter, we address also the research funding agencies that encourage the adoption of transdisciplinary approaches in large international projects.
A first reflection regards the overall experience of maintaining a transdisciplinary research design in NATURVATION. What started as an apparently simple method for gathering information on given aspects of a set of knowledge co-production events held in six European cities over 4 years proved to be anything but a straightforward solution to an uncomplicated problem. Indeed, the identification of transdisciplinary quality criteria that would be applicable to the work of the URIPs, and the incorporation of these criteria in the systematic reporting system, responded to the need for generating a robust evidence-base on the building of transdisciplinarity capacities in a project involving hundreds of participants with diverse disciplinary backgrounds, work cultures, and objectives. What may appear as a basic informationgathering method stems from what we experienced as the effort of "bringing order to creative chaos": an expression that captures the research context in which our monitoring and evaluation strategy had to be devised. The very task of reconstructing the rationales and contexts that led to certain methodological choices and materials required a considerable collaborative effort by the very same authors of this study, all of whom were heavily involved in the project, but represent only a fraction of its core participants.
A second general observation is that participants in such large transdisciplinary projects tend to remain divided along the lines of academic vs. non-academic participants, that is between research-driven and practice-driven participants. In our experience, this divide tends to be collectively experienced as the boundaries within which individuals operate as "scientists" vs. "non-scientists, " "researchers" vs. "stakeholders, " and so on. In our view, the resulting clustering is not only simplistic, but unfounded. It is simplistic because it rests on the false assumption that "non-scientists" do not approach complex innovation questions scientifically, while scientists do not approach them practically. It is unfounded because, in large transdisciplinary projects, individuals assuming roles as researchers, policymakers, stakeholders, and professionals may have highly hybrid backgrounds, which may include having spent parts of their career in academia, public service, the private sector, or civil society. NATURVATION offered a clear example of such hybridity, and of the consequent inadequacy of dividing project participants along the lines of science, policy, and practice.
We now address the two questions that motivated this study, namely, the adoption of four transdisciplinary quality criteria for generating information regarding the collaboration within the URIPs, and their administration by means of a systematic reporting system. Regarding the former, our self-assessment showed that the criteria of inclusiveness, equity, and flexibility were able to generate sufficiently descriptive information about the dynamics of collaboration in URIP events. At the same time, they could also be used to generate insights from coordinators regarding the organization and management of these events. Not all coordinators shared an equal understanding of, and hence commitment to, the scope of reporting, however. An important lesson in this regard is the crucial importance of sharing the scope of future reporting systems and engaging participants from the very beginning. In future transdisciplinary projects, we are therefore likely to promote periodic exchanges on the reporting system with those involved in it; this should facilitate its "adaptivity" to their feedback with the connotation of this term that emerged from the document analysis that was executed at the scope of our self-assessment.
Positioning the PBL researchers as observers of the transdisciplinary process rather than as active participants in its coordination constituted a clear limitation for such an "adaptive" use of the monitoring and evaluation method described here. However, to come to our final remarks, the relevant choice mirrors the plurality of roles and objectives that characterize any transdisciplinary research endeavors. In our view, such plurality is the principal added value of research and innovation projects like NATURVATION: by bringing together diverse individuals, these projects create disciplinary conflicts, their subsequent mediation and solution, and thus facilitate the birth of novel cross-disciplinary collaboration. Through constant interaction and comparison with others, these projects enrich and widen perspectives of those taking part in them. In other words, by elevating the quality of the process of research to one of the explicit research objectives, these projects enable learning.
In our experience, the quality of this latter process is proportional to the commitment to create the conditions that allow all participants to reach their common, but also individual, objectives. Indeed, in the words of one participant, "transdisciplinarity should not be confounded with many-many-many being always involved in everything." We have found that working transdisciplinarily means heeding all such objectives equally-whether scientific, professional or social; theoretical or practical; individual or collective-and encouraging collaboration among project participants without clustering them into simplistic categories: an important lesson for project coordinators, we think.
In the same vein, transdisciplinary research should not be experienced as a fluid and somehow "spontaneous" form of knowledge co-creation for which traditional research methods and rigorous monitoring and evaluation mechanisms can be abandoned. On the contrary, "bringing order to chaos" in the framework of transdisciplinary research and with respect to the necessary teamwork renders the robustness, replicability, and verifiability of the ways in which knowledge is co-produced and synthetized only more relevant. Such relevance will often imply adopting traditional monitoring and evaluation approaches that will require, to some, to acquire data and information from others. While NATURVATION showed that the risk of letting these "others" feel like "guinea pigs" instead of participants on an equal footing in the process of research is present, loosening the methodological rigor by which information relevant to given project's objectives is gathered and analyzed is not a desirable risk-mitigation strategy. A more effective strategy consists of recognizing the importance of accurate and continuous internal communication on research methods from an early stage in the project. This will help foster a climate of goodwill among project participants at all levels.
Our self-assessment enabled us to recognize that the early recognition of such importance was not followed up by an entirely effective internal communication. As a result, some opportunities were missed. On the one hand, besides generating valuable observations, increased involvement of URIP members in reporting would have generated more actions on the proposed criteria, for example, by creating an even more flexible agenda or empowering all stakeholders to voice their opinions during each meeting. On the other hand, a stronger engagement of URIP participants in the reasons behind the use of the criteria of inclusiveness, equity, flexibility, and consistency for reporting on their events may have strengthened those "new capacities" that should be part of the ambitions of any knowledge coproduction endeavor (Norström et al., 2020). Ambitions that, due to our discontinuous communication on the URIP reporting the scope of the system, were only partly met. Finally, more effective communication would have made the research goals of the PBL team, who designed the transdisciplinary monitoring and evaluation strategy, more accessible to the participants who were instrumental to their achievement.
These conclusions raise again the inherent difficulties of monitoring and assessing the process and outcomes of large international projects that have to contend with different cultures, multiple ambitions, and goals. Rather than having reduced such complexity, our monitoring and evaluation strategy struck a balance between the need of gathering and documenting the relevant process in a robust and accessible away, and allowing for the "creative chaos" of NATURVATION to generate innovation by means of unforeseeable and unconventional forms of collaboration. The importance of constant mediation is therefore the main lesson we will take forward into future transdisciplinary research.
Finally, our final recommendation addresses the research funding agencies that support the adoption of transdisciplinary approaches. In our view, requiring the adoption of transdisciplinary monitoring and evaluation methods already at the project proposal stage may be beneficial for national and international policies on research fund acquisition. Besides enhancing the methodological quality of future TD projects of future transdisciplinary projects regarding their scientific credibility and social relevance, the requirement of adequate transdisciplinary monitoring and evaluation methods at an early stage would help build a knowledge-base on the different interpretations and applications of the TD approach in the context of structural research. This would facilitate would facilitate parallel methodological research aimed at raising those methods to the highest standards of robustness and replicability. We hope that the experience documented in this article and other desirable developments of the practice of transdisciplinary research in the sustainability sciences.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

AUTHOR CONTRIBUTIONS
CB has designed and led the execution of the study documented in the article as well as all background-studies cited in it. EK has acted as internal peer-reviewer of the study, including the executed document-analysis. As member of the PBL's team responsible for the respective research, she has been involved in the study from its conception up to delivery. CW has thoroughly reviewed multiple versions of the article and partaken in the selfassessment exercise discussed in it. AJ and FB have contributed to the exercise of self-assessment. They have also provided valuable comments on the latest versions of the manuscript. IB and MB from the ICLEI Secretariat have contributed to the exercise of self-assessment. MB has also reviewed the last version of the manuscript prior to its re-submission, providing valuable inputs on the research-practice interface enabled by TD teams. BW has actively partaken in the online discussions that have informed the execution of the self-assessment discussed in the study. All authors contributed to the article and approved the submitted version.

FUNDING
The NATURVATION project object of the article was funded by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 730243.