Formal Learning in Informal Settings—Increased Physics Content Knowledge After a Science Centre Visit

Over the past 50 years, the prevalence of interactives in museums and science centres has increased dramatically, with interactive learning proliferating around the world. With a current estimated visitation of 300 million people each year, free-choice learning through museums and related venues has become a major source of human learning over the course of a lifetime. While many studies of visitor experience have examined positive changes in affective components of learning, fewer have examined whether specific scientific content knowledge is included in what is learnt. This research investigated gains in content knowledge through informal science learning. Three surveys were conducted at the Otago Museum’s science centre (Dunedin, New Zealand) with visitors eight years and older. The main component of the survey included a brief “formal” content knowledge assessment in the form of a pre-post multiple-choice test, with a focus on physics concepts illustrated in the science centre. Self-reported examples of science learned during the visit and selected items from the Modes of Learning Inventory complement the data. In the pre-post test, prior knowledge was age and gender dependent, with younger visitors and females getting significantly lower scores. Notwithstanding, visitors to the science centre had an overall average of 13% more correct answers in the test after visiting, independent of age and gender. A learning flow diagram was created to visualise learning in the presence or absence of interactivity. As expected, interactivity was found to increase learning.


INTRODUCTION Science Learning at Science Centres
Learning is one of the most sought-after visitor-related outcomes by museums, second only to revenue (Jacobsen, 2016). This research studied learning in a science centre embedded within a museum.
Stemming from the still discussed deficit model, where knowledge flows from experts to novices (Cortassa, 2016), learning has been traditionally defined in terms of knowledge acquisition (Illeris, 2018). However, it is not knowledge alone what will determine what people will do with information, but their personal values, beliefs and attitudes (Kahan et al., 2012;Cortassa, 2016).
Instead of the deficit model, we have used the Koru Model of Science Communication (Longnecker, 2016) in which informal education is part of a learning ecosystem where facts are transformed into coherent information that can, in turn, be transformed into knowledge when individuals engage with it. Accordingly, we consider science learning to be the structured updating of scientific literacy based on processing new information that challenges a prior state, as described by Barron et al. (2015). In turn, while scientific literacy is a contested construct (Linder et al., 2010), it can be considered to encompass multiple concepts such as attitudes, understanding of the scientific method and engagement with science-related issues (Organisation for Economic Cooperation and Development, 2016). A broad approach to the study of science learning can be found in Solis (2020).
Although not the only component, scientific knowledge is commonly placed at the core of what scientific literacy implies (National Academies of Sciences Engineering and Medicine, 2016). While personal values, beliefs and attitudes need to be considered when speaking of learning, knowledge needs to be considered as well.
This study focuses specifically on learning related to content knowledge, defined as the "knowledge of facts, concepts, ideas, and theories about the natural world that science has established" (Organisation for Economic Cooperation and Development, 2016). Though limited in scope, content and procedural knowledge are reasonable indicators of science knowledge (National Academies of Sciences Engineering and Medicine, 2016). Thus, only for the purposes of this research, science learning is operationalized this time as a change in visitor's scientific content knowledge before and after visiting the science centre.

Value of Formal Assessment
Free-choice learning refers to learning that is up to the individual (Jacobsen, 2016). Many studies have shown evidence of increasing scientific knowledge at science centres (e.g., National Research Council, 2009;Martin et al., 2016), with some estimates stating that informal learning makes up as much as 70-90% of a person's learning (Latchem, 2014) However, assessment in informal environments has typically relied on self-reporting (National Research Council, 2009). Employing self-reporting techniques to assess learning of content knowledge has advantages, but it assumes that an honest respondent is enough for an accurate self-report (Paulhus and Vazire, 2007), and this may not always be so. The "familiarity hypothesis" considers that an individual's familiarity with a science topic is a good reflection of their actual factual science knowledge 1 (Ladwig et al., 2012). However, respondent's confidence is based on the ease with which potential answers come to mind, making people genuinely believe their knowledge or understanding is correct if they feel familiar with it, irrespective of whether it is actually right (Mbewe et al., 2010;Wang et al., 2016).
Using formal testing to measure knowledge in informal settings can detract from the visitor experience and some researchers consider it inappropriate (e.g. National Research Council, 2009;Fenichel and Schweingruber, 2010). Nonetheless, self-reports are biased by personal judgements; assessing content knowledge objectively can be a valuable complement to self-reports and indirect measures.
We conducted an exhaustive literature review for articles, books and reports where formal content knowledge was assessed in informal environments. In total, only six manuscripts included results of knowledge being objectively tested when related to learning experiences in an informal environment (e.g. Mbewe et al., 2010;Salmi et al., 2015;Martin et al., 2016). A discussion of these studies can be found in Solis (2020). While some of those studies conducted a test in the pre-post manner we did, they tended to focus on school students, and none of them was conducted on a wider range of visitors to science centres.
This research included formal testing of scientific knowledge with visitors to a science centre; the drawbacks of such an assessment were considered, the risk of alienation was taken seriously, and the research test was designed to be user-friendly and minimize alienation.

The Otago Museum's Science Centre
The Otago Museum is located in the city of Dunedin and it is named after the Otago Region in the South Island of New Zealand. The importance of this museum to the community is reflected in it regularly having more than 350,000 annual visitors (Otago Museum, 2018), a substantial proportion who are local residents.
This study was conducted at the Otago Museum's science centre in two steps. Piloting happened in early 2017 at Discovery World, the museum's science centre before it underwent a major redevelopment. Surveys were conducted in 2018 at Tūhura, the redeveloped and renamed science centre. The area dedicated to science exhibits increased from 393 sq. m. in Discovery World to 654 sq. m. in Tūhura, with both including a warm and humid enclosure called the Tropical Forest (215 sq. m.). The Tropical Forest is full of greenery and butterflies fly freely throughout. The science centre is a favourite of small children, with one third of Tūhura visitors being under 7 years old ( Table 3). Tūhura is also popular with adults, some of whom visit without children. For example, the Museum runs occasional "after-dark" sessions without children and these usually sell out.

Approach
This study triangulated measurement of informal science learning using three approaches: objective testing of scientific content knowledge, self-reporting of learning, and open questions which asked for specific examples of learning. A survey was piloted in 2017 and then three surveys were conducted in 2018, administered by the first author, using the same surveying methodology for all. Surveys were created and hosted in SurveyGizmo TM . The study was approved by the Human Research Ethics Committee at the University of Otago (17/ 062). The sections of the surveys that provided data analysed in this manuscript are attached as Supplementary Material.

Formal Assessment Questionnaire
The research instrument comprised five multiple-choice questions focused on light and electromagnetism, key topics showcased in Tūhura, plus a control question that was not included in the exhibits ( Table 1). Multiple-choice questionnaires can be used to assess content knowledge (Brady, 2005;Kahan et al., 2012). All the items had one answer that was right, two that were wrong, and an extra "I don't know". The questionnaire was created by the authors and was iteratively reviewed by a panel of experts in science communication.
The score of scientific content knowledge in light and electromagnetism was calculated as the sum of right answers (1 for each right answer, 0 for incorrect answers, not including the control question). "Don't know" options were counted as incorrect, (Salmi et al., 2015).
A short two-item test (plus a control question) was piloted in 2017 at Discovery World. The number of right answers increased significantly (Wilcoxon Signed Rank Test, Z 5.816, p < 0.001, r 0.389, 89 discordant pairs of 224) from a median of 0 before the visit to 1 (out of 2) after the visit.
Given the formal nature of this instrument in an informal setting, alienation could be a concern. To minimize alienation, a number of approaches were taken: 1) Questions were selected such that the risk of conflict between the questions and visitor worldviews were minimal 2 , 2) The survey was as short as possible, 3) The person administering the survey welcomed visitors and was friendly and respectful when asking for participation, responding to all questions from parents and children, 4) Respondents were given enough space to fill out the survey without feeling observed or pressured, 5) Places to sit were provided), 6) iPads were used to survey (Section 2.2.3), 7) A token was given to respondents on completion, as a sign of appreciation (Section 2.2.4).
There were few signs of bias (e.g. children feeling everything is five stars) or visitor alienation (e.g. skipping questions in the survey), giving confidence to add more questions to a final fiveitem (plus control) questionnaire that was then conducted in 2018 at Tūhura. The questions from the pilot were included in the final version of the test. The control question in the pilot became an actual question in the final version of the test, as its topic was not covered in an exhibit in Discovery World, but it was in Tūhura. A new control question was added to the final version. This questionnaire ( Table 1) was asked in what hereafter is called a "I don't know" was an optional answer for each question. Questions and their correct answers are greyed out. b Control question in Discovery World. c Control question in Tūhura. The museum has a planetarium and one of the five shows had a short mention of auroras. However, only one in five of the visitors reported going to the planetarium. Since the particular show was not popular, it is expected that very few visitors had access to that information.
Survey A. Although all items were related to light and electromagnetism, they cover multiple subtopics and it cannot be expected that someone who learns about one, knows about the others. In other words, the multiple-choice test is not a scale and scientific knowledge is not necessarily mathematically unidimensional, nor a concrete construct. The Kuder-Richardson Formula 20 (KR-20) coefficient (equivalent to Cronbach's alpha for dichotomous values, such as right/ wrong) was 0.506 before and 0.542 after the visit.

Modes of Learning Inventory (Selected Items)
The scientific content knowledge test is able to quantify learning, but does not capture non-content learning, or content learning outside the specific questions asked. To provide a measure of whether visitors themselves believe they have learned and how they learned, Environmetrics Pty Ltd. created the Modes of Learning Inventory (MOLI), a 10-item, five-point, Likert-type scale developed by Griffin et al. (2005). MOLI was designed to be conducted only once, after a visit. For the present research, reversed items and those considered complicated for children were dropped. The remaining six items ( Table 2) were included in the after-the-visit Survey B. As expected, the subset was still unidimensional 3

Direct Self-Report
Since cognitive changes are highly individual and difficult to assess in a standardized way, outcomes need to be assessed in a variety of ways (National Research Council, 2009). Individuals are capable of understanding and self-reporting their own learning 4 (National Research Council, 2009;Falk and Needham, 2013;Colliver and Fleer, 2016). Directly asking a visitor if they learnt something new is one way to assess changed knowledge and understanding (Longnecker et al., 2014). In Survey C, visitors were asked "Do you consider you learnt something at Tūhura's exhibits that you did not know before? (including any previous visits)" (Yes/No/I haven't interacted with Tūhura's exhibits). In total, 276 said Yes and 78 said No 5 . Those who said Yes were asked "Can you give an example of something you learnt?". Examples were given by 196 respondents. In addition, "It was cool learning about . . . " was an open question included in Survey A, answered by 394 participants. Qualitative responses from both surveys are provided as examples of learning.

Variables Involved in Learning
To combat the view of some young people that science is boring (Linder et al., 2010), the first generation of interactive museums started in 1969 with the Exploratorium in San Francisco and the Ontario Science Centre in Canada (Patiño, 2013). Since then, interactivity has been expected to be a key variable in learning science at a science centre, as interactive elements are more attractive to visitors (McKenna-Cress and Kamien, 2013), promote learning (Fenichel and Schweingruber, 2010), and make the experience more memorable (Maxwell and Evans, 2002).
It is important to define what is meant here by interactivity. Hands-on interactives are those where the user interacts with their hands, but interactivity is a much broader concept, as broad as the ways a visitor can influence an exhibit's functioning. For example, Tūhura showcased an infrared camera. To interact with it, visitors do not need to touch anything. The simple act of standing in front of the camera makes the exhibit change what is displayed on the screen (the temperatures of the visitor's body). However, interactivity does not occur until the user completes the cycle of interaction; in this example, the cycle is complete when the visitor pays attention to what the screen is displaying.
Learning is a complex process that is influenced by a multitude of factors, such as age and gender (Wehmeyer et al., 2011). However, conclusions about the relationship between these variables and learning vary. For example, Ramey-Gassert (1997) concluded that both children and adults learn science at science centres, but Allen (1997) found a very different result. Allen interviewed visitors who interacted with a "coloured shadows" exhibit 6 to see if they provided more correct answers to questions about the nature of those shadows (asked during the interview and later assessed). The success rate in getting the correct answers after an intervention was null for visitors under 12 years old, very small for those between 13 and 15 years old, and only considerable for those 16 and above (Allen, 1997).
Since learning occurs more readily if there is some prior knowledge and the topic resonates with the visitor (Krajcik and Sutherland, 2010;Falk and Dierking, 2016;Mattar, 2018), prior knowledge (operationalized in this research as the score in the pre-knowledge test) was another variable to study. Comparison of results of pre and post answers to survey questions with answers to a control question which asked about information that was not included in the science centre exhibits provides greater confidence that differences observed after the visit were indeed indications of learning. Even if science learning is one of a venue's primary objectives, it is not necessarily on a visitor's free-time radar (Burns and Medvecky,  Griffin et al. (2005).
3 A single factor explains 50% of the variance (Bartlett's test of sphericity χ 2 (198,15) 319, p < 0.001, KMO 0.816) the internal consistency was acceptable (α 0.784). 4 Notice that the need for proving there is knowledge gain objectively does not discredit the supposition of self-reporting validity. To the contrary, a positive gain objectively measured can strengthen the self-reporting assumption. 5 Also, 17 people skipped the question and 15 people responded that they did not interact with the exhibits. These respondents are not included in the calculation of percentage of visitors learning after interacting with the exhibits. 6 In this exhibit, lights of different colors shine on the same spot. Objects blocking these lights produce colored shadows. A very similar exhibit is on display at Tūhura.
Frontiers in Education | www.frontiersin.org August 2021 | Volume 6 | Article 698691 2016). The three surveys also asked why visitors came to Tūhura (pre-visit) and what they actually did during their visit (post-visit).
The option "Interact with the exhibits" appeared in the pre-and post-test surveys to measure how many originally disengaged visitors became engaged with the exhibits. Complementing, but not paired between surveys, "Learn some science" was a pre-visit option and "Read some panels" was a post-visit option. These questions were added to potentially explain other results as complementary factors to interactivity.

Target Population
Visitors of all ages come to Tūhura, but given that survey questions require a certain maturity to be answered correctly, it was decided to limit participants to those over a minimum respondent age. According to the National Research Council (2009), children older than seven years are able to respond to questionnaires, but the age limit for this study was increased to eight years old, as seven to eight is the age when children enter the "concrete operational stage" in Piaget's theory of cognitive development (Piaget, 1968). We acknowledge that emotional reactivity and regulation are age-related (Silvers et al., 2012), but as mentioned in Section 2.1.2, questions were designed to minimize alienation. The format was explicitly designed and tested for being easy for younger respondents while not being patronizing for adults. This allows use of age as a variable in statistical comparison of changes in content knowledge. More detail about development of instruments directed at children as well as adults can be found in Solis (2020).

Pre-test/post-test Design
The MOLI instrument and the open questions do not require a comparison between two points of time and were only asked in the corresponding post-survey. The knowledge questionnaire matched participants' pre-test and post-test responses, allowing true comparison (Friedman, 2008;Hernández et al., 2014) to assess changes in scientific literacy of visitors.

Use of iPads and Visit Time
A strategy used to avoid alienation involved the use of iPads to administer the surveys. The pilot assessed the formal test and compared the use of iPads versus paper. Visitors commented that the use of iPads was "cool" and paper surveys were only kept for emergency (e.g., if internet was down) or visitors who might prefer paper requested one. None of these scenarios happened, all data analyzed in this study were collected on iPad.
Although sometimes younger visitors needed to instruct older relatives on iPad use, the appeal to use iPads in this informal setting was independent of age, gender and group composition. Compared to paper, electronic surveying produces equivalent results in terms of missing data, item means, and internal consistencies (Giduthuri et al., 2014;Ravert et al., 2015), response rates (Ravert et al., 2015;Shah et al., 2016), and time spent completing the survey (Shah et al., 2016). Moreover, using iPads instead of pencil and paper has advantages such as saving time in responding to closed questions (Giduthuri et al., 2014), presenting a more attractive and uncluttered questionnaire (Fowler, 2013), and allowing randomized presentation of items, which increases reliability of the instrument (Fowler, 2013). Lastly, using the iPads allowed visit time to be recorded. However, this information was of limited use, as the time spent at the relevant exhibits could not be separated from time spent in the Tropical Forest.

Non-monetary Incentive
As an incentive for answering a formal questionnaire, a small token was given to respondents as a token of appreciation-a small glow in the dark item or a magnetic butterfly. The token was attached to a piece of paper with a scientific fact and it was given after completing the post-survey.

Sampling and Demographics
All Tūhura visitors were asked to participate in the surveys provided they were at least 8 years old (with consent of the carer), there were at least two iPads available, and there were enough caretakers in a group to look after the youngest children while other members of the group filled out the survey.
Survey A was conducted from May to August 2018, Survey B in September and October 2018, and Survey C from July to September 2018. Piloting at Discovery World happened in June and July 2017. Table 3 shows respondent demographics. For ease of interpretation, age was divided into groups: Children (8-12 years old), Adolescents (13-18), Young Adults (19-40) and Mature Adults (41+). Visitors came mainly in family groups (75%), their ethnicity was mainly European (87%) and most (78%) agreed to participate. Response rate was calculated by dividing the number of groups that accepted by the number of groups that were asked.
To be able to compare respondent demographics to those of the general visitor population, visitors (respondents and non- a Gender and age sample sizes may be smaller than the total sample size due to missing values. Gender was not assessed in the visual count for visitors less than two years old. b "Other" gender responses were counted, but are not displayed due to very small numbers (≤1%). c Survey C demographics do not include visitors who skipped the question about whether they had learned something at the exhibits, nor those who did not interact with the exhibits.
The sampling method affected the group distribution because of exclusion of visitors under seven years old.

Data Pre-processing
Ideally, data should be correct, unambiguous and complete (Kimball and Caserta, 2004), but real world data are often inaccurate and need to be cleaned (pre-processed). For example, data quality can be improved by removing survey responses that exceed an acceptable number of missing attributes (Kimball and Caserta, 2004). A method to detect these invalid responses was devised (for full description, Solis, 2020) and data reported in this paper were cleaned. To comply with ethics recommendations by the institutions involved, no questions were forced and respondents were allowed to skip any as they so desired. As a result, sample sizes vary for different questions. Of the 198 valid MOLI responses, 23 included up to two missing values (pre and post counted separately), either blanks or I Don't Know 9 . After determining data were missing at random (MAR), missing values were input with Expectation Maximization in SPSS v25. Cronbach's alpha before and after imputation changed minimally from 0.788 to 0.784. The multiple-choice questionnaire does not form a scale and therefore it is not imputable. Blanks and I Don't Know responses were counted as incorrect.
Quotes are shown verbatim, with clarifications signaled in brackets. Respondent gender and age in years are reported in brackets after each quote.

Scientific Content Learning
Scientific content knowledge about light and electromagnetism increased significantly (N 456, t (455) 11.9, p < 0.001) from a mean score of 1.96 correct answers (out of five) before a visit to the Tūhura science centre, to 2.61 after a single visit. Length of visits varied 10 from 8 min to 3 h:31 min with an average stay of 1 h:52 min. The control question added confidence to this result as there was no change in proportion of right or wrong answers after the visit.
The effect size (d 0.560, d CI 0.068) 11 falls in what Hattie (2009) catalogues as the "zone of desired effects learning", i.e., learning surpassed what is expected from formal schooling. Although formal education may produce deeper learning than a one-off visit to a science centre, Hattie's interpretation of Cohen's d reinforces that informal education can be a powerful ally to formal education.

Self-reported Learning
From the questions from the Modes of Learning Inventory (MOLI), 86% (n 170) of visitors reported their visit resulted in high or very high learning 12 . While only 36% (n 128, N 356) of Tūhura visitors specifically said that they came to the science centre to learn some science in the pre-visit survey, 78% (n 276, N 354) reported in the post-visit survey that they had learned something they didn't know before. Those who responded yes were asked to give an example.
"Plasma the fourth form of matter was something I knew but almost forgot previously" (F, 33). "Recalling torque and inertia was leanring (a learning) event-need to go back to my physics texts of 40 years ago!" (M, 58). Remembering something we have forgotten or strengthening existing knowledge can be considered learning (Falk and Dierking, 2016). These quotes are evidence that formal and informal education can work together to help people learn and consolidate their learning.
The following two responses exemplify that learning is an individual process: "That you can balance an object on the tourqe (torque) board if you get the object to have a matched tourqe (torque)" (F, 19). "That if you spin the ball in the opposite direction that the disc is spinning, it stays on there longer" (F, 52). These two visitors both caught what the Torque Table exhibit 13 was trying to convey. The response of the former appears more conceptual, and she is using the terminology displayed at the panel. The second visitor's explanation is practical and direct, and her learning may have occurred primarily by experimentation rather than reading the panel.
Any doubt of whether children can learn science by visiting a science centre should consider the following self-reported example of learning: "1. I have learned how to make still objects move at the animation station 2. Through an experiment I have learned how humans conduct electricity 3. I learned that white has many different colours" (F, 9).
The effect of the science centre does not stop with learning science content, visitors can develop a sense of inquiry, as can be appreciated from the following quote: "How you could create 7 In Survey B, the MOLI questions were not included at first and 44 of visitors who left valid responses, filled out the survey without the instrument. Only seven of those who had the complete version and left a valid response, did not have enough answers in the instrument. 8 In Survey C, valid responses with not enough answers comprise those who skipped the direct question (n 17) and those who did not interact with the exhibits (n 15). 9 Missing values not only come from blanks, but also from I Don't Know responses (Kimball & Caserta, 2004). 10 These calculations come from all available data of visit time (N 1,090), all coming from the three surveys, but with no restrictions of other types of data availability (for instance, visit time of those who skipped any of the questions or instruments here discussed are still counted). 11 d CI is the confidence interval of the reported Cohen's d. 12 MOLI scores range from 6 to 30. Results were recoded as Very Low (6-10 points), Low (11-15), Medium (16-20), High (21-25) and Very High (26-30). Descriptives were rescaled to values from 1 to 5. 13 The Torque Table is a turning disc where you can roll objects over the disc to discover how they react to circular motion.  0.023, p 0.622, N 451).
To further consider how age relates to learning content knowledge, a LOESS fit 15 was done on a scientific content knowledge scatter plot before and after the visit to Tūhura (Figure 1) against the independent variable of age. While a LOESS fit does not produce correlation coefficients, it allows us to see two clear sections with roughly linear relationships between scientific content knowledge and age, but with different slopes. The domain of one of the relationships includes Children and Adolescents, while the domain of the other one includes Young Adults and Mature Adults. The independence of age and learning can be visually appreciated in Figure 1 as shapes from before and after are similar, regardless of the age group, with both shifting upwards after the visit.
In contrast, Allen (1997) found considerable science learning from a science exhibit only in visitors 16 years and older. However, that result may be due to the nature of the exhibit that was studied. In "coloured shadows", how shadows get their colour is counterintuitive and requires a good deal of abstraction-something that does not start to develop until adolescence (Piaget, 1968). Also, prior knowledge is important for learning abstract concepts (Krajcik and Sutherland, 2010). Figure 1 shows how Tūhura visitors' prior scientific content knowledge in the topic of this study depended on their age in the range from eight to 22 years old 16 (r (237) 0.440, p < 0.001). From the age of 23 there was no further age-related increase in prior scientific knowledge, (r (214) 0.005, p 0.938). This finding agrees with Lindon (1996), in that knowledge is accumulated with age, especially in young people. The ages where knowledge increased rapidly is consistent with the typical age of formal schooling. "From eight to 18 years there is great potential for children and young people to extend their knowledge tremendously (Lindon, 1996). Notwithstanding, the parallel upwards shift of curves from pre to post in Figure 1 also demonstrates that the influence of informal learning can be important, even when compared to that of traditional schooling, as has been suggested by Falk and Needham (2013). The increase in scores from pre to post-test at all ages demonstrates that adults continue to learn when provided opportunities outside of school.

Prior Knowledge
The flatter section (from 23 years old) does not mean adults learn less, but that their priorities may tilt their learning to other subjects (Flynn, 2012), not assessed with this instrument (which only measured the topic of light and electromagnetism). Instead of being generalists, adults tend to develop expertise in specific domains (Fenichel and Schweingruber, 2010).

Gender
The prior scientific knowledge of males (M 2.23, SD 1.40, CI 0.20) was significantly higher (t (345) 3.69, p < 0.001, n m 185, n f 267, d 0.359, d CI 0.096) than that of females (M 1.77, SD 1.15, CI 0.14). Females scoring lower than males in prior scientific knowledge about physics (Figure 2), is consistent with other reports showing a gender gap in scientific literacy unfavourable to females (e.g. Allen, 1997;Skaalvik and Skaalvik, 2004;Kurtz-Costes et al., 2008). A multitude of reasons have been proposed to explain this gap, including low self-esteem in science (Bamberger, 2014), stereotype related issues (Bian et al., 2017) and lack of opportunities (Aikman and Unterhalter, 2007). We agree with the reasons above and discuss another factor.
In Table 4 it is seen that there is no prior knowledge gap in Children; the gender gap starts from adolescence onwards. This difference does not need to come from some sort of discouragement necessarily. On the one hand, engagement is a FIGURE 1 | Scatter plot with LOESS regressions (smoothing parameter α 0.70) for scientific content knowledge as a function of age, before and after the visit (N 451) at Tūhura.
14 Scientific inquiry is a desired outcome, but it can lead to misinterpretations if not correctly guided. This topic will be covered elsewhere. 15 A LOESS fit (Locally Estimated Scatterplot Smoothing, a.k.a. LOWESS, Locally Reweighted Scatterplot Smoothing) is similar in nature to a linear regression, but instead of producing a single and linear regression from all data points, it creates multiple weighted local linear regressions around each point by using a subset of n neighbouring points. Although the LOESS fit is merely descriptive and does not produce a correlation coefficient as the linear regression would, it is useful to detect relationships by zones, as it will become clearer below. 16 The age dependent group was extended beyond Adolescents because the plot and Pearson correlations showed the dependence was still high until 22 years old.
Frontiers in Education | www.frontiersin.org August 2021 | Volume 6 | Article 698691 cornerstone that supports effective science learning and interest in learning more (Krapp and Prenzel, 2011). On the other hand, career choices are influenced not only by confidence and interest in science, but by relative academic strengths (Stoet and Geary, 2018), and it was found in 2015 PISA that boys had a significantly larger rescaled intra-strength in Science, while girls' intrastrength was in Reading 17 (Stoet and Geary, 2018). STEM careers can be divided into two broad categories, physical STEM careers and life sciences STEM careers (Mohtar et al., 2019). It is well documented that girls tend to have less interest in physical sciences than boys (Krapp and Prenzel, 2011). More specifically, females tend to be more attracted to biology and males to physics (Akarsu and Kariper, 2013). An important factor for women's underrepresentation in physics may be their own choices that start at a young age (Williams and Ceci, 2012) and that are based on having more areas where they feel they can succeed (Mostafa, 2019). A deeper discussion on the gender gap and the relation between content knowledge and self-concept will be presented elsewhere. Regardless of the gap, it is interesting to note that both genders increased their content knowledge significantly, males going up from M 2.23 to M 2.84 (t (184) 7.13, p < 0.001, d 0.524, n 185, d CI 0.106) and females from M 1.77 to M 2.45 (t (266) 9.51, p < 0.001, d 0.582, n 267, d CI 0.088). If we take the pre-post difference in right answers (ΔM) as a measure of content knowledge learning, females (n females 259, ΔM 0.68, CI 0.14) are not significantly different (t (423) 0.180, p 0.857) from males (n males 175, ΔM 0.66, CI 0.17). This agrees with Piraksa et al. (2014), who found that gender did not influence scientific reasoning in students in Thailand.
Self-reports are also interesting in this regard. The MOLI responses for males (n 78, Mdn 4.08, IQR 0.83, CI 0.08) and females (n 117, Mdn 4.00, IQR 0.83, CI 0.17) were not statistically different (Mann-Whitney U 4,342, p 0.564, r 0.041), but the percentage of females reporting new learning when asked "Do you consider you learnt something at Tūhura's Exhibits that you did not know before?" (82%, n 213) was significantly higher (χ2 (1) 6.37, p 0.012) than that of males (72%, n 138). Due to the small sample size of sub-groups, medians instead of means were used. Table 4 complements Figure 2 by showing the results to testing for statistical differences in these subgroups. While adult female visitors increased their test scores more than adult male visitors, no statistical difference was found in children.

Interactivity
Tūhura visitors who interacted with exhibits changed their answers significantly between the pre and post-test surveys (McNemar-Bowker test χ 2 (3,n' 1973) 166, p asym <0.001, DPRS 14.0). The non-interacting group did not (χ 2 (3,n' 127) 3.628, p asym 0.305, DRPS 0.007). Figure 3 shows this graphically 18 . The amount of answers that changed 19 was the same in both groups (33%). However, those who interacted with the exhibits have a large net flow towards the right answer, while the distribution of those who did not interact is more random.
It is important to acknowledge that interactivity is not a factor that works alone. Engagement with the exhibits translates into more time playing with them, and more time at the exhibits means more opportunities for learning (Serrell, 1997). As expected, visitors who interacted with the exhibits stayed (n 692, t 67 m 09s, SD 25 m 02s, CI 1 m 52 s) significantly longer at Tūhura (t (742) 3.542, p < 0.001, d 0.516, F 0.144) than those who did not interact with the exhibits (n 52, t 54 m 26 s, SD 24 m 14 s, CI 6 m 34 s). Unfortunately, time spent exclusively at the exhibits could not be isolated from the total which could include time spent in the Tropical Forest.
Another indirect factor that could account for the increased learning by those interacting is the possibility that those interacting also read the panels. But the difference in means of right answers from pre to post in panel readers (ΔM 0.60) and non-readers (ΔM 0.60) was not significant (t (425) 0.544, p 0.587, n NR 115, n PR 312, d 0.061, d CI 0.109), meaning that those who did not FIGURE 2 | Medians of correct answer before (pre) and after (post) visiting Tūhura for male (M) and female (F) visitors: male children (n 56), female children (n 55), male adolescents (n 21), female adolescents (n 55), male young adults (n 57), female young adults (n 96), male mature adults (n 50) and female mature adults (n 57). Children comprised visitors from 8 to 12 years old, Adolescents from 13 to 18, Young Adults from 19 to 40 and Mature Adults from 41. read the panels were as likely to provide correct answers as those who did. This is predictable to some extent, given the interactive nature of the exhibits, which were designed to be self-explanatory. Another possible factor is that visitors who came with the intention of learning science worked hard towards their aim and their increase in science knowledge was so high that it influenced the results of the entire interacting group. However, the amount learned by those who said they came to learn some science (n 295, ΔM 0.64, CI 1.14) was not statistically different (t (425) 0.183, p 0.855, d 0.03, F 0.322) from those who stated no intention to learn science in the pre-visit survey (n 132, ΔM 0.67, CI 0.20).

LIMITATIONS AND FUTURE WORK
It is acknowledged that pre-testing may have "cued" (presensitized) visitors (Friedman, 2008), affecting the outcome. However, matching pre and post responses is a widely-used experimental design that allows for changes to be detected in the same population (Friedman, 2008;Hernández et al., 2014). Feedback, worked examples, scaffolding, and elicited explanations play a big role in learning (Honomichl and Chen, 2012). Therefore, an extraneous variable that might have influenced the results of children are parents, as they and others in mentoring roles play a critical role in supporting science learning (Fenichel and Schweingruber, 2010). The role of parents or carers was not determined in this study.
Very little research has been done on formal assessment of content knowledge in informal settings. More research is needed to confirm the results found in this study, especially considering science learning is a much broader concept whose study requires considering other areas.
It would be interesting to investigate whether visit time at specific exhibits is correlated to learning, as has been suggested by Serrell (1997). Unfortunately in this study, recorded visit time could not be split in visit time at the exhibits and at the Tropical Forest. For that reason, how experiencing the Tropical Forest influenced learning could not be isolated.
Why there is a gender difference in prior knowledge for older visitors but not in younger visitors also warrants further study.

CONCLUSION
This research focused on the fundamental question of whether a single visit to a science centre results in science learning. As discussed earlier, in addition to content knowledge, learning comprises a rainbow of constructs, such as attitudes and engagement (Organisation for Economic Cooperation and Development, 2016). While all types or learning are valuable and contribute to an individual's cognitive, emotional, and social growth (Eaton, 2010) this study examined scientific knowledge. This construct is a core concept of scientific literacy (National Academies of Sciences Engineering and Medicine, 2016) that can be reliably assessed with multiple-choice questionnaires (Brady,  4 | Statistical significance of differences of correct answers (medians) in scientific content knowledge before (B) and after (A) the visit by gender and age group in Tūhura.
FIGURE 3 | Learning flow diagrams for Tūhura visitors who interacted (A) with the exhibits (n 409, n' 1913) and those who did not interact (B) with the exhibits (right, n 26, n' 127). n stands for the number of respondents, n' for total number of responses. Answers to the scientific content knowledge test were recoded as Right, Wrong and I Don't Know (IDK). All of the items (except the control question) were pooled together 20 . Responses were split into groups of visitors who interacted with the exhibits and visitors who did not.
Frontiers in Education | www.frontiersin.org August 2021 | Volume 6 | Article 698691 2005). However, objective testing methods are commonly considered inappropriate in informal venues (e.g. National Research Council, 2009;Fenichel and Schweingruber, 2010), relying its assessment mainly on self-reporting (National Research Council, 2009). The issue is testing in informal environments without alienating visitors. Our recommendations for researchers who desire to use a formal test in an informal setting, are listed below. The first three recommendations are especially important when surveying young children.
1) Provide visitors with a friendly environment for testing, 2) Word questions such that they are clear, non-threatening, short and unambiguous, 3) Keep the survey as short as possible with the formal test in the middle, 4) Pilot the survey and pay attention to any discomfort of visitors; discard the method if signs of discomfort are detected, 5) Modify the questionnaire if needed, 6) Matched pre-post responses (having the same set of questions before and after with the same respondents) allows for direct pre-post comparison, but may also "cue" visitors; depending on available time, number of respondents and needs, consider alternatives, such as splitting samples.
Using the guidelines above, we managed to reliably assess content knowledge minimizing the bias of self-reporting. Unsurprisingly, prior scientific content knowledge, as measured by this study's instrument, increases with age during childhood and adolescence (during the years of formal schooling). It then reaches a plateau in adulthood. An important finding in this study was that learning content knowledge at the science centre was independent of age. When exhibits are engaging for people of different ages, nobody is too young or too old to learn from a visit to the science centre.
Gender did not play a role in prior content knowledge of young children, but adult females in this study showed significantly lower scientific content knowledge for these physics-related questions than males. Expanding on the multiple reasons that can cause a gender gap goes beyond the goals of this study, but one of the reasons may arise from personal choices related to females having less interest in physical sciences than boys (Osborne and Dillon, 2008;Krapp and Prenzel, 2011). A deeper discussion will be presented elsewhere.
Interactivity is another factor that heavily influences learning in science centres. A learning flow diagram helped visualize how answers move among the right answer, the wrong answers and the I Don't Know option after the visit. Visitors who interacted with the exhibits were more likely to provide correct answers after the visit, while answers of non-interacting visitors moved randomly among the options.
Although analyzing the full spectrum of what learning science entails was not part of this study's aim, the content knowledge test was complemented by qualitative and quantitative data collected through three surveys using the same data collection methodology by the same researcher in the same year (2018).
These data helped triangulating the results, providing evidence of learning. While only one third of visitors reported coming to the science centre to learn some science, most of them reported learning as a result of their visit, as measured by both the MOLI instrument (86%), the direct question (78%) and the scientific content knowledge questions. In the latter, mean scores of correct answers increased from 1.96 to 2.61.
Some of the quotes provided by visitors clearly show learning of physics content knowledge, either about something new or refreshing older memories. This learning occurred for all ages, including very young visitors. In addition, some quotes show visitors were able to take what they experienced at the science centre and extrapolate it to personally-relevant contexts.
The combined use of different items and qualitative responses makes a strong case that visitors learned formal physics content knowledge in a single visit to the informal setting of this case study. It could be said that the MOLI instrument provided a quantitative measure of the breadth, the multiple-choice questionnaire provided quantitative depth, and the open question added qualitative breadth and depth.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusion of this article will be made available by the authors, without undue reservation.

AUTHOR CONTRIBUTIONS
DS, DH, and NL conceived the project of which this study is part. DS and NL and designed the study. DS collected and analysed all data. DH provided financial support for conference presentations of the project where feedback was provided. DS wrote the first draft of the manuscript. DS and NL contributed to manuscript extension and revisions. DS, DH, and NL read and approved the submitted manuscript.

FUNDING
This study was supported by the University of Otago through a Doctoral Scholarship for the first author, the Dodd-Walls Centre for Photonic and Quantum Technologies through direct funding, and the Instituto Politécnico Nacional and the Unidad Profesional Interdisciplinaria de Ingeniería Campus Zacatecas through the Licencias con Goce de Sueldo CPE/COTEBAL/105/ 2016, CPE/COTEBAL/100/2017, CPE/COTEBAL/71/2018 and CPE/COTEBAL/67/2019. Interdisciplinaria de Ingeniería Campus Zacatecas, for their support. Especially, we would like to thank the staff at Otago Museum for their logistical assistance and the visitors who generously gave their time to provide responses and feedback.