Looking Back to Think Ahead: Reflections on Science Festival Evaluation and Research

This methodological review considers science festival evaluation and research studies that have been published in the peer-reviewed literature since 2011, when modern-day science festivals were defined formally. Since that time, the number of science festivals around the world has increased dramatically. The methods and results used to study science festivals are summarized in order to reflect on existing work within this growing sector. The existing literature base is then positioned in relation to recent recommendations for visitor studies research on informal science learning overall, to provide suggestions for expanding current practices to include new methods that have the potential to support continued learning and fill key gaps in the literature.


Introduction
Over the past decade, evaluators and researchers have spent significant resources developing instruments and processes to understand science festivals, the audiences who attend them, and the short-term outcomes associated with these types of events.Science festivals are defined as informal science communication events occurring over a short period of time to engage visitors with contemporary science issues and research, usually via personal interactions with scientists and engineers (Bultitude et al., 2011).In recent years, the number of science festivals across the globe has grown significantly (Bultitude et al., 2011;Canovan, 2019;Fooshee, 2019).
Science festivals are recognized as highly influential science communication events and have been described by the Wellcome Trust as "very much at the hub" of informal science learning and public engagement initiatives (Wellcome Trust, 2008).Like institutions from the broader field of informal learning, festivals vary in their operating budgets, number of paid staff (if any at all), and types of science activities (Jensen & Buckley, 2014;Wiehe, 2014).Some festivals are one day or one weekend in length, while others span multiple weeks.Festival events provide wide variability in the programing offered, as they are designed to embody the characteristics and values of their local community.Most include one or more expo events, which have been defined as "large, free, open events in which scientists exhibit their research at booths visited by public audiences" (Peterman & Young, 2015, p. 85).
Scholarship about science festivals is still in its infancy.Modern-day science festivals were defined in the literature in 2011 (Bultitude et al., 2011), via a review of festivals in the U.K. that demonstrated the variability in both programing and evaluation efforts of the 56 festivals surveyed at that time.Regarding evaluation, Bultitude and colleagues found that the majority of science festivals surveyed evaluated their festival in some way, and approximately half of those made their evaluation results public by sharing them through their web site or by request.Jensen and Buckley (2014) posited that early evaluation efforts may not have been of high quality given that none were published in peer-reviewed journals at that time.
Since these initial publications, science festival evaluations and research studies have appeared in the literature.A small number of studies were published between 2014 and 2017 (Bevc et al., 2016;Dippel et al., 2016;Fogg-Rogers et al., 2015;Illingworth et al., 2015;Jensen & Buckley, 2014;Pearce et al., 2015;Peterman & Young, 2015).Scholarship in this sector seems to be on the rise, with the total number of festival studies in the literature almost doubling in the past three years alone (Adhikari et al., 2019;Boyette & Ramsey, 2019;Canovan, 2019;Davies, 2019;Idema & Patrick, 2019;Kennedy et al., 2018;Munn et al., 2018;Nielsen et al., 2019;Robertson Evia & Peterman, 2020;van Beynen & Burress, 2018;van Beynen et al., in revision).This increase seems to be the result of both the propagation of science festivals and their related evaluation efforts, as well as coordinated multisite studies. 1 Given this trajectory, we believe the time is right to consider the strengths and limitations of this research within the larger context of the visitor studies literature.
The purpose of this paper is to reflect on the methods and results published since 2011, and to use the strengths and limitations of existing work to suggest an agenda for future evaluation and visitor research.The summaries presented below are based on studies that included research questions or conclusions that centered on science festivals as their unit of analysis.The analysis does not include studies that utilize science festivals as the platform for other studies on science engagement (e.g., Iachini et al., 2019;Rose et al., 2017).Similarly, research focused on specific activities within the context of science festivals was not included.The next section considers the methods and findings from studies of individual festivals and is followed by a review of studies that included multiple science festivals.These results are then used to reflect on the state of the field and suggest next steps in science festival evaluation and research that have the potential to generate deeper learning about festivals as an informal learning mechanism.

Methods and findings from studies of individual science festivals
Studies of individual science festivals have been used almost exclusively to document short-term outcomes, with eleven of 14 studies using self-report methods to gather feedback from visitors (see Table 1 for a full list).The majority of this work has used intercept survey methods.Adults have been the primary audience studied (Canovan, 2019;Jensen & Buckley, 2014;Pearce et al., 2015), with more recent work expanding to include multiple audiences: all ages (Fogg-Rogers et al., 2015); students and teachers 1 It is important to acknowledge that, while a significant portion of this recent work was contributed by one or more of the current authors, a similar portion has been contributed by others.(Illingworth et al., 2015); parents, students, and teachers (Munn et al., 2018); parents and children (Idema & Patrick, 2019); and a blend of adult audience members, event organizers, and speakers (Adhikari et al., 2019).
Just as the number of audiences studied has expanded recently, so too have the number of and type of methods used.While almost all studies included intercept surveys, more recent work has included interviews and focus groups, with seven out of eight studies published since 2016 using both survey and qualitative methods.Follow-up surveys and interviews have also been used after a festival concludes to gather additional perspectives on the event (Canovan, 2019;Jensen & Buckley, 2014).A recent study, from Idema and Patrick (2019), provides the only known study of longitudinal festival impacts to date.
Collectively, the results of these self-report studies present a consistent picture of positive short-term outcomes fostered among festival visitors.Jensen and Buckley (2014), for example, found that adult visitors at a large festival believed the event was effective at increasing their interest in and curiosity about science.Similarly, students who attended a large-scale festival event reported high levels of learning and increased interest afterwards (Illingworth et al., 2015).Events have also been shown to increase students' interest in and understanding of science careers (Munn et al., 2018), and parents' perceptions of science and related career opportunities for their children (Canovan, 2019).
Fewer studies have used unobtrusive measures to capture the transactional nature of festival engagement.Unobtrusive measures are often described as performance-based activities that can be embedded within a program.For example, mystery shopper protocols have been developed to document scientists' "performance" at expo booths, by recording the extent to which best practices in booth logistics and science communication are used (Peterman & Young, 2015).Meanwhile, timing and tracking has also been used to document engagement.van Beynen and Burress (2018) utilized timing and tracking protocols to understand how primary school children interacted with friends, family, and scientists at expo booths.Similarly, festivals have explored the use of sentiment capture technology to track "smiles" in real-time and display results alongside but not connected to visitor feedback (Jensen, 2015).
A final example from the literature focused on the contributions that festivals make to the local STEM learning ecosystem.In the only study found to date that does not focus on visitors, at least in part, Bevc et al. (2016), found that a science festival was an effective mechanism for fostering new partnerships between informal learning institutions.By gathering retrospective data before the festival and data on partnerships after the festival, this study demonstrated that partnerships were sustained both within and outside the festival context.
In summary, much of the existing literature on science festivals has utilized intercept surveys to focus on a single event and at a single time point; multi-method approaches have also been used to provide qualitative nuance to support survey results in some cases.Observational methods have been introduced to the field and provide perspectives on how families and scientists interact in festival contexts, and some systems-level results exist to begin to document the influence of festivals in the larger learning ecosystem.

Methods and findings from studies of multiple science festivals
Compared to the number of studies cited above, fewer have been published that include data from multiple festivals, with five known examples (see Table 2).Current contributions to the literature have focused on the types of visitors at festivals, engagement patterns during festivals, and short-term outcomes.Each of the studies in this section benefited from the use of similar metrics in some cases and shared measures in others.Shared measures are those that are created using rigorous methods and with the goal of applying the measure across multiple programs that address the same construct or outcome (Grack Nelson et al., 2019).
Studies of multiple science festival studies have utilized self-report surveys as the predominant research method.Of those, two utilized a shared measure to gather data about visitor demographics, satisfaction, and engagement across festivals (https://evalfest.org/wp-content/uploads/2018/12/Attendee-Survey-Core-Questions.pdf;Boyette & Ramsey, 2019;Nielsen et al., 2019).Meanwhile, Kennedy et al. (2018) utilized a mix of similar and shared measures across participating sites.The most recent of these studies Regarding audience, multisite studies have focused exclusively on explaining the visitor experience.These studies used survey methods to identify a common visitor profile of those who attend science festivals (Kennedy et al., 2018;Nielsen et al., 2019).Like research on other informal learning contexts (Falk & Needham, 2011;Kato-Nitta et al. 2018;Martin, 2017), the results indicate that festival visitors tend to be more affluent and educated than the general population (Kennedy et al., 2018;Nielsen et al., 2019).Visitors in the U.K. also have higher self-reported levels of science interest and fluency than the general population (Kennedy et al., 2018).There are also nuances to this group that may deserve further study.Robertson Evia & Peterman (2020) found different engagement patterns among potential festival-goers, including an Uninterested group who reported little interest in science even though they attended the most recent science festival in their local area.
As with the literature on individual festivals, studies that include multiple festivals have also focused on interactions during festival events and short-term outcomes.van Beynen et al. (in revision) compared timing and tracking data across two festivals to study children's engagement with expo booths led by adults at one festival and youthled booth experiences at another.Regarding outcomes, Boyette and Ramsey (2019) used results from an intercept survey to demonstrate that adult visitors were significantly more likely to give the highest ratings possible when they reported they had interacted with a scientist during the event.
In summary, the existing literature on multisite studies has provided the field a solid foundation for considering how we might utilize shared measures to understand the visitor experience.The current studies make use of surveys as the primary tool to document visitors and their experiences at events.However, multisite studies are not yet routinely utilizing multi-methods to understand the visitor experience, nor is there published literature on the comparative experiences of staff, scientists, and other stakeholders.

Looking ahead: A possible agenda for future research and evaluation
The published literature on science festivals to date mirrors the types of methods that have been critiqued by recent studies of informal learning evaluation (Fu et al., 2016;Rowe & Frewer, 2000).We believe that existing research on science festivals has played a critical and necessary role in establishing a literature for this relatively new method of informal learning.However, like Fu and her colleagues, we also believe that the field would now benefit from studies of science festivals that use a wider variety of rigorous methods.Science festival evaluators and researchers seem poised to apply both triedand-true measurement strategies from the visitor studies literature, and to contribute new methods and approaches.
This section offers a number of suggestions to illustrate possible future directions.These suggestions are not intended to be prescriptive or to limit the creativity of those who conduct evaluation and research in this space.Instead, we hope that they spark the imagination and set the stage for evaluators and researchers to make significant contributions to the field of visitor studies moving forward, while also addressing key gaps in informal science learning research.

Moving beyond self-report measures
Most studies of science festivals to date have utilized self-report survey measures.Fu et al. (2016) note the pervasive use of self-report measures in informal science learning evaluation and research, and suggest that the use of unobtrusive, direct, and common measures each hold promise as ways to improve measurement across the field.The science festival community has taken initial steps in these directions, providing contextspecific examples to build from as the field moves forward.The broader visitor studies field also includes a wide range of potential building blocks.The recommendations from Fu and colleagues are discussed below to reflect further on existing research and to suggest possible next steps for the field.
The mystery shopping study cited earlier (Peterman & Young, 2015) provides an example of an unobtrusive measure that was designed to gather performance-based data in situ during festival expo events.This example is also considered an embedded assessment because it is integrated seamlessly into the learning experience (Becker-Klein et al., 2016).Other possibilities for embedded assessment in this context include integrating the data collection process into a festival event or exhibit itself.The Wisconsin Science Festival recently pilot-tested the use of LEGO blocks to collect demographic and attitudinal information from visitors.The experience, which was created in partnership with artist-in-residence Stuart Flack, resulted in LEGO towers that represented both a person and their data (Thomas, 2018).Similarly, competitions such as soap box derbys, or any products created by visitors for or during a festival event might be used to demonstrate the skills deployed and mastered by participants.Though the use of embedded assessments is cited as an ideal fit for informal learning environments, few examples exist in the literature (see Fu et al., 2019, for a review).Given their experiential nature, science festivals have the potential to make significant contributions to the use and study of these methods.
Additional direct measures from the visitor studies literature might also be deployed to support both local evaluation needs and contribute new research to the field.The timing and tracking studies cited earlier provide initial data to document how children engage with expo booths (van Beynen & Burress, 2018;van Beynen et al., in revision).
Little is known about how festival visitors navigate expo events overall.Recent research has used Bluetooth technology to track visitor behavior in closed spaces such as the Louvre, and open spaces such as a pedestrian shopping mall (Yoshimura et al., 2014(Yoshimura et al., , 2017)).Another recent initiative uses biometrics and mobile eye tracking data to understand engagement with and reactions to museum exhibit components (Asher, 2019).
Deploying these methods has the potential to glean new understanding about event design.Some festivals cluster booths by subject matter, for example, while others intentionally avoid this strategy.Which of these approaches results in longer engagement by visitors?What is the ideal positioning of family-friendly versus adult-oriented content across a festival venue?Do the answers to these questions replicate similar findings from museums, zoos, and other informal learning spaces?If not, what results are unique to festivals and how might those findings inform the development of other types of informal learning events?Answers to these and similar questions would provide valuable data to support event design within the context of festivals and beyond.
As noted earlier, visitor studies related to festivals have benefited from the use of shared measures.Indeed, four of the five multisite studies cited above were generated as the result of a National Science Foundation-funded initiative called EvalFest that was designed to develop and use shared measures to catalyze learning about science festivals.We believe that intentional choices in the use of shared measures by evaluators, researchers, and communities continues to hold promise.The EvalFest project is just one example of an informal learning community joining forces to deploy shared measures in an effort to catalyze learning across their sector.Other similar communities, such as the COVES project (http://www.understandingvisitors.org), also benefit from the use of shared measures to gather comparable data that allow for comparison studies within the sector and field-wide learning beyond the sector.Similar potential exists for internal evaluation and research teams that have the opportunity to use a shared measure across multiple programs.Deploying small-or larger-scale collaborations that extend beyond the use of self-report measures has the potential to fill gaps in existing approaches and to generate data that provide a more nuanced understanding of science festival interactions and impacts.
Additional opportunities exist in how the data from shared measures are analyzed.Existing research has aggregated data across festivals for analysis purposes, rather than using the data to compare results across contexts.Though cross-project analyses are not used often, exploring the similarities and differences that emerge across projects has the potential to streamline data-driven learning and decision-making across a sector.The informal STEM learning community currently includes highly collaborative practitioners who are interested in working together to propel research and evaluation efforts (Allen & Peterman, 2019).We hope that the approaches used here can serve as an example of how the use of shared measures can foster learning about visitors.We believe these results have the potential to serve as benchmarks for others in the field, following Allen's (2008) recommendation to use such findings to gauge the success of local efforts and to support continued learning.

Integrating systems perspectives and studying inclusive practices
We believe science festival evaluation and research can make unique and generative contributions to fill other known gaps in the visitor studies literature, namely by increasing how the field understands the broader learning ecosystem and uses that knowledge to broaden participation in informal science learning (Center for the Advancement of Informal Science Learning [CAISE], 2019).Specifically, future evaluation and research efforts might benefit from using a systems perspective to guide broader understanding of the multiple stakeholders, events, and ecosystems that interact during festival planning and events.Existing evaluation and research efforts have focused almost exclusively on visitors, even though the stakeholders involved in festivals include sponsors, exhibitors, scientists, host institutions, and other partners.One of the hallmarks of science festivals is that they provide opportunities for scientists and other researchers to connect with the public (Boyette & Ramsey, 2019), yet there has been little research on scientists' involvement in festival events.What is the impact of a festival on the scientists, engineers, and others in the STEM workforce who participate?What do they gain from their participation?What do they believe is the impact of their efforts?With increasing calls in the U.K. and the U.S. for scientists to share their work with the public, furthering our understanding of the impact of these engagements is critical.Science festivals provide a natural laboratory for answering these questions.
Current research has often focused narrowly on individual events, which seems a logical and necessary starting place.Even so, many festivals span multiple days or weeks, and include a broad range of programing.The systematic study of these events with regards to engagement and learning outcomes would provide valuable data to individual festivals, as well as information that could inform event development across the field.At the individual level, for example, how do visitors, scientists, and partner institutions navigate the range of opportunities provided by festivals across days and weeks?Are there typologies for the ways that visitors choose to navigate a festival's program, such that there are common constellations of events that draw particular audiences?How do these types of patterns relate to learning outcomes for visitors, scientists, and partners?Answers to any of these questions would provide valuable information to begin to fill the gap in our current understanding of how people connect learning across settings (CAISE, 2019).
Research on science festival events and outcomes might be uniquely positioned to answer questions about impact, both within and across festival seasons, providing two perspectives from which to fill the current gap in research related to longer-term outcomes (CAISE, 2019).For example, are there differential learning outcomes associated with participating in one versus many events in a calendar year?Are there differential learning outcomes associated with participating year after year?Epidemiological approaches similar to those used by Falk et al. (2016) seem particularly suited to answering these types of research questions.The study from Idema and Patrick (2019) cited earlier provides a smaller-scale effort, and serves as an example of the types of methodological and design recommendations that can be generated through this type of research.
Though science festivals are defined as time-bound events, they often leverage and contribute to their local STEM learning ecosystems.Recent publications have called for additional scholarship to document and understand STEM learning ecosystems, including the roles that out-of-school-time opportunities play in people's lives (National Research Council, 2015).The gaps identified by CAISE reiterate the need for scholarship in this area.As noted earlier, we are aware of only one study that has investigated the contributions that festivals make to the larger, local informal learning network (Bevc et al., 2016).Little is known about the ways that festivals are situated within their local STEM learning landscape, or the unique and supportive contributions they might make as an informal learning mechanism within that landscape.
Evaluation and research related to two specific recommendations from the NRC report have the potential to provide findings that would benefit the larger informal learning community.The first is the recommendation to "build a map and bridge the gaps" (p.44) to both document strengths and connections between existing STEM learning opportunities and actively collaborate to fill the gaps.Festivals provide a fixed, seasonal infrastructure for STEM learning in their local areas.Learning more about the successful strategies that have been used by informal learning practitioners to leverage this infrastructure has the potential to inform both festival directors and the wider field alike, while also providing data to strengthen the STEM learning ecosystem locally.The NRC also recommended investing in research to explore how STEM learning ecosystems work.The role that science festivals and other event-based STEM learning opportunities play within a community, and ways that these events support and detract from learning that are constant fixtures in the learning landscape seem particularly fruitful areas for future study.
Both recommendations from the NRC have the potential to provide structures for studying effective strategies to broaden participation in festivals and events.In her recent book, Emily Dawson (2019) notes that "publics are brought into being in the light or shadow of specific practices."She goes on to share interviews with youth who describe the systemic ways that informal learning institutions and practices exclude their participation.Canovan's (2019) study of families from backgrounds underrepresented in STEM who attended a science festival focuses on the barriers to science that are perceived by parents and the ways that festivals can alleviate some of those concerns.We believe that the study of STEM learning ecosystems will be most generative if it includes an equal focus on mapping specific types of inclusive learning opportunities, the people who are engaged, and those who are missing.
With regards to mapping opportunities, the field would benefit from documenting the integration of inclusive practices into the co-development and implementation of festival programs.This recommendation echoes that from Dawson (2014), who found a lack of scholarship regarding equity and the use of inclusive practices in informal science learning and recommended additional study of programs and practices that explore potentially inclusive activities.Feinstein (2017) challenged science museums to "reimagine museum science in the image of the underserved and invest in new programs that are grounded in the cultures and concerns of the very people who currently avoid science museums (p.536)."He goes on to describe the creative and special programs that many museums offer-garden programs, comedy shows, water quality monitoring programs-that are hosted in a variety of venues to move programing outside the museum context.Because many of these events are considered special programs, Feinstein notes that we do not yet know what would happen to our current notions of visitor engagement if they became the norm.While these programs are not the norm for science centers, they are standard programs for science festivals.In particular, larger festivals that take place over many days or weeks might provide the opportunity for rapid prototyping and testing of inclusive practices.
With regards to the publics who are missing from science festivals, we encourage evaluators and researchers to assume that groups who are not represented among current audiences are choosing not to come.While this perspective may or may not be accurate, it necessitates moving beyond basic demographic studies to think critically about the structures and systems of our institutions and programs that prevent us from being successful in broadening participation in our events.In some cases, as demonstrated by Munn et al. (2018), the limiting factor might be geographical in nature and thus alleviated by "taking the show on the road."In other cases, the challenges are likely to be rooted in deeper historical and societal prejudices that have fostered inequities in participation over time (Dawson, 2014(Dawson, , 2019;;Feinstein, 2017;Garibay & Teasdale, 2019).Given their relatively new history and short duration, festivals and other public science events be positioned to identify ways to approach new relationships to cocreate programs with communities rather than for communities.Successful strategies might then be applied and studied in relation to programs and institutions.

Conclusion
Evaluation and research studies focused on science festivals have increased in recent years, though scholarship on this form of informal learning is still in its infancy.Like other visitor studies research, existing studies have relied heavily on self-report measures and intercept surveys, in particular.Even so, a number of multi-method studies already exist and single studies can be found that include both unobtrusive and direct measures, as well as data collection efforts with audiences beyond festival visitors themselves.Initial scholarship in this sector was critical of early evaluation efforts.Bultitude et al. (2011) noted the need for greater transparency and communication regarding festival-based studies, and Jensen and Buckley (2014) posited that evaluations at that time were not of the quality expected by peer-reviewed journals.We are pleased that the community has responded to these early criticisms by beginning to build a peerreviewed literature base that can serve as building blocks for those of us who work in this sector.There is still much to learn about festivals as an informal learning mechanism, and the ways festival evaluation and research might make unique contributions to the visitor studies literature.The strengths and limitations of the current literature on science festivals seem to indicate that this sector is a microcosm for the larger field of informal science learning.We hope the suggestions provided herein will catalyze new approaches for expanding the study of science festivals to support continued learning through evaluation and scholarship, not just for those of us who study science festivals, but for the field of informal learning overall.

Table 1 .
Summary of methods used in studies of individual science festivals.

Table 2 .
Summary of methods used in multisite studies of science festivals.(vanBeynenet al., in revision)is the only multisite study to utilize multiple methods, including timing and tracking.