AUTHOR=Korb Kevin B. , Nyberg Erik P. , Oshni Alvandi Abraham , Thakur Shreshth , Ozmen Mehmet , Li Yang , Pearson Ross , Nicholson Ann E. 

TITLE=Individuals vs. BARD: Experimental Evaluation of an Online System for Structured, Collaborative Bayesian Reasoning

JOURNAL=Frontiers in Psychology

VOLUME=Volume 11 - 2020

YEAR=2020

URL=https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2020.01054

DOI=10.3389/fpsyg.2020.01054

ISSN=1664-1078

ABSTRACT=US intelligence analysts must weigh up relevant evidence to assess the probability of their conclusions, and express this reasoning clearly in written reports for decision-makers. Typically, they work alone with no special tools, and sometimes succumb to the same probabilistic and causal reasoning errors and limitations as everyone else. So, the US government funded a major research program (CREATE) for four large academic teams to develop four new structured, collaborative, software-based methods that might achieve better results. Our team’s method (BARD) is the first to combine two key techniques: constructing causal Bayesian network models (BNs) to represent analyst knowledge, and small-group collaboration via the Delphi technique. BARD also incorporates compressed, high-quality online training that allows novices to use it, and checklist-inspired report templates with a rudimentary AI tool for generating corresponding text explanations from analysts’ BNs. In two prior experiments, our team showed BARD’s BN-building assists probabilistic reasoning even when used by individuals, with a large effect (Glass’ ∆ 0.8) (Cruz et al., 2019), and even minimal Delphi-style interactions improve the BN structures individuals produce, with medium to very large effects (Glass’ ∆ 0.5–1.3) (Bolger et al., 2019).

This experiment is the critical test of BARD as an integrated system, and a possible alternative to business-as-usual for intelligence analysis. Participants were asked to solve three probabilistic reasoning problems spread over five weeks, developed by our team to test both quantitative accuracy and susceptibility to tempting qualitative fallacies. Experimental participants were randomly assigned to 27 teams of 6–8 using the BARD tool, while 44 control participants worked individually using Google Suite and (if desired) the best pen-and-paper techniques. For each problem, BARD consistently outperformed the control with very large to huge effects (Glass’ ∆ 1.4–2.2), greatly exceeding CREATE’s initial target. We conclude that both BARD’s BN-building and collaboration worked beneficially and cumulatively, and for suitable problems, BARD already offers significant advantages over both business-as-usual and existing BN software. BARD also has enormous potential for further development and testing, both of specific components and on more complex problems, and many potential applications outside intelligence analysis involving similar reasoning and decision-making under uncertainty.