Friday, May 12, 2017

Measuring Perceptions of Uncertainty

An interesting study came across my desk recently.  The study highlighted experiment results of classifying event likelihoods based on textual descriptors rather than assigning explicit probabilities.  After reading the study, my thoughts turned towards the application of these results to wargame design.  Yes, wargaming never seems too far out of mind.

The study discussed in the book, Psychology of Intelligence Analysis (Heuer, 1999), details the results of an experiment whereby 23 NATO military officers accustomed to reading intelligence reports were tasked with assigning probability of occurrence given "verbal expressions of uncertainty" typically found in an intelligence report.

Having recently read Michael Lewis' latest book, The Undoing Project
Heuer's work struck a familiar chord.  Lewis retells the careers of noted Israeli psychologists, Amos Tversky and Daniel Kahneman, and their groundbreaking work on decision analysis and bias.  Turns out that Heuer cites Tversky and Kahneman in his work.

The results of this study are summarized in Figure 18 from Heuer Chapter 12.

Each analyst was given a number of sentences typically found in intelligence reports and asked to assign a probability to each.  Sentences were the same with the exception of the verbal expression of uncertainty.  For example, a sentence might begin, "There is a very good chance that..."

The descriptors are based Kent's 1964 work (see Kent) who suggested standardizing phrases and their associated probability ranges.

A similar experiment was repeated in 2015 based on input from 46 Reddit users (see Zonination).


Since a number of the descriptors resulted in either overlapping ranges or the same median in the Zonination experiment, I reduced the number of classifications to a more manageable level.  My results below:
What do these results suggest?  To me, the rows suggest an odds ratio using verbal expressions of uncertainty rather than precise odds.  Also, the distribution of responses are variable depending at which row of the table one finds oneself. Some responses have narrow ranges with few outliers; others have wider ranges will more outliers.  Think of this table as a CRT with Combat Odds (Statements) along the Y-axis and Combat Results spread out along the X-axis.  
What does this have to do with wargame design?  Maybe something; maybe nothing.  I could envision situational modifiers moving a result either up or down the rows either increasing or decreasing the probability of success.

In most games, the exact probability (or range of probabilities) of a particular outcome is known or can be computed with relative certainty.  Did our historical counterparts have that precision in results?  Not likely.  They assessed the outcome based on limited evidence at hand.  In their own decision-making process, commanders assessed and concluded that an attack had a "very good chance" of succeeding.

A few questions for thought until the next time I pick up this topic:
  1. What if we approach game design from this mindset?
  2. If this experiment was repeated with the survey given to wargamers, would the probability assignments look similar?
  3. With our experience in games and game theory, do we assign probabilities differently from intelligence analysts?
  4. Is there interest in conducting a similar experiment? 
If this topic has stirred up some interest, I encourage reading Heuer (especially Part III) and Kent.  Both have links to the original texts.

16 comments:

  1. Very interesting, Jonathan! Matrix games take an approach in which arguments are rated by an umpire for strength and then diced against, with a stronger argument more likely to succeed. Might that be a useful parallel?

    I'll read up a little more as you suggest. I think you might be on to something!

    Cheers,
    Aaron

    ReplyDelete
    Replies
    1. Hi Aaron, Matrix games are very different from this proposal. The study above parallels the use of a CRT in many board, wargames. Rather than an odds ratio for the column (or row) headings, the headers are "Measures of Perception" that equate to the more precise odds ratio. Within a particular heading, outcomes or responses are still variable, decided by a die roll.

      The supporting documentation for the original study is interesting.

      Delete
  2. I think within our games, it is the dice that is providing the most narrative and is perhaps the most underestimated aspect of a system. Gamers curse throwing a 1 and shout hoorah at a 6, just thinking in terms of chance and probability, where-as if the imagination wanders, the 1 is representing lost orders, a broken (by artillery perhaps) coms cable or an otherwise distracted commander. A six might just mean someone was looking in the right direction at the right time, that an inspirational speech has been given or a timely delivery of ammunition has arrived at the battery.

    other factors such as the CRT and combat modifiers are fixed, so without the a dice, the attack uphill would always be the CRT modified by say a -1, we need the dice to add the rest of the story. We can even influence that with a D6, an Average dice or a D10 or opposed die rolls etc.

    In some cases the reliability of what we see is modified by what we see, so for example, Awesome must be the most hi-jacked word in present circulation, to the effect that when I read awesome, I know the chances are that it probably isn't.

    Likewise if steady Eddie tells me around 150 enemy have gathered at a jump-off point, he is probably accurate to with 20 either side of that. The fact that something is rare is not of itself enough to discount it, but with the bombardment of data and the need to respond quickly, it probably would be discounted and therefore all the more deadly when the rare becomes a reality.

    An interesting interface is where computers use models to ascertain probability, while humans use judgement, which differs between us anyway.

    There is one tactical boardgame that I have played which uses a D10 for combat against a CRT with 10 results, but the results in each column are too wide ranging, so roll a 1 and you get no effect, roll a 10 and you get elimination, even though all other factors are equal. A dampening down could be done with a basic CRT giving a crass result and then a sub-set CRT giving a more nuanced result. So roll heavy casualties on the 1st CRT and then go to the Heavy Casualties sub CRT for a more subtle range of potential outcomes of a unit taking heavy casualties and it could be this second table that morale modifiers are applied to.

    I think in reality, that all becomes too much die rolling and faffing for most gamers.

    ReplyDelete
    Replies
    1. Norm, first thank you for your detailed response.
      Second, this proposal does not eliminate the random outcomes provided by a die roll. A randomized response or outcome component is still present.

      Let me begin at the anchor point of "About Even" in the table where the outcome is a toss up. If the player finds himself on that row (equivalent to an odds ratio of 1:1), the probability of a success has a median of 50%. The player rolls percentile dice. A result of 50 or less results in a successful outcome. A roll of greater than 50 ends in a failure.

      Now, assume that modifiers are present that increase the odds or uncertainty perception to shifting the target from "About Even" to "Very Good Chance" line. Now, the player has increased the odds such that a success can be obtained with a percentile dice roll of 80 or less. As noted in the study, Chapter 12, a number of factors work to set these probabilities or perceptions.

      By the way, your observation of,

      "An interesting interface is where computers use models to ascertain probability, while humans use judgement, which differs between us anyway."

      is a good one and well summarized. Are human judgments not based on models as well? Perhaps not as explicitly defined and formed as by computer code but models, still. Remember that computer models are instructed by humans and their perceptions and judgments. But I digress...

      Requiring too much die rolling is not good game design just as too few randomizers might lead to too many pre-determined results. No fun in that. Well, Diplomacy can be pretty fun!

      We tend to like what we like.

      As always, I enjoy your thoughtful and thought provoking comments.

      Delete
    2. And I strongly agree with your statement that the die rolling adds the narrative. When I roll an extreme result, I justify it by saying that something interesting must have happened on the field beyond the control or purview of the commander.

      Delete
  3. Thanks for the extra detail. I think human judgement is largely based on personal experience, which can be brought closer to a 'model' by good training - though good old Mother Nature has hard wired us to rely more on our instinct, which is often instant and gives us our first impression - then again, perhaps stereotyping of anything is a model and we all do that as a matter of course .... even those that say they don't :-)

    anyway, back to your point, coming from a different direction, but perhaps ending up with a similar objective, perhaps combat results could involved the enemy player more. i.e. if you get a retreat result, it is the enemy player who MIGHT move you or they MIGHT get achance to select your actual casualties. This would break the behaviour of the gamer being tied in to some actions that they do automatically and often in their own favour.

    ReplyDelete
    Replies
    1. In one of the rulesets I use ("Risorgimento 1859" to be precise), I allow the owning player to control rearward direction for withdrawals from a losing combat but a rout result forces a random directional move to the rear. When a unit routs, the resulting damage to supporting or nearby units is beyond the owning player's control.

      Delete
  4. Also ..... I game against someone who always plays naturally cautiously - He is sub-consciously using your percentile chart and I am consciously operating in the likelihood that he will do that! His caution actually encourages me to take greater (all or nothing) risks.

    ReplyDelete
    Replies
    1. That is great, Norm! Applying game theory against an opponent who is a minimaxer! Learning and responding to a long-term opponent's tendencies can be fun and challenging.

      Sometimes, I force players to take on the position in a game that goes against those "natural" tendencies and watch the fun. Forcing an ALWAYS aggressive player into the role of a defender is a treat to watch.

      Delete
  5. I'm wondering if the range of results are based on precision and perception of language.

    Maybe on Vulcan the meaning of words is precise but English, even with its enormous lexicon is a language known for its lack of exactness.

    As an example, what are the odds that you can define Cornflower Blue? Modifiers: how long exposed to the elements? What die lot? I give up?

    Or President Clinton agonizing over the word "Is".

    Legaleze vs street slang.

    I guess I will just roll the dice and hope for a 6 but not boxcars.

    ReplyDelete
    Replies
    1. Bill, when I read the study, the goal was to attempt to quantify these difficult to quantify (or qualify) expressions. Perception and precision of language is exactly what produced the range bands among the survey respondents. The sample size of 23 in the first study and 46 in the second seems small to me.

      You are right that words have different meanings to different people. With training and especially among intelligence analysts, I expect those range bands of uncertainty to contract towards the "true" measure of central tendency. As we see, wide ranges with outliers exist even for those with specific training.

      As for Cornflower Blue, I know it when I see it!

      Thanks for your comments!

      Delete
  6. Excellent points. I am still working on the next part of my own Operational Design series which is "What is in a roll?" for just this reason.

    I concur with Norms point that we often overlook the fact that the die roll is actually what creates the narrative. I think that is why we get so wrapped up in die modifiers: we forget that a '1' is a nice way to symbolize that a regiment didn't form square, or that they went low on ammunition.

    I was listening to an episode of Meeples and Miniatures awhile back where they were playing a Star Wars Role Playing Game. The die rolls actually gave results like advantage, disadvantage, critical success, etc. that the GM used to create a narrative arc.

    As for the baseline study, I think it might also be useful to see if their were cultural differences among the "NATO Officers" it might shed some light on interpreting historical records from different sources.

    ReplyDelete
    Replies
    1. I am looking forward to your next wargame design philosophy installment.

      The die roll does provide the narrative for those unmeasurable and unforeseen factors experienced below the level of a commander's control. Die roll modifiers account for the known situational factors that steer the narrative in a particular direction.

      The sample size of only 23 in the original seemed low for a study of this nature. More respondents would equate to more confidence in the results. Still, a very interesting study.

      Delete
  7. Jonathan, the end of your first paragraph, "...my thoughts turned towards the application of these results to wargame design.  Yes, wargaming never seems too far out of mind" had me laughing out loud. I can't count the number if times that I have seen something and immediately thought of how it could be applied to wargaming.

    ReplyDelete