Theory and Practice of Artificial IntelligenceHeuristic Search and GamesDaniel PolaniSchool of Physics, Engineering and Computer ScienceUniversity of HertfordshireMarch 21, 2021All rights reserved. Permission is granted to copy and distribute these slides in full or in part for purposes ofresearch, education as well as private use, provided that author, affiliation and this notice is retained.Use as part of home- and coursework is only allowed with express permission by the responsible tutor and, inthis case, is to be appropriately referenced.Theory and Practice of Artificial Intelligence 57 / 125Heuristic SearchConsider: heuristic estimator ff (n): estimates cost” of n, i.e.the cost of best solutionpathfrom start node sto some goal node, say t,but provided that pathgoes via nf (n) = g(n)|{z}actual cost from s to nnot necessarily optimal,estimate of minimal costfrom s to n +h(n)|{z}t guesswork!no universal solutiong(n)h(n)nsTheory and Practice of Artificial Intelligence 58 / 125Best-FirstIdeaCompeting Subtrees:among competitors, only onesubtree is active at a time:the most promising, i.e. thatwith the lowest f-valueswitch to alternative, if thatchangesExample: f (X) = g(X) + h(X)Budget of subtree spentuntil exhausted, switching toother tree2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]3 [s,a,b], [s,e]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]3 [s,a,b], [s,e]4 [s,e], [s,a,b,c]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]3 [s,a,b], [s,e]4 [s,e], [s,a,b,c]5 [s,a,b,c], [s,e,f]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]3 [s,a,b], [s,e]4 [s,e], [s,a,b,c]5 [s,a,b,c], [s,e,f]6 [s,e,f], [s,a,b,c,d]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]3 [s,a,b], [s,e]4 [s,e], [s,a,b,c]5 [s,a,b,c], [s,e,f]6 [s,e,f], [s,a,b,c,d]7 [s,e,f,g], [s,a,b,c,d]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]3 [s,a,b], [s,e]4 [s,e], [s,a,b,c]5 [s,a,b,c], [s,e,f]6 [s,e,f], [s,a,b,c,d]7 [s,e,f,g], [s,a,b,c,d]8 [s,e,f,g,t], [s,a,b,c,d]2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125Best-FirstCandidates(current best is underlined)1 [s]2 [s,a], [s,e]3 [s,a,b], [s,e]4 [s,e], [s,a,b,c]5 [s,a,b,c], [s,e,f]6 [s,e,f], [s,a,b,c,d]7 [s,e,f,g], [s,a,b,c,d]8 [s,e,f,g,t], [s,a,b,c,d]9 best candidate reached goal | return2 25222323a 5c 4b 4e 7d 3g 2f 4s startt goalThe full A∗ includes retroactively optimizing the path to the current node. We have left that out here.Theory and Practice of Artificial Intelligence 59 / 125AdmissibilityDefinitionsDef.: A search algorithm is admissible if it always produces an optimalsolution.Note: Above algorithm immediately produces an optimal solution if foreach node n, h∗(n) is the cost of an optimal path from n to somegoal node.Note: The Best-First algorithm is a variant of the noted A∗ algorithm.TheoremBest-First is admissible if it uses an optimistic heuristic h, i.e. h withh(n) ≤ h∗(n)DefaultSetting maximally optimistic h(n) := 0 is always admissible (givesBreadth-First-Search), but has no predictive power.Theory and Practice of Artificial Intelligence 60 / 125Heuristic ConstructionNotesin the formalization of a search problem, often one has a setof constraints that one has to fulfil and that make calculatingthe optimal moves difficult.By releasing one or more of these constraints, the searchproblem often becomes much easier. The ensuing h0 for theless (un-)constrained problem is typically admissible for theoriginal problem since removing the constraint does notincrease the path length to the solution.Thus, releasing constraints on the original problem does leadto admissible heuristics for it (read up details on the heuristicsfor the 8-puzzle in (Pearl 1984; Russell and Norvig 2002)).Theory and Practice of Artificial Intelligence 61 / 125Heuristic ConstructionConsiderationsConstruction of Heuristics:Given admissible heuristicsh1, h2, h3, . . . , hn, is it possible toconstruct a heuristic that is asleast as good as any of theabove?Theory and Practice of Artificial Intelligence 62 / 125Heuristic ConstructionConsiderationsConstruction of Heuristics:Given admissible heuristicsh1, h2, h3, . . . , hn, is it possible toconstruct a heuristic that is asleast as good as any of theabove?Consideration: first, what is a goodheuristics?Note: clearly a maximally admissibleh = 0 is a bad heuristics, nothelping at allErgo: a heuristic is mostexpressive/helpful/good thelarger it is!Theory and Practice of Artificial Intelligence 62 / 125Heuristic Construction ConsiderationsBottom Line Construction of Heuristics:Given admissible heuristicsh1, h2, h3, . . . , hn, is it possible toconstruct a heuristic that is asleast as good as any of theabove?Consideration: first, what is a goodheuristics?Note: clearly a maximally admissibleh = 0 is a bad heuristics, nothelping at allErgo: a heuristic is mostexpressive/helpful/good thelarger it is!The heuristicshmax := max(h1, h2, h3, . . . , hn)is at least as good as any of thehi.Theory and Practice of Artificial Intelligence 62 / 125Heuristic Construction ConsiderationsBottom Line Construction of Heuristics:Given admissible heuristicsh1, h2, h3, . . . , hn, is it possible toconstruct a heuristic that is asleast as good as any of theabove?Consideration: first, what is a goodheuristics?Note: clearly a maximally admissibleh = 0 is a bad heuristics, nothelping at allErgo: a heuristic is mostexpressive/helpful/good thelarger it is!The heuristicshmax := max(h1, h2, h3, . . . , hn)is at least as good as any of thehi.InsightThe best possible heuristic werethe true value of the cost to agoal if it were known (which, ingeneral, is difficult).Theory and Practice of Artificial Intelligence 62 / 125GamesAssumptionsIn the following, we consider exclusivelytwo-person (not multi-person; no gang-ups)perfect information (no card games)deterministic (no backgammon)alternating moves (no rock/scissors/paper)zero-sum (no prisoner’s dilemma)games.Theory and Practice of Artificial Intelligence 63 / 125Games IIIn a game, each player to find to each configuration the best possible response. Best in the sense that it allows thebest possible payoff under antagonistic response by the other player. DefinitionA game is defined by State/Node Set: a set N of nodes (also called states), each of whichdescribes a particular possible situation; includes all informationnecessary for the next move, including whose turn it is, or anyrelevant historical information (e.g. in chess information about themove of a rook or king for castling).Successor Rule: for each node n 2 N , the function S generating the setS(n) ⊆ N of successors of n, including turn-taking, where applicable;Start Node: one node s, at which the game starts;Terminal State Set: a set G ⊆ N of one or more terminal nodes ofthe game, i.e. nodes without successors.Payoff: For each terminal node g 2 G, a payoff (utility) U(g) 2 R isdefined for the player who moves first. Since considering onlyZero-Sum games, utility for second player is implied. First player,Max wants to maximize utility, second player, Min wants to minimizeit.Theory and Practice of Artificial Intelligence 64 / 125UtilitiesNotesin typical games, a terminal state defines an outcome such aswin/draw/loss for each player. For player Max, the correspondencebetween outcome and utility could be: OutcomeUtilitywindrawloss10-1 Other scalings are possible (i.e. 100 instead of 1, -100 instead of-1).for player Min, the outcome/utility correspondence is exactlyreversed.Theory and Practice of Artificial Intelligence 65 / 125Aim of the Game IAimEach player aims to move as to win the game. More precisely, interms of outcomes:1 a player has a winning strategy if he can choose moves thatreach a winning terminal state, no matter what the otherplayer is doing;2 a player has a drawing strategy if he can choose moves thatreach a drawing terminal state, no matter what the otherplayer is doing. Of course, he will adopt a drawing strategyonly if there is no winning strategy.3 in all other cases, the player can only draw or win if the otherplayer does not play perfectly.Theory and Practice of Artificial Intelligence 66 / 125Aim of the Game IIAim in Terms of UtilityEach player aims to optimize (maximize for Max, minimize forMin) the utility achieved in the terminal state.Max will aim to achieve the terminal node with the highestutility that any play of Min will allow it to reach.Min will aim to achieve the terminal node with the lowestutility that any play of Max will allow it to reach.Theory and Practice of Artificial Intelligence 67 / 125Utility of a General Node IGeneralizing UtilityFor a terminal state, the utility is defined per default;for a general state, we define the utility as the best outcomethat can be achieved against any possible play of theopponent.Note that in general games, the opponent may not beantagonistic; these cases are complex to consider (read up onNash-equilibria). Zero-Sum games are simpler, the opponentwill indeed aim to choose the moves that are the worst for theplayer.Theory and Practice of Artificial Intelligence 68 / 125Utility of a General Node II Formally: The Minimax PrincipleConsider: U(n), the utility of a node Let: S(n) = fn1, n2, . . . , nkg be the set of successors for node nMinimax Utility: defineU(n) =8>:Udefault(n) if n terminal, i.e. S(n) = fgmaxni2S(n)U(ni) if in n it is Max’s turnminni2S(n)U(ni) if in n is is Min’s turnTheory and Practice of Artificial Intelligence 69 / 125Minimax ExampleMAXMINMAXstatic values41 4 5 6 2 1 1 14114 6 2Theory and Practice of Artificial Intelligence 70 / 125Minimax Example (Main Variation)MAXMINMAXstatic valuesmain variation41 4 5 6 2 1 1 14114 6 2Theory and Practice of Artificial Intelligence 71 / 125
