- Normal Form Games
- Strategies
- Best Response
- Nash Equilibrium
- Computing Nash Equilibria
- Complexity of Finding Nash Equilibria
- Domination
- Pareto Optimality
- Iterated Removal of Strictly Dominated Strategies
- Maxmin & Minmax Strategies
- The Minimax Theorem
- Correlated Equilibrium
- Extensive Form Games
- Perfect Information Extensive Form Games
- Pure Strategies in Perfect Information Extensive Form Games
- Subgame Perfection
- Computing Subgame Perfect Equilibria
- Imperfect Information Extensive Form Games
- Pure Strategies in Imperfect Information Extensive Form Games
- Induced Normal Form
- Perfect Recall
- Behavioral Strategies
- Beyond Subgame Perfection
- Repeated Games
- Utility in Infinitely Repeated Games
- Stochastic Games
- Fictitious Play
- No-regret Learning
- Regret Matching
- Pure Strategies and Equilibria in Repeated Games
- Payoff Profiles
- Folk Theorem for IR games with Average Rewards
- Games with Discounted Rewards
- Folk Theorem for IR games with Discounted Rewards
- Bayesian Games

These are my lecture notes on the Stanford coursera course on Game Theory.

- \(N=\{1, \ldots, n\}\) is a finite set denoting the players (indexed by \(i\))
- \(A=A_1 \times \ldots \times A_n\) denotes the
**action profile**where \(A_i\) denotes the**action set**for player \(i\). Each action \(a_i \in A_i\) is referred to as a pure strategy. - \(u=(u_1, \ldots, u_n)\) is a profile of
**utility functions**where \(u_i\colon A → \mathbb{R}\) is the**utility/payoff function**for player \(i\)

A game is

A game of two players is called a game of pure competition when both players have exactly opposed interests, i.e. \(\forall a \in A, u_1(a)+u_2(a) = c\) for some constant \(c\) (special case zero sum games with \(c = 0\)).

A game is referred to as a game of cooperation when all players have the exact same interests, i.e. \(\forall a\in A, \forall i, j, u_i(a) = u_j(a)\).

A strategy is called

Let the set of all strategies for \(i\) be denoted \(S_i\) and the set of all strategy profiles be denoted \(S=S_1\times \ldots \times S_n\).

For players following mixed strategies the payoff is then defined in terms of an expectation: $$ u_i(s) = \sum_{a\in A} u_i(a) \Pr(a|s)\\ \Pr(a|s) = \prod_{j\in N} s_j(a_j) $$ where \(s \in S\).

When to play mixed strategies? To confuse opponent (like in matching pennies) or when uncertain about the other's action (like in battle of the sexes).

\(s=\langle s_1, \ldots, s_n\rangle\) is a Nash equilibrium \(\iff \forall i, s_i \in BR(s_{-i})\).

\(a=\langle a_1, \ldots, a_n\rangle\) is a pure strategy Nash equilibrium \(\iff \forall i, a_i \in BR(a_{-i})\).

A Nash equilibrium in strictly dominant strategies is unique. Therefore the prisoners dilemma has only a single pure strategy NE and no mixed strategy NEs.

Note: if the results of the above computation weren't probabilities (in the range \((0, 1)\)) we would know that there exists no equilibrium with the given support.

**LCP (Linear Complementarity)**formulation [Lemke-Howson 1964] (\(s\) are strategies, \(r\) are slack variables)**Support Enumeration Method**[Porter et al. 2004]: Enumerate supports using clever heuristics (to curb exponential number of supports) and try to find an equilibrium for the given supports by formulating them as linear programs. A heuristic for searching through different supports is to start by looking at small supports and generally bias our search towards supports that are similar in size for each player. (sigma is the set of actions in the support for a given player)

- Does a unique NE exist?
- Does a strictly Pareto efficient NE exist?
- Does an NE exist with a guaranteed payoff or guaranteed social welfare?
- Does an NE exist that includes/excludes specific actions?

The complexity of finding a even a single Nash equilibrium is captured by the complexity class

- FNP problems are constructive versions of NP problems (F stands for functional)
- TFNP is a subclass of FNP for problems for which a solution is guaranteed to exist (T stands for "total")
- PPAD is a subclass of TFNP where the proofs are based on parity arguments in directed graphs

Theorem: Finding a Nash equilibrium is

Then \(s_i\)

Furthermore \(s_i\)

A strategy that dominates all others is called

"Dominant strategies are powerful from both an analytical point of view and a player’s perspective. An individual does not have to make any predictions about what other players might do, and still has a well-defined best strategy. [..] A basic lesson from the prisoners’ dilemma is that individual incentives and overall welfare need not coincide."

An outcome is

The prisoners dilemma is a dilemma for the exact reason that the Nash equilibrium is the only non-Pareto-optimal outcome. (The outcome most likely to happen is the socially worst outcome)

Performing this iterative algorithm preserves Nash equilibria since in a Nash equilibrium everybody plays a best reply (and we don't remove best replies). It can therefore be used as a preprocessing step before computing an equilibrium.

If after the procedure only a single action profile remains, that profile is the unique NE of the game and the game is called

Note: we can also iteratively remove

The corresponding

The corresponding

The minimax theorem shows us that we can easily find NE for two-player zero-sum games by solving the corresponding minmax problem.

The minmax problem for 2x2 games can be solved by writing down and solving the corresponding maxmin value equations for each player.

The general minmax problem for two players is easily solvable with LP. \(U_1^\ast\) is the value of the game (the payoff to player 1 in equilibrium), \(s_2\) is the mixed strategy for player 2 that we want to find, \(k\) is a pure strategy.

The minmax problem for matching pennies:

A correlated equilibrium is a generalization of Nash equilibria since any Nash equilibrium can be expressed as a correlated equilibrium where action recommendations are not correlated.

A CE can for example be used to achieve fair and optimal outcomes in the Battle of the Sexes game.

- \(N\) is a set of \(n\) players.
- \(A\) is a set of actions (not action profiles like in normal form games)
- \(H\) is a set of choice nodes
- \(\chi\) is the action function \(\chi\colon H \rightarrow 2^A\) that assigns a set of possible actions to each choice node
- \(\rho\) is the player function \(\rho \colon H → N\) that assigns a player \(i \in N\) to each choice node (that player that makes that choice)
- \(Z\) is a set of terminal nodes (disjoint from \(H\))
- \(\sigma\) is the successor function \sigma \colon H \times A → H \cup Z that assigns a next choice or terminal node to a choice and action pair (such that the node graph is a tree)
- \(u\) is a utility function \((u_1, \ldots, u_n)\) where \(u_i \colon Z → \mathbb{R}\) is a utility function for player \(i\) on the terminal nodes \(Z\)

Theorem:

The

A Nash equilibrium \(s\) is a

Every finite extensive form game with perfect recall has a subgame perfect equilibrium.

The algorithm below simply computes the payoff under the equilibrium strategy but can be extended to compute the strategy itself. (The function labels each node with a utility vector, the labeling can be seen as an extension of the game's utility function to the non-terminal nodes)

For zero-sum games backwards induction is referred to as the

Not all normal form games can be converted to perfect information extensive form games due to their nonsequential nature (e.g. matching pennies) but they can always be converted to imperfect information extensive form games.

Performing the conversion NF → IIEF → NF may not return the same game but always returns an equivalent game with the same strategy spaces and equilibria.

A mixed strategy in imperfect information games is a probability distribution over pure strategies. Considering a single information set, mixed strategies assign a probability to every action available at nodes in that set. Before the game is played a concrete pure strategy is sampled from the the mixed strategy which is then executed. Notice that this entails the constraint that the agent plays the same action at every node in an information set (actions are not resampled when moving from one node to another node in the same information set).

Behavioral strategies are also probability distributions over pure strategies but they resample from the distribution every time they make a move. Behavioral strategies can have equilibria that are different from equilibria in mixed strategies.

Note: The N choice at the top stands for nature and injects randomization into the game

Here \((S→N, W→N; \,F)\) is an equilibrium even though the off-the-equilibrium-path action \(F\) for player two is a non-credible threat. (It is still a nash equilibrium because if player 1 truly believes player 2 is going to play \(F\) and player 2 truly believes player 1 is going to play \(S→N, W→N\) neither player has an incentive to change their action)

\((S→E, W→N; A)\) is another (a more credible) Nash equilibrium.

Other equilibrium concepts (sequential equilibrium, perfect bayesian equilibrium) explicitly model players' beliefs about the current state of the game and may be better suited for these kinds of games.

and the

$$ \sum_{j=1}^\infty \beta^jr_j $$ The discount factor can be interpreted as expressing that the agent prefers present rewards over future rewards. The future discounted reward is equivalent to the expected reward in a game where the agent cares equally about present and future rewards but there is a probability of \(1-\beta\) that the game ends in any given round.

A stochastic game is a tuple \((Q, N, A, P, R)\) where

- \(Q\) is a finite set of states (these implicitly define multiple games)
- \(N\) is a finite set of \(n\) players
- \(A=A_1 \times \ldots \times A_n\) is an action profile where \(A_i\) is the set of actions available to player \(i\)
- \(P\colon Q \times A \times Q \rightarrow [0, 1]\) is a transition probability function where \(P(q, a, \hat q)\) is the probability of transitioning from state \(q\) to state \(\hat q\) after executing action profile \(a\)
- \(R=r_1, \ldots, r_n\) where \(r_i\colon Q \times A → \mathbb{R}\) is a real valued payoff function for player \(i\)

This definition assumes that all games have the same strategy space (otherwise just more notation).

Stochastic games generalize Markov Decision Processes since an MDP is a single-agent stochastic game.

Note that the strategies played by each player may not converge to a single strategy.

Each of the following are sufficient conditions for the empirical frequencies of play to converge in fictitious play:

- The game is zero-sum
- The game is dominance-solvable
- The game is a potential game
- The game is \(2\times n\) and has generic payoffs

A learning rule exhibits no regret if for any pure strategy \(s\) it holds that \(\Pr([\lim_{t→\infty} \inf R^t(s)] \le 0) = 1\) (in the limit the regret tends to zero).

See this for more information.

Regret matching converges to a correlated equilibrium for finite games.

Since Nash's theorem only applies to finite games, there may not be any Nash equilibria in IRGs.

Finding Nash equilibria in IRGs is difficult but the Folk theorem tells us a way to identify which payoff profiles are generated by a Nash equilibrium.

A payoff profile \(r\) is

A payoff profile \(r\) is

- If \(r\) is the payoff in any Nash equilibrium then it is enforceable.
- If \(r\) is both feasible and enforceable it is the payoff in some Nash equilibrium.

Note that above theorem does not state that a payoff profile in an equilibrium is necessarily feasible. This is because it may not be expressible as a

See the lecture for the full proof. Short summary of proof

- Proof by contradiction: Playing the minmax strategy would be a profitable deviation from the Nash equilibrium (contradiction!) since the Nash equilibrium's reward profile is enforceable.
- By exploiting the rationality in the definition of feasibility we can construct simple strategies for all players that lead exactly to the given reward profile. Furthermore we make the strategies be trigger strategies so that when any player deviates from their strategy the others punish him and he will receive at most his minmax value. Because the reward profile is enforceable deviating from the strategy is not a profitable deviation for any player and the strategy is a therefore Nash equilibrium.

The set of finite histories is \(H=\bigcup_tH^t\) where \(H^t = \{h^t : h^t \in A^t\}\) is the set of histories of length \(t\). A strategy is a function \(s_i \colon H → \Delta(A_i)\).

A strategy profile is a subgame perfect Nash equilibrium if it is Nash in every subgame where a subgame in infinitely repeated games is defined as the game starting at any specific round.

Repeatedly playing the stage game NE is always a subgame perfect equilibrium (by backward induction) and is the unique subgame perfect equilibrium if the stage game has a unique NE.

If there exists an \(a' = (a'_1, \ldots a'_n)\) such that all \(u_i(a') > u_i(a)\) then there exists a discount factor \(\beta < 1\), such that if all \(\beta_i \ge \beta\), then there exists a subgame perfect equilibrium for the IR game where \(a'\) is played in every period (as long as we're on the equilibrium path).

Note: the equilibrium strategy is a grim trigger strategy, meaning it plays \(a'\) as long as no one deviates and switches to forever playing \(a\) otherwise. Not playing \(a'\) is then never a profitable deviation for any player as long as \(\beta\) is large enough (if players don't care about future rewards enough, they don't care if we punish them by playing \(a\) instead of \(a'\)).

- \(N\) is a set of players
- \(G\) is a set of games with identical strategy spaces and players \(N\) (they differ only in their utility functions)
- \(\Pi(G)\) is the set of probability distributions over games in \(G\) and \(P \in \Pi(G)\) is a common prior
- I=(I_1, \ldots, I_n) is a set of partitions of \(G\), one for each agent

where games in the same information set \(I_{i, j}\) of \(I_i\) are indistinguishable for agent \(i\). Note that players have access to all information in the \((N,G,P,I)\) tuple, they just don't know which game is being played.

Agents' beliefs are posteriors, obtained by conditioning the common prior \(P\) on individual private signals.

- \(N\) is a set of agents
- \(A=(A_1, \ldots, A_n)\), where \(A_i\) is the sets of actions for player \(i\)
- \(\Theta = (\Theta_1, \ldots, \Theta_n)\), where \(\Theta_i\) is the type space of player \(i\)
- \(p\colon \Theta → [0, 1]\) is a common prior
- \(u=(u_1, \ldots, u_n)\), where \(u_i\colon A \times \Theta → \mathbb{R}\) is the utility function for player \(i\)

An agent's type consists of all information and beliefs the agent has that is not common knowledge.

Kinds of

**ex-ante:**the agent knows nothing about anyone's actual type**interim:**the agent only knows their actual type**ex-post:**the agent knows all agent's actual types

The

See this example.