Jekyll2024-04-22T18:20:13+00:00https://cybercat-institute.github.io///Cybercat InstituteWrite an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.Cybercat InstituteThe Build Your Own Open Games Engine Bootcamp — Part I: Lenses2024-04-22T00:00:00+00:002024-04-22T00:00:00+00:00https://cybercat-institute.github.io//2024/04/22/open-games-bootcamp-i<p>Cross-posted from the <a href="https://blog.20squares.xyz/open-games-bootcamp-i/">20[ ] blog</a></p>
<p>Welcome to part I of the Build Your Own Open Games Engine Bootcamp, where we’ll be learning the inner workings of the Open Games Engine and Compositional Game Theory in general, while implementing a super-simple Haskell version of the engine along the way.</p>
<p>In this episode we will learn about <strong>Lenses</strong>, how to compose them and how they can be implemented in Haskell. But first, let’s set the context for this whole series.</p>
<h2>How to scale classical Game Theory</h2>
<p>In classical Game Theory, the definitions for (deterministic) <a href="https://en.wikipedia.org/wiki/Normal-form_game">Normal-form</a> and <a href="https://en.wikipedia.org/wiki/Extensive-form_game">Extensive-form</a> games have undoubtedly proved successful as mathematical tools for studying strategic interactions between rational agents. Despite this, the monolithic nature of these definitions becomes apparent over time, eventually leading to a complexity wall in one’s game theoretic modelling career. This limitation arises as games become more intricate, and the rigid structure of these definitions gets in the way of modelling, similar to how mantaining a large codebase written in a <a href="https://en.wikipedia.org/wiki/X86_assembly_language">x86 assembly</a> quickly becomes a superhuman feat.</p>
<p>Compositional Game Theory solves this exact problem: By turning games into composable open processes, one can build up a library of reusable components and approach the problem compositionally™, in a divide-et-impera fashion. To keep the programming language analogy going: Programming in a high-level language like Haskell or Rust is way easier than programming in straight assembly. The ability to modularize code by breaking it up into modules and functions, which are predictably<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> composable and reusable, helps tame the mental overhead of complex programs. It also saves the programmer tons of time and keystrokes that would otherwise be spent re-writing the same chunk of boilerplate code with minor modifications over and over.</p>
<p>The primary goal of this series is to introduce Compositional Game Theory and provide readers with a practical understanding of Open Games. This includes a very simple Haskell implementation of Open Games for readers to play with and test their intuitions against. By the end of this series, you will have the knowledge and tools to start modelling simple deterministic games. Additionally, you’ll be equipped to start exploring the <a href="https://github.com/CyberCat-Institute/open-game-engine">Open Game Engine</a> codebase and see how Open Games are applied in real-world modeling.</p>
<h2>What is an Open Game?</h2>
<p>In the following posts, we’re going to break down and understand the following definition:</p>
<div class="definition">
<p>An <strong>Open Game</strong> is a pair $(A,\varepsilon)$, where $A$ is a <strong>Parametrized Lens</strong> with co/parameters $P$ and $Q$ and $\varepsilon$ is a <strong>Selection Function</strong> on $P \to Q$.</p>
</div>
<p>Moreover, we will learn about how Open Games can be composed both sequentially and in parallel, and hopefully some extra cool stuff along the way.</p>
<h2>(Parametrized) Lenses</h2>
<p>The first and most important component of an Open Game is the arena, i.e. the “playing field” where all the dynamics happens and the players can interface with. The arena is a <strong>parametrized lens</strong>, a composable typed bidirectional process.</p>
<div class="definition">
<p>A <strong>Parametrized Lens</strong> from a pair of sets $\binom{X}{S}$ to a pair of sets $\binom{Y}{R}$ with <strong>Parameters</strong> $\binom{P}{Q}$ is a pair of functions $\mathsf{get}: P\times X \to Y$ and $\mathsf{put}:P\times X\times R \to S\times Q$.</p>
</div>
<p>Which can be implemented in the following manner in Haskell by making use of <a href="https://en.wikipedia.org/wiki/Currying">currying</a>:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span> <span class="kr">where</span>
<span class="c1">-- get put</span>
<span class="kt">MkLens</span> <span class="o">::</span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">x</span> <span class="o">-></span> <span class="n">y</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">x</span> <span class="o">-></span> <span class="n">r</span> <span class="o">-></span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">q</span><span class="p">))</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span>
</code></pre></div></div>
<p>Diagrammatically speaking, a parametrized lens can be represented as a box with 6 typed wires, which under the lens (pun intended) of compositional game theory are interpreted as the following:</p>
<ul>
<li>$\mathsf{x}$ is the type of <strong>game states</strong> that can be observed by the player prior to making a move.</li>
<li>$\mathsf{p}$ is the type of <strong>strategies</strong> a player can adopt.</li>
<li>$\mathsf{y}$ is the type of <strong>game states</strong> that can be observed after the player made its move.</li>
<li>$\mathsf{r}$ is the type of <strong>utilities</strong>/<strong>payoffs</strong> the player can receive after making its move.</li>
<li>$\mathsf{s}$ is the type of <strong>back-propagated utilities</strong> a player can send to players that moved before it.</li>
<li>$\mathsf{q}$ is the type of <strong>rewards</strong> representing the player’s intrinsic utility.</li>
</ul>
<div class="tikz"><script type="text/tikz">
\begin{tikzpicture}
\draw [line width=1.5pt, rounded corners] (0,0) rectangle (8,5) node[pos=0.5] {$\mathsf A$};
\draw [-stealth, line width=1.5pt] (-3,4) -- (0,4) node[pos=0.1, above] {$\mathsf x$};;
\draw [-stealth, line width=1.5pt] (0,1) -- (-3,1) node[pos=0.9, above] {$\mathsf s$};
\draw [-stealth, line width=1.5pt] (8,4) -- (11,4) node[pos=0.9, above] {$\mathsf y$};
\draw [-stealth, line width=1.5pt] (11,1) -- (8,1) node[pos=0.1, above] {$\mathsf r$};
\draw [-stealth, line width=1.5pt] (2,8) -- (2,5) node[pos=0.1, right] {$\mathsf p$};
\draw [-stealth, line width=1.5pt] (6,5) -- (6,8) node[pos=0.9, right] {$\mathsf q$};
\end{tikzpicture}
</script></div>
<p>With this in mind, we can open the box in the previous diagram and have a look at the internals of a parametrized lens<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">2</a></sup>:</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/exploded_lens.png" alt=""exploded" internals of a parametrized lens" /></p>
<p>By looking at the internals of a lens and the direction of the arrows, it becomes clear that data flows in two different directions:</p>
<ul>
<li>The <strong>forward</strong> pass, i.e. the <code class="language-plaintext highlighter-rouge">get</code> function, is happening at the time a player can observe the state before interacting with the game.</li>
<li>The <strong>backward</strong> pass, i.e. the <code class="language-plaintext highlighter-rouge">put</code> function, is happening “in the future”, after all players did their moves, and represents the stage in which payoffs are being computed and passed around.</li>
</ul>
<p>To limit mental overload, the following definition of non-parametrized lens will also come useful later:</p>
<div class="definition">
<p>A <strong>(non-parametrized) Lens</strong> is a parametrized lens with parameters $\binom{\mathbf{1}}{\mathbf{1}}$, where $\mathbf{1}$ is the singleton set.</p>
</div>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- A (non-parametrized) `Lens` is a `ParaLens` with trivial parameters</span>
<span class="kr">type</span> <span class="kt">Lens</span> <span class="o">=</span> <span class="kt">ParaLens</span> <span class="nb">()</span> <span class="nb">()</span>
<span class="c1">-- Non-parametrized Lens constructor</span>
<span class="n">nonPara</span> <span class="o">::</span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">y</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">r</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span>
<span class="n">nonPara</span> <span class="n">get</span> <span class="n">put</span> <span class="o">=</span> <span class="kt">MkLens</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="n">x</span> <span class="o">-></span> <span class="n">get</span> <span class="n">x</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="n">x</span> <span class="n">r</span> <span class="o">-></span> <span class="p">(</span><span class="n">put</span> <span class="n">x</span> <span class="n">r</span><span class="p">,</span> <span class="nb">()</span><span class="p">))</span>
</code></pre></div></div>
<p>Diagrammatically we will represent wires of type <code class="language-plaintext highlighter-rouge">()</code> (the singleton type) as no wires at all. This will also come useful to us later in order to simplify some definitions and diagrams. For example, here’s a representation of the flow of data in a non-parametrized lens, courtesy of <a href="https://www.brunogavranovic.com">Bruno Gavranović</a>:</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/lens_traces.gif" alt="Representation of the flow of data in a non-parametrized lens, courtesy of Bruno Gavranović" /></p>
<h3>Composing Lenses two ways</h3>
<p>What makes Compositional Game Theory compositional is (unsurprisingly) the fact that parametrized lenses are closed under two different kinds of composition operators, one behaving like <strong>sequential composition</strong> of pure functions and one behaving like <strong>parallel</strong> execution of programs, or more or less like a tensor product<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">3</a></sup>.</p>
<h4>Sequential Composition</h4>
<p>Let’s start with sequential composition: When the right boundary types of $\mathsf A:\binom{X}{S}\to\binom{Y}{R}$ match the left boundary types of $\mathsf B:\binom{Y}{R}\to\binom{Z}{T}$, we should be able to build another lens out of it that amounts to running what happens in $\mathsf A$ first, and then run what happens in $\mathsf B$ while taking into account the parameters of both lenses:</p>
<p>By trying to code this up in a type-directed way in Haskell, the only sensible definition that can possibly come out is the following:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">infixr</span> <span class="mi">4</span> <span class="o">>>>></span>
<span class="p">(</span><span class="o">>>>></span><span class="p">)</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p'</span> <span class="n">q'</span> <span class="n">y</span> <span class="n">r</span> <span class="n">z</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="n">x</span> <span class="n">s</span> <span class="n">z</span> <span class="n">t</span>
<span class="p">(</span><span class="kt">MkLens</span> <span class="n">get</span> <span class="n">put</span><span class="p">)</span> <span class="o">>>>></span> <span class="p">(</span><span class="kt">MkLens</span> <span class="n">get'</span> <span class="n">put'</span><span class="p">)</span> <span class="o">=</span>
<span class="kt">MkLens</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="n">x</span> <span class="o">-></span> <span class="n">get'</span> <span class="n">p'</span> <span class="p">(</span><span class="n">get</span> <span class="n">p</span> <span class="n">x</span><span class="p">))</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="n">x</span> <span class="n">t</span> <span class="o">-></span>
<span class="kr">let</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="o">=</span> <span class="n">put'</span> <span class="n">p'</span> <span class="p">(</span><span class="n">get</span> <span class="n">p</span> <span class="n">x</span><span class="p">)</span> <span class="n">t</span>
<span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span> <span class="o">=</span> <span class="n">put</span> <span class="n">p</span> <span class="n">x</span> <span class="n">r</span>
<span class="kr">in</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div></div>
<p>From the Haskell implementation we can see that composing two lenses, parametrized or not, isn’t as simple as plugging one end into another, merging the parameter wires and calling it a day<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">4</a></sup>. Something a bit more articulate is happening:</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/exploded_comp.png" alt=""exploded" lens composition" /></p>
<p>Mathematically, this amounts to the following compositions:</p>
<ul>
<li>For the <code class="language-plaintext highlighter-rouge">get</code> part: $P’\times P\times X\xrightarrow{\mathsf{id}\times\mathsf{get}}P’\times Y\xrightarrow{get’} Z$</li>
<li>
<p>For the <code class="language-plaintext highlighter-rouge">put</code> part:
\(\begin{align*}
P'\times P\times X \times T
&\xrightarrow{\mathsf{id}\times \Delta_{P}\times \Delta_{X}\times\mathsf{id}} P'\times P\times P\times X \times X \times T\\
&\xrightarrow{\mathsf{sym}\times \mathsf{get}\times \mathsf{sym}} P\times P'\times Y \times T \times X\\
&\xrightarrow{\mathsf{id}\times \mathsf{put}'\times \mathsf{id}} P\times R\times Q'\times X\\
&\xrightarrow{\mathsf{rearrange}} P\times X\times R\times Q'\\
&\xrightarrow{\mathsf{put}\times\mathsf{id}} S\times Q\times Q'
\end{align*}\)</p>
<p>Where $\Delta(x) = (x,x)$, $\mathsf{sym}(x,y)=(y,x)$ and $\mathsf{rearrange}$ is a suitable composition of $\mathsf{sym}$s.</p>
</li>
</ul>
<h4>Parallel Composition</h4>
<p>Luckily, parallel composition is way easier than the sequential one: In fact, parallel composition of $\mathsf{A}:\binom{X}{S}\to\binom{Y}{R}$ with parameters $\binom{P}{Q}$ and $\mathsf{B}:\binom{X’}{S’}\to\binom{Y’}{R’}$ with parameters $\binom{P’}{Q’}$, amounts to a lens $\mathsf{A}\times\mathsf{B}:\binom{X\times X’}{S\times S’}\to\binom{Y \times Y’}{R \times R’}$ with parameters $\binom{P\times P’}{Q \times Q’}$, such that \(\mathsf{put}_{\mathsf{A}\times\mathsf{B}}\) and \(\mathsf{get}_{\mathsf{A}\times\mathsf{B}}\) are respectively the cartesian product of the <code class="language-plaintext highlighter-rouge">put</code> and <code class="language-plaintext highlighter-rouge">get</code> functions from $\mathsf{A}$ and $\mathsf{B}$, modulo some rearrangement of inputs and outputs.</p>
<p>This is even clearer from the Haskell implementation:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">infixr</span> <span class="mi">4</span> <span class="o">####</span>
<span class="p">(</span><span class="o">####</span><span class="p">)</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p'</span> <span class="n">q'</span> <span class="n">x'</span> <span class="n">s'</span> <span class="n">y'</span> <span class="n">r'</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x'</span><span class="p">)</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">s'</span><span class="p">)</span> <span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y'</span><span class="p">)</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">r'</span><span class="p">)</span>
<span class="p">(</span><span class="kt">MkLens</span> <span class="n">get</span> <span class="n">put</span><span class="p">)</span> <span class="o">####</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="n">get'</span> <span class="n">put'</span><span class="p">)</span> <span class="o">=</span>
<span class="kt">MkLens</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x'</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">get</span> <span class="n">p</span> <span class="n">x</span><span class="p">,</span> <span class="n">get'</span> <span class="n">p'</span> <span class="n">x'</span><span class="p">))</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x'</span><span class="p">)</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">r'</span><span class="p">)</span> <span class="o">-></span>
<span class="kr">let</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span> <span class="o">=</span> <span class="n">put</span> <span class="n">p</span> <span class="n">x</span> <span class="n">r</span>
<span class="p">(</span><span class="n">s'</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="o">=</span> <span class="n">put'</span> <span class="n">p'</span> <span class="n">x'</span> <span class="n">r'</span>
<span class="kr">in</span> <span class="p">((</span><span class="n">s</span><span class="p">,</span> <span class="n">s'</span><span class="p">),</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Diagrammatically, this amounts to just putting the two lenses near each other.</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/parallel_comp.png" alt="parallel lens composition" /></p>
<h2>Building Concrete Lenses</h2>
<p>Now that we have laid all the groundwork, let’s have a look at a couple of concrete examples of lenses.</p>
<h3>Lenses from Functions</h3>
<p>Our first source of lenses will be functions: For each function $f: X\to S$ there is a non-parametrized lens $\mathsf{F}:\binom{X}{S}\to\binom{\mathbf{1}}{\mathbf{1}}$ such that $\mathsf{get}(*,x)=*$ and $\mathsf{put}(*,x,*)=(f(x),*)$. Vice-versa, we can always extract a unique function from non-parametrized lenses of this kind.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">funToCostate</span> <span class="o">::</span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="nb">()</span> <span class="nb">()</span>
<span class="n">funToCostate</span> <span class="n">f</span> <span class="o">=</span> <span class="n">nonPara</span> <span class="p">(</span><span class="n">const</span> <span class="nb">()</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="n">x</span> <span class="kr">_</span> <span class="o">-></span> <span class="n">f</span> <span class="n">x</span><span class="p">)</span>
<span class="n">costateToFun</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="nb">()</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span>
<span class="n">costateToFun</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="kr">_</span> <span class="n">f</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="n">fst</span> <span class="o">$</span> <span class="n">f</span> <span class="nb">()</span> <span class="n">x</span> <span class="nb">()</span>
</code></pre></div></div>
<p>Similarly, for each function $f: P\to Q$ there is a parametrized lens \(\bar{\mathsf{F}}:\binom{\mathbf{1}}{\mathbf{1}}\to\binom{\mathbf{1}}{\mathbf{1}}\) with parameters \(\binom{P}{Q}\), such that $\mathsf{get}(*,*)=*$ and $\mathsf{put}(p,*,*)=(f(p),*)$. Likewise, we can always extract a unique function from this kind of parametrized lenses.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">funToParaState</span> <span class="o">::</span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">q</span><span class="p">)</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span>
<span class="n">funToParaState</span> <span class="n">f</span> <span class="o">=</span> <span class="kt">MkLens</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="kr">_</span> <span class="o">-></span> <span class="nb">()</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="n">p</span> <span class="kr">_</span> <span class="kr">_</span> <span class="o">-></span> <span class="p">(</span><span class="nb">()</span><span class="p">,</span> <span class="n">f</span> <span class="n">p</span><span class="p">))</span>
<span class="n">paraStateTofun</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">q</span><span class="p">)</span>
<span class="n">paraStateTofun</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="kr">_</span> <span class="n">coplay</span><span class="p">)</span> <span class="n">p</span> <span class="o">=</span> <span class="n">snd</span> <span class="o">$</span> <span class="n">coplay</span> <span class="n">p</span> <span class="nb">()</span> <span class="nb">()</span>
</code></pre></div></div>
<h3>Lenses from Scalars</h3>
<p>For each value $\bar{y}\in Y$ and for any set $R$ we can build a non-parametrized lens \(\mathcal{S}_\bar{y}:\binom{\mathbf{1}}{\mathbf{1}}\to\binom{Y}{R}\) such that \(\mathsf{put}(*,*)=\bar{y}\) and \(\mathsf{get}(*,*,r)=(*,*)\).</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scalarToState</span> <span class="o">::</span> <span class="n">y</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="nb">()</span> <span class="nb">()</span> <span class="n">y</span> <span class="n">r</span>
<span class="n">scalarToState</span> <span class="n">y</span> <span class="o">=</span> <span class="n">nonPara</span> <span class="p">(</span><span class="n">const</span> <span class="n">y</span><span class="p">)</span> <span class="n">const</span>
<span class="n">stateToScalar</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="nb">()</span> <span class="nb">()</span> <span class="n">y</span> <span class="n">r</span> <span class="o">-></span> <span class="n">y</span>
<span class="n">stateToScalar</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="n">get</span> <span class="kr">_</span><span class="p">)</span> <span class="o">=</span> <span class="n">get</span> <span class="nb">()</span> <span class="nb">()</span>
</code></pre></div></div>
<h3>The Identity Lens</h3>
<p>The <strong>Identity Lens</strong> is a non-parametrized lens of type \(\binom{X}{S}\to\binom{X}{S}\) that serves as the identity morphism for parametrized lenses, i.e. pre-/post-composing a lens $\mathsf{A}$ with the identity lens gives you back $\mathsf{A}$ modulo readjusting the parameters (we will see how to do that in the next post). In Haskell:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">idLens</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="n">x</span> <span class="n">s</span>
<span class="n">idLens</span> <span class="o">=</span> <span class="n">nonPara</span> <span class="n">id</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<h3>Corners</h3>
<p>(Right) <strong>Corners</strong> are parametrized lenses of type \(\binom{\mathbf{1}}{\mathbf{1}}\to\binom{Y}{R}\) and parameters $\binom{Y}{R}$ that bend parameter wires into right wires, such that \(\mathsf{get}(y,*)=y\) and \(\mathsf{put}(y,*,r)=(r,*)\).</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">corner</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">y</span> <span class="n">r</span> <span class="nb">()</span> <span class="nb">()</span> <span class="n">y</span> <span class="n">r</span>
<span class="n">corner</span> <span class="o">=</span> <span class="kt">MkLens</span> <span class="n">const</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="kr">_</span> <span class="n">r</span> <span class="o">-></span> <span class="p">(</span><span class="nb">()</span><span class="p">,</span> <span class="n">r</span><span class="p">))</span>
</code></pre></div></div>
<p>And diagrammatically:
<img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/corner.png" alt="corner lens" /></p>
<p>As we will see in later posts, corners are an important component of bimatrix games.</p>
<h2>Final Remarks</h2>
<p>Parametrized lenses are not only useful for reasoning about Open Games, but also serve as the base of <a href="https/arxiv.o/abs/2105.06332">a whole categorical framework</a> for reasoning about complex multi-agent systems which has also been applied to <a href="https/arxiv.o/abs/2103.01931">gradient-based learning</a>, <a href="https/arxiv.o/abs/2206.04547">dynamic programming</a>, <a href="https://arxiv.org/abs/2404.02688">reinforcement learning</a>, <a href="https/arxiv.o/abs/2305.06112">bayesian inference</a> and <a href="https/arxiv.o/abs/2203.15633">servers</a> on top of various flavors of game theory (e.g.<a href="https/arxiv.o/abs/2105.06763">[2105.06763]</a>). Indeed, this categorical framework is so general and promising that we spawned an entire <a href="https://cybercat.institute">research institute</a> dedicated to it.</p>
<p>Phew! That’s all for today. I hope that this introduction to the world of parametrized lenses has left you wanting for more! I’ll see you in the next post, were we will explore how to handle spurious parameters with reparametrizations and model players and their agency with selection functions.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Without side-effects and/or emergent behavior. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Sometimes it will be useful to represent certain lenses in their unboxed form and with product-type wires decoupled when reasoning pictorially, luckily this approach to reasoning with lenses is still completely formal. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>In mathematical lingo, one would say that parametrized lenses can be organized as the morphisms of some kind of somewhat complicated <a href="https://en.wikipedia.org/wiki/Monoidal_category"><strong>monoidal category</strong></a>-like structure called a <a href="https://ncatlab.org/nlab/show/monoidal+bicategory"><strong>symmetric monoidal bicategory</strong></a>. This is not a 1-category on-the-nose since there’s some issues with the bracketing of parameters after sequential composition that makes associativity hold only up to isomorphism. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Actually there’s a useful generalization of the (parametrized) lens definition, called (parametrized) optics which allows this, on top of other operational advantages over the lens definition and allowing to expand the “classical” definition of Open Games to Bayesian Game Theory and more. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Daniele PalombiThe first installment of a multi-part series demistifying the underlying mechanics of the open games engine in a simple manner.Building a Neural Network from First Principles using Free Categories and Para(Optic)2024-04-15T00:00:00+00:002024-04-15T00:00:00+00:00https://cybercat-institute.github.io//2024/04/15/neural-network-first-principles<h2>Introduction</h2>
<p>Category theory for machine learning has been a big topic recently, both with <a href="https://arxiv.org/abs/2403.13001">Bruno’s thesis</a> dropping, and the <a href="https://arxiv.org/abs/2402.15332">paper on using the Para construction for deep learning</a>.</p>
<p>In this post we will look at how dependent types can allow us to almost effortlessly implement the category theory directly, opening up a path to new generalisations.</p>
<p>I will be making heavy use of Tatsuya Hirose’s <a href="https://zenn.dev/lotz/articles/14458f024674e14f4134">code that implements the Para(Optic) construction in Haskell</a>. Our goal here is to show that when we make the category theory in the code explicit, it becomes a powerful scaffolding that lets us structure our program.</p>
<p>All in all, our goal is to formulate this: A simple neural network with static types enforcing the parameters and input and output dimensions.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">import</span> <span class="nn">Data.Fin</span>
<span class="kr">import</span> <span class="nn">Data.Vect</span>
<span class="nf">model :</span> <span class="kt">GPath</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="o"><</span> <span class="p">[</span><span class="mf">4</span><span class="p">,</span> <span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">]]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span>
<span class="n">model</span> <span class="o">=</span> <span class="p">[</span><span class="o"><</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">,</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">]</span>
</code></pre></div></div>
<p>The cruicial part is the $\mathbf{Para}$ construction, which lets us accumulate parameters along the composition of edges. This lets us state the parameters of each edge separately, and then compose them into a larger whole as we go along.</p>
<h2>Graded monoids</h2>
<p>$\mathbf{Para}$ forms a graded category, and in order to understand what this is we will start with a graded monoid first.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">namespace</span> <span class="nc">Monoid
</span> <span class="kr">data</span> <span class="kt">Env</span> <span class="o">:</span> <span class="p">(</span><span class="n">par</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="kt">List </span><span class="n">par</span> <span class="o">-></span> <span class="kt">Type </span><span class="kr">where</span>
<span class="c1">-- Empty list</span>
<span class="kt">Nil</span> <span class="o">:</span> <span class="kt">Env</span> <span class="n">f</span> <span class="kt">[]</span>
<span class="c1">-- Add an element to the list, and accumulate its parameter</span>
<span class="p">(</span><span class="o">::</span><span class="p">)</span> <span class="o">:</span> <span class="p">{</span><span class="n">f</span> <span class="o">:</span> <span class="n">par</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">}</span> <span class="o">-></span> <span class="n">f</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Env</span> <span class="n">f</span> <span class="n">ns</span> <span class="o">-></span> <span class="kt">Env</span> <span class="n">f</span> <span class="p">(</span><span class="n">n</span><span class="o">::</span><span class="n">ns</span><span class="p">)</span>
<span class="c1">-- Compare this to the standard free monoid </span>
<span class="c1">-- data List : Type -> Type where </span>
<span class="c1">-- Nil : List a </span>
<span class="c1">-- (::) : a -> List a -> List a </span>
</code></pre></div></div>
<p>I used this datatype in a <a href="https://zanzix.github.io/posts/stlc-idris.html">previous blog post</a> where it is used to represent variable environments.</p>
<p>We can use it for much more, though. For instance, let’s say that we want to aggregate a series of vectors, and later perform some computation on them.</p>
<p>Our free graded monoid lets us accumulate a list of vectors, while keeping their sizes in a type-level list.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kt">Vec</span> <span class="o">:</span> <span class="kt">Nat </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">Vec</span> <span class="n">n</span> <span class="o">=</span> <span class="kt">Fin</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Double</span>
<span class="n">f1</span> <span class="o">:</span> <span class="kt">Vec</span> <span class="mf">1</span>
<span class="n">f2</span> <span class="o">:</span> <span class="kt">Vec</span> <span class="mf">2</span>
<span class="n">f3</span> <span class="o">:</span> <span class="kt">Vec</span> <span class="mf">3</span>
<span class="n">fins</span> <span class="o">:</span> <span class="kt">Env</span> <span class="kt">Vec</span> <span class="p">[</span><span class="mf">1</span><span class="p">,</span> <span class="mf">2</span><span class="p">,</span> <span class="mf">3</span><span class="p">]</span>
<span class="n">fins</span> <span class="o">=</span> <span class="p">[</span><span class="n">f1</span><span class="p">,</span> <span class="n">f2</span><span class="p">,</span> <span class="n">f3</span><span class="p">]</span>
</code></pre></div></div>
<p>As we will soon see, $\mathbf{Para}$ works the same way, but instead of forming a graded monoid, it forms a graded category.</p>
<h2>Free categories</h2>
<p>Before we look at free graded categories, let’s first look at how to work with a plain free category. I’ve used them in another <a href="https://zanzix.github.io/posts/bcc.html">previous blog post</a>.
A nice trick that I’ve learned from André Videla is that we can use Idris notation for lists with free categories too, we just need to name the constructors appropriately.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">Graph :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">Graph</span> <span class="n">obj</span> <span class="o">=</span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Type </span>
<span class="c1">-- The category of types and functions</span>
<span class="nf">Set :</span> <span class="kt">Graph</span> <span class="kt">Type
Set</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=</span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span>
<span class="kr">namespace</span> <span class="kt">Cat</span>
<span class="kr">data</span> <span class="kt">Path</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Graph</span> <span class="n">obj</span> <span class="kr">where</span>
<span class="c1">-- Empty path</span>
<span class="kt">Nil</span> <span class="o">:</span> <span class="kt">Path</span> <span class="n">g</span> <span class="n">a</span> <span class="n">a</span>
<span class="c1">-- Add an edge to the path </span>
<span class="p">(</span><span class="o">::</span><span class="p">)</span> <span class="o">:</span> <span class="n">g</span> <span class="n">a</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">Path</span> <span class="n">g</span> <span class="n">b</span> <span class="n">c</span> <span class="o">-></span> <span class="kt">Path</span> <span class="n">g</span> <span class="n">a</span> <span class="n">c</span>
</code></pre></div></div>
<p>While vectors form graded monoids, matrices form categories.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kt">Matrix</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="kt">Nat </span>
<span class="kt">Matrix</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span> <span class="kt">Fin</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Fin</span> <span class="n">m</span> <span class="o">-></span> <span class="kt">Double</span>
<span class="n">mat1</span> <span class="o">:</span> <span class="kt">Matrix</span> <span class="mf">2</span> <span class="mf">3</span>
<span class="n">mat2</span> <span class="o">:</span> <span class="kt">Matrix</span> <span class="mf">3</span> <span class="mf">1</span>
<span class="n">matrixPath</span> <span class="o">:</span> <span class="kt">Path</span> <span class="kt">Matrix</span> <span class="mf">2</span> <span class="mf">1</span>
<span class="n">matrixPath</span> <span class="o">=</span> <span class="p">[</span><span class="n">mat1</span><span class="p">,</span> <span class="n">mat2</span><span class="p">]</span>
<span class="c1">-- matrixPath = mat1 :: mat2 :: Nil</span>
</code></pre></div></div>
<p>Just as we did at the start of the blog post, we are using the inbuilt syntactic sugar to represent a list of edges. We will now generalise from free paths to their parameterised variant!</p>
<h2>Free graded categories</h2>
<p>A free graded category looks not unlike a free category, except now we are accumulating an additional parameter, just as we did with graded monoids:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParGraph :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">ParGraph</span> <span class="n">par</span> <span class="n">obj</span> <span class="o">=</span> <span class="n">par</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Type </span>
<span class="c1">-- A free graded path over a parameterised graph</span>
<span class="kr">data</span> <span class="kt">GPath</span> <span class="o">:</span> <span class="kt">ParGraph</span> <span class="n">par</span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">ParGraph</span> <span class="p">(</span><span class="kt">List </span><span class="n">par</span><span class="p">)</span> <span class="n">obj</span> <span class="kr">where</span>
<span class="c1">-- Empty path, with an empty list of grades</span>
<span class="kt">Nil</span> <span class="o">:</span> <span class="kt">GPath</span> <span class="n">g</span> <span class="kt">[]</span> <span class="n">a</span> <span class="n">a</span>
<span class="c1">-- Add an edge to the path, and accumulate its parameter</span>
<span class="p">(</span><span class="o">::</span><span class="p">)</span> <span class="o">:</span> <span class="p">{</span><span class="n">g</span> <span class="o">:</span> <span class="n">par</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">}</span> <span class="o">-></span> <span class="p">{</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">:</span> <span class="n">obj</span><span class="p">}</span>
<span class="o">-></span> <span class="n">g</span> <span class="n">p</span> <span class="n">a</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">GPath</span> <span class="n">g</span> <span class="n">ps</span> <span class="n">b</span> <span class="n">c</span> <span class="o">-></span> <span class="kt">GPath</span> <span class="n">g</span> <span class="p">(</span><span class="n">p</span> <span class="o">::</span> <span class="n">ps</span><span class="p">)</span> <span class="n">a</span> <span class="n">c</span>
</code></pre></div></div>
<p>So a graded path will take in a parameterised graph, and give back a path of edges with an accumulated parameter.
Where could we find such parameterised graphs? This is where the Para construction comes in.
Para takes a category $\mathcal C$, an action of a monoidal category $\mathcal M \times \mathcal C \to \mathcal C$, and gives us a parameterised category over $\mathcal C$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Para over a monoidal category C </span>
<span class="nf">Para :</span> <span class="p">(</span><span class="n">c</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">act</span> <span class="o">:</span> <span class="n">par</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="kt">ParGraph</span> <span class="n">par</span> <span class="n">obj</span>
<span class="kt">Para</span> <span class="n">c</span> <span class="n">act</span> <span class="n">p</span> <span class="n">x</span> <span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="n">p</span> <span class="p">`</span><span class="n">act</span><span class="p">`</span> <span class="n">x</span><span class="p">)</span> <span class="p">`</span><span class="n">c</span><span class="p">`</span> <span class="n">y</span>
</code></pre></div></div>
<p>In other words, we have morphisms and an accumulating parameter.
A simple example is the graded co-reader comonad, also known as the pair comonad.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaSet :</span> <span class="kt">ParGraph</span> <span class="kt">Type Type </span>
<span class="kt">ParaSet</span> <span class="n">p</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=</span> <span class="kt">Para</span> <span class="kt">Set</span> <span class="kt">Pair</span> <span class="n">p</span> <span class="n">a</span> <span class="n">b</span>
<span class="c1">-- A function Nat -> Double, parameterised by Nat</span>
<span class="nf">pair1 :</span> <span class="kt">ParaSet</span> <span class="kt">Nat Nat Double</span>
<span class="c1">-- A function Double -> Int, parameterised by String</span>
<span class="nf">pair2 :</span> <span class="kt">ParaSet</span> <span class="kt">String</span> <span class="kt">Double</span> <span class="kt">Int</span>
<span class="c1">-- A function Nat -> Int, parameterised by [Nat, String]</span>
<span class="nf">ex :</span> <span class="kt">GPath</span> <span class="kt">ParaSet</span> <span class="p">[</span><span class="kt">Nat</span><span class="p">,</span> <span class="kt">String</span><span class="p">]</span> <span class="kt">Nat Int</span>
<span class="n">ex</span> <span class="o">=</span> <span class="p">[</span><span class="n">pair1</span><span class="p">,</span> <span class="n">pair2</span><span class="p">]</span>
</code></pre></div></div>
<p>It works a lot like the standard co-reader comonad, but it now accumulates parameters as we compose functions.</p>
<h2>The category of lenses</h2>
<p>Functional programers tend to be familiar with lenses. They are often presented as coalgebras of the costate comonad, and their links to automatic differentiation <a href="https://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/">are now well known</a>.</p>
<p>Monomorphic lenses correspond to the plain costate comonad, and polymorphic lenses correspond to the indexed version.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Monomorphic Lens</span>
<span class="nf">MLens :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">MLens</span> <span class="n">s</span> <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span> <span class="o">-></span> <span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span>
<span class="c1">-- Polymorphic Lens, Haskell-style</span>
<span class="nf">Lens :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type
Lens</span> <span class="n">s</span> <span class="n">t</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span> <span class="o">-></span> <span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">t</span><span class="p">)</span>
</code></pre></div></div>
<p>Idris allows us to bundle up the arguments for a polymorphic lens into a pair, sometimes called a boundary. This will help us form the category of parametric lenses more cleanly, as well as cut down on the number of types that we need to wrangle.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Boundary : Type
Boundary = (Type, Type)
-- Polymorphic lenses are morphisms of boundaries
Lens : Boundary -> Boundary -> Type
Lens (s, t) (a, b) = (s -> a, (s, b) -> t)
</code></pre></div></div>
<p>Both monomorphic and polymorphic lenses form categories. But before we look at them, let’s generalise our notion of lens away from $\mathbf{Set}$ and towards arbitrary (cartesian) monoidal categories.</p>
<p>In other words, given a cartesian monoidal category $\mathcal C$, we want to form the category $\mathbf{Lens} (\mathcal C)$ of lenses over $\mathcal C$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- take a category C, and a cartesian monoidal product, to give back the category Lens(C) </span>
<span class="nf">LensC :</span> <span class="p">(</span><span class="n">c</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">ten</span><span class="o">:</span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Graph</span> <span class="n">obj</span>
<span class="kt">LensC</span> <span class="n">c</span> <span class="n">ten</span> <span class="n">s</span> <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span> <span class="p">`</span><span class="n">c</span><span class="p">`</span> <span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span> <span class="p">`</span><span class="n">ten</span><span class="p">`</span> <span class="n">a</span><span class="p">)</span> <span class="p">`</span><span class="n">c</span><span class="p">`</span> <span class="n">s</span><span class="p">)</span>
</code></pre></div></div>
<p>We then take $\mathbf{Para}$ of this construction, giving us the category $\mathbf{Para} (\mathbf{Lens} (\mathcal C))$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaLensSet :</span> <span class="kt">ParGraph</span> <span class="kt">Type Type </span>
<span class="kt">ParaLensSet</span> <span class="n">p</span> <span class="n">s</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">Para</span> <span class="p">(</span><span class="kt">LensC</span> <span class="kt">Set</span> <span class="kt">Pair</span><span class="p">)</span> <span class="kt">Pair</span> <span class="n">p</span> <span class="n">s</span> <span class="n">t</span>
</code></pre></div></div>
<p>We now have all the theoretical pieces together. At this point, we could simply implement $\mathbf{Para} (\mathbf{Lens} (\mathbf{Set}))$, which would give us the morphisms of our neural network. However, there is one more trick up our sleeve - rather than working in the category of sets, we would like to work in the category of vector spaces.</p>
<p>This means that we will parameterise the above construction to work over some monoidal functor $\mathcal C \to \mathbf{Set}$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaLensF :</span> <span class="p">(</span><span class="n">f</span> <span class="o">:</span> <span class="n">k</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="kt">ParGraph</span> <span class="n">k</span> <span class="n">k</span>
<span class="kt">ParaLensF</span> <span class="n">f</span> <span class="n">p</span> <span class="n">m</span> <span class="n">n</span> <span class="o">=</span> <span class="kt">ParaLensSet</span> <span class="p">(</span><span class="n">f</span> <span class="n">p</span><span class="p">)</span> <span class="p">(</span><span class="n">f</span> <span class="n">m</span><span class="p">)</span> <span class="p">(</span><span class="n">f</span> <span class="n">n</span><span class="p">)</span>
</code></pre></div></div>
<p>And now, let us proceed to do machine learning.</p>
<h2>Tensor algebra from first principles</h2>
<p>First we will introduce the type of tensors of arbitrary rank. Our first instinct would be to do this with a function</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">Tensor' :</span> <span class="kt">List Nat </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">Tensor'</span> <span class="kt">[]</span> <span class="o">=</span> <span class="kt">Double</span>
<span class="kt">Tensor'</span> <span class="p">(</span><span class="n">n</span> <span class="o">::</span> <span class="n">ns</span><span class="p">)</span> <span class="o">=</span> <span class="kt">Fin</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Tensor'</span> <span class="n">ns</span>
</code></pre></div></div>
<p>But unfortunately this will mess up with type inference down the line. Dependent types tend to struggle when it comes to inferring types whose codomain contains arbitrary computation. This is what Conor McBride calls “green slime”, and is one of the major pitfalls that functional programmers encounter when they try to make the jump to dependent types.</p>
<p>For this reason, we will represent our rank-n tensors using a datatype, which will allow Idris to infer the types much more easily. Luckily, tensors are easily represented using an alternative datatype that’s popular in Haskell.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Tensor</span> <span class="o">:</span> <span class="kt">List Nat </span><span class="o">-></span> <span class="kt">Type </span><span class="kr">where</span>
<span class="kt">Scalar</span> <span class="o">:</span> <span class="kt">Double</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="kt">Nil</span>
<span class="kt">Dim</span> <span class="o">:</span> <span class="kt">Vect</span> <span class="n">n</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="n">ns</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">(</span><span class="n">n</span> <span class="o">::</span> <span class="n">ns</span><span class="p">)</span>
</code></pre></div></div>
<p>This is essentially a nesting of vectors, which accumulates their sizes.</p>
<p>All together, our datatype of parameterised lenses over tensors becomes</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaLensTensor :</span> <span class="kt">ParGraph</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span>
<span class="kt">ParaLensTensor</span> <span class="n">pars</span> <span class="n">ms</span> <span class="n">ns</span> <span class="o">=</span> <span class="kt">ParaLensF</span> <span class="kt">Tensor</span> <span class="n">pars</span> <span class="n">ms</span> <span class="n">ns</span>
</code></pre></div></div>
<p>We can now start writing neural networks. I’ll be mostly adapting <a href="https://zenn.dev/lotz/articles/14458f024674e14f4134">Tatsuya’s code</a> in the following section. The full code for our project can be found <a href="https://github.com/zanzix/idris-neural-net">here</a>, and I’ll only include the most interesting bits.</p>
<p>Unlike the original code, we will be using a heterogeneous list - rather than nested tuples - to keep track of all of our parameters, which is why the resulting dimensions will be much easier to track.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">linear :</span> <span class="p">{</span><span class="n">n</span><span class="p">,</span> <span class="n">m</span> <span class="o">:</span> <span class="kt">Nat</span><span class="p">}</span> <span class="o">-></span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span>
<span class="n">linear</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span>
<span class="n">getter</span> <span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="o">=</span> <span class="n">joinM</span> <span class="n">w</span> <span class="n">x</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="n">outer</span> <span class="n">y</span> <span class="n">x</span><span class="p">,</span> <span class="n">joinM</span> <span class="p">(</span><span class="n">dist</span> <span class="n">w</span><span class="p">)</span> <span class="n">y</span><span class="p">)</span>
<span class="nf">bias :</span> <span class="p">{</span><span class="n">n</span> <span class="o">:</span> <span class="kt">Nat</span><span class="p">}</span> <span class="o">-></span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">bias</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">getter</span> <span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="o">=</span> <span class="n">pointwise</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="n">x</span> <span class="n">b</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="nf">relu :</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">relu</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">getter</span> <span class="p">(</span><span class="kr">_</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="o">=</span> <span class="n">dvmap</span> <span class="p">(</span><span class="nb">max </span><span class="mf">0.0</span><span class="p">)</span> <span class="n">x</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">,</span> <span class="n">pointwise</span> <span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="n">y</span> <span class="p">(</span><span class="n">dvmap</span> <span class="n">step</span> <span class="n">x</span><span class="p">))</span> <span class="kr">where</span>
<span class="n">step</span> <span class="o">:</span> <span class="kt">Double</span> <span class="o">-></span> <span class="kt">Double</span>
<span class="n">step</span> <span class="n">x</span> <span class="o">=</span> <span class="kr">if</span> <span class="n">x</span> <span class="o">></span> <span class="mf">0</span> <span class="kr">then</span> <span class="mf">1</span> <span class="kr">else</span> <span class="mf">0</span>
<span class="nf">learningRate :</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="kt">[]</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span>
<span class="n">learningRate</span> <span class="o">=</span> <span class="p">(</span><span class="nb">const </span><span class="p">(</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">),</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="kt">[]</span><span class="p">),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="kt">[]</span><span class="p">)</span>
<span class="n">setter</span> <span class="p">((</span><span class="kr">_</span><span class="p">,</span> <span class="p">(</span><span class="kt">Scalar</span> <span class="n">loss</span><span class="p">)),</span> <span class="kr">_</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">,</span> <span class="kt">Scalar</span> <span class="p">(</span><span class="o">-</span><span class="mf">0.2</span> <span class="o">*</span> <span class="n">loss</span><span class="p">))</span>
<span class="nf">crossEntropyLoss :</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="kt">[]</span>
<span class="n">crossEntropyLoss</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="kt">[]</span>
<span class="n">getter</span> <span class="p">(</span><span class="n">y'</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span>
<span class="kr">let</span> <span class="kt">Scalar</span> <span class="n">dot'</span> <span class="o">=</span> <span class="n">dot</span> <span class="n">y'</span> <span class="n">y</span> <span class="kr">in</span>
<span class="kt">Scalar</span> <span class="p">(</span><span class="nb">log </span><span class="p">(</span><span class="n">sumElem</span> <span class="p">(</span><span class="n">dvmap</span> <span class="nb">exp </span><span class="n">y</span><span class="p">))</span> <span class="o">-</span> <span class="n">dot'</span><span class="p">)</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="kt">[]</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="n">y'</span><span class="p">,</span> <span class="n">y</span><span class="p">),</span> <span class="p">(</span><span class="kt">Scalar</span> <span class="n">z</span><span class="p">))</span> <span class="o">=</span> <span class="kr">let</span>
<span class="n">expY</span> <span class="o">=</span> <span class="n">dvmap</span> <span class="nb">exp </span><span class="n">y</span>
<span class="n">sumExpY</span> <span class="o">=</span> <span class="n">sumElem</span> <span class="n">expY</span> <span class="kr">in</span>
<span class="p">(</span><span class="n">dvmap</span> <span class="p">(</span><span class="o">*</span> <span class="p">(</span><span class="o">-</span><span class="n">z</span><span class="p">))</span> <span class="n">y</span><span class="p">,</span>
<span class="n">dvmap</span> <span class="p">(</span><span class="o">*</span> <span class="n">z</span><span class="p">)</span> <span class="p">(</span>
<span class="p">((</span><span class="n">pointwise</span> <span class="p">(</span><span class="o">-</span><span class="p">)</span> <span class="p">(</span><span class="n">dvmap</span> <span class="p">(</span><span class="o">/</span><span class="n">sumExpY</span><span class="p">)</span> <span class="n">expY</span><span class="p">)</span> <span class="n">y'</span><span class="p">))))</span>
<span class="c1">-- Our final model: parameters source target</span>
<span class="nf">model :</span> <span class="kt">GPath</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="o"><</span> <span class="p">[</span><span class="mf">4</span><span class="p">,</span> <span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">]]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span>
<span class="n">model</span> <span class="o">=</span> <span class="p">[</span><span class="o"><</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">,</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">]</span>
</code></pre></div></div>
<p>All that remains is to implement an algebra for this structure. Normally we would use the generic recursion schemes machinery to do this, but for now we will implement a one-off fold specialized to graded paths.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Evaluate the free graded category over ParaLensTensor</span>
<span class="nf">eval :</span> <span class="kt">GPath</span> <span class="kt">ParaLensTensor</span> <span class="n">ps</span> <span class="n">s</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">ParaLensTensorEnvS</span> <span class="n">ps</span> <span class="n">s</span> <span class="n">t</span>
<span class="n">eval</span> <span class="p">[</span><span class="o"><</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="kr">_</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="n">s</span><span class="p">,</span> <span class="nf">\</span><span class="p">((</span><span class="n">l</span><span class="p">,</span> <span class="n">s'</span><span class="p">),</span> <span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="p">([</span><span class="o"><</span><span class="p">],</span> <span class="n">s</span><span class="p">))</span>
<span class="n">eval</span> <span class="p">(</span><span class="n">es</span> <span class="o">:<</span> <span class="p">(</span><span class="n">fw</span><span class="p">,</span> <span class="n">bw</span><span class="p">))</span> <span class="o">=</span> <span class="kr">let</span> <span class="p">(</span><span class="n">fw'</span><span class="p">,</span> <span class="n">bw'</span><span class="p">)</span> <span class="o">=</span> <span class="n">eval</span> <span class="n">es</span> <span class="kr">in</span>
<span class="p">(</span><span class="nf">\</span><span class="p">((</span><span class="n">ps</span> <span class="o">:<</span> <span class="n">p</span><span class="p">),</span> <span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="kr">let</span> <span class="n">b</span> <span class="o">=</span> <span class="n">fw'</span> <span class="p">(</span><span class="n">ps</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span> <span class="kr">in</span> <span class="n">fw</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(((</span><span class="n">ps</span> <span class="o">:<</span> <span class="n">p</span><span class="p">),</span> <span class="n">s</span><span class="p">),</span> <span class="n">dt</span><span class="p">)</span> <span class="o">=></span> <span class="kr">let</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">fw'</span> <span class="p">(</span><span class="n">ps</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
<span class="p">(</span><span class="n">p'</span><span class="p">,</span> <span class="n">b'</span><span class="p">)</span> <span class="o">=</span> <span class="n">bw</span> <span class="p">((</span><span class="n">p</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">dt</span><span class="p">)</span>
<span class="p">(</span><span class="n">ps'</span><span class="p">,</span> <span class="n">s'</span><span class="p">)</span> <span class="o">=</span> <span class="n">bw'</span> <span class="p">((</span><span class="n">ps</span><span class="p">,</span> <span class="n">s</span><span class="p">),</span> <span class="n">b'</span><span class="p">)</span>
<span class="kr">in</span> <span class="p">(</span><span class="n">ps'</span> <span class="o">:<</span> <span class="n">p'</span><span class="p">,</span> <span class="n">s'</span><span class="p">)))</span>
</code></pre></div></div>
<p>It would actually be possible to write an individual algebra for $\mathbf{Lens} (\mathcal C)$ and $\mathbf{Para} (\mathcal C)$ and then compose them into an algebra $\mathbf{Para} (\mathbf{Lens} (\mathcal C))$, but we can leave that for a future blog post.</p>
<h2>Defunctionalizing and working with the FFI</h2>
<p>Running a neural network in Idris compared to NumPy is going to be obviously slow. However, since we’re working entirely with free categories, it means that we don’t have to actually evaluate our functions in Idris!</p>
<p>What we can do is organise all of our functions into a signature, where each constructor corresponds to a primitive function in the target language. We could then use the FFI to interpret them, allowing us to get both the static guarantees of Idris and the performance of NumPy.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">TensorSig</span> <span class="o">:</span> <span class="kt">ParGraph</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span> <span class="kr">where</span>
<span class="kt">Linear</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span>
<span class="kt">Bias</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="kt">Relu</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="kt">LearningRate</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="mf">1</span><span class="p">]</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span>
<span class="kt">CrossEntropyLoss</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="kt">[]</span>
<span class="kt">SoftMax</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="nf">model' :</span> <span class="kt">GPath</span> <span class="kt">PTensorSig</span> <span class="p">[</span><span class="o"><</span> <span class="p">[</span><span class="mf">4</span><span class="p">,</span> <span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">]]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span>
<span class="n">model'</span> <span class="o">=</span> <span class="p">[</span><span class="o"><</span> <span class="kt">Linear</span><span class="p">,</span> <span class="kt">Bias</span><span class="p">,</span> <span class="kt">Relu</span><span class="p">,</span> <span class="kt">Linear</span><span class="p">,</span> <span class="kt">Bias</span><span class="p">,</span> <span class="kt">Relu</span><span class="p">]</span>
</code></pre></div></div>
<p>We’ve also only sketched out the tensor operations, but we could take this a step forward and develop a proper tensor library in Idris.</p>
<p>In a future post, we will see how to enhance the above with auto-diff: meaning that the user needs to only supply the getter, and the setter will be derived automatically.</p>Zanzi MihejevsIn this post we will look at how dependent types can allow us to effortlessly implement the category theory of machine learning directly, opening up a path to new generalisations.Enriched Closed Lenses2024-04-12T00:00:00+00:002024-04-12T00:00:00+00:00https://cybercat-institute.github.io//2024/04/12/enriched-closed-lenses<p>I’m going to record something that I think is known to everyone doing research on categorical cybernetics, but I don’t think has been written down somewhere: an even more general version of mixed optics that replaces the backwards actegory with an enrichment. With it, I’ll make sense of a curious definition appearing in <a href="https://homepages.inf.ed.ac.uk/gdp/publications/compiler-forest.pdf">The Compiler Forest</a>.</p>
<h1>Actegories and enrichments</h1>
<p>An <strong>actegory</strong> consists of a monoidal category $\mathcal M$, a category $\mathcal C$ and a functor $\bullet : \mathcal M \times \mathcal C \to \mathcal C$ that behaves like an “external product”: namely that it’s equipped with coherent isomorphisms $I \bullet X \cong X$ and $(M \otimes N) \bullet X \cong M \bullet (N \bullet X)$.</p>
<p>An <strong>enriched category</strong> consists of a category $\mathcal C$, a monoidal category $\mathcal M$ and a functor $[-, -] : \mathcal C^\mathrm{op} \times \mathcal C \to \mathcal M$ that behaves like an “external hom” (I’m not going to write down what this means because it’s more complicated).</p>
<p>There’s a very close relationship between actegories and enrichments, to the point that I consider them different perspectives on the same idea. This is the <em>final form</em> of the famous tensor-hom adjunction, aka. currying. (I learned this incredible fact from Matteo Capucci, and I have no idea where it’s written down, although it’s definitely written down somewhere.)</p>
<p>A <strong>tensored enrichment</strong> is one where every $[Z, -] : \mathcal C \to \mathcal M$ has a left adjoint $- \bullet X : \mathcal M \to \mathcal C$. Allowing $Z$ to vary results in a functor $\bullet$ which (nontrivial theorem) is always an actegory.</p>
<p>A <strong>closed actegory</strong> is one where every $- \bullet Z : \mathcal M \to \mathcal C$ has a right adjoint $[Z, -] : \mathcal C \to \mathcal M$. Allowing $Z$ to vary results in a functor $[-, -]$ which (nontrivial theorem) is always an enrichment.</p>
<p>So, closed actegories and tensored enrichments are equivalent ways of defining the same thing, namely a monoidal category $\mathcal M$ and category $\mathcal C$ equipped with $\bullet$ and $[-, -]$ related by a tensor-hom adjunction $\mathcal C (X \bullet Z, Y) \cong \mathcal M (Z, [X, Y])$.</p>
<h1>Parametrisation</h1>
<p>Given an actegory, we can define a bicategory \(\mathbf{Para}_\mathcal M (\mathcal C)\), whose objects are objects of $\mathcal C$ and 1-cells are pairs of $M : \mathcal M$ and $f : \mathcal C (M \bullet X, Y)$. We can also define a bicategory \(\mathbf{Copara}_\mathcal M (\mathcal C)\), whose objects are objects of $\mathcal C$ and 1-cells are pairs of $M : \mathcal M$ and $f : \mathcal C (X, M \bullet Y)$.</p>
<p>Given an enriched category, we can define a bicategory \(\mathbf{Para}_\mathcal M (\mathcal C)\), whose objects are objects of $\mathcal C$ and morphisms are pairs of $M : \mathcal M$ and $f : \mathcal M (M, [X, Y])$. If this is a tensored enrichment then the two definitions of \(\mathbf{Para}_\mathcal M (\mathcal C)\) are equivalent.</p>
<p>In all of these cases we are locally fibred over $\mathcal M$, and I will write \(\mathbf{Para}_\mathcal M (\mathcal C) (X, Y) (M)\), \(\mathbf{Copara}_\mathcal M (\mathcal C) (X, Y) (M)\) for the set of co/parametrised morphisms with a fixed parameter type.</p>
<p>It’s not possible to define $\mathbf{Copara}_\mathcal M (\mathcal C)$ for an enrichment. There’s a very slick common generalisation of actegories and enrichments called a <a href="https://ncatlab.org/nlab/show/locally+graded+category">locally graded category</a>, which is a category enriched in presheaves with Day convolution. There’s also a very slick definition of $\mathbf{Para}$ for a locally graded category. I’d like to know, for exactly which locally graded categories is possible to define $\mathbf{Copara}$?</p>
<h1>Mixed optics</h1>
<p>If we have two actegories $\mathcal C, \mathcal D$ that share the same acting category $\mathcal M$ then we can define <strong>mixed optics</strong>, which first appeared in <a href="https://compositionality-journal.org/papers/compositionality-6-1/">Profunctor Optics: A Categorical Update</a>. This is a 1-category \(\mathbf{Optic}_\mathcal M (\mathcal C, \mathcal D)\) whose objects are pairs $\binom{X}{X’}$ of an object of $\mathcal C$ and an object of $\mathcal D$, and a morphism $\binom{X}{X’} \to \binom{Y}{Y’}$ is an element of the coend</p>
\[\int^{M : \mathcal M} \mathbf{Copara}_\mathcal M (\mathcal C) (X, Y) (M) \times \mathbf{Para}_\mathcal M (\mathcal D) (Y', X') (M)\]
<p>There’s a slightly more general definition called “weighted optics” that appears in <a href="https://arxiv.org/abs/2403.13001">Bruno’s thesis</a> and was used very productively there, which replaces $\mathcal M$ with two monoidal categories related by a Tambara module. I think that it’s an orthogonal generalisation to the one I’m about to do here.</p>
<h1>Enriched closed lenses</h1>
<p>Putting together everything I’ve just said, the next step is clear. If we have categories $\mathcal C, \mathcal D$ and a monoidal category $\mathcal M$, with $\mathcal M$ acting on $\mathcal C$ and $\mathcal D$ enriched in $\mathcal M$, then we can still define \(\mathbf{Optic}_\mathcal M (\mathcal C, \mathcal D)\) in exactly the same way, replacing \(\mathbf{Para}_\mathcal M (\mathcal D)\) with its enriched version. But now, unlike before, we can use the ninja Yoneda lemma to eliminate the coend and get</p>
\[\mathbf{Optic}_\mathcal M (\mathcal C, \mathcal D) \left( \binom{X}{X'}, \binom{Y}{Y'} \right) \cong \mathcal C (X, [Y', X'] \bullet Y)\]
<p>In general I refer to optics that can be defined without type quantification as <em>lenses</em>, and so this is an <strong>enriched closed lens</strong>. It’s the <em>final form</em> of “linear lenses”, the version of lenses that is defined like <code class="language-plaintext highlighter-rouge">Lens s t a b = s -> (a, b -> t)</code>.</p>
<h1>Into the compiler forest</h1>
<p>Section 5 of <a href="https://homepages.inf.ed.ac.uk/gdp/publications/compiler-forest.pdf">The Compiler Forest</a> by Budiu, Galenson and Plotkin has a <em>very</em> interesting definition in it. They have a cartesian closed category $\mathcal C$ (whose internal hom I’ll write as $\to$) and a strong monad $T$ on it, and they define a category whose objects are pairs of objects of $\mathcal C$ and whose morphisms $f : \binom{X}{X’} \to \binom{Y}{Y’}$ are morphisms $f : X \to T (Y \times (Y’ \to T X’))$ of $\mathcal C$.</p>
<p>They also nail an intuition for lenses that I use constantly and I haven’t seen written down anywhere else: problems go forwards, solutions go backwards.</p>
<p>Me and this definition have quite a history. It came to my attention while polishing <a href="https://compositionality-journal.org/papers/compositionality-5-9/">Bayesian Open Games</a> for submission. For a while, I thought that it was equivalent to optics in the kleisli category of $T$, and we’d wasted a years of our lives trying to understand optics (this being around 2018, when optics were still a niche idea). Then, for a while I thought that the paper made a mistake and these things don’t compose associatively. Now I’ve made peace: I think their definition is <em>conceptually</em> subtly wrong in a way that makes no difference in practice, and I can say very precisely how it relates to kleisli optics.</p>
<p>There is an action of $\mathcal C$ on $\mathrm{Kl} (T)$ given by $M \bullet X = M \otimes X$, where $\otimes$ is the tensor product of $\mathrm{Kl} (T)$ which on objects is given by the product $\times$ of $\mathcal C$. That’s the actegory generated by the strong monoidal embedding $\mathcal C \hookrightarrow \mathrm{Kl} (T)$. There is also an enrichment of $\mathrm{Kl} (T)$ in $\mathcal C$, given by $[X, Y] = X \to T Y$. This action and enrichment are adjoint to each other: $\mathrm{Kl} (T) (M \otimes X, Y) \cong \mathcal C (X, M \to TY)$.</p>
<p>The category defined in Compiler Forest turns out to be equivalent to</p>
\[\mathrm{Optic}_\mathcal C (\mathrm{Kl} (T), \mathrm{Kl} (T))\]
<p>whose forwards pass is given by the action of $\mathcal C$ on $\mathrm{Kl} (T)$ and whose backwards pass is given by the enrichment of $\mathrm{Kl} (T)$ in $\mathcal C$. Its hom-sets are given by</p>
\[\mathrm{Optic}_\mathcal C (\mathrm{Kl} (T), \mathrm{Kl} (T)) \left( \binom{X}{X'}, \binom{Y}{Y'} \right)\]
\[= \int^{M : \mathcal C} \mathcal C (X, T (M \times Y)) \times \mathcal C (M, Y' \to T X')\]
<p>which Yoneda-reduces to the definition in the paper.</p>
<p>Even though the action and enrichment are adjoint, this is <em>not</em> the same as optics in the klesli category:</p>
\[\mathrm{Optic}_\mathcal C (\mathrm{Kl} (T), \mathrm{Kl} (T)) \not\cong \mathrm{Optic}_{\mathrm{Kl} (T)} (\mathrm{Kl} (T), \mathrm{Kl} (T))\]
<p>where the hom-sets of the latter are defined by</p>
\[\mathrm{Optic}_{\mathrm{Kl} (T)} (\mathrm{Kl} (T), \mathrm{Kl} (T)) \left( \binom{X}{X'}, \binom{Y}{Y'} \right)\]
\[= \int^{M : \mathrm{Kl} (T)} \mathcal C (X, T (M \times Y)) \times \mathcal C (M \times Y', T X')\]
<p>This equivalence, between optics whose backwards passes are an adjoint action or enrichment, would be a completely reasonable-looking lemma but it just isn’t true!</p>
<p>The difference between them is extremely subtle, though. The “proper” definition of kleisli optics identifies morphisms that agree up to sliding any kleisli morphism, whereas the definition in Compiler Forest only identifies morphisms that agree up to sliding pure morphisms of $\mathcal C$. So hom-sets of coend optics are a quotient of the hom-sets defined in Compiler Forest. While writing this up, I realised that most of this conclusion actually appears in section 4.9 of <a href="https://arxiv.org/abs/1809.00738">Riley’s original paper</a>.</p>
<p>As long as you don’t care about equality of morphisms - which in practice is never, because they are made of functions - the difference between them can be safely ignored. The only genuine reason to prefer kleisli optics is <a href="https://arxiv.org/abs/2209.09351">their better runtime performance</a>.</p>Jules HedgesI'm going to record something that I think is known to everyone doing research on categorical cybernetics, but I don't think has been written down somewhere: an even more general version of mixed optics that replaces the backwards actegory with an enrichment. With it, I'll make sense of a curious definition appearing in The Compiler Forest.Modular Error Reporting with Dependent Lenses2024-04-08T00:00:00+00:002024-04-08T00:00:00+00:00https://cybercat-institute.github.io//2024/04/08/modular-error-reporting<p>A big part of programming language design is in feedback delivery. One aspect of feedback is parse errors. Parsing is a very large area of research and there are new developments from industry that make it easier and faster than ever to parse files. This post is about an application of dependent lenses that facilitate the job of reporting error location from a parsing pipeline.</p>
<h2>What is parsing & error reporting</h2>
<p>A simple parser could be seen as a function with the signature</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">parse :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Maybe </span><span class="n">output</span>
</code></pre></div></div>
<p>where <code class="language-plaintext highlighter-rouge">output</code> is a parsed value.</p>
<p>In that context, an error is represented with a value of <code class="language-plaintext highlighter-rouge">Nothing</code>, and a successful value is represented with <code class="language-plaintext highlighter-rouge">Just</code>. However, in the error case, we don’t have enough information to create a helpful diagnostic, we can only say “parse failed” but we cannot say why or where the error came from. One way to help with that is to make the type aware of its context and carry the error location in the type:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">parseLoc :</span> <span class="n">string</span> <span class="o">-></span> <span class="kt">Either Loc</span> <span class="n">output</span>
</code></pre></div></div>
<p>where <code class="language-plaintext highlighter-rouge">Loc</code> holds the file, line, and column of the state of the parser.
This is a very successful implementation of a parser with locations and many languages deployed today use a similar architecture where the parser, and its error-reporting mechanism, keep track of the context in which they are parsing files and use it to produce helpful diagnostics.</p>
<p>I believe that there is a better way, one that does not require a tight integration between the error-generating process (here parsing) and the error-reporting process (here, location tracking). For this, I will be using container morphisms, or dependent lenses, to represent parsing and error reporting.</p>
<h2>Dependent lenses</h2>
<p>Dependent lenses are a generalisation of lenses where the backward part makes use of dependent types to keep track of the origin and destination of arguments. For reference the type of a lens <code class="language-plaintext highlighter-rouge">Lens a a' b b'</code> is given by the two functions:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">get : a -> b</code></li>
<li><code class="language-plaintext highlighter-rouge">set : a -> b' -> a'</code></li>
</ul>
<p>Dependent lenses follow the same pattern, but their types are indexed:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">record</span> <span class="kt">DLens</span> <span class="o">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a'</span> <span class="o">:</span> <span class="n">a</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">b</span> <span class="o">:</span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">b'</span> <span class="o">:</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">get</span> <span class="o">:</span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span>
<span class="n">set</span> <span class="o">:</span> <span class="p">(</span><span class="n">x</span> <span class="o">:</span> <span class="n">a</span><span class="p">)</span> <span class="o">-></span> <span class="n">b'</span> <span class="p">(</span><span class="n">get</span> <span class="n">x</span><span class="p">)</span> <span class="o">-></span> <span class="n">a'</span> <span class="n">x</span>
</code></pre></div></div>
<p>The biggest difference with lenses is the second argument of <code class="language-plaintext highlighter-rouge">set</code>: <code class="language-plaintext highlighter-rouge">b' (get x)</code>. It means that we always get a <code class="language-plaintext highlighter-rouge">b'</code> that is indexed over the result of <code class="language-plaintext highlighter-rouge">get</code>, for this to typecheck, we <em>must know</em> the result of <code class="language-plaintext highlighter-rouge">get</code>.</p>
<p>This change in types allows a change in perspective. Instead of treating lenses as ways to convert between data types, we use lenses to convert between query/response APIs.</p>
<p><img src="/assetsPosts/2024-04-08-modular-error-reporting/lens2.png" alt="Lens" /></p>
<p>On each side <code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">B</code> are queries and <code class="language-plaintext highlighter-rouge">A'</code> and <code class="language-plaintext highlighter-rouge">B'</code> are <em>corresponding responses</em>. The two functions defining the lens have type <code class="language-plaintext highlighter-rouge">get : A -> B</code>, and <code class="language-plaintext highlighter-rouge">set : (x : A) -> A' (get x) -> B' x</code>, that is, a way to convert queries together, and a way to <em>rebuild</em> responses given a query. A lens is therefore a mechanism to map between one API to another.</p>
<p>If the goal is to find on what line an error occurs, then what the <code class="language-plaintext highlighter-rouge">get</code> function can do is split our string into multiple lines, each of which will be parsed separately.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">splitLines :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">List String</span>
</code></pre></div></div>
<p>Once we have a list of strings, we can call a parser on each line, this will be a function like above <code class="language-plaintext highlighter-rouge">parseLine : String -> Maybe output</code>. By composing those two functions we have the signature <code class="language-plaintext highlighter-rouge">String -> List (Maybe output)</code>. This gives us a hint as to what the response for <code class="language-plaintext highlighter-rouge">splitLine</code> should be, it should be a list of potential outputs. If we draw our lens again we have the following types:</p>
<p><img src="/assetsPosts/2024-04-08-modular-error-reporting/lens.png" alt="Lens" /></p>
<p>We are using <code class="language-plaintext highlighter-rouge">(String, String)</code> on the left to represent “files as inputs” and “messages as outputs” both of which are plain strings.</p>
<p>There is a slight problem with this, given a <code class="language-plaintext highlighter-rouge">List (Maybe output)</code> we actually have no way to know which of the values refer to which line. For example, if the outputs are numbers and we know the input is the file</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>23
24
3
</code></pre></div></div>
<p>and we are given the output <code class="language-plaintext highlighter-rouge">[Nothing, Nothing, Just 3]</code> we have no clue how to interpret the <code class="language-plaintext highlighter-rouge">Nothing</code> and how it’s related to the result of splitting the lines, they’re not even the same size. We can “guess” some behaviors but that’s really flimsy reasoning, ideally the API translation system should keep track of that so that we don’t have to guess what’s the correct behavior. And really, it should be telling us what the relationship is, we shouldn’t even be thinking about this.</p>
<p>So instead of using plain lists, we are going to keep the information <em>in the type</em> by using dependent types. The following type keeps track of an “origin” list and its constructors store values that fulfill a predicate in the origin list along with their position in the list:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Some</span> <span class="o">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="kt">List </span><span class="n">a</span> <span class="o">-></span> <span class="kt">Type </span><span class="kr">where</span>
<span class="kt">None</span> <span class="o">:</span> <span class="kt">Some</span> <span class="n">p</span> <span class="n">xs</span>
<span class="kt">This</span> <span class="o">:</span> <span class="n">p</span> <span class="n">x</span> <span class="o">-></span> <span class="kt">Some</span> <span class="n">p</span> <span class="n">xs</span> <span class="o">-></span> <span class="kt">Some</span> <span class="n">p</span> <span class="p">(</span><span class="n">x</span> <span class="o">::</span> <span class="n">xs</span><span class="p">)</span>
<span class="kt">Skip</span> <span class="o">:</span> <span class="kt">Some</span> <span class="n">p</span> <span class="n">xs</span> <span class="o">-></span> <span class="kt">Some</span> <span class="n">p</span> <span class="p">(</span><span class="n">x</span> <span class="o">::</span> <span class="n">xs</span><span class="p">)</span>
</code></pre></div></div>
<p>We can now write the above situation with the type <code class="language-plaintext highlighter-rouge">Some (const Unit) ["23", "", "24", "3"]</code> which is inhabited by the value <code class="language-plaintext highlighter-rouge">Skip $ Skip $ Skip $ This () None</code> to represent the fact that only the last element is relevant to us. This ensures that the response always matches the query.</p>
<p>Once we are given a value like the above we can convert our response into a string that says <code class="language-plaintext highlighter-rouge">"only 3 parsed correctly"</code>.</p>
<h2>A Simple parser</h2>
<p>Equipped with dependent lenses, and a type to keep track of partial errors, we can start writing a parsing pipeline that keeps track of locations without interfering with the actual parsing. For this, we start with a simple parsing function:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">containsEven :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Maybe Int</span>
<span class="n">containsEven</span> <span class="n">str</span> <span class="o">=</span> <span class="n">parseInteger</span> <span class="n">str</span> <span class="o">>>=</span> <span class="p">(</span><span class="nf">\</span><span class="n">i</span> <span class="o">:</span> <span class="kt">Int</span> <span class="o">=></span> <span class="n">toMaybe</span> <span class="p">(</span><span class="n">even</span> <span class="n">i</span><span class="p">)</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<p>This will return a number if it’s even, otherwise it will fail. From this we want to write a parser that will parse an entire file, and return errors where the file does not parse. We do this by writing a lens that will split a file into lines and then rebuild responses into a string such that the string contains the line number.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">splitFile :</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">SomeC</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="n">output</span><span class="p">)</span>
<span class="n">splitFile</span> <span class="o">=</span> <span class="kt">MkMorphism</span> <span class="nb">lines </span><span class="n">printErrors</span>
<span class="kr">where</span>
<span class="n">printError</span> <span class="o">:</span> <span class="p">(</span><span class="n">orig</span> <span class="o">:</span> <span class="kt">List String</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">i</span> <span class="o">:</span> <span class="kt">Fin</span> <span class="p">(</span><span class="nb">length </span><span class="n">orig</span><span class="p">))</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">printError</span> <span class="n">orig</span> <span class="n">i</span> <span class="o">=</span> <span class="s">"At line </span><span class="se">\</span><span class="err">{show (c</span><span class="se">a</span><span class="s">st {to = Nat} i)}: Could not parse </span><span class="se">\"\</span><span class="err">{i</span><span class="se">n</span><span class="s">dex' orig i}</span><span class="se">\"</span><span class="s">"</span>
<span class="n">printErrors</span> <span class="o">:</span> <span class="p">(</span><span class="n">input</span> <span class="o">:</span> <span class="kt">String</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Some</span> <span class="p">(</span><span class="nb">const </span><span class="n">error</span><span class="p">)</span> <span class="p">(</span><span class="nb">lines </span><span class="n">input</span><span class="p">)</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">printErrors</span> <span class="n">input</span> <span class="n">x</span> <span class="o">=</span> <span class="nb">unlines </span><span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="n">printError</span> <span class="p">(</span><span class="nb">lines </span><span class="n">input</span><span class="p">))</span> <span class="p">(</span><span class="n">getMissing</span> <span class="n">x</span><span class="p">))</span>
</code></pre></div></div>
<p>Some notation: <code class="language-plaintext highlighter-rouge">=%></code> is the binary operator for dependent lenses, and <code class="language-plaintext highlighter-rouge">:-</code> is the binary operator for non-dependent boundaries. Later <code class="language-plaintext highlighter-rouge">!></code> will be used for dependent boundaries.</p>
<p><code class="language-plaintext highlighter-rouge">printErrors</code> builds an error message by collecting the line number that failed. We use the missing values from <code class="language-plaintext highlighter-rouge">Some</code> as failed parses. Equipped with this program, we should be able to generate an error message that looks like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>At line 3: could not parse "test"
At line 10: could not parse "-0.012"
At line 12: could not parse ""
</code></pre></div></div>
<p>The only thing left is to put together the parser and the line splitter. We do this by composing them into a larger lens via lens composition and then extracting the procedure from the larger lens. First we need to convert our parser into a lens.</p>
<p>Any function <code class="language-plaintext highlighter-rouge">a -> b</code> can also be written as <code class="language-plaintext highlighter-rouge">a -> () -> b</code> and any function of that type can be embedded in a lens <code class="language-plaintext highlighter-rouge">(a :- b) =%> (() :- ())</code>. That’s what we do with our parser and we end up with this lens:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">parserLens :</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">Maybe Int</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span> <span class="c1">-- this is the unit boundary () :- ()</span>
<span class="n">parserLens</span> <span class="o">=</span> <span class="n">embed</span> <span class="n">parser</span>
</code></pre></div></div>
<p>We can lift any lens with a failable result into one that keeps track of the origin of the failure:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">lineParser :</span> <span class="kt">SomeC</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">Int</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span>
<span class="n">lineParser</span> <span class="o">=</span> <span class="n">someToAll</span> <span class="o">|></span> <span class="kt">AllListMap</span> <span class="n">parserLens</span> <span class="o">|></span> <span class="n">close</span>
</code></pre></div></div>
<p>We can now compose this lens with the one above that adjusts the error message using the line number:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">composedParser :</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span>
<span class="n">composedParser</span> <span class="o">=</span> <span class="n">splitFile</span> <span class="o">|></span> <span class="n">lineParser</span>
</code></pre></div></div>
<p>Knowing that a function <code class="language-plaintext highlighter-rouge">a -> b</code> can be converted into a lens <code class="language-plaintext highlighter-rouge">(a :- b) =%> CUnit</code> we can do the opposite, we can convert any lens with a unit codomain into a simple function, which gives us a very simple <code class="language-plaintext highlighter-rouge">String -> String</code> program:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">mainProgram :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">mainProgram</span> <span class="o">=</span> <span class="n">extract</span> <span class="n">composedParser</span>
</code></pre></div></div>
<p>Which we can run as part of a command-line program</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">main :</span> <span class="kt">IO </span><span class="nb">()</span>
<span class="n">main</span> <span class="o">=</span> <span class="kr">do</span> <span class="nb">putStrLn </span><span class="s">"give me a file name"</span>
<span class="n">fn</span> <span class="o"><-</span> <span class="n">getLine</span>
<span class="kc">Right</span> <span class="n">fileContent</span> <span class="o"><-</span> <span class="nb">readFile </span><span class="n">fn</span>
<span class="o">|</span> <span class="kc">Left</span> <span class="n">err</span> <span class="o">=></span> <span class="n">printLn</span> <span class="n">err</span>
<span class="kr">let</span> <span class="n">output</span> <span class="o">=</span> <span class="n">mainProgram</span> <span class="n">fileContent</span>
<span class="nb">putStrLn </span><span class="n">output</span>
<span class="n">main</span>
</code></pre></div></div>
<p>And given the file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0
2
-3
20
04
1.2
</code></pre></div></div>
<p>We see:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>At line 2: Could not parse ""
At line 3: Could not parse "-3"
At line 6: Could not parse "1.2"
</code></pre></div></div>
<h2>Handling multiple files</h2>
<p>The program we’ve seen is great but it’s not super clear why we would bother with such a level of complexity if we just want to keep track of line numbers. That is why I will show now how to use the same approach to keep track of file origin without touching the existing program.</p>
<p>To achieve that, we need a lens that will take a list of files, and their content, and keep track of where errors emerged using the same infrastructure as above.</p>
<p>First, we define a filesystem as a mapping of file names to a file content:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">Filename</span> <span class="o">=</span> <span class="kt">String</span>
<span class="kt">Content</span> <span class="o">=</span> <span class="kt">String</span>
<span class="kt">Filesystem</span> <span class="o">=</span> <span class="kt">List </span><span class="p">(</span><span class="kt">Filename</span> <span class="o">*</span> <span class="kt">Content</span><span class="p">)</span>
</code></pre></div></div>
<p>A lens that splits problems into files and rebuilds errors from them will have the following type:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">handleFiles :</span> <span class="kt">Interpolation</span> <span class="n">error</span> <span class="o">=></span>
<span class="p">(</span><span class="kt">Filesystem</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">SomeC</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="n">error</span><span class="p">)</span>
<span class="n">handleFiles</span> <span class="o">=</span> <span class="kt">MkMorphism</span> <span class="p">(</span><span class="nb">map </span><span class="err">π</span><span class="mf">2</span><span class="p">)</span> <span class="n">matchErrors</span>
<span class="kr">where</span>
<span class="n">matchErrors</span> <span class="o">:</span> <span class="p">(</span><span class="n">files</span> <span class="o">:</span> <span class="kt">List </span><span class="p">(</span><span class="kt">String</span> <span class="o">*</span> <span class="kt">String</span><span class="p">))</span> <span class="o">-></span>
<span class="kt">Some</span> <span class="p">(</span><span class="nb">const </span><span class="n">error</span><span class="p">)</span> <span class="p">(</span><span class="nb">map </span><span class="err">π</span><span class="mf">2</span> <span class="n">files</span><span class="p">)</span> <span class="o">-></span>
<span class="kt">String</span>
<span class="n">matchErrors</span> <span class="n">files</span> <span class="n">x</span> <span class="o">=</span> <span class="nb">unlines </span><span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">path</span> <span class="o">&&</span> <span class="n">err</span><span class="p">)</span> <span class="o">=></span> <span class="s">"In file </span><span class="se">\</span><span class="err">{p</span><span class="se">a</span><span class="s">th}:</span><span class="se">\n\</span><span class="err">{e</span><span class="se">r</span><span class="s">r}"</span><span class="p">)</span> <span class="p">(</span><span class="n">zipWithPath</span> <span class="n">files</span> <span class="n">x</span><span class="p">))</span>
</code></pre></div></div>
<p>This time I’m representing failures with the <em>presence</em> of a value in <code class="language-plaintext highlighter-rouge">Some</code> rather than its absence. The rest of the logic is similar: we reconstruct the data from the values we get back in the backward part and return a flat <code class="language-plaintext highlighter-rouge">String</code> as our error message.</p>
<p>Combining this lens with the previous parser is as easy as before:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">filesystemParser :</span> <span class="p">(</span><span class="kt">Filesystem</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span>
<span class="n">filesystemParser</span> <span class="o">=</span> <span class="n">handleFiles</span> <span class="o">|></span> <span class="nb">map </span><span class="n">splitFile</span> <span class="o">|></span> <span class="n">join</span> <span class="p">{</span><span class="n">a</span> <span class="o">=</span> <span class="kt">String</span> <span class="o">:-</span> <span class="kt">Int</span><span class="p">}</span> <span class="o">|></span> <span class="n">lineParser</span>
<span class="nf">fsProgram :</span> <span class="kt">Filesystem</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">fsProgram</span> <span class="o">=</span> <span class="n">extract</span> <span class="n">filesystemParser</span>
</code></pre></div></div>
<p>We can now write a new main function that will take a list of files and return the errors for each file:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">main2 :</span> <span class="kt">IO </span><span class="nb">()</span>
<span class="n">main2</span> <span class="o">=</span> <span class="kr">do</span> <span class="n">files</span> <span class="o"><-</span> <span class="n">askList</span> <span class="kt">[]</span>
<span class="n">filesAndContent</span> <span class="o"><-</span> <span class="n">traverse</span> <span class="p">(</span><span class="nf">\</span><span class="n">fn</span> <span class="o">=></span> <span class="nb">map </span><span class="p">(</span><span class="n">fn</span> <span class="o">&&</span><span class="p">)</span> <span class="o"><$></span> <span class="nb">readFile </span><span class="n">fn</span><span class="p">)</span> <span class="p">(</span><span class="nb">reverse </span><span class="n">files</span><span class="p">)</span>
<span class="kr">let</span> <span class="kc">Right</span> <span class="n">contents</span> <span class="o">=</span> <span class="nb">sequence </span><span class="n">filesAndContent</span>
<span class="o">|</span> <span class="kc">Left</span> <span class="n">err</span> <span class="o">=></span> <span class="n">printLn</span> <span class="n">err</span>
<span class="kr">let</span> <span class="n">result</span> <span class="o">=</span> <span class="n">fsProgram</span> <span class="n">contents</span>
<span class="nb">putStrLn </span><span class="n">result</span>
</code></pre></div></div>
<p>We can now write two files.
file1:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0
2
-3
20
04
1.2
</code></pre></div></div>
<p>file2:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>7
77
8
</code></pre></div></div>
<p>And obtain the error message:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In file 'file1':
At line 2: Could not parse ""
At line 3: Could not parse "-3"
At line 6: Could not parse "1.2"
In file 'file2':
At line 0: Could not parse "7"
At line 1: Could not parse "77"
</code></pre></div></div>
<p>All that without touching our original parser, or our line tracking system.</p>
<h2>Conclusion</h2>
<p>We’ve only touched the surface of what dependent lenses can do for software engineering by providing a toy example. Yet, this example is simple enough to be introduced, and resolved in one post, but also shows a solution to a complex problem that is affecting parsers and compilers across the spectrum of programming languages. In truth, dependent lenses can do much more than what is presented here, they can deal with effects, non-deterministic systems, machine learning, and more. One of the biggest barriers to mainstream adoption is the availability of dependent types in programming languages. The above was written in <a href="https://www.idris-lang.org/">idris</a>, a language with dependent types, but if your language of choice adopts dependent types one day, then you should be able to write the same program as we did just now, but for large-scale production software.</p>
<p>The program is available on <a href="https://gitlab.com/avidela/types-laboratory/-/blob/main/src/Interactive/Parsing.idr?ref_type=heads">gitlab</a>.</p>Andre VidelaDependent lenses are useful for general-purpose programming, but in which way exactly? This post demonstrates the use of dependent lenses as input/output-conversion processes, using parsing and error location reporting as a driving example.Value Chain Integrity2024-04-04T00:00:00+00:002024-04-04T00:00:00+00:00https://cybercat-institute.github.io//2024/04/04/value-chain-integrity<p>Cross-posted from <a href="https://econpatterns.substack.com/p/value-chain-integrity">Oliver’s Substack blog, EconPatterns</a></p>
<p>In the first four posts, I tried to map out an economy structured around the need to find out. This didn’t happen by accident, but is the result of spending a couple of decades in a realm where academic economic knowledge is held in little regard in no small part because its gatekeepers like to give off an air of having it already figured out, even if from the circumstances it’s clear that is rarely ever the case.</p>
<p>It doesn’t match my own opinion, but I perfectly understand when, say, the founders of a three-person startup bid adieu to their knowledge of academic economics when they learn that there is no such thing as a demand curve unless they put in the effort to assemble it piece by piece, transaction by transaction, price change by price change.</p>
<p>Most of them stop at this point and direct their attention to other, more pressing concerns, and I can’t blame them for it. The “need to find out” gets short shrift in most economics classes since economic instruction at universities generally starts from a vantage point where the groundwork has already been laid by wizards behind the curtain, and all that’s needed for mere mortals is to fine-tune the preconceived machinery.</p>
<p>That’s also a major reason why economists find employment in government, big banks, and, increasingly publicly listed tech firms, within large machineries, but are rarely ever in demand as one of the three founders of a recently minted startup with more enthusiasm than cash — or data.</p>
<p>This series tries to remedy that situation, and I could have subtitled it either “economics for startuppers” or “startup thinking for economists”, except the intended scope — and my intended audience — is a bit wider than that.</p>
<h1>Use of decentralized knowledge in society</h1>
<p>The underlying idea of “finding out” pursued in EconPatterns is ultimately derived from Adam Smith’s gains from specialization that drives specialization of labor, and that ultimately influenced another key contribution to economic lore, Friedrich Hayek’s <a href="https://www.econlib.org/library/Essays/hykKnw.html">Use of Knowledge in Society</a>.</p>
<p>Hayek’s point was that there’s no point trying to steer the whole economy from a central vantage point because there is always someone somewhere closer to the ground, steeped in operational detail, who knows better, and can put that knowledge to better use than the central planner.</p>
<p>This idea that there is always local knowledge that is more detailed than the aggregated knowledge on the macro level, that there is knowledge that is nested, and that all participants have a mental map of the economy that is most detailed in their own vicinity and that degrades in detail, certainty, or precision, that resorts to using coarse-grained models, aggregates, or even stereotypes, the further one moves away from one’s own location, is deeply embedded in EconPatterns.</p>
<p>And this isn’t only true for the physical dimensions, it’s also true for the temporal dimension. Both the past and the future get hazy very quickly, and we resort to increasingly coarse-grained knowledge the further we go in each direction: Hayek’s “knowledge of the particular circumstances of time and place”.</p>
<p>There is an inevitable urge to remedy this shortcoming with the magic potion of “more transparency”. Every time we hear news about another supply chain pile-up, there is the inevitable stratum of pundits opining that this (the negative surprise, that is) could have been avoided if we just magically gave every participant a detailed map of the whole economy, or at least the whole chain of events — network really — leading to the participant’s problems stemming from the unexpected supply chain outage.</p>
<p>This is illusory of course to anyone attuned to the operational details of supply chain, not only because these pundits habitually underestimate by several orders of magnitude just how much operational raw data is out there, most of which is of no use to anyone but the data owner, but also because the countervailing demands of privacy and transparency (usually leading to the conundrum of each side demanding transparency from the other party but insisting on privacy for oneself) will inevitably lead to privacy winning out, except in those cases where the more powerful actor can compel less powerful actors to disclose their secrets.</p>
<p><img src="/assetsPosts/2024-04-04-value-chain-integrity/img1.webp" alt="Container ship" /></p>
<h1>Supply chain and value chain integrity</h1>
<p>Designing a mechanism that orchestrates the conflicting information needs of the participants in a value chain or its mapping into the physical realm, the supply chain, is still a holy grail in operations and in economic trade, but in no small part because the reasons why such a governance mechanism is hard to come by are still poorly understood.</p>
<p>Finding this holy grail, and mapping out the path to its discovery, is of course the goal of this series. A starting point is to arrive at a better understanding how knowledge disseminates thru an economy, where, when, and why it forms clusters (especially in the form of belief clusters), and how to interfere in that flow in a structured, goal-oriented way.</p>
<p>Just to offer a simple example, novices in the field of supply chain are often surprised to learn that the bill of lading, one of the crucial documents ensuring integrity of a product thruout its transit along ports, flights, shipments, loading and unloading, handovers and often rough handling, is still legally required to be in paper form, sent by courier from station to station.</p>
<p>A simple impulse is to blame an overbearing bureaucracy or an industry staunchly resistant to organizational change and technological progress, but an alternative and more plausible explanation is that paper solves a few integrity requirements that electronic communication still has a hard time to solve.</p>
<p>When handovers and handshakes are still the literal thing involving actual hands, if signatures are still done by hand in the presence of the counterparty, we are solving a few problems about identity that turn out to be quite tricky once we try to shift them online, into the digital domain where ascertaining that an individual is who they claim to be can be exceedingly tricky.</p>
<p>Turns out this simple example is repeated all over the place, in all kinds of domains and scenarios, with a number of idiosyncratic details added, but the underlying pattern still the same.</p>
<p>This is why I will come back to that example again and again. Because that is what EconPatterns is about.</p>Oliver BeigeIn which we discuss how knowledge travels thru the economy, and how, when and where it forms clusters.Colimits of Selection Functions2024-04-01T00:00:00+00:002024-04-01T00:00:00+00:00https://cybercat-institute.github.io//2024/04/01/colimits-selection-functions<p>In <a href="https://arxiv.org/abs/2105.06332">Towards Foundations of Categorical Cybernetics</a> we built a category whose objects are selection functions and whose morphisms are lenses. It was a key step in how we <em>justified</em> open games in that paper: they’re <em>just</em> parametrised lenses “weighted” by selection functions. In this post I’ll show that by adding dependent types and stirring, we can get a nicer category that does the same job but has all colimits, and comes extremely close to having all limits. Fair warning: this post assumes quite a bit of category-theoretic background.</p>
<p>Besides being a nice thing to do in itself, we have a very specific motivation for this. The recently realised paper <a href="https://arxiv.org/abs/2402.15332">Categorical deep learning: An algebraic theory of architectures</a> proposed using initial algebras and final coalgebras in categories of parametrised morphisms to build neural networks with learning invariants designed to operate on complex data structures, in a huge generalisation of <a href="https://geometricdeeplearning.com/">geometric deep learning</a>. This post is the first step to replicating the same structure in compositional game theory, and is probably the first case where a class of deep learning architectures has a game-theoretic analogue right from the beginning (ok, the first other than <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>) - something that is absolutely key to our vision of AI safety, as I described in <a href="https://cybercat.institute/2024/03/18/learning-invariant-preferences/">this previous post</a>.</p>
<h1>Dependent lenses</h1>
<p>In this post I’m going work over the category of sets, to make my life easy. A <strong>container</strong> (also known as a <strong>polynomial functor</strong>) is a pair $\binom{X}{X’}$ where $X$ is a set and $X’$ is an $X$-indexed family of sets.</p>
<p>Given a pair of containers, a <strong>dependent lens</strong> $f : \binom{X}{X’} \to \binom{Y}{Y’}$ is a pair of a function $f : X \to Y$ and a function $f’ : (x : X) \times Y’ (f (x)) \to X’ (x)$. There’s a category $\mathbf{DLens}$ whose objects are containers and whose morphisms are dependent lenses (also known as the <em>category of containers</em> $\mathbf{Cont}$ and the <em>category of polynomial functors</em> $\mathbf{Poly}$ by different authors).</p>
<p>The category $\mathbf{DLens}$ has all limits and colimits, distinguishing it from the category of simply-typed lenses which is missing many of both (see my old paper <a href="https://arxiv.org/abs/1711.07059">Morphisms of Open Games</a>). In this post I want to just take that as a given fact, because calculating them is not always so easy. The slick way to prove it is by constructing $\mathbf{DLens}$ as a fibration $\int_{X : \mathbf{Set}} \left( \mathbf{Set} / X \right)^\mathrm {op}$, and using the fact that a fibred category has all co/limits if every fibre does and reindexing preserves them (a fact that we’ll be seeing again later).</p>
<h1>Dependent selection functions</h1>
<p>Write $I$ for the tensor unit of dependent lenses: it’s made of the set $1 = \{ * \}$ and the $1$-indexed set $* \mapsto 1$. A dependent lens $I \to \binom{X}{X’}$ is an element of $X$, and a dependent lens $\binom{X}{X’} \to I$ is a <em>section</em> of the container: a function $k : (x : X) \to X’ (x)$. For shorthand I’ll write $H = \mathbf{DLens} (I, -) : \mathbf{DLens} \to \mathbf{Set}$ and $K = \mathbf{DLens} (-, I) : \mathbf{DLens}^\mathrm{op} \to \mathbf{Set}$ for these representable functors.</p>
<p>By analogy to <a href="https://julesh.com/2021/03/30/selection-functions-and-lenses/">what happens in the simply-typed case</a>, a <strong>dependent selection function</strong> for a container $\binom{X}{X’}$ should be a function $\varepsilon : K \binom{X}{X’} \to H \binom{X}{X’}$ - that is, a thing that turns costates into states.</p>
<p>But I think we’re going to need things to be multi-valued in order to get all colimits (and we need it to do much game theory anyway), so let’s immediately forget that and define a <strong>dependent multi-valued selection function</strong> of type $\binom{X}{X’}$ to be a binary relation $\varepsilon \subseteq H \binom{X}{X’} \times K \binom{X}{X’}$.</p>
<p>To be honest, I don’t really have any serious examples of these things to hand, I think they’ll arise from taking colimits of things that are simply-typed. For game theory the main one we care about is still $\arg\max$, which <em>is</em> a “dependent” multi-valued selection function but only in a boring way that doesn’t use the dependent types - it’s a binary relation $\arg\max \subseteq H \binom{X}{\mathbb R} \times K \binom{X}{\mathbb R}$, where $\mathbb R$ here means the $X$-indexed set that is constantly the real numbers.</p>
<p>For each container $\binom{X}{X’}$, write $E \binom{X}{X’} = \mathcal P \left( H \binom{X}{X’} \times K \binom{X}{X’} \right)$ for the set of multi-valued selection functions for it. Since it’s a powerset it inherits a posetal structure from subset inclusion, which is a boolean algebra. That means that as a thin category, it has all limits and colimits, something that will come in useful later.</p>
<p>Given $\varepsilon \subseteq H \binom{X}{X’} \times K \binom{X}{X’}$ and a dependent lens $f : \binom{X}{X’} \to \binom{Y}{Y’}$ we can define a “pushforward” selection function $f_* (\varepsilon) \subseteq H \binom{Y}{Y’} \times K \binom{Y}{Y’}$ by $f_* (\varepsilon) = \{ (hf, k) \mid (h, fk) \in \varepsilon \}$. Defining it this way means we get functoriality for free, and it’s also monotone, so we have a functor $E : \mathbf{DLens} \to \mathbf{Pos}$.</p>
<p>The fact that we could just as easily have defined a contravariant action on dependent lenses means that the fibration we’re about to get is a bifibration, something that will <em>definitely</em> come in useful one day, but not today.</p>
<h1>Colimits of selection functions</h1>
<p>The next thing we do is take the category of elements of $E$. Objects of $\int E$ are pairs $\left( \binom{X}{X’}, \varepsilon \right)$ of a container and a selection function for it. A morphism $f : \left( \binom{X}{X’}, \varepsilon \right) \to \left( \binom{Y}{Y’}, \delta \right)$ is a dependent lens $f : \binom{X}{X’} \to \binom{Y}{Y’}$ with the property that $f_* (\varepsilon) \leq \delta$ - which is to say, any $h : H \binom{X}{X’}$ and $k : K \binom{Y}{Y’}$ satisfying $(h, fk) \in \varepsilon$ must also satisfy $(hf, k) \in \delta$.</p>
<p>So, $\int E$ is a category whose objects are dependent multi-valued selection functions and morphisms are dependent lenses. The only difference to the original category of selection functions from <a href="https://arxiv.org/abs/2105.06332">Towards Foundations</a> is that we replaced simply typed lenses with dependent lenses. This is enough to get all limits, and I’d call $\int E$ a “nice category of selection functions”.</p>
<p>The good way to prove that a fibred category has all co/limits (see <a href="https://arxiv.org/abs/1801.02927">this paper</a>) is to show that (1) the base category has all co/limits, (2) every fibre has all co/limits, and (3) reindexing preserves co/limits. We already know (1) and (2) (remember the fibres are all boolean algebras), so we just need to prove (3). Since limits and colimits in the fibres are unions and intersections, this should not be too hard.</p>
<p>For some container $\binom{X}{X’}$, suppose we have some family $\varepsilon_i \subseteq E \binom{X}{X’}$ indexed by $i : I$. We can define the meet $\bigwedge_{i : I} \varepsilon_i$ and join $\bigvee_{i : I} \varepsilon_i : E \binom{X}{X’}$ by intersection and union. To get all colimits in $\int E$, what we need to prove is that for any dependent lens $f : \binom{X}{X’} \to \binom{Y}{Y’}$, $f_* \left( \bigvee_{i : I} \varepsilon_i \right) = \bigvee_{i : I} f_* (\varepsilon_i)$. Let’s do it:</p>
<p>Going forwards, suppose $(h, k) \in f_* \left( \bigvee_i \varepsilon_i \right)$, so by definition of $f_* $ there must be $h’$ such that $h = h’f$ and $(h’, fk) \in \bigvee_i \varepsilon_i$. So there is some $i : I$ such that $(h’, fk) \in \varepsilon_i$, so $(h’f, k) = (h, k) \in f_* (\varepsilon_i)$, therefore $(f, k) \in \bigvee_i f_* (\varepsilon_i)$.</p>
<p>The other direction, suppose $(h, k) \in \bigvee_i f_* (\varepsilon_i)$, so $(h, k) \in f_* (\varepsilon_i)$ for some $i : I$. So we must have $h’$ such that $h = h’f$ and $(h’, fk) \in \varepsilon_i$. So $(h’, fk) \in \bigvee_i \varepsilon_i$, therefore $(h’f, k) = (h, k) \in f_* \left( \bigvee_i \varepsilon_i \right)$.</p>
<p>Note, this is intentionally a pure existence proof. Actually calculating these things can be quite a pain, and I’m going to put it off until later, specifically until a paper we’re cooking up on <em>branching</em> open games.</p>
<h1>Limits of selection functions</h1>
<p>If we also had $f_* \left( \bigwedge_{i : I} \varepsilon_i \right) = \bigwedge_{i : I} f_* (\varepsilon_i)$ then $\int E$ would also have all limits, but sadly in general the best we can do is $f_* \left( \bigwedge_{i : I} \varepsilon_i \right) \subseteq \bigwedge_{i : I} f_* (\varepsilon_i)$. I’d guess this probably means that $\int E$ has some kind of lax limits or something, but I’ll deal with that another day.</p>
<p>It’s instructive to look at what goes wrong. If $(h, k) \in \bigwedge_i f_* (\varepsilon_i)$, then for all $i : I$ we have $(h, k) \in f_* (\varepsilon_i)$. So, for every $i$ we have $h’_i$ such that $h = h’_i f$ and $(h’_i, fk) \in \varepsilon_i$. We can make progress if $f$ is a monomorphism, in which case all of the $h’_i$ are equal because $h’_i f = h = h’_j f$ implies $h’_i = h’_j$. In fact, while I don’t know what general monomorphisms in $\mathbf{DLens}$ look like, in this case it’s enough that the forwards pass of $f$ is an injective function. This probably gives us a decent subcategory of $\int E$ that has all limits as well as all colimits, but I don’t know whether that category will be useful for anything.</p>Jules HedgesIn Towards Foundations of Categorical Cybernetics we built a category whose objects are selection functions and whose morphisms are lenses. It was a key step in how we justified open games in that paper: they're just parametrised lenses weighted by selection functions. In this post I'll show that by adding dependent types and stirring, we can get a nicer category that does the same job but has all colimits, and comes extremely close to having all limits. Fair warning: this post assumes quite a bit of category-theoretic background.On Organization2024-03-22T00:00:00+00:002024-03-22T00:00:00+00:00https://cybercat-institute.github.io//2024/03/22/on-organization<p>Cross-posted from <a href="https://econpatterns.substack.com/p/on-organization">Oliver’s Substack blog, EconPatterns</a></p>
<p>Leonard Read wrote his 1958 essay, <a href="https://oll.libertyfund.org/titles/read-i-pencil-my-family-tree-as-told-to-leonard-e-read-dec-1958">I, Pencil</a>, to drive the point home in dramatic prose that even a contraption as humble as the eponymous writing utensil depends on a wide variety of raw materials, production processes, labor, and technological advancements, to come together — all coordinated by the marvel of the market system.</p>
<p>“<em>Consider the millwork in San Leandro. The cedar logs are cut into small, pencil-length slats less than one-fourth of an inch in thickness. These are kiln dried and then tinted for the same reason women put rouge on their faces. People prefer that I look pretty, not a pallid white. The slats are waxed and kiln dried again. How many skills went into the making of the tint and the kilns, into supplying the heat, the light and power, the belts, motors, and all the other things a mill requires? Sweepers in the mill among my ancestors? Yes, and included are the men who poured the concrete for the dam of a Pacific Gas & Electric Company hydroplant which supplies the mill’s power!</em>”</p>
<p>Milton Friedman made a truncated version of Read’s pencil story famous in a 1980 <a href="https://www.youtube.com/watch?v=67tHtpac5ws">television special</a>, and in the process connected it to another famous story from economic history about the production of a seemingly simple object: Adam Smith’s parable of the pin factory.</p>
<p>But one detail eluded Friedman: the step-by-step process of putting a pin together that opens Adam Smith’s magnum opus <a href="https://archive.org/details/bim_eighteenth-century_an-inquiry-into-the-natu_smith-adam_1785_1/page/6/mode/2up">The Wealth of Nations</a> to explain the division of labor, happens entirely under one roof. No handover across markets is mentioned until, presumably, the finished product is sold in bulk.</p>
<p><img src="/assetsPosts/2024-03-22-on-organization/img1.jpg" alt="Wealth of Nations" /></p>
<p>The same, we could surmise, might perfectly well hold true for Read’s story. There’s no reason why a pencil maker should not also mine the graphite needed for their one marketable product, harvest the rubber, or produce the electricity.</p>
<p>All of this has happened in the history of industrial production. But we can extrapolate from these stories and wonder if Smith’s pin maker also mills its raw material, iron or steel of a given quality, or if Read’s resource-conscious pencil maker would go as far as producing the mining machinery in-house, or maybe that’s the point where it’s willing to hand over the reigns to someone more qualified.</p>
<h1>Organizations as tectonic plates</h1>
<p>Abstracting away from these two stories, we can ask the question where in a complex production process should we put the handovers? The unremarkable-sounding name for this question is the make-or-buy decision, or, if a more academic term is needed, the degree of vertical (dis)integration. In operations, we speak of production depth.</p>
<p>Abstracting even further away, we can also ask where within any larger network of interactions, social or economic, should we draw the boundaries?</p>
<p>This question, on multiple layers, will occupy us quite a bit.</p>
<p>We can think of it popping up in the context of industrial production and market exchange: the economic sphere, in the context of public goods and livelihood risks: the political sphere, or even in the context of language, religion, and shared expressions of ideas and beliefs: the social sphere.</p>
<p>The laws by which we draw these boundaries, consciously or habitually, share some commonalities while there are also rules that hold only for one of these layers. Capturing them in design patterns is what this post is about.</p>
<p>For the economic sphere, which forms our primary concern, Oliver Williamson has established the fundamental dichotomy in the title of his first book: <a href="https://archive.org/details/marketshierarchi00will">markets vs hierarchies</a>, as shorthand for activities across firms vs activities within firms.</p>
<p>But in both governance mechanisms (to appropriate the title of Williamson’s <a href="https://archive.org/details/mechanismsofgove0000will">final book</a>), these labels hide some intricate machinery under the hood: “hierarchy” might be the canonical form of structuring interactions within, and “market” for interactions across organizations. But these terms contain a multitude of moving parts, all of which are subject to a myriad of design decisions.</p>
<p>Hierarchy is the canonical reporting structure for any larger organization. It’s ubiquitous enough that we can use the two terms synonymously, even if they’re not perfectly identical. It takes on the form of an upside-down tree (in the <a href="https://en.wikipedia.org/wiki/Tree_(graph_theory)">graph-theoretic sense</a>), with its root node at the top.</p>
<p>The branches in a hierarchy describe vertical relationships, usually one-to-many, also known command-and-control. The superior defines tasks for the subordinates to undertake, provides the necessary resources, monitors, evaluates, and recompenses the work effort — at least in theory.</p>
<p>In theory hierarchy is a sorting mechanism by seniority, a catch-all term that encompasses more experience, more ability to put individual tasks in context and orchestrate them: more ability to manage. In practice, the lofty goal of sorting by superior skill is at best approximated but rarely reached.</p>
<p>In practice, hierarchies take many forms based on and sometimes even deviating from this fundamental design pattern. They can be steeper or flatter, they can incorporate matrix elements, they can be stiff or flexible.</p>
<p>“Reorganization” is a popular game played in the higher echelons of most corporate hierarchies and a neverending income stream for consultants, usually deeply unpopular among those manning the trenches.</p>
<p>This just shows that finding the perfect organizational structure is elusive for all but the simplest organizations.</p>
<p><img src="/assetsPosts/2024-03-22-on-organization/img2.jpg" alt="Acropolis" /></p>
<p>Market is the catch-all term for all economic interactions that happen between organizations. But typically we think of a market more narrowly as a central place where many buyers meet many sellers: an agora.</p>
<p>In reality most economic interactions are of the few-to-few, few-to-one, or one-to-one variety, shaped by relational rather than market interaction. The key ingredient that the economic abstraction of a many-to-many market requires is the “coincidence of wants”: buyers and sellers wanting to trade the very same thing at a price they can both agree upon have to come together at the same place and the same time.</p>
<p>This is often tricky to achieve, and might require two steps mentioned in the first newsletter: displacement in space or time, transportation or storage, to bridge the gap between producer and consumer. Even the advent of online marketplaces did little to change this.</p>
<p>Beyond the recurring reorganizations typically triggered by underperformance, companies have also been known to first outsource their entire distribution network just to reverse course and bring it back in-house. So the make-or-buy label hides a non-trivial problem with massive costs but no obvious solution.</p>
<p>But other than the recognition that markets, organizations, and the boundary inbetween are subject to design choices which ostensibly influence performance, can we offer another explanation for how to split a supply network into its constituent parts other than the Coasean “<a href="https://onlinelibrary.wiley.com/doi/full/10.1111/j.1468-0335.1937.tb00002.x">costs of carrying out the exchange transactions in the open market</a>”?</p>
<p>To reduce the work of half a dozen Nobelists including Ronald Coase and Oliver Williamson to a tweet-length statement (which might itself evolve into a pattern), the make-or-buy decision boils down to the choice between cost and control.</p>
<p>Expressed in another way, more exactly expressed in accounting terms, the cost of holding control over the production process vs the cost of losing control over it.</p>
<p>This is the point where we can bring in the patterns from the first two newsletters. Assuming under Adam Smith’s division of labor that there is another producer who can produce our part cheaper than we could do it in-house, what are costs of losing control?</p>
<p>They are the costs of negative surprise.</p>
<p>While it’s in Read’s essay perfectly within the supplier’s self-interest to ship us the part in the volume ordered, there are two reasons why the shipment might stall: accidentally or deliberately.</p>
<p>Accidental production stops, or more exactly fluctuation between demand and supply that trigger stock-outs, are fairly common occurrences and the daily of supply bottleneck managers. The risk that a stock-out can trigger massive knock-on costs, the aforementioned missing five-dollar part that can stop a ten-million-dollars-per-hour production line and reduce finished products into 99.9%-finished unsellable inventory drive the decision to increase production depth even if there is no ill will by the supplier.</p>
<p>But the supplier knows this and can withhold deliveries strategically, essentially holding them hostage in order to negotiate better terms. The world of procurement is even in normal times rougher than portrayed by Read. Add an external shock to the supply infrastructure and planning cycles, inventory costs, and strategic maneuvering can explode.</p>
<h1>Organizations as belief structures</h1>
<p>But can we abstract away from the purely economic — most organizations are not in their intent economic — and express this avoidance of negative surprise as a design pattern for drawing the boundaries around organizations, or viewed from the other end: how to split a network of interactions, social, political, or economic, into coherent clusters which we might want to call organizations or, more specifically, companies, parties, states, religious communities?</p>
<p>In the world I’ve drawn so far, the need for design (and the need to capture them in design patterns) arises from the myriad of moving parts that require design choices on multiple levels: we have to choose between market and hierarchy, once we choose hierarchy we have to choose the structure of the hierarchy, including its governance structure, and another level down we have to decide on the shape of each reporting relationship, including who gets to sit on each end.</p>
<p>This is a world of high uncertainty and while we would like to resolve each design question empirically, empiricism is costly, so we will ultimately end up with goal conflict.</p>
<p>This goal conflict, which shapes the boundary of the organization, can be expressed in three patterns.</p>
<p>The first fundamental problem of organization is resolving the conflict between moving forward and staying together.</p>
<p>The second fundamental problem of organization is resolving the conflict between moving forward and staying put.</p>
<p>The third fundamental problem of organization is to decide which direction is forward.</p>
<p>All organizations have to resolve this goal conflict — literally “where do we go from here?” — or risk breaking up. Or expressed differently, the tectonic rifts between factions occur where these goal conflicts are unresolvable.</p>
<p>Using another pattern, the paradigm of traditional industrial organization introduced by Ed Chamberlin and applied by Joe Bain, “<a href="https://archive.org/details/industrialorgani00bain">structure, conduct, performance</a>” is a more general translation as “given the situation we’re in, of all the options available, which courses of action are the ones that promise the most success?”</p>
<p>Applying this paradigm requires coming to an agreement on a mapping between actions (conduct) and future outcomes (performance) given a set of starting conditions both internal and external (structure). This mapping requires expressing and ranking subjective expectations of conditional futures: beliefs (as opposed to objective probabilities in the statistical nomenclature). Where these beliefs diverge sufficiently, coordinating efforts within an organization is no longer feasible.</p>
<p>Competition is not only, as economic textbooks imply, competition between like products, but competition between differing courses of action based on differing beliefs about their feasibility.</p>
<p>This underlying idea, that boundaries emerge where coherence of beliefs breaks down between participants, will come up repeatedly in the future, and it is the main reason why organization rather than market exchange takes pride of place in this discussion — very simply because from a design perspective, market exchange is simply a special form of organization.</p>
<p><img src="/assetsPosts/2024-03-22-on-organization/img3.jpg" alt="Landscape" /></p>Oliver BeigeIn which we describe organization and organizations as tectonic plates shaped by clashing beliefs.Learning with Invariant Preferences2024-03-18T00:00:00+00:002024-03-18T00:00:00+00:00https://cybercat-institute.github.io//2024/03/18/learning-invariant-preferences<p>It’s been a busy few weeks in the world of category theory for deep learning. First of all come the preprint <a href="https://arxiv.org/abs/2402.15332">Categorical Deep Learning: An Algebraic Theory of Architectures</a> from authors at <a href="https://www.symbolica.ai/">Symbolica</a> and <a href="https://deepmind.google/">DeepMind</a>, including our friend <a href="https://www.brunogavranovic.com/">Bruno</a>. And then hot on the heels of the paper, Symbolica raised a <em>big</em> investment round based largely on applications of the ideas in the paper.</p>
<p>The paper is about <em>structured learning</em> and it proposes a big generalisation of geometric deep learning, which is itself a big generalisation of convolutional networks. The general idea is that the data processed by a neural network is not just random data but is the vectorisation of data coming from some real world domain. If your vectors encode an image then there is implicit geometry inherited from the physical world. Geometric deep learning is all about designing architectures that encode <em>geometric</em> invariants of data, specifically in the form of invariant <em>group actions</em> a la <a href="https://en.wikipedia.org/wiki/Erlangen_program">Klein’s Erlangenprogramm</a>.</p>
<p>What the paper points out is that the whole of geometric deep learning can be massively generalised from group actions to arbitrary (co)algebras of functors and (co)monads. From there you can easily re-specialise for specific applications. For example, if your training data is vectorisation of source code of a programming language, you can encode the structure of that language’s source grammar into your architecture in a virtually mechanical way.</p>
<p>Suffice to say, I’m <em>very</em> excited about this idea. This could be a watershed moment for applied category theory in general, and it happens to be something that’s right next door to us - the paper heavily uses categories of parametrised morphisms, one of the two building blocks of categorical cybernetics.</p>
<p><img src="/assetsPosts/2024-03-18-learning-invariant-preferences/eugenio-mazzone-6ywyo2qtaZ8.jpg" alt="Books" /></p>
<h1>Invariant preferences</h1>
<p>The first thought I had when I read the paper was <em>invariant preferences</em>. A real AI system is not something that exists in isolation but is something that interacts in some way with the world around it. Even if it’s not a direct “intentional” action such as a robot actuator, the information flow from the AI to the outside world is some kind of <em>action</em>, making the AI an <em>agent</em>. For example, ChatGPT is an agent that acts by responding to user prompts.</p>
<p>Intelligent agents who act can have <em>preferences</em>, the most fundamental structure of <em>decision theory</em> and perhaps also <em>microeconomics</em>. In full generality, “having preferences” means selecting actions in order to bring about certain states of the world and avoid others. Philosophical intention is not strictly required: preferences could have been imposed by the system’s designer or user, one extreme case being a thermostat. AI systems that act on an external world are the general topic of <em>reinforcement learning</em> (although some definitions of RL are too strict for our purposes here).</p>
<p>This gave me a future vision of AI safety where neural network architectures have been designed upfront to <em>statically guarantee</em> (ie. in a way that can be mathematically proven) that the learned system will act in a way that conforms to preferences chosen by the system designer. This is in contrast to, and in practice complements, most approaches to AI safety that involve supervision, interpretation, or “dynamic constraint” of a deployed system - making it the very first line of an overall <em>defense in depth</em> strategy.</p>
<p>A system whose architecture has invariant preferences will act in a way to bring about or avoid certain states of the world, <em>no matter what it learns</em>. A lot of people have already put a lot of thought into the issue of “good and bad world-states”, including very gnarly issues of how to agree on what they should be - what I’m proposing is a technological missing link, how to bridge from that level of abstraction to low-level neural network architectures.</p>
<p>This post is essentially a pitch for this research project, which as of right now we don’t have funding to do. We would have to begin with a deep study of the relationship between <em>preference</em> (the thing that actions optimise) and <em>loss</em> (the thing that machine learning optimises). This is a crossover that already exists: for example in the connection between softmax and Boltzmann distributions, where thermodynamics and entropy enter the picture uninvited yet again. But going forward I expect that categorical cybernetics, which has already built multiple new bridges between all of the involved fields (see this picture that I sketched a year ago), is going to have a lot to say about this, and we’re going to listen carefully to it.</p>
<p><img src="/assetsPosts/2024-03-18-learning-invariant-preferences/img1.jpg" alt="Mind map" /></p>
<p>There’s a few category-theoretic things I already have to say, but this post isn’t the best place for it. To give a hint: I suspect that preferences should be <em>coalgebraic</em> rather than algebraic according to the structural invariant learning machinery, because they describe the <em>output</em> of a neural network, as opposed to things like geometric invariant which describe the <em>input</em>.</p>
<h1>World-models</h1>
<p>The thing that will stop this being easy is that in a world of <a href="https://en.wikipedia.org/wiki/Complete_information">incomplete information</a>, such as the real world, agents with preferences can only act with respect to their <em>internal model</em> of the outside world. If we’re relying on invariant preferences for safety, they can only be as safe as the agent’s internal model is accurate. We would also have to worry about things like the agent systematically deceiving itself for long-term gain, as many humans do. The good news is that practitioners of RL have spent a long time working on the exact issue of accurately learning world-models, the first step being off-policy algorithms that decouple <em>exploration</em> (ie. world-model learning) from <em>exploitation</em> (ie. optimisation of rewards).</p>
<p>There is also an alternative possibility of <em>manually</em> imposing a human-engineered world-model rather than allowing the agent to learn it. This would be an absolutely monumental task of industrial-scale ontology, but it’s a big part of what <a href="https://www.aria.org.uk/what-were-working-on/#davidad">Davidad’s project</a> at the UK’s new ARIA agency aims to do. Personally I’m more bullish on learning world-models by provably-accurate RL at the required scale, but your mileage may vary, and in any case invariant preferences will be needed either way.</p>
<p>To wrap up: this is a project we’re thinking about and pursuing funding to actively work on. The “Algebraic Theory of Architecture” paper only dropped a few weeks ago as I’m writing this and opens up a whole world of new possibilities, of which invariant preferences is only one, and we want to strike while the iron is still hot.</p>Jules HedgesA system whose architecture has invariant preferences will act in a way to bring about or avoid certain states of the world, no matter what it learns. A lot of people have already put a lot of thought into the issue of good and bad world-states, including very gnarly issues of how to agree on what they should be - what I'm proposing is a technological missing link, how to bridge from that level of abstraction to low-level neural network architectures.The Attention-Seeking Rational Actor2024-03-15T00:00:00+00:002024-03-15T00:00:00+00:00https://cybercat-institute.github.io//2024/03/15/attention-seeking-rational-actor<p>Cross-posted from <a href="https://econpatterns.substack.com/p/the-attention-seeking-rational-actor">Oliver’s Substack blog, EconPatterns</a></p>
<p><em>The fundamental economic exchange is surprises for eyeballs.</em></p>
<p>Modern economics is built around understanding the mechanics of market exchange, but it hasn’t always been that way. The etymological root of economics, the Greek <a href="https://www.etymonline.com/word/economy">oikonomia</a> points toward household management, or husbandry of the (largely self-sufficient) estate, the oikos. Today we would call it home economics.</p>
<p>After discussing the fundamental grid of the economy in the <a href="2024-03-18-stocks-flows-transformations">last post</a>, it makes sense to lay out the underlying assumptions of human behavior within that economy in some detail — and both the title and the introductory statement (possibly the first pattern introduced) should make it clear that these assumptions differ somewhat from the traditional textbook treatment of economic agents.</p>
<p>But they also differ from the various attempts to bound the rationality assumptions of textbook economics in some way, be it in the Carnegie “<a href="https://en.wikipedia.org/wiki/Satisficing">satisficing</a>” or in the Berkeley “<a href="https://en.wikipedia.org/wiki/Behavioral_economics">behavioral</a>” tradition. It nevertheless incorporates both, in addition to a variety of other behavioral quirks which we might not associate with the economic realm.</p>
<p>The major reason to tweak our behavioral assumptions is that to design economic structures we need a coherent framework for dealing with a variety of settings in which we need to be able to apply a varying set of behavioral assumptions while still trying to stay coherent.</p>
<p>So it’s not so much a behavioral assumption but a template for developing context-specific behavioral assumptions — or in other words, a design pattern. Humans behave differently in different social settings, and we should be able to pick the right model for the right circumstances, but still be able to treat it as a special instantiation of a shared underlying pattern.</p>
<p>This explicitly includes using the assumption of perfect rationality wherever it is warranted.</p>
<p>So let’s grab our opening statement and take it apart.</p>
<p><img src="/assetsPosts/2024-03-15-attention-seeking-rational-actor/img1.jpg" alt="Woman's face" /></p>
<h1>Eyeballs</h1>
<p>“Eyeballs” is marketing vernacular for attention. The term can be taken quite literally — there are devices that track eyeball movement to find out how much screentime is spent staring at ads. But for the most part I will use it metaphorically as the cognitive effort devoted to a task.</p>
<p>It is perfectly fine to assume away cognitive limitations in a wide variety of circumstances. It simplifies our model significantly. It deflects accusations that a given policy claim is the outcome of an opportunistically chosen (boundedly rational) behavioral model rather than an underlying economic force. And in many scenarios it creates good-enough predictions for the task at hand.</p>
<p>Assumptions are simplifications that ideally give us more gain in parsimony than loss in predictive accuracy. As long as that’s what they do, they do their job.</p>
<p>But there are also situations where such an simplifying assumption produces results that stray too far from the observable reality, and we need to have a plan for how we want to adjust the behavioral model in those situations.</p>
<p>A fair starting assumption is to expect that the economic actor will allocate cognitive resources economically and allocate the most attention to those tasks where she expects the most bang for the buck. And that brings us to the other part of the statement.</p>
<h1>Surprises</h1>
<p>The economic expression for “expects most bang for the buck” is “maximum expected utility”, but this requires a lot of foreknowledge where we can’t simply assume under all circumstances that our economic actor already possesses it. Every time you see an economics paper assuming that our actor knows something about the distribution of a random variable you know we’re on shaky ground.</p>
<p>So the next level is to assume that our actor will venture to find out and acquire this knowledge step-by-step in what we can call a process of discovery — which usually means a sequence of failures that terminates either with a moment of success or the decision to call it off. In econspeak, this discovery process is known as tatônnement.</p>
<p>But we shouldn’t assume that our agent just wanders around in the desert aimlessly hoping to find an oasis — a stark example of such a discovery process with a life-or-death ending — but that there should be a plan behind those wanderings.</p>
<p>That plan is usually to devote the existing resources, cognitive and physical, in a way that maximizes the knowledge gained about the terrain. In our desert scenario this might translate to climbing to the top of a ridge to survey the territory, or alternatively to stay near the valley floor to limit exposure to sunlight.</p>
<p>We can call this process in two ways: uncovering secrets — where a secret is anything that wasn’t known before but is known after — or hunting for surprises.</p>
<p>Surprise expresses the same thing — some difference between what was known before vs what is known after — but it also gives us the opportunity to express it in two ways: positive surprise and negative surprise.</p>
<h1>The fundamental economic exchange is surprise for eyeballs</h1>
<p>Loosely translated, positive surprise is beneficial — something worth seeking out — and negative surprise is harmful — something to be avoided. On this single dimension we can build a (surprisingly) wide range of behavioral models, including differentiating individuals by their propensity to seek out positive surprise and accept negative surprise in the process, in other words by their affinity for disorder.</p>
<p>This has clear connotations to the behavioral assumption of risk preference, and this connection definitely warrants further attention — risk is a transferable economic commodity — but it also gives us the additional angle that planning is a vehicle to mitigate negative surprise for individual actors, and contracting is a vehicle to mitigate negative surprise for collective action, including the canonical form of collective action: the organization (which will be at the center of next week’s post).</p>
<p>A lot of this will be fleshed out in the weeks to come, and some of the jumping-off points should already be apparent. Surprise gives us the opportunity to invoke both information entropy and ultimately thermodynamic entropy. But as already mentioned, this series will only use these ideas conceptually, and point towards formal treatments in their respective literatures.</p>
<p>Design is a guided trial-and-error process where judgment calls have to be made about the structure of the problem, about splitting it into its constituent parts and putting the parts back together in the hope that no unwanted interaction effects emerge, about taking requirements and putting them in an order, about defining and resolving contingencies and dependencies, about the level of detail at which a problem needs to be resolved, at which precision, and how far into the future.</p>
<p>For this we need a flexible model of behavioral assumptions that can be adjusted to fit the task at hand, that can be experimented with. “Surprises for eyeballs”, or in other words, “secrets for attention”, gives us exactly that.</p>
<h1>The good old-fashioned attention economy</h1>
<p>There’s an obvious objection to this treatment, and it’s a fair one. “Surprise for eyeballs” is most obviously suited to the information economy, or maybe more aptly: the attention economy, and in the trad economy we might be better off dealing with the canonical exchange of supply vs demand in its trad form of an effort (a product or service) vs a payment.</p>
<p>Let me use George Akerlof’s famous essay on the <a href="https://www.jstor.org/stable/1879431">market for lemons</a> to show why even in a world of a one-off transfer of a physical object for a simultaneous transfer of a monetary equivalent is still a special case of an attention economy full of surprises.</p>
<p>Akerlof’s paper kicked off the field of information economics, and is most widely associated with introducing the concept of asymmetric information. But as the second half of its title suggests, it’s actually about quality competition (a “lemon” being a colloquial term for a used car of poor quality), and the information angle is about the inability of conveying this quality — especially about the inability of an owner of a high-quality car to establish that his car is not a lemon.</p>
<p>But how do we find out if a car is a lemon? And how do we insure ourselves against the risk of acquiring a lemon? By finding out.</p>
<p>In the same sense of the stranded-in-the-desert example above, the process of finding out is a discovery process except with opposite signs. It’s a sequence of successes terminated by a failure — which is true for all machines: they run until they break down.</p>
<p>But there’s an inevitable random element to this process, and even if we can assume that lemon-ness correlates negatively with longevity, that relationship is far from deterministic. We cannot conclude with certainty from the time of failure whether the car was a lemon — even if the prior owner knew about its lemon-ness.</p>
<p>This simple recognition has a wide array of ramifications worth taking apart in detail, because most of them are central to economic design — not only of economic engines like markets, auctions, recommenders or reputation engines, but also to the design of economic institutions. Notoriously, the business model of the Roman Catholic Church is that of a certifier of good conduct: a good old-fashioned reputation engine.</p>
<p>The tl;dr of this excursion is that almost all goods are experience goods in that their value only becomes apparent when they are consumed, and the consumption harbors the possibility for surprise, positive or negative.</p>
<p>If this happens over a longer time span like driving a car, if it happens immediately like eating ice cream, or if immediate consumption might trigger belated effects like getting toothache, depends on the circumstances.</p>
<p>But the canonical economic trade of a perfectly substitutable commodity of perfectly equal quality is a simplifying assumption resting on a lot of institutional underpinnings. Almost all trades, in the trad economy or the digital economy, contain an element of surprise, and in turn engage our propensity to shield ourselves from it, or to embrace it.</p>Oliver BeigeIn which we establish an underlying model for human behavior and claim that all economies are just a variation of the attention economy.Stocks, Flows, Transformations: The Cybernetic Economy2024-03-08T00:00:00+00:002024-03-08T00:00:00+00:00https://cybercat-institute.github.io//2024/03/08/stocks-flows-transformations<p>Cross-posted from <a href="https://econpatterns.substack.com/p/stocks-flows-transformations-the">Oliver’s Substack blog, EconPatterns</a></p>
<p>On a certain level of abstraction, an economy can be described as a network of stocks, flows, and transformations. Let’s call this level the cybernetic economy.</p>
<h1>Stocks, flows, transformations</h1>
<p>Stocks and flows are two fundamental forms of displacement: in time and space respectively, and they are typically restricted by upper and lower capacity constraints: overstock vs stockout, overflow vs desiccation.</p>
<p>Transformation in the usual sense of industrial production means the recombination of inputs to produce new outputs, but we can also include creation and consumption as starting and endpoints of network flow. In the case of natural resources, creation often takes the form of extraction.</p>
<p>The stocks and flows usually come in the form of information, materials, effort, payments, equipment, and on a more abstract level, risks, beliefs, rights, and commitments. Risk is just as much an economic good that can be transformed, bundled, disassembled, transported as any physical material.</p>
<p>Most of these objects should sound familiar from economic textbooks, especially macroeconomic textbooks. The cybernetic economy differs from this textbook treatment mostly by explicitly highlighting the network of interactions, and by stressing the global ramifications of local interactions.</p>
<p>This network view of the economy on the other hand should be familiar to anyone with a background in industrial production, where orchestrating multi-step processes on shop floors densely packed with machines, pathways, buffers, and assembly stations is a major part of the job description, and where stockouts of five-dollar parts can stop ten-million-an-hour assembly lines — as can pathways congested by improvised material buffer overflows.</p>
<p><img src="/assetsPosts/2024-03-08-stocks-flows-transformations/img1.jpg" alt="Shipping containers" /></p>
<p>Economics, especially macroeconomics, usually skips this operational layer for the sake of expositional expediency, and for the most part it does ok doing so. As long as the operational friction stays within bounds, no stocks and flows pushing against their upper or lower capacity limits, no production schedules foiled by unobtainable five-dollar components, we can safely assume a frictionless world and focus on the established gears and levers central to macroeconomic inquiry.</p>
<p>In other words, as long as there is only a modicum of disorder in the economy, it’s perfectly fine to assume a well-ordered economy.</p>
<p>Which underlines a key principle: the right level of aggregation matters. A map is not the territory, but we might need different maps to do different things within the territory. In the same sense we can drop operational details and aggregate activity on a high level as long as we can be sure that the loss of realism — the loss of predictability — is inconsequential for the task at hand.</p>
<p>But we should have a more fine-grained map at the ready just in case our survey map fails to capture the finer points.</p>
<h1>The cybernetic economy</h1>
<p>The economy we’re looking at is an economy that can be disaggregated and disassembled to the individual component, the individual participant, the individual activity, just as needed whenever it is needed.</p>
<p>I’m resurrecting the somewhat outmoded term “cybernetic” for it because it conveys the focus on flows, on routing, buffering, concatenating, on orchestrating activities and resources.</p>
<p>Routing, network flow, buffering, job shop scheduling, machine replacement models are all standard tools of the trade in operations research. They are no longer, or not yet again, standard tools in economics, but in order to describe the economic activities as intended, and to couch them in a wider social and political context, they should become economic tools again.</p>
<p>EconPatterns intends to bring them back together under the same motivation that it intends to bring mathematical, statistical and computational tools together: to build up a toolset which we can use to design economic objects.</p>
<p>But, and this is the conjurer’s trick, it’ll do so almost entirely without resorting to formal modeling or even mathematical notation. This is not out of nostalgia for an era where political economy was a branch of the philosophical faculties. The economy is as data rich as any field of inquiry and we seem to have just enough recognizable, repeating and generalizable patterns to give the scientific method a try.</p>
<p>But the point of the exercise is to develop an economic design language, to establish a conceptual foundation, rather to rephrase current economic knowledge. This is why it invokes the famous Bauhaus Vorkurs, the foundational course that gave the Bauhaus students a starting point from which to branch out into their respective workshops.</p>
<p>The things for which economics, mathematics, statistics, operations research, computer science, and other fields have developed very intricate formal mechanisms will pop up mostly as pointers. The question which sorting, filtering, or separating algorithm to use is relevant and often decisive to the success of an economic activity, but it is secondary to the question when to sort, filter or separate — and what.</p>
<p>Instead it will take very close looks — some might think unreasonably close looks but my hope is the reasons for doing so will reveal themselves in due time — at existing economic artifices and their constituent parts. One of the motivations is to show that the Grand Bazaar in Istanbul and an online e-commerce platform have surprisingly many things in common, and there’s a reason for it.</p>
<h1>An economic pattern language</h1>
<p>To this end, EconPatterns — and I believe this is the defining novelty — will borrow liberally from design theory and practice, as well as from architecture. The chosen container for this endeavor is Christopher Alexander’s design pattern. There are many reasons for this choice, not the least of which is that design patterns have successfully been translated from architecture to software design.</p>
<p>The in-depth discussion of “why design patterns?” surely deserves its own article, but it also introduces an interesting tension. As design philosophies go, Alexander and the Bauhaus stalwarts are certainly at opposing ends of the spectrum, A to B, organic to geometric, habitable spaces to machines for living.</p>
<p>I’m hoping to put this tension to good use. Designing economic contraptions poses relevant questions beyond their productivity and efficiency. Which is a major reason why I am not trying to resolve that conflict or take sides.</p>
<p>Admittedly, the whole endeavor is open-ended, and the crucial question if the patterns sketched out so far will ultimately come together as a coherent whole is still unresolved. This is why the blog format is the right one at this juncture: to put the question out in the open while I present the first pieces of the puzzle.</p>
<p>EconPatterns will inevitably be shaped by my own background and my own particular interests, which is one reason why economic organization will be the initial focus. The fundamental model of the economy is different, as is the underlying concept of human behavior (as next week’s entry will show). I’m somewhat inclined to say that there are not that many people out there with a background both in design and economics, so I’m quite comfortable in claiming that the exercise should offer sufficient novelty.</p>
<p>I’m also very clear that I don’t hold exclusive rights to the very concept of design patterns — if anything I might be the first practitioner to apply them to economic design problems — but the ultimate defining characteristic of a design pattern that sets them apart from economic laws is that they’re entirely voluntary. They are simply proposals of how to look at, structure, and solve a certain design problem, and the ultimate arbiter for their success is if enough practitioners will find them useful enough to apply them to express their ideas.</p>
<p>Which in itself should hopefully take much of the pedantry out of economic debates.</p>Oliver BeigeAn Economic Pattern Language (@econpatterns for short) takes the economy and disassembles it into its constituent parts. But first, this blog post describes the economy as a whole.Iteration with Optics2024-02-22T00:00:00+00:002024-02-22T00:00:00+00:00https://cybercat-institute.github.io//2024/02/22/iteration-optics<p>In this post I’ll describe the theory of how to add iteration to categories of optics. Iteration is required for almost all applications of categorical cybernetics beyond game theory, and is something we’ve been handling only semi-formally for some time. The only tool we need is already one we have inside the categorical cybernetics framework: parametrisation weighted by a lax monoidal functor. I’ll end with a conjecture that this is an instance of a general procedure to force states in a symmetric monoidal category.</p>
<p>This post is strongly inspired by the account of Moore machines in <a href="http://davidjaz.com/">David Jaz Myers</a>’ book <a href="http://davidjaz.com/Papers/DynamicalBook.pdf">Categorical Systems Theory</a>, and <a href="https://matteocapucci.wordpress.com/">Matteo</a>’s enthusiasm for it. There’s probably a big connection to things like <a href="https://arxiv.org/abs/1903.01093">Delayed trace categories</a>, but I don’t understand it yet.</p>
<p>The diagrams in this post are made with <a href="https://q.uiver.app/">Quiver</a> and <a href="https://varkor.github.io/tangle/">Tangle</a>.</p>
<h1>The iteration functor</h1>
<p>For the purposes of this post, we’ll be working with a symmetric monoidal category $\mathcal C$, and the category $\mathbf{Optic} (\mathcal C)$ of monoidal optics over it. Objects of $\mathbf{Optic} (\mathcal C)$ are pairs of objects of $\mathcal C$, and morphisms are given by the coend formula</p>
\[\mathbf{Optic} (\mathcal C) \left( \binom{X}{X'}, \binom{Y}{Y'} \right) = \int_{M : \mathcal C} \mathcal C (X, M \otimes Y) \times \mathcal C (M \otimes Y', X')\]
<p>which amounts to saying that an optic $\binom{X}{X’} \to \binom{Y}{Y’}$ is an equivalence class of triples</p>
\[(M : \mathcal C, f : X \to M \otimes Y, f' : M \otimes Y' \to X')\]
<p>I’m pretty sure everything in this post works for other categories of bidirectional processes such as mixed optics and dependent lenses, this is just a setting to write it down which is both convenient and not at all obvious.</p>
<p>The <strong>iteration functor</strong> is a functor $\mathrm{Iter} : \mathbf{Optic} (\mathcal C) \to \mathbf{Set}$ defined on objects by</p>
\[\mathrm{Iter} \binom{X}{X'} = \int_{M : \mathcal C} \mathcal C (I, M \otimes X) \times \mathcal C (M \otimes X', M \otimes X)\]
<p>We refer to elements of $\mathrm{Iter} \binom{X}{X’}$ as <em>iteration data</em> for $\binom{X}{X’}$. We call the object $M$ the <em>state space</em>, the morphism $x_0 : I \to M \otimes X$ the <em>initial state</em> and the morphism $i : M \otimes X’ \to M \otimes X$ the <em>iterator</em>.</p>
<p>Note that in the common case that $\mathcal C$ is cartesian monoidal, we can eliminate the coend to obtain a simpler characterisation:</p>
\[\mathrm{Iter} \binom{X}{X'} = \mathcal C (1, X) \times \mathcal C (X', X)\]
<p>Given an optic $f : \binom{X}{X’} \to \binom{Y}{Y’}$ given by $f = (N, f : X \to N \otimes Y, f’ : N \otimes Y’ \to X’)$, we get a function</p>
\[\mathrm{Iter} (f) : \mathrm{Iter} \binom{X}{X'} \to \mathrm{Iter} \binom{Y}{Y'}\]
<p>Namely, the state space is $M \otimes N$, the initial state is</p>
\[I \overset{x_0}\longrightarrow M \otimes X \xrightarrow{M \otimes f} M \otimes N \otimes Y\]
<p>and the iterator is</p>
\[M \otimes N \otimes Y' \xrightarrow{M \otimes f'} M \otimes X' \overset{i}\longrightarrow M \otimes X \xrightarrow{M \otimes f} M \otimes N \otimes Y\]
<p>This is evidently functorial. Funnily enough, although the action of $\mathrm{Iter}$ on objects when $\mathcal C$ is cartesian is easier to understand, its action on morphisms is less obvious and is not <em>evidently</em> functorial, instead demanding a small proof.</p>
<h1>Pairing iterators and continuations</h1>
<p>We have an existing functor $K : \mathbf{Optic} (\mathcal C)^{\mathrm{op}} \to \mathbf{Set}$, given on objects by $K \binom{X}{X’} = \mathcal C (X, X’)$. This is the <em>continuation functor</em>, and it is the contravariant functor represented by the monoidal unit $\binom{I}{I}$. (This functor first appeared in <a href="https://arxiv.org/abs/1711.07059">Morphisms of Open Games</a>.)</p>
<p>For the remainder of this section I’ll specialise to the case $\mathcal C = \mathbf{Set}$, in which case an optic $\binom{X}{X’} \to \binom{Y}{Y’}$ is determined by a pair of functions $f : X \to Y$ and $f’ : X \times Y’ \to X’$, and iteration data $i : \mathrm{Iter} \binom{X}{X’}$ is determined by an initial value $x_0 : X$ and a function $i : X’ \to X$.</p>
<p>Given iteration data and a continuation that agree on their common boundary, we know enough to run the iteration and produce an infinite stream of values:</p>
\[\left< - | - \right> : \mathrm{Iter} \binom{X}{X'} \times K \binom{X}{X'} \to X^\omega\]
<p>Namely, this stream is defined corecursively by</p>
\[\left< x_0, i | k \right> = x_0 : \left< i (k (x_0)), i | k \right>\]
<p>This operation is natural (technically, <em>dinatural</em>): for any iteration data $i : \mathrm{Iter} \binom{X}{X’}$, optic $f : \binom{X}{X’} \to \binom{Y}{Y’}$ and continuation $k : K \binom{Y}{Y’}$, we have</p>
\[\left< i | K (f) (k) \right> = f^\omega \left( \left< \mathrm{Iter} (f) (i) | k \right> \right)\]
<p>where $f^\omega (-) : X^\omega \to Y^\omega$ means applying the forwards pass of $f$ to every element of the stream. As a commuting diagram,</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/dinaturality.png" alt="Dinaturality" /></p>
<p>Here’s a tiny implementation of the iteration functor and the pairing operator in Haskell:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Iterator</span> <span class="n">s</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">Iterator</span> <span class="p">{</span>
<span class="n">initialState</span> <span class="o">::</span> <span class="n">s</span><span class="p">,</span>
<span class="n">updateState</span> <span class="o">::</span> <span class="n">t</span> <span class="o">-></span> <span class="n">s</span>
<span class="p">}</span>
<span class="n">mapIterator</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="n">s</span> <span class="n">t</span> <span class="n">a</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">Iterator</span> <span class="n">s</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">Iterator</span> <span class="n">a</span> <span class="n">b</span>
<span class="n">mapIterator</span> <span class="n">l</span> <span class="p">(</span><span class="kt">Iterator</span> <span class="n">s</span> <span class="n">f</span><span class="p">)</span> <span class="o">=</span> <span class="kt">Iterator</span> <span class="p">(</span><span class="n">s</span> <span class="o">^#</span> <span class="n">l</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="n">b</span> <span class="o">-></span> <span class="p">(</span><span class="n">f</span> <span class="p">(</span><span class="n">s</span> <span class="o">&</span> <span class="n">l</span> <span class="o">.~</span> <span class="n">b</span><span class="p">))</span> <span class="o">^#</span> <span class="n">l</span><span class="p">)</span>
<span class="n">runIterator</span> <span class="o">::</span> <span class="kt">Iterator</span> <span class="n">s</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="n">s</span> <span class="n">t</span> <span class="nb">()</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">[</span><span class="n">s</span><span class="p">]</span>
<span class="n">runIterator</span> <span class="p">(</span><span class="kt">Iterator</span> <span class="n">s</span> <span class="n">f</span><span class="p">)</span> <span class="n">l</span> <span class="o">=</span> <span class="n">s</span> <span class="o">:</span> <span class="n">runIterator</span> <span class="p">(</span><span class="kt">Iterator</span> <span class="p">(</span><span class="n">f</span> <span class="p">(</span><span class="n">s</span> <span class="o">&</span> <span class="n">l</span> <span class="o">.~</span> <span class="nb">()</span><span class="p">))</span> <span class="n">f</span> <span class="p">)</span> <span class="n">l</span>
</code></pre></div></div>
<h1>The category of elements of Iterator</h1>
<p>The next step is to form the category of elements $\int \mathrm{Iter}$, also known as the discrete Grothendieck construction. This is a category whose objects are tuples $\left( \binom{X}{X’}, i \right)$ of an object $\binom{X}{X’}$ of $\mathbf{Optic} (\mathcal C)$ and a choice of iteration data $i : \mathrm{Iter} \binom{X}{X’}$. A morphism $\left( \binom{X}{X’}, i \right) \to \left( \binom{Y}{Y’}, j \right)$ is an optic $f : \binom{X}{X’} \to \binom{Y}{Y’}$ with the property that $\mathrm{Iter} (f) (i) = j$, that is to say, the iteration data on the left and right boundary have to agree.</p>
<p>The functor $\mathrm{Iter} : \mathbf{Optic} (\mathcal C) \to \mathbf{Set}$ is lax monoidal: there is an evident natural way to combine pairs of iteration data into iteration data for pairs:</p>
\[\nabla : \mathrm{Iter} \binom{X}{X'} \times \mathrm{Iter} \binom{Y}{Y'} \to \mathrm{Iter} \binom{X \otimes Y}{X' \otimes Y'}\]
<p>This means that the tensor product of $\mathbf{Optic} (\mathcal C)$ lifts to $\int \mathrm{Iter}$, by</p>
\[\left( \binom{X}{X'}, i \right) \otimes \left( \binom{Y}{Y'}, j \right) = \left( \binom{X \otimes Y}{X' \otimes Y'}, i \nabla j \right)\]
<p>The category $\int \mathrm{Iter}$ can essentially already describe iteration with optics, although in a slightly awkward way. Suppose we draw a string diagram that not coincidentally resembles a control loop:</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/closed-control-loop.png" alt="Control loop" /></p>
<p>Here, $f$ and $f’$ denote some morphisms $f : X \to Y$ and $f’ : Y \to X$ in our underlying category, and $x_0$ represents an initial state $x_0 : I \to X$.</p>
<p>Normally string diagrams denote morphisms of a monoidal category, but we make a cut just to the right of the backwards-to-forwards turning point, and consider that everything left of that is describing a boundary object. Namely in this case, we have the object $\left( \binom{X}{X}, i \right)$ where the iteration data $i : \mathrm{Iter} \binom{X}{X}$ is given by the state space $I$, the initial state $x_0 : I \to I \otimes X$ and the iterator $\mathrm{id} : I \otimes X \to I \otimes X$.</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/cut-control-loop.png" alt="Control loop" /></p>
<p>The remainder of the string diagram to the right of the cut denotes an ordinary optic $f : \binom{X}{X} \to \binom{I}{I}$, namely the one given by $f = (Y, f, f’)$, with forwards pass $f : X \to Y \otimes I$ and backwards pass $f’ : Y \otimes I \to X$. This boils down to describing the composite morphism $f; f’ : X \to X$.</p>
<p>Overall, we can read this diagram as denoting a morphism $f$ in $\int \mathrm{Iter}$ of type $f : \left( \binom{X}{X}, i \right) \to \left( \binom{I}{I}, \mathrm{Iter} (f) (i) \right)$. The iteration data on the right boundary is $\mathrm{Iter} (f) (i) : \mathrm{Iter} \binom{I}{I}$, which concretely has state space $Y$, the initial state $x_0; f : I \to Y$ and iterator $f’; f : Y \to Y$.</p>
<p>This works in principle, but splitting the diagram between denoting an object and denoting a morphism is very non-standard. So far, this amounts to doing for the iteration functor what we did for the selection functions functor in section 6 of <a href="https://arxiv.org/abs/2105.06332">Towards Foundations of Categorical Cybernetics</a>.</p>
<h1>The full theory of iteration</h1>
<p>Now we take the final step to fix the slight clunkiness of using $\int \mathrm{Iter}$ as a model of iteration. This continues the firmly established pattern that categorical cybernetics contains only two ideas that get combined in more and more intricate ways: optics and parametrisation.</p>
<p>There is a strong monoidal functor $\pi : \int \mathrm{Iter} \to \mathbf{Optic} (\mathcal C)$ that forgets the iteration data, namely the discrete fibration $\pi \left( \binom{X}{X’}, i \right) = \binom{X}{X’}$. This functor generates an action of the monoidal category $\int \mathrm{Iter}$ on $\mathbf{Optic} (\mathcal C)$, namely</p>
\[\left( \binom{X}{X'}, i \right) \bullet \binom{Y}{Y'} = \binom{X \otimes Y}{X' \otimes Y'}\]
<p>See section 5.5 of <a href="https://arxiv.org/abs/2203.16351">Actegories for the Working Amthematician</a> for far too much information about actegories of this form.</p>
<p>We now take the category $\mathbf{Para}_{\int \mathrm{Iter}} (\mathbf{Optic} (\mathcal C))$ of parametrised morphisms generated by this action. We also refer to this kind of thing (parametrisation for the action generated by a discrete fibration) as the Para construction <em>weighted</em> by $\mathrm{Iter}$, $\mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C))$ - the name comes from it being a kind of <a href="https://ncatlab.org/nlab/show/weighted+limit">weighted limit</a> and I think the reference for this is <a href="https://www.brunogavranovic.com/">Bruno</a>’s PhD thesis, which is sadly unreleased as I’m writing this.</p>
<p>Working things through: an object of $\mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C))$ is still a pair $\binom{X}{X’}$, but a morphism $\binom{X}{X’} \to \binom{Y}{Y’}$ consists of three things: another pair of objects $\binom{Z}{Z’}$, iteration data $i : \mathrm{Iter} \binom{Z}{Z’}$, and an optic $\binom{X \otimes Z}{X’ \otimes Z’} \to \binom{Y}{Y’}$.</p>
<p>Now suppose we have a diagram of an open control loop, that is to say, a control loop that is open-as-in-systems (not to be confused with an <a href="https://en.wikipedia.org/wiki/Open-loop_controller">open loop controller</a>, which is unrelated):</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/open-control-loop.png" alt="Open control loop" /></p>
<p>Here the primitive morphisms in the diagram are $f : A \otimes X \to B \otimes Y$, $f’ : B’ \otimes Y \to A’ \otimes X$, and an initial state $x_0 : I \to X$. The idea is that $f$ is the forwards pass, $f’$ is the backwards pass, and after the backwards pass comes another forwards pass but one time step in the future.</p>
<p>To make formal sense of this diagram, we imagine that we deform the backwards-to-forwards bend upwards, treating the state as a parameter, and then cut the diagram as we did before:</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/cut-open-control-loop.png" alt="Cut open control loop" /></p>
<p>Now we can read this off as a morphism $\binom{X}{X’} \to \binom{Y}{Y’}$ in $\mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C))$. The (weighted) Para construction makes everything go smoothly, so this is an entirely standard string diagram with no funny stuff.</p>
<p>Technically categories of parametrised morphisms are always bicategories (or better, double categories), and I think this is a rare case where we actually want to quotient out all morphisms in the vertical direction, i.e. identify $\left( f : \binom{X \otimes Z}{X’ \otimes Z’} \to \binom{Y}{Y’}, i : \mathrm{Iter} \binom{Z}{Z’} \right)$ with $\left( g : \binom{X \otimes W}{X’ \otimes W’} \to \binom{Y}{Y’}, j : \mathrm{Iter} \binom{W}{W’} \right)$ whenever there is <em>any</em> optic $h : \binom{Z}{Z’} \to \binom{W}{W’}$ making $\mathrm{Iter} (h) (i) = j$ and commuting with $f$ and $g$. Coming back to our earlier picture of cutting a string diagram, this exactly says that we identify all of the different ways we could make the cut. In order to do this we change the base of enrichment along the functor $\pi_0 : \mathbf{Cat} \to \mathbf{Set}$ taking each category to its set of connected components.</p>
<p>One final note: Almost everything in this post used nothing but the fact that $\mathrm{Iter}$ is a lax monoidal functor $\mathbf{Optic} (\mathcal C) \to \mathbf{Set}$. With minimal translation, I think the entire thing works as a story about “forcing states in a symmetric monoidal category”: given any symmetric monoidal category $\mathcal C$ and a lax monoidal functor $F : \mathcal C \to \mathbf{Set}$, the category $\mathbf{Para}^F (\mathcal C)$ is equivalently described as $\mathcal C$ freely extended with a morphism $x : I \to X$ for every $x : F (X)$. I’ll leave this as a conjecture for somebody else to prove.</p>Jules HedgesIn this post I'll describe the theory of how to add iteration to categories of optics. Iteration is required for almost all applications of categorical cybernetics beyond game theory, and is something we've been handling only semi-formally for some time. The only tool we need is already one we have inside the categorical cybernetics framework: parametrisation weighted by a lax monoidal functor. I'll end with a conjecture that this is an instance of a general procedure to force states in a symmetric monoidal category.Passive Inference is Compositional, Active Inference is Emergent2024-02-06T00:00:00+00:002024-02-06T00:00:00+00:00https://cybercat-institute.github.io//2024/02/06/passive-inference-compositional<p>This post is a writeup of a talk I gave at the <a href="https://amcs-community.org/events/causal-cognition-humans-machines/">Causal Cognition in Humans and Machines</a> workshop in Oxford, about some work in progress I have with <a href="https://tsmithe.net/">Toby Smithe</a>. To a large extent this is my take on the theoretical work in Toby’s PhD thesis, with the emphasis shifted from category theory and neuroscience to numerical computation and AI. In the last section I will outline my proposal for how to build AGI.</p>
<h2>Markov kernels</h2>
<p>The starting point is the concept of a <a href="https://en.wikipedia.org/wiki/Markov_kernel">Markov kernel</a>, which is a synonym for <a href="https://en.wikipedia.org/wiki/Conditional_probability_distribution">conditional probability distribution</a> that sounds unnecessarily fancy but, crucially, contains only 30% as many syllables. If $X$ and $Y$ are some sets then a Markov kernel $\varphi$ from $X$ to $Y$ is a conditional probability distribution $\mathbb P_\varphi [y \mid x]$. Most of this post will be agnostic to what exactly “probability distribution” can mean, but in practice it will <em>probably</em> eventually mean “Gaussian”, in order to <a href="https://knowyourmeme.com/memes/money-printer-go-brrr">go <em>brrr</em></a>, by which I mean <em>effective in practice at the expense of theoretical compromise</em>. (I blatantly stole this usage of that meme from <a href="https://www.brunogavranovic.com/">Bruno</a>.)</p>
<p>There are two different perspectives on how Markov kernels can be implemented. They could be <em>exact</em>, for example, they could be represented as a stochastic matrix (in the finite support case) or as a tensor containing a mean vector and covariance matrix for each input (in the Gaussian case). Alternatively they could be <a href="https://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo</a>, that is, implemented as a function from $X$ to $Y$ that may call a pseudorandom number generator. If we send the same input repeatedly then the outputs are samples from the distribution we want. Importantly these functions satisfy the <a href="https://en.wikipedia.org/wiki/Markov_property">Markov property</a>: the distribution on the output depends only on the current input and not on any internal state.</p>
<p>An important fact about Markov kernels is that they can be composed. Given a Markov kernel $\mathbb P_\varphi [y \mid x]$ and another $\mathbb P_\psi [z \mid y]$, there is a composite kernel $\mathbb P_{\varphi; \psi} [z \mid x]$ obtained by integrating out $y$:</p>
\[\mathbb P_{\varphi; \psi} [z \mid x] = \int \mathbb P_\varphi [y \mid x] \cdot \mathbb P_\psi [z \mid y] \, dy\]
<p>This formula is sometimes given the unnecessarily fancy name <a href="https://en.wikipedia.org/wiki/Chapman%E2%80%93Kolmogorov_equation">Chapman-Kolmogorov equation</a>. If we represent kernels by stochastic matrices, then this is exactly matrix multiplication; if they are Gaussian tensors, then it’s a similar but slightly more complicated operation. Doing exact probability for anything more complicated is extremely hard in practice because of the <a href="https://en.wikipedia.org/wiki/Curse_of_dimensionality">curse of dimensionality</a>.</p>
<p>If we represent kernels by Monte Carlo funtions, then composition is literally just function composition, which is extremely convenient. That is, we can just send particles through a chain of functions and they’ll come out with the right distribution - this fact is basically what the term “Monte Carlo” actually means.</p>
<p>A special case of this is an ordinary (non-conditional) probability distribution, which can be usefully thought of as a Markov kernel whose domain is a single point. Given a distribution $\mathbb P_\pi [x]$ and a kernel $\mathbb P_\varphi [y \mid x]$ we can obtain a distribution $\pi; \varphi$ on $y$, known as the <em>pushforward distribution</em>, by integrating out $x$:</p>
\[\mathbb P_{\pi; \varphi} [y] = \int \mathbb P_\pi [x] \cdot \mathbb P_\varphi [y \mid x] \, dx\]
<h2>Bayesian inversion</h2>
<p>Suppose we have a Markov kernel $\mathbb P_\varphi [y \mid x]$ and we are shown a sample of its output, but we can’t see what the input was. What can we say about the input? To do this, we must start from some initial belief about how the input was distributed: a <em>prior</em> $\mathbb P_\pi [x]$. After observing $y$, <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes’ law</a> tells us how we should modify our belief to a <em>posterior distsribution</em> that accounts for the new evidence. The formula is</p>
\[\mathbb P [x \mid y] = \frac{\mathbb P_\varphi [y \mid x] \cdot \mathbb P_\pi [x]}{\mathbb P_{\pi; \varphi} [y]}\]
<p>The problem of computing posterior distributions in practice is called <a href="https://en.wikipedia.org/wiki/Bayesian_inference">Bayesian inference</a>, and is very hard and very well studied.</p>
<p>If we fix $\pi$, it turns out that the previous formula for $\mathbb P [x \mid y]$ defines a Markov kernel from $Y$ to $X$, giving the posterior distribution for each possible observation. We call this the <em>Bayesian inverse</em> of $\varphi$ with respect to $\pi$, and write $\mathbb P_{\varphi^\dagger_\pi} [x \mid y]$.</p>
<p>The reason we can have $y$ as the input of the kernel but we had to pull out $\pi$ as a parameter is that the formula for Bayes’ law is <em>linear</em> in $y$ but <em>nonlinear</em> in $\pi$. This nonlinearity is really the thing that makes Bayesian inference hard.</p>
<p>Technically, Bayes’ law only considers <em>sharp</em> evidence, that is, we observe a particular point $y$. Considering inverse Markov kernels also gives us a way of handling <em>noisy</em> evidence, such as stochastic uncertainty in a measurement, by pushing forward a distribution $\mathbb P_\rho [y]$ to obtain $\mathbb P_{\rho; \varphi^\dagger_\pi} [x]$. This way of handling noisy evidence is sometimes called a <em>Jeffreys update</em>, and contrasted with a different formula called a <em>Pearl update</em> - see <a href="https://arxiv.org/abs/1807.05609">this paper</a> by <a href="https://www.cs.ru.nl/B.Jacobs/">Bart Jacobs</a>. Pearl updates have very different properties and I don’t know how they fit into this story, if at all. Provisionally, I consider the story of this post as evidence that Jeffreys updates are “right” in some sense.</p>
<h2>Deep inference</h2>
<p>So far we’ve introduced 2 operations on Markov kernels: composition and Bayesian inversion. Are they related to each other? The answer is a resounding <em>yes</em>: they are related by the formula</p>
\[(\varphi; \psi)^\dagger_\pi = \psi^\dagger_{\pi; \varphi}; \varphi^\dagger_\pi\]
<p>We call this the <em>chain rule</em> for Bayesian inversion, because of its extremely close resemblance to the chain rule for transpose Jacobians that underlies backpropagation in neural networks and differentiable programming:</p>
\[J^\top_x (f; g) = J^\top_{f (x)} (g) \cdot J^\top_x (f)\]
<p>The Bayesian chain rule is <em>extremely</em> folkloric. I conjectured it in 2019 while talking to Toby, and he proved it a few months later, writing it down in his unpublished preprint <a href="https://arxiv.org/abs/2006.01631">Bayesian Updates Compose Optically</a>. It’s definitely not new - <em>some</em> people already know this fact - but extremely few, and we failed to find it written down in a single place. (I feel like it should have been known by the 1950s at the latest, when things like dynamic programming were being worked out. Perhaps it’s one of the things that was well known in the Soviet Union but wasn’t discovered in the West until much later.) The first place Toby <em>published</em> this fact was in <a href="https://arxiv.org/abs/2305.06112">The Compositional Structure of Bayesian Inference</a> with <a href="https://dylanbraithwaite.github.io/about.html">Dylan Braithwaite</a> and me, which fixed a minor problem to do with zero-probability observations in a nice way.</p>
<p>What this formula tells us is that if we have a Markov kernel with a known factorisation, we can compute Bayesian posteriors efficiently if we already know the Bayesian inverse of each factor. Since this is exactly the same form as differentiable programming, we have good evidence that it can go <em>brrr</em>. At first I thought it was completely obvious that this must be how compilers for probabilistic programming languages work, but it turns out this is not the case at all, probabilistic programming languages are monolithic. I’ve given this general methodology for computing posteriors compositionally the catchy name <em>deep inference</em>, by its very close structural resemblance to deep learning.</p>
<h2>Variational inference</h2>
<p>I wrote “we can compute Bayesian posteriors efficiently if we already know the Bayesian inverse of each factor”, but this is still a big <em>if</em>: computing posteriors even of simple functions is still hard if the dimensionality is high. Numerical methods are used in practice to approximate the posterior, and we would like to make use of these while still exploiting compositional structure.</p>
<p>The usual way of approximating a Bayesian inverse $\varphi^\dagger_\pi$ is to cook up a functional form $\varphi^\prime_\pi (p)$ that depends on some parameters $p \in \mathbb R^N$. Then we find a loss function on the parameters with the property that minimising it causes the approximate inverse to converge to the exact inverse, ie. $\varphi^\prime_\pi (p) \longrightarrow \varphi^\dagger_\pi$. This is called <em>variational inference</em>.</p>
<p>There are many ways to do this. Probably the most common loss function in practice is <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">KL divergence</a> (aka <em>relative entropy</em>),</p>
\[\mathbf{KL} (\varphi^\dagger_\pi, \varphi^\prime_\pi (p)) = \int \mathbb P_{\varphi^\dagger_\pi} [x \mid y] \log \frac{\mathbb P_{\varphi^\dagger_\pi} [x \mid y]}{\mathbb P_{\varphi^\prime_\pi (p)} [x \mid y]} \, dx\]
<p>This expression is a function of $y$, which can optionally also be integrated over (but the next paragraph reveals a better way to use it). A closely related alternative is <a href="https://en.wikipedia.org/wiki/Evidence_lower_bound">variational free energy</a>, which despite being more complicated to define is more computationally tractable.</p>
<p>Ideally, we would like to use a functional form for which we can derive an analytic formula that tells us exactly how we should update our parameters to decrease the loss, given (possibly batched) Monte Carlo samples that are assumed to be drawn from a distribution in a certain class, such as Gaussians.</p>
<p>Of course in 2024 if you are <em>serious</em> then the functional form you use is a deep neural network, and you replace your favourite loss function by its derivative. I refer to this version as <em>deep variational inference</em>. There is no fundamental difference in theory, but in practice deep variational inference is necessary in order to go <em>brrr</em>.</p>
<h2>Passive inference is compositional</h2>
<p>Now, suppose we have two Markov kernels $\mathbb P_\varphi [y \mid x]$ and $\mathbb P_\psi [z \mid y]$ which we compose. Suppose we have a prior $\mathbb P_\pi [x]$ for $\varphi$, which pushes forward to a prior $\mathbb P_{\pi; \varphi} [y]$ for $\psi$. We pick a functional form for approximating each Bayesian inverse, which we call $\mathbb P_{\varphi^\prime_\pi (p)} [x \mid y]$ and $\mathbb P_{\psi^\prime_{\pi; \varphi} (q)} [y \mid z]$.</p>
<p>Doing this requires a major generalisation of our loss function. This was found by Toby Smithe in <a href="https://arxiv.org/abs/2109.04461">Compositional active inference 1</a>. The method he developed comes straight from <a href="https://arxiv.org/abs/1603.04641">compositional game theory</a>, and this appearance of virtually identical structure in game theory and Bayesian inference is absolutely one of the most core ideas of <a href="https://cybercat-institute.github.io/2022/05/29/what-is-categorical-cybernetics/">categorical cybernetics</a> as I envision it.</p>
<p>The idea is to define the loss of an approximate inverse to a kernel $\varphi : X \to Y$ in a <em>context</em> that includes not only a prior distribution on $X$, but also a (generally nonlinear) function $k$ called the <em>continuation</em>, that transforms probability distributions on $Y$. The continuation is a black box that describes how predictions transform into observations. Then when $y$ appears free in the expressions for KL divergence and variational free energy, we integrate it over the distribution $k (\pi; \varphi)$.</p>
<p>So for our composite kernel $\varphi; \psi$, as well as the prior $\pi$ on $X$ we also have a continuation $k$ that transforms distributions on $Z$. In order to optimise the parameters $(p, q)$ in this context, we divide them into two sub-problems:</p>
<ul>
<li>Optimise the parameters $p$ for $\varphi$ in the context given by the prior $\pi$ on $X$ and the continuation $k’$ on $Y$ given by $k’ (\sigma) = k (\sigma; \psi); \psi’_\sigma (q)$</li>
<li>Optimise the parameters $q$ for $\psi$ in the context given by the prior $\pi; \varphi$ on $Y$ and the continuation $k$ on $Z$</li>
</ul>
<p>Notice that the optimisation step for $p$ involves the current value of $q$, but not vice versa. It is easy to prove that this method correctly converges to the total Bayesian inverse by a dynamic programming argument, if we first optimise $q$ to convergence and then optimise $p$. However, Toby and me conjecture that this procedure also converges if $p$ and $q$ are optimised asynchronously, which means the procedure can be parallelised.</p>
<h2>Active inference is emergent</h2>
<p>The convergence conjecture in the previous section crucially relies on the fact that the prediction kernels $\varphi$ and $\psi$ are fixed, and we are only trying to approximate their Bayesian inverses. That is why I referred to it as <em>passive inference</em>. The term <em>active inference</em> means several different things (more on this in the next section) but one thing it should mean is that we simultaneously learn to do both prediction and inference.</p>
<p>Toby and me think that if we do this, the compositionality result breaks. In particular, if we also have a parametrised family of prediction kernels $\varphi (p)$ which converge to our original kernel $\varphi$, it is <em>not</em> the case that</p>
\[\psi^\prime_{\pi; \varphi (p)} (q); \varphi^\prime_\pi (p) \longrightarrow (\varphi; \psi)^\dagger_\pi\]
<p>Specifically, we think that the nonlinear dependency of $\psi^\prime_{\pi; \varphi (p)} (q)$ on $\varphi^\prime (p)$ causes things to go wrong.</p>
<p>One way of saying this negative conjecture is: <em>compositional active inference can fail to converge to true beliefs, even in a stationary environment</em>. The main reason you’d want to do this anyway, even at the expense of getting the wrong answer, is that it might go <em>brrr</em> - but whether this is really true remains to be seen.</p>
<p>We can, however, put a positive spin on this negative result. I am known for the idea that <em>the opposite of compositionality is emergence</em>, from <a href="https://julesh.com/2017/04/22/on-compositionality/">this blog post</a>. A compositional active inference system does not behave like the sum of its parts. The interaction between components can prevent them from learning true beliefs, but can it do anything positive for us? So far we know nothing about how this emergent learning dynamics behaves, but our optimistic hope is that it could be responsible for what is normally called things like <em>intelligence</em> and <em>creativity</em> - on the basis that there aren’t many other places that they could be hiding.</p>
<h2>How to build a brain</h2>
<p>Boosted by the last paragraph, we now fully depart the realm of mathematical conjecture and enter the outer wilds of hot takes, increasing in temperature towards the end.</p>
<p>So far I’ve talked about active inference but not mentioned what is probably the most important thing in the cloud of ideas around the term: conflating <em>prediction</em> and <em>control</em>. Ordinarily, we would think of $\mathbb P_{\pi; \varphi} [y]$ as <em>prediction</em> and $\mathbb P_{\varphi^\dagger_\pi} [x \mid y]$ as <em>inference</em>. However it has been proposed (I believe the idea is due to <a href="https://www.fil.ion.ucl.ac.uk/~karl/">Karl Friston</a>) that in the end $\mathbb P_{\pi; \varphi} [y]$ is interpreted as a command: at the end of a chain of prediction-inference devices comes an actuator designed to act on the external environment in order to (try to) make the prediction true. That is, a prediction like “my arm will rise” is <em>the same thing</em> as the command “lift my arm” when connected to my arm muscles.</p>
<p>This lets us add one more piece to the puzzle, namely <em>reinforcement learning</em>. A deep active inference system can interact with an environment (either the real world or a simulated environment), by interpreting its ultimate predictions as commands, effecting those commands into the environment, and responding with fresh observations. Over time, the system should learn to predict the response of the environment, that is to say, it will learn an <em>internal model</em> of its environment. If several different active inference systems interact with the same environment, then we should consider the environment of each to contain the others, and expect each to learn a model of the others, recursively.</p>
<p>I am not a neuroscientist, but I understand it is at least plausible that the compositional structure of the mammalian cortex exactly reflects the compositional structure of deep active inference. The cortex is shaped (in the sense of connectivity) approximately like a pyramid, with both sensory and motor areas at the bottom. In particular, the brain is <em>not</em> a <a href="https://en.wikipedia.org/wiki/Series_of_tubes">series of tubes</a> with sensory signals going in at one end and motor signals coming out at the other end. Obviously the basic pyramid shape must be modified with endless ad-hoc modifications at every scale developed by evolution for various tasks. So following Hofstadter’s <a href="http://bert.stuy.edu/pbrooks/fall2014/materials/HumanReasoning/Hofstadter-PreludeAntFugue.pdf">Ant Fugue</a>, I claim <em>the cortex is shaped like an anthill</em>.</p>
<p>The idea is that the hierarchical structure is roughly an <em>abstraction</em> hierarchy. Predictions (aka commands) $\mathbb P_\varphi [y \mid x]$ travel down the hierarchy (towards sensorimotor areas), transforming predictions at a higher level of abstraction $\mathbb P_\pi [x]$ into predictions at a lower level of abstraction $\mathbb P_{\pi; \varphi} [y]$. Inferences $\mathbb P_{\varphi^\dagger_\pi} [x \mid y]$ travel up the hierarchy (away from sensorimotor areas), transforming observations at a lower level of abstraction $\mathbb P_\rho [y]$ into observations at a higher level of abstraction $\mathbb P_{\rho; \varphi^\dagger_\pi} [x]$.</p>
<p>Given this circularity, with observations depending on predictions recursively through many layers, I expect that the system will learn to predict <em>sequences</em> of inputs (as any recursive neural network does, and notably <em>transformers</em> do extremely successfully) - and also <em>sequences of sequences</em> and so on. I predict that stability will increase up the hierarchy - that is, updates will usually be smaller at higher levels - so that at least conceptually, higher levels run on a slower timescale than lower levels. This comes back to ideas I first read almost 15 years ago in the book <a href="https://us.macmillan.com/books/9780805078534/onintelligence">On Intelligence</a> by Jeff Hawkins and Sandra Blakeslee.</p>
<p>Conceptually, this is exactly the same idea I wrote about in <a href="https://link.springer.com/chapter/10.1007/978-3-031-08020-3_9">chapter 9</a> of <a href="https://link.springer.com/book/10.1007/978-3-031-08020-3">The Road to General Intelligence</a> - the main difference is that now I think I have a good idea how to actually compute commands and observations in practice, whereas back then I hand-crafted a toy proof of concept.</p>
<p>If both sensory and motor areas are at the bottom of the hierarchy, this raises the obvious question of what is at the <em>top</em>. It probably has something to do with long term memory formation, but it is almost impossible to not be thinking about <em>consciousness</em> at this point. I’m going to step back from this so that the hot takes in this post don’t reach their ignition temperature before the next paragraph.</p>
<p>The single hottest take that I genuinely believe is that <em>deep variational reinforcement learning is all you need</em>, and is the only conceptually plausible route to what is sometimes sloppily called “AGI” and what I refer to in private as “true intelligence”.</p>
<p>I should mention that none of my collaborators is as optimistic as me that <em>deep variational reinforcement sequence learning is all you need</em>. Uniquely among my collaborators, I am a hardcore connectionist and I believe good old fashioned symbolic methods have no essential role to play. Time will tell.</p>
<p>My long term goal is <em>obviously</em> to build this, if it works. My short term goal is to build some baby prototypes starting with passive inference, to verify and demonstrate that what works in theory also works in practice. So watch this space, because the future might be wild…</p>Jules HedgesThis post is a writeup of a talk I gave at the Causal Cognition in Humans and Machines workshop in Oxford, about some work in progress I have with Toby Smithe. To a large extent this is my take on the theoretical work in Toby's PhD thesis, with the emphasis shifted from category theory and neuroscience to numerical computation and AI. In the last section I will outline my proposal for how to build AGI.How to Stay Locally Safe in a Global World2024-01-16T00:00:00+00:002024-01-16T00:00:00+00:00https://cybercat-institute.github.io//2024/01/16/How%20to%20Stay%20Locally%20Safe%20in%20a%20Global%20World<p>Cross-posted from <a href="https://jadeedenstarmaster.wordpress.com/">Jade’s blog</a>: parts <a href="https://jadeedenstarmaster.wordpress.com/2023/12/06/how-to-stay-locally-safe-in-a-global-world/">1</a>, <a href="https://jadeedenstarmaster.wordpress.com/2023/12/17/how-to-stay-locally-safe-in-a-global-world-part-ii-defining-a-world-and-stating-the-problem/">2</a>, <a href="https://jadeedenstarmaster.wordpress.com/2023/12/17/how-to-stay-locally-safe-in-a-global-world-part-iii-the-global-safety-poset/">3</a></p>
<h2>Introduction</h2>
<p>Suppose your name is $x$ and you have a very important state machine $N_x : S_x \times \Sigma \to \mathcal{P}(S_x)$ that you cherish with all your heart. Because you love this state machine so much, you don’t want it to malfunction and you have a subset $P \subseteq S_x$ which you consider to be safe. If your state machine ever leaves this safe space you are in big trouble so you ask the following question. If you start in some subset $I \subseteq P$ will your state machine $N_x$ ever leave $P$? In math, you ask if</p>
\[\mu (\blacksquare(-) \cup I) \subseteq P\]
<p>where $\mu$ is the least fixed point and $\blacksquare(-)$ indicates the next-time operator of the cherished state machine. What is the next-time operator?</p>
<p>Definition: For a function $N : X \times \Sigma \to \mathcal{P}(Y)$ there is a monotone function $\blacksquare_N : \mathcal{P}(X) \to \mathcal{P}(Y)$ given by</p>
\[\blacksquare_N(A) = \bigcup_{a \in A} \bigcup_{s \in \Sigma} N(a,s)\]
<p>In layspeak the next-time operator sends a set of states the set of all possible successors of those states.</p>
<p>In a perfect world you could use these definitions to ensure safety using the formula</p>
\[\mu (\blacksquare(-) \cup I) = \bigcup_{n=0}^{\infty} (\blacksquare ( - ) \cup I)^n\]
<p>or at least check safety up to an arbitrary time-step $n$ by computing this infinite union one step at a time.</p>
<p>Unfortunately there is a big problem with this method! Your state machine does not exist in isolation. You have a friend whose name is $y$ with their own state machine $N_y : S_y \times \Sigma \to \mathcal{P} (S_y)$. $y$ has the personal freedom to run their state machine how they like but there are functions</p>
\[N_{xy} : S_x \times \Sigma \to \mathcal{P}(S_y)\]
<p>and</p>
\[N_{yx} : S_y \times \Sigma \to \mathcal{P}(S_x)\]
<p>which allow states of your friend’s machine to change the states of your own and vice-versa. Making matters worse, there is a whole graph $G$ whose vertices are your friends and whose edges indicate that the corresponding state machines may effect each other. How can you be expected to ensure safety under these conditions?</p>
<p>But don’t worry, category theory comes to the rescue. In the next sections I will:</p>
<ul>
<li>State my model of the world and the local-to-global safety problem for this model (Part II)</li>
<li>Propose a solution to the local-to-global safety problem based on an enriched version of the Grothendieck construction (Part III)</li>
</ul>
<h2>Defining a World and Stating the Problem</h2>
<p>Suppose we have a directed graph $G=(V(G),E(G))$ representing our world. The vertices of this graph are the different agents in our world and an edge represents a connection between these agents. The semantics of this graph will be the following:</p>
<p>Definition: Let $\mathsf{Mach}$ be the directed graph whose objects are sets and where there is an edge $e : X \to Y$ for every function</p>
\[e : X \times \Sigma \to \mathcal{P}(Y)\]
<p>A world is a morphism of directed graphs $W : G \to \mathsf{Mach}$.</p>
<p>A world has a set $S_x$ for each vertex $x$ called the local state over $\mathbf{x}$ and for each edge $e :x \to y$ a function $W(e) : S_x \times \Sigma_e \to \mathcal{P}(S_y)$ representing the state machine connecting the local state over $x$ to the local state over $y$. Note that self edges are ordinary state machines from a local state to itself. An example world may be drawn as follows:</p>
<p><img src="/assetsPosts/2023-12-18-How to Stay Locally Safe in a Global World/World.png" alt="Example World" /></p>
<p>Definition: Given a world $W: G \to \mathsf{Mach}$, the total machine of $W$ is the state machine
$\int W : \sum_{x \in V(G)} S_x \times \sum_{e \in E(G)} \Sigma_e \to \mathcal{P}( \sum_{x \in V(G)} S_x )$</p>
<p>given by</p>
\[( (s,x),(\tau,d)) \mapsto \bigcup_{e: x \to y} F(e) (s, \tau)\]
<p>The notation $\int$ is used based on the belief that this is some version of the Grothendieck construction. Exactly which flavor will be left to future work. The transition function of this state machine comes from unioning the transition functions of all the state machines associated to edges originating in a vertex.</p>
<p>Definition: Given a world $W : G \to \mathsf{Mach}$, a vertex $x \in V(G)$, and subsets $I,P \subset S_x$, we say that $I$ is locally safe in a global context if</p>
\[\mu (\blacksquare_{\int W} (-) \cup I) \subseteq P\]
<p>where $\blacksquare_{\int W}$ is the next-time operator of the state machine $\int W$.</p>
<p>The state machine $\int W$ may be large enough to make computing this least fixed point by brute force impractical. Therefore, we must leverage the compositional structure of $W$. We will see how to do this in the next post.</p>
<h2>The Global Safety Poset</h2>
<p>In this section we will give a compositional solution to the local safety problem in a global context in two steps:</p>
<ul>
<li>First by turning the world into a functor $\hat{W} : FG \to \mathsf{Poset}$</li>
<li>Then by gluing this functor into a single poset $\int \hat{W}$ whose inequalities solve the problem of interest.</li>
</ul>
<p>First we define the functor.</p>
<p>Given a world $W : G \to \mathsf{Mach}$, there is a functor</p>
\[\hat{W} : FG \to \mathsf{Poset}\]
<p>where</p>
<ul>
<li>$FG$ is the free category on the graph $G$,</li>
<li>$\mathsf{Poset}$ is the category whose objects are posets and whose morphisms are monotone functions.</li>
</ul>
<p>Functors from a free category are uniquely defined by their image on vertices and generating edges.</p>
<ul>
<li>For a vertex $x \in V(G)$, $\hat{W}(x) = \mathcal{P}(S_x)$,</li>
<li>for an edge $e : x \to y$, we define $\hat{W}(e): \mathcal{P}(S_x) \to \mathcal{P}(S_y)$ by $A \mapsto \blacksquare_{W(e)}(A)$</li>
</ul>
<p>Now for step two.</p>
<p>Given a functor $\hat{W} : FG \to \mathsf{Poset}$ defined from a world $W$, the <strong>global safety poset</strong> is a poset $\int \hat{W}$ where</p>
<ul>
<li>elements are pairs $(x \in V(G), A \subseteq S_x)$,</li>
<li>$(x, A) \leq (y, B) \iff \bigwedge_{f: x \to y \in FG} \hat{W} (f) (A) \subseteq B$</li>
</ul>
<p>Given a world $W : G \to \mathsf{Mach}$, a vertex $x \in V(G)$, and subsets $I,P \subseteq S_x$ then $I$ is locally safe in a global context if and only if there is an inequality
$(x,I) \subseteq (x,P)$ in the global safety poset $\int \hat{W}$</p>
<p>My half-completed proof of this theorem involves a square of functors</p>
<p><img src="/assetsPosts/2023-12-18-How to Stay Locally Safe in a Global World/commsquare.png" alt="Correctness Square" /></p>
<p>Going from right and then down, the first functor uses a Grothendieck construction to turn a world into a total state machine and then turns that state machine into it’s global safety poset. Going down and then right follows the construction detailed in the last two sections. The commutativity of this diagram should verify correctness. I will explain all of this in more detail later. Thanks for tuning in today!</p>Jade MasterSuppose your name is x and you have a very important state machine that you cherish with all your heart. Because you love this state machine so much, you don't want it to malfunction and you have a subset which you consider to be safe. If your state machine ever leaves this safe space you are in big trouble so you ask the following question.AI Safety Meets Value Chain Integrity2023-12-11T00:00:00+00:002023-12-11T00:00:00+00:00https://cybercat-institute.github.io//2023/12/11/ai-safety-meets-value-chain-integrity<p><strong>tl;dr - Advanced AI making economic decisions in supply chains and markets creates poorly-understood risks, especially by undermining the fundamental concept of individuality of agents. We propose to research these risks by building and simulating models.</strong></p>
<p>For many years, AI has been routinely used for economic decision making. Two major roles it has traditionally played are high frequency trading and algorithmic pricing. Traditionally these are quite simple, at the level of tabular Q-learning agents. Even these comparatively simple algorithms can behave in unexpected ways due to emergent interactions in an economic environment. Probably the most infamous of these events was the <a href="https://en.wikipedia.org/wiki/2010_flash_crash">flash crash</a>, for which algorithmic high speed trading was a major contributing cause. Much less well known is the subtle issue of <em>implicit collusion</em> in pricing algorithms, which are ubiquitous in several markets such as airline tickets and Amazon: <a href="https://www.aeaweb.org/articles?id=10.1257/aer.20190623">a widely 2020 cited paper</a> found that even very simple tabular Q-learning will converge to prices higher than the Nash equilibrium price - but <a href="https://arxiv.org/abs/2201.00345">our research</a> found that this depends sensitively on the exact method of training, and the effect vanishes when the algorithms are trained independenly in simulated markets.</p>
<p>Besides markets, AI is also already used for making decisions in supply chains (see for example [<a href="https://www.thomsonreuters.com/en-us/posts/technology/ai-supply-chains/">1</a> <a href="https://www.mckinsey.com/capabilities/operations/our-insights/autonomous-supply-chain-planning-for-consumer-goods-companies">2</a> <a href="https://www.forbes.com/sites/forbestechcouncil/2023/08/08/ais-role-in-supply-chain-management-and-how-organizations-can-get-started/">3</a> <a href="https://www.accenture.com/us-en/blogs/business-functions-blog/generative-ai-why-smarter-supply-chains-are-here">4</a>]), and surely will be moreso in the future. Contemporary supply chains are extraordinarily complex. A typical modern technology product can have hundred of thousands of components sourced from ten thousand suppliers across half a dozen tiers which need to be shipped across the globe to the final assembly. A single five-dollar part can stop an assembly line, which in the case of industries like automotive can cost millions per hour of downtime. The worst type of inventory a company can carry is a 99.9% finished product it cannot sell. Over time, supply chains have been hyper-optimised at the expense of integrity, so that a metaphorical perfect storm in the shape of an <a href="https://en.wikipedia.org/wiki/2010_eruptions_of_Eyjafjallaj%C3%B6kull">Icelandic volcano named Eyjafjallajökull erupting</a> or a <a href="https://en.wikipedia.org/wiki/2021_Suez_Canal_obstruction">container ship named <em>Ever Given</em> getting stuck in the Suez Canal</a> caused massive disruption that inevitably leads to delayed goods, spoiled perishables, lawsuits and contested insurance claims easily in the ten digits. The <a href="https://www.ey.com/en_gl/supply-chain/how-covid-19-impacted-supply-chains-and-what-comes-next">COVID-19 pandemic</a> was a business school case for all the types of havoc supply chain disruptions can wreak, oscillating wildly from not enough containers to too many containers in port, obstructing the handling of cargo, from COVID-related work shutdowns in China to sudden shifts in consumer behavior in Western countries, leading to layoffs in hospitality industries and labour shortages in production and transportation. Beyond these knock-on effects that can explode planning horizons for procurement and shift the delicate power balance from buyer to supplier, another major problem in supply chain is the knock-off effect: fashion brands and pharmaceutical companies alike fight the problem of counterfeit products being introduced into the supply chain when no one is looking, leading to multi-million dollar losses along with the reputational damage, and, especially in pharmaceuticals, posing a hazard to health and life for many. Supply chain integrity crucially on transparency across a multitude of participants who are typically less than eager to share confidential data.</p>
<p>Moving fowards from these events, the delicate tredeoff between efficiency and integrity is a perfect use-case for the integrated and inter-connected decision-making that is afforded by AI.</p>
<p>This brings us to the issue of economic decisions being deferred to large language models such as GPT4. The well known examples are not “natively economic”, but many people are adapting transformer architectures to operate on various types of data besides linguistic data, and it is only a matter of time before there are “economics LLMs”. In the meantime, GPT is entirely capable of making economic decisions with the right prompting - although virtually nothing is known about its performance on these type of tasks. We do not recommend using GPT to make investment decisions for you, but we expect it to become widespread anyway, if it isn’t already. Similarly, we expect large parts of complex supply chains to be almost entirely deferred to AI, extending the existing automation and its associated benefits and risks.</p>
<h2>AI undermines individuality in economics</h2>
<p>The traditional (tabular Q-learning) and contemporary (LLMs) situations are very different in many ways, but they have a subtle and crucial point in common. This is that decisions that look independent are secretly connected. There are two ways this could happen: one is that human decision-makers defer to off-the-shelf software that comes from the same upstream supplier - as is the case for algorithmic pricing in the airline industry for example. The other is that there really is a single instance of the AI system in the world and everybody is calling into it - as is the case with GPT.</p>
<p>For off-the-shelf implementations of tabular Q-learning for algorithmic pricing, there is some evidence that having a single upstream supplier has a significant impact on the behaviour of the market, and this is something that regulators are actively investigating. For LLMs virtually nothing is known, but we expect that the situation is worse. At the very least, the situation will certainly be more unpredictable, and we expect the compounding of implicit biases to be worse as these systems become ubiquitous and deeply embedded into decision-making. We plan to research this, by building economic simulations where decisions are made by advanced AIs and studying their behaviour.</p>
<p>A further possibility is more hypothetical, but we expect it to become a reality within the next few years. Right now the technology behind large language models - generative transformers - mainly operates on textual data, but it is actively being adapted for other types of data, and for other tasks besides text generation. Making economic decisions is very similar to playing games, and so there is an obvious analogy to the wildly successful application of deep reinforcement learning to strategically complex game playing tasks such as Go and StarCraft 2 by DeepMind. Combining this with generative transformer architectures could be immensely powerful, and it is not hard to believe such a system could surpass human performance on the task of economic decision-making.</p>
<h2>Modelling for harm prevention</h2>
<p>Compositional game theory - a technology that we <a href="https://arxiv.org/abs/1603.04641">developed</a> and <a href="https://github.com/CyberCat-Institute/open-game-engine">implemented</a> - is currently the state of the art for implementing complex meso-scale microeconomic models. The way things are traditionally done, models are written first in mathematics and are later converted into computational models in general purpose languages (traditionally Fortran, but increasingly in modern languages such as Python), a process that is very slow and very prone to introducing hard-to-detect errors. We use a <em>model is code</em> paradigm, where both the mathematical and computational languages are modified to bring them very close to each other - most commonly we build our models directly in code, with a clean separation of concerns between the economic and computational parts. Our models are not inherently more accurate, but they are 2 orders of magnitude faster and cheaper to build, and this unlocks our secret weapon: <em>rapid prototyping models</em>. By iterating quickly, and continuously obtaining feedback from data and stakeholders, we reach a better model than could be built monolithically.</p>
<p>Why do we want to build these models? The bigger picture is, we want to inform the discussion about regulation of AI. This discussion is already widespread at the highest level of governments around the world, but is currently heavily lacking in evidence one way or the other. There’s a good reason for this: the domain of LLMs is language, and it is extremely difficult to make convincing predictions about the possible harms that can happen mediated by linguistic communication. More restricted domains, such as the behaviours of API bots, are easier to reason about. We have identified the general realm of economic decision-making as a critically under-explored part of the general AI safety question, which our tools are well-placed to explore through modelling and simulations.</p>
<p>Our implementation of compositional game theory allows modularly switching the algorithm that each player uses for making decisions. Normally when doing applied game theory we use a monte carlo optimiser for every player. But we also have <a href="https://github.com/CyberCat-Institute/open-games-RLib">a version</a> that calls a Python implementation of Q-learning over a web socket. We could also easily switch it to calls to an open source LLM, or API calls to a GPT API bot or similar.</p>
<p>What’s more, this is emphatically <em>not</em> a mere hack that we bolt on top of game theory. At the core of our whole approach is our discovery, as seen in <a href="https://arxiv.org/abs/2105.06332">this paper</a>, that the foundations of compositional game theory and several branches of machine learning are extremely closely related - this foundation is what we call <a href="https://cybercat.institute/2022/05/29/what-is-categorical-cybernetics/">categorical cybernetics</a>. This foundation is what guides us and tells us that what are are doing is really meaningful. More than that, though, it opens a realistic possibility that we can know <em>qualitative</em> things about the behaviour of AIs making economic decisions, a much higher level of confidence than making inferences from simulation results. And when it comes to informing the discussion on regulation when the stakes are as high as they are, more certainty is always better.</p>
<h2>What if?</h2>
<p>So far we have focussed on the likely negative <em>accidental</em> impacts AI is likely to have on markets and supply chains, where they perform their intended purpose locally but interact in unforeseen ways. This is already concerning, but there is another side to the issue. What if decisions that should be independent are made by a single AI that has “gone rogue”, i.e. has a goal that is not the intended one? Depending on your personal assessment of the likelihood of this situation you could read this section as a fun thought experiment or a warning.</p>
<p>Being handed direct control of markets and supply chains gives perhaps the most powerful leverage over the physical world that an AI could have. Since it can <em>collude with itself</em>, it can easily create behaviours that would never be possible when decisions are made by agents that are independent and at least somewhat rational.</p>
<p>By far the most straightforward outcome of this situation is chaos. Markets and supply chains are so deeply interconnected that it would take very little intentional damage to create a recession deep enough to bring society to its knees. However, by virtually destroying the institutions that it controls this makes it a one-time event, which while extremely bad, would be easily recoverable for humanity as a whole.</p>
<p>Much worse would be the ability of a rogue AI to subtly direct real-world resources towards a secret goal of its own over a long period of time. It isn’t a hypothetical that complex supply chains can easily hide parts of themselves: consider how widespread is modern slavery in the supply chains of consumer electronics, or how the US government secretly procured the resources needed to build the first nuclear weapons at a time when supply chains were much simpler.</p>
<h2>Conclusion</h2>
<p>It is arguable exactly how extensive are the risks associated to allowing AIs to interact with economic systems, with the scenarios described in the previous section being hypothetical. However, it is undeniable that some serious risks do exist, including already-observed events such as flash crashes and implicit collusion. We have identified that the specific factor of decision-makers using the same upstream provider of decision-making software leads to poorly-understood emergent behaviours of supply chains and markets.</p>
<p>Our theoretical framework, compositional game theory, and our implementation of it, the open game engine, are the perfect tools for building and simulating models of economic situations with AI decision-makers. The goal of creating these models is to produce evidence leading to a better-informed debate on issues around the regulation of AI.</p>Jules HedgesAdvanced AI making economic decisions in supply chains and markets creates poorly-understood risks, especially by undermining the fundamental concept of individuality of agents. We propose to research these risks by building and simulating models.About the CyberCat Institute blog2023-11-26T00:00:00+00:002023-11-26T00:00:00+00:00https://cybercat-institute.github.io//2023/11/26/test-post<p>The Cybercat blog website is based on the <a href="https://jekyllthemes.io/theme/whiteglass">Whiteglass</a> theme.</p>
<h2>TOC <!-- omit in toc --></h2>
<ul>
<li><a href="#workflow">Workflow</a>
<ul>
<li><a href="#previewing">Previewing</a></li>
</ul>
</li>
<li><a href="#post-preamble">Post preamble</a></li>
<li><a href="#latex">Latex</a>
<ul>
<li><a href="#theorem-environments">Theorem environments</a>
<ul>
<li><a href="#referencing">Referencing</a></li>
</ul>
</li>
<li><a href="#typesetting-diagrams">Typesetting diagrams</a>
<ul>
<li><a href="#quiver">Quiver</a></li>
<li><a href="#tikz">Tikz</a></li>
<li><a href="#referencing-1">Referencing</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#images">Images</a>
<ul>
<li><a href="#referencing-2">Referencing</a></li>
</ul>
</li>
<li><a href="#code">Code</a></li>
</ul>
<h2>Workflow</h2>
<p>Standard github workflow:</p>
<ul>
<li>Clone this repo</li>
<li>Create a branch</li>
<li>Write your post</li>
<li>Make a PR</li>
<li>Wait for approval</li>
</ul>
<p>The blog will be automatically rebuilt once your PR is merged.</p>
<h3>Previewing</h3>
<p>Since the blog uses Jekyll, you will need to <a href="https://jekyllrb.com/docs/installation/">install it</a> or use the included nix flake devshell (just run <code class="language-plaintext highlighter-rouge">nix develop</code> with flakes-enabled nix installed) to be able to preview your contents. Once the installation is complete, just navigate to the repo folder and give <code class="language-plaintext highlighter-rouge">bundle exec jekyll serve</code>. Jekyll will spawn a local server (usually at <code class="language-plaintext highlighter-rouge">127.0.0.1:4000</code>) that will allow you to see the blog in locale.</p>
<h2>Post preamble</h2>
<p>Posts must be placed in the <code class="language-plaintext highlighter-rouge">_posts</code> folder. Post titles follow the convention <code class="language-plaintext highlighter-rouge">yyyy-mm-dd-title.md</code>. Post assets (such as images) go in the folder <code class="language-plaintext highlighter-rouge">assetsPost</code>, where you should create a folder with the same name of the post.</p>
<p>Each post should start with the following preamble:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">layout</span><span class="pi">:</span> <span class="s">post</span>
<span class="na">title</span><span class="pi">:</span> <span class="s">the title of your post</span>
<span class="na">author</span><span class="pi">:</span> <span class="s">your name</span>
<span class="na">categories</span><span class="pi">:</span> <span class="s">keyword or a list of keywords [keyword1, keyword2, keyword3]</span>
<span class="na">excerpt</span><span class="pi">:</span> <span class="s">A short summary of your post</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">assetsPosts/yourPostFolder/imageToBeUsedAsThumbnails.png This is optional, but useful if e.g. you share the post on Twitter.</span>
<span class="na">usemathjax</span><span class="pi">:</span> <span class="kc">true</span><span class="s"> (omit this line if you don't need to typeset math)</span>
<span class="na">thanks</span><span class="pi">:</span> <span class="s">A short acknowledged message. It will be shown immediately above the content of your post.</span>
<span class="nn">---</span>
</code></pre></div></div>
<p>As for the content of the post, it should be typeset in markdown.</p>
<h2>Latex</h2>
<ul>
<li>Inline math is shown by using <code class="language-plaintext highlighter-rouge">$ ... $</code>. Notice that some expressions such as <code class="language-plaintext highlighter-rouge">a_b</code> typeset correctly, while expressions like <code class="language-plaintext highlighter-rouge">a_{b}</code> or <code class="language-plaintext highlighter-rouge">a_\command</code> sometimes do not. I guess this is because mathjax expects <code class="language-plaintext highlighter-rouge">_</code> to be followed by a literal.</li>
<li>Display math is shown by using <code class="language-plaintext highlighter-rouge">$$ ... $$</code>. The problem above doesn’t show up in this case, but you gotta be careful:
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code> text
$$ ... $$
text
</code></pre></div> </div>
<p>does not typeset correctly, whereas:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code> text
$$
...
$$
text
</code></pre></div> </div>
<p>does. You can also use environments, as in:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> $$
\begin{align*}
...
\end{align*}
$$
</code></pre></div> </div>
</li>
</ul>
<h3>Theorem environments</h3>
<p>We provide the following theorem environments: Definition, Proposition, Lemma, Theorem and Corollary. Numbering is automatic. If you need others, just ask. The way these works is as follows:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% def %}</span>
A *definition* is a blabla, such that: <span class="p">$</span><span class="nb">...</span><span class="p">$</span>. Furthermore, it is:
<span class="p">$$</span><span class="nb">
...
</span><span class="p">$$</span>
<span class="p">{</span><span class="c">% enddef %}</span>
</code></pre></div></div>
<p>This gets rendered as follows:</p>
<div class="definition">
<p>A <em>definition</em> is a blabla, such that: $…$. Furthermore, it is:</p>
\[...\]
</div>
<p>Numbering is automatic. Use the tags:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% def %}</span>
For your definitions
<span class="p">{</span><span class="c">% enddef %}</span>
<span class="p">{</span><span class="c">% not %}</span>
For your notations
<span class="p">{</span><span class="c">% endnot %}</span>
<span class="p">{</span><span class="c">% ex %}</span>
For your examples
<span class="p">{</span><span class="c">% endex %}</span>
<span class="p">{</span><span class="c">% diag %}</span>
For your diagrams
<span class="p">{</span><span class="c">% enddiag %}</span>
<span class="p">{</span><span class="c">% prop %}</span>
For your propositions
<span class="p">{</span><span class="c">% endprop %}</span>
<span class="p">{</span><span class="c">% lem %}</span>
For your lemmas
<span class="p">{</span><span class="c">% endlem %}</span>
<span class="p">{</span><span class="c">% thm %}</span>
For your theorems
<span class="p">{</span><span class="c">% endthm %}</span>
<span class="p">{</span><span class="c">% cor %}</span>
For your corollaries
<span class="p">{</span><span class="c">% endcor %}</span>
</code></pre></div></div>
<h4>Referencing</h4>
<p>If you need to reference results just append a <code class="language-plaintext highlighter-rouge">{"id":"your_reference_tag"}</code> after the tag, where <code class="language-plaintext highlighter-rouge">your_reference_tag</code> is the same as a LaTex label. Fore example:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% def {"id":"your_reference_tag"} %}</span>
A *definition* is a blabla, such that: <span class="p">$</span><span class="nb">...</span><span class="p">$</span>. Furthermore, it is:
<span class="p">$$</span><span class="nb">
...
</span><span class="p">$$</span>
<span class="p">{</span><span class="c">% enddef %}</span>
</code></pre></div></div>
<p>Then you can reference this by doing:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>As we remarked in <span class="p">[</span><span class="nv">Reference description</span><span class="p">](</span><span class="sx">#your_reference_tag</span><span class="p">)</span>, we are awesome...
</code></pre></div></div>
<h3>Typesetting diagrams</h3>
<p>We support two types of diagrams: quiver and TikZ.</p>
<h4>Quiver</h4>
<p>You can render <a href="https://q.uiver.app/">quiver</a> diagrams by enclosing quiver expoted iframes between <code class="language-plaintext highlighter-rouge">quiver</code> tags:</p>
<ul>
<li>On <a href="https://q.uiver.app/">quiver</a>, click on <code class="language-plaintext highlighter-rouge">Export: Embed code</code></li>
<li>Copy the code</li>
<li>In the blog, put it between delimiters as follows:</li>
</ul>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
{% quiver %}
<span class="c"><!-- https://q.uiver.app/codecodecode--></span>
<span class="nt"><iframe</span> <span class="na">codecodecode</span><span class="nt">></iframe></span>
{% endquiver %}
</code></pre></div></div>
<p>They get rendered as follows:</p>
<div class="quiver">
<!-- https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0= -->
<iframe class="quiver-embed" src="https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0=&embed" width="432" height="432" style="border-radius: 8px; border: none;"></iframe>
</div>
<p><strong>Should the picture come out cropped, select <code class="language-plaintext highlighter-rouge">fixed size</code> when exporting the quiver diagram, and choose some suitable parameters.</strong></p>
<h4>Tikz</h4>
<p>You can render tikz diagrams by enclosing tikz code between <code class="language-plaintext highlighter-rouge">tikz</code> tags, as follows:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% tikz %}</span>
<span class="nt">\begin{tikzpicture}</span>
<span class="k">\draw</span> (0,0) circle (1in);
<span class="nt">\end{tikzpicture}</span>
<span class="p">{</span><span class="c">% endtikz %}</span>
</code></pre></div></div>
<p>Tikz renders as follows:</p>
<div class="tikz"><script type="text/tikz">
\rotatebox{0}{
\scalebox{1}{
\begin{tikzpicture}
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=left:{$1$}] (al1) at (-2,0) {};
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=right:{$1$}] (ar1) at (0,0) {};
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=right:{$2$}] (ar2) at (0,-1) {};
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=right:{$3$}] (ar3) at (0,-2) {};
\draw[thick] (al1) to (ar1);
\draw[thick, out=180, in=180, looseness=2] (ar2) to (ar3);
\end{tikzpicture}
}
}
</script></div>
<p>Notice that at the moment tikz rendering:</p>
<ul>
<li>Supports any option you put after <code class="language-plaintext highlighter-rouge">\begin{document}</code> in a <code class="language-plaintext highlighter-rouge">.tex</code> file. So you can use this to include any stuff you’d typeset with LaTex (but we STRONGLY advise against it).</li>
<li>Does not support usage of anything that should go in the LaTex preamble, that is, before <code class="language-plaintext highlighter-rouge">\begin{document}</code>. This includes exernal tikz libraries such as <code class="language-plaintext highlighter-rouge">calc</code>, <code class="language-plaintext highlighter-rouge">arrows</code>, etc; and packages such as <code class="language-plaintext highlighter-rouge">tikz-cd</code>. Should you need <code class="language-plaintext highlighter-rouge">tikz-cd</code>, use quiver as explained above. If you need fancier stuff, you’ll have to render the tikz diagrams by yourself and import them as images (see below).</li>
</ul>
<h4>Referencing</h4>
<p>Referencing works also for the quiver and tikz tags, as in:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% tikz {"id":"your_reference_tag"} %}</span>
...
<span class="p">{</span><span class="c">% endtikz %}</span>
</code></pre></div></div>
<p>This automatically creates a numbered ‘Figure’ caption under the figure, as in:</p>
<div class="quiverCaption" id="example"><div class="quiver">
<!-- https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0= -->
<iframe class="quiver-embed" src="https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0=&embed" width="432" height="432" style="border-radius: 8px; border: none;"></iframe>
</div></div>
<p>Whenever possible, we encourage you to enclose diagrams into definitions/propositions/etc should you need to reference them.</p>
<h2>Images</h2>
<p>Images are included via standard markdown syntax:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">![</span><span class="nv">image description</span><span class="p">](</span><span class="sx">image_path</span><span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">image_path</code> can be a remote link. Should you need to upload images to this blog post, do as follows:</p>
<ul>
<li>Create a folder in <code class="language-plaintext highlighter-rouge">assetsPosts</code> with the same title of the blog post file. So if the blogpost file is <code class="language-plaintext highlighter-rouge">yyyy-mm-dd-title.md</code>, create the folder <code class="language-plaintext highlighter-rouge">assetsPosts/yyyy-mm-dd-title</code></li>
<li>Place your images there</li>
<li>Reference the images by doing:
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code> !<span class="p">[</span><span class="nv">image description</span><span class="p">](</span><span class="sx">../assetsPosts/yyyy-mm-dd-title/image</span><span class="p">)</span>
</code></pre></div> </div>
</li>
</ul>
<p>Whenever possible, we recommend the images to be in the format <code class="language-plaintext highlighter-rouge">.png</code>, and to be <code class="language-plaintext highlighter-rouge">800</code> pixels in width, with <strong>transparent</strong> backround. Ideally, these should be easily readable on the light gray background of the blog website. You can strive from these guidelines if you have no alternative, but our definition and your definition of ‘I had no alternative’ may be different, and <em>we may complain</em>.</p>
<h4>Referencing</h4>
<p>Referencing works exactly as for diagrams:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% figure {"id":"your_reference_tag"} %}</span>
![image description](image<span class="p">_</span>path)
<span class="p">{</span><span class="c">% endfigure %}</span>
</code></pre></div></div>
<h2>Code</h2>
<p>CyberCat blog offers support for code snippets:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">print_hi</span><span class="p">(</span><span class="nb">name</span><span class="p">)</span>
<span class="nb">puts</span> <span class="s2">"Hi, </span><span class="si">#{</span><span class="nb">name</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="n">print_hi</span><span class="p">(</span><span class="s1">'Tom'</span><span class="p">)</span>
<span class="c1">#=> prints 'Hi, Tom' to STDOUT.</span>
</code></pre></div></div>
<p>To include a code snippet, just give:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">language the snippet is written in
</span><span class="sb">your code</span>
<span class="p">```</span>
</code></pre></div></div>
<p>Check out the <a href="https://jekyllrb.com/docs/home">Jekyll docs</a> for more info on how to get the most out of Jekyll. File all bugs/feature requests at <a href="https://github.com/jekyll/jekyll">Jekyll’s GitHub repo</a>. If you have questions, you can ask them on <a href="https://talk.jekyllrb.com/">Jekyll Talk</a>.</p>Fabrizio GenoveseThis is a short summary of the post. It is meant to explain how to write for our blog.A Software Engine For Game Theoretic Modelling - Part 22022-06-24T00:00:00+00:002022-06-24T00:00:00+00:00https://cybercat-institute.github.io//2022/06/24/a-software-engine-for-game-theoretic-modelling-part-2<h2>Introduction</h2>
<p>Some time ago, in a <a href="https://statebox.org/blog/compositional-game-engine/">previous blog post</a>, we introduced our software engine for game theoretic modelling. In this post, we expand more on how to apply the engine to use cases relevant for the Ethereum ecosystem. We will consider an analysis of a simplified staking protocol. Our focus will be on compositionality – what this means from the perspective of representing protocols and from the perspective of analyzing protocols.</p>
<p>We end with an outlook on the further development of the engine, what its current limitations are and how we work on overcoming them.</p>
<p>The codebase of the example discussed can be found <a href="https://github.com/20squares/block-validation">here</a>. If you have never seen the engine before, we advise you to go back to our earlier post. Also note that there exists a basic <a href="https://github.com/philipp-zahn/open-games-engine/blob/master/Tutorial/TUTORIAL.md">tutorial</a> that explains how the engine works. Lastly, here is a recent <a href="https://www.youtube.com/watch?v=fucygCyCyo8">presentation</a> Philipp gave at the <a href="https://ef-events.notion.site/ETHconomics-Devconnect-676d73f791684e18bfae35bbc9e1fa90">Ethconomics workshop at DevConnect Amsterdam</a>.</p>
<h2>Preliminaries</h2>
<p>Consider a simplified model of a staking protocol. The staking protocol is motivated by <a href="https://ethereum.org/en/developers/docs/consensus-mechanisms/pos/">Ethereum proof of stake</a>. The model we introduce is relevant as, even though simple, it shines a light on how a previous version of the staking protocol was subject to reorg attacks as discussed in this <a href="https://arxiv.org/abs/2110.10086">paper</a>. We thank Barnabé Monnot for pointing us to the problem in the first place and helping us with the specification and modelling.</p>
<p>In what follows, we give a short verbal summary of the protocol.</p>
<p>To begin with, we model a chain as a (compositional) relation. The chain contains blocks with unique identifiers as well as voting weights. The weights correspond to votes by validators on the specific blocks contained in the chain. Here is an example of such a chain in the case of two validators:</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/chain.png" alt="Example chain for two validators" /></p>
<p>The staking protocol consists of episodes. Within each episode, which lasts for several time steps, a <em>proposer</em> decides to extend the chain by a further block. The proposer can decide to extend or not to extend it. If the proposer extends the chain, he chooses on which block to build. Consider the following example when the proposer extends the above chain:</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/extendedchain.png" alt="Example chain for two validators and a new block proposed" /></p>
<p>The new block he generates will have initially no votes attesting to this block being the legitimate successor. This assessment is conducted by two validators.</p>
<p>These two validators observe the last stage of the chain before their episode starts and they observe a possible change to the chain made by the proposer within their episode. The validators can then vote on the block which they view as the legitimate successor. Here is the continued example from above:</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/extendedandvoted.png" alt="Example chain for two validators, new block proposed, and voted on" /></p>
<p>Both the proposer’s as well as the validators’ choices will be evaluated in the next episode. If the decisions they made, i.e. the building on a specific block by the proposer as well as the voting by the validators, is on the path to the longest weighted chain, they will receive a reward.</p>
<p>From a modelling perspective, this is an important feature. The agents’ remuneration in episode $t$ will be determined in episode $(t+1)$. We will come back to this feature.</p>
<p>So far, the setup seems simple enough. However, the picture is complicated by possible network issues. Messages may be delayed. For instance, the two validators might not observe a message by the proposer in their episode simply due to the network being partitioned.</p>
<p>Hence, in this specific case, the validators cannot be sure when a message does <em>not</em> reach them, that the message was actually not sent, or that they just have not received it yet.</p>
<p>Real world network issues like delay complicate the incentives. They also open avenues for malicious agents. Modelling the arising incentive problems in game-theoretic terms is a formidable challenge as the timing of moves and information is itself affected by the moves of players. For instance, in the reorg attack mentioned in the beginning, a malicious proposer might want to wait with sending information until the next episode has started. In that way he might draw validators away from the honest proposer of that episode and instead have them vote on his block that he created late.</p>
<p>The practical modelling of such interactions is not obvious (and in fact motivated a new research project on our end). Here, we dramatically simplify the problem. We get rid of time completely. Instead, we leverage a key feature of our approach: Games are defined as open systems —- open to their environment and waiting for information.</p>
<p>Through the environment we can feed in specific information we want. Concretely, we can expose the proposer and validators in a given episode exactly to the kind of reorg scenario mentioned above: Proposer and validators are facing differing information regarding the state of the chain.</p>
<p>Besides simplifying the model, proceeding in this way has a further advantage. The analysis of optimal moves is static and only relative to the context. It thereby becomes much simpler.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
<h2>Representing the protocol as a compositional game</h2>
<p>In order to construct a game-theoretic model of the protocol, we will build up the protocol from the bottom up using building blocks.</p>
<h3>Building blocks</h3>
<p>We begin with the boring but necessary parts that describe the mechanics of the protocol. These components are mostly functions lifted into games as computations. In order not to introduce too much clutter in this post, we focus on the open games representations and hide details of the auxiliary function implementations. These functions are straightforward and is should be hopefully clear from the context what they do.</p>
<h4>Auxiliary components</h4>
<p>Given a chain, <code class="language-plaintext highlighter-rouge">determineHeadOfChain</code> produces the head of the current chain:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">determineHeadOfChain</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chain ;
feedback : ;
:-----:
inputs : chain ;
feedback : ;
operation : forwardFunction $ determineHead ;
outputs : head ;
returns : ;
:-----:
outputs : head ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>Given the old chain from $(t-1)$ and the head of the chain from $(t-2)$, <code class="language-plaintext highlighter-rouge">oldProposerAddedBlock</code> determines whether the proposer actually did send a new block in $(t-1)$. It also outputs the head of the chain for period $(t-1)$ - as this is needed in the next period.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">oldProposerAddedBlock</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chainOld, headOfChainIdT2 ;
feedback : ;
:-----:
inputs : chainOld, headOfChainIdT2 ;
feedback : ;
operation : forwardFunction $ uncurry wasBlockSent ;
outputs : correctSent, headOfChainIdT1 ;
returns : ;
:-----:
outputs : correctSent, headOfChainIdT1 ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>Given the decision by the proposer to either wait or to send a head, <code class="language-plaintext highlighter-rouge">addBlock</code> creates a new chain. Which means either the old chain is copied as before or the chain is actually appended by a new block.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">addBlock</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chainOld, chosenIdOrWait ;
feedback : ;
:-----:
inputs : chainOld, chosenIdOrWait ;
feedback : ;
operation : forwardFunction $
uncurry addToChainWait ;
outputs : chainNew ;
returns : ;
:-----:
outputs : chainNew ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>The following diagram summarizes the information flow in these building blocks.</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/auxiliary.png" alt="Information flow in the building blocks" /></p>
<h4>Decisions</h4>
<p>Given the old chain from $(t-1)$, proposer decides to append the block to a node or not to append. Conditional on that decision, a new chain is created.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">proposer</span> <span class="n">name</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chainOld;
feedback : ;
:-----:
inputs : chainOld ;
feedback : ;
operation : dependentDecision name
alternativesProposer;
outputs : decisionProposer ;
returns : 0;
inputs : chainOld, decisionProposer ;
feedback : ;
operation : addBlock ;
outputs : chainNew;
returns : ;
//
:-----:
outputs : chainNew ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>Given a new chain proposed and the old chain from $(t-1)$, validator then decides which node to attest as the head.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">validator</span> <span class="n">name</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chainNew,chainOld ;
feedback : ;
:-----:
inputs : chainNew,chainOld ;
feedback : ;
operation : dependentDecision name
(\(chainNew, chainOld) ->
[1, vertexCount chainNew]) ;
outputs : attestedIndex ;
returns : 0 ;
// ^ NOTE the payoff for the validator comes from the next period
:-----:
outputs : attestedIndex ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>This open game is parameterized by a specific player (<code class="language-plaintext highlighter-rouge">name</code>). The information flow of the decision open games are depicted in in the next diagram:</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/decisions.png" alt="Add block flow" /></p>
<h4>Payoffs</h4>
<p>The central aspect of the protocol is how the payoffs of the different players are determined. For both proposers and validators we split the payoff components into two parts. First, we create open games which are mere accounting devices, i.e. they just update a player’s payoff.</p>
<p><code class="language-plaintext highlighter-rouge">updatePayoffValidator</code>:</p>
<ol>
<li>determines the value that an validator should receive conditional on his action being assessed as correct and</li>
<li>updates the value for a specific validator. This open game is parameterized by a specific player (<code class="language-plaintext highlighter-rouge">name</code>).</li>
</ol>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">updatePayoffValidator</span> <span class="n">name</span> <span class="n">fee</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : bool ;
feedback : ;
:-----:
inputs : bool ;
feedback : ;
operation : forwardFunction $ validatorPayoff fee ;
outputs : value ;
returns : ;
// ^ Determines the value
inputs : value ;
feedback : ;
operation : addPayoffs name ;
outputs : ;
returns : ;
:-----:
outputs : ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">updatePayoffProposer</code> works analogously to the validators’. First, determine the value the proposer should receive depending on his action. Second, do the book-keeping and add the payoff to <code class="language-plaintext highlighter-rouge">name</code>’s account.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">updatePayoffProposer</span> <span class="n">name</span> <span class="n">reward</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : bool ;
feedback : ;
:-----:
inputs : bool ;
feedback : ;
operation : forwardFunction $ proposerPayoff reward;
outputs : value ;
returns : ;
// ^ Determines the value
inputs : value ;
feedback : ;
operation : addPayoffs name ;
outputs : ;
returns : ;
:-----:
outputs : ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">proposerPayment</code> embeds <code class="language-plaintext highlighter-rouge">updatePayoffProposer</code> into a larger game where the first stage includes a function, <code class="language-plaintext highlighter-rouge">proposedCorrect</code>, lifted into the open game. That function does what its name suggests, given the latest chain and a Boolean value whether the proposer actually added a block, it determines whether the proposer proposed correctly - according to the protocol.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">proposerPayment</span> <span class="n">name</span> <span class="n">reward</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : blockAddedInT1, chainNew ;
feedback : ;
:-----:
inputs : blockAddedInT1, chainNew ;
feedback : ;
operation : forwardFunction $ uncurry
proposedCorrect ;
outputs : correctSent ;
returns : ;
// ^ This determines whether the proposer was
correct in period (t-1)
inputs : correctSent ;
feedback : ;
operation : updatePayoffProposer name reward;
outputs : ;
returns : ;
// ^ Updates the payoff of the proposer given
decision in period (t-1)
:-----:
outputs : ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>This last game already show-cases a pattern that we will see from now on repeatedly. Using the primitive components, we go on to build up larger games. All the needed “molding” parts have been put on the table. All what follows will be about composing those elements.</p>
<p>Let us consider another example for composition.</p>
<p><code class="language-plaintext highlighter-rouge">validatorsPayment</code> groups the payment for validators included, here two, into one game.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">validatorsPayment</span> <span class="n">name1</span> <span class="n">name2</span> <span class="n">fee</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : validatorHashMap, chainNew, headId;
feedback : ;
:-----:
inputs : validatorHashMap, chainNew, headId ;
feedback : ;
operation : forwardFunction $ uncurry3 $
attestedCorrect name1 ;
outputs : correctAttested1 ;
returns : ;
// ^ This determines whether validator 1 was
correct in period (t-1) using the latest
hash and the old information
inputs : validatorHashMap, chainNew, headId ;
feedback : ;
operation : forwardFunction $ uncurry3 $
attestedCorrect name2 ;
outputs : correctAttested2 ;
returns : ;
// ^ This determines whether validator 2 was
correct in period (t-1)
inputs : correctAttested1 ;
feedback : ;
operation : updatePayoffValidator name1 fee ;
outputs : ;
returns : ;
// ^ Updates the payoff of validator 1 given
decision in period (t-1)
inputs : correctAttested2 ;
feedback : ;
operation : updatePayoffValidator name2 fee ;
outputs : ;
returns : ;
// ^ Updates the payoff of validator 2 given
decision in period (t-1)
:-----:
outputs : ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>This concludes the blocks for generating payments. The information flow of these components is depicted in the following diagram:</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/payments.png" alt="Grouping of validators" /></p>
<p><code class="language-plaintext highlighter-rouge">validatorsGroupDecision</code>` groups all validators’ decisions considered into one game. The output of this game is a map (in the programming sense) connecting the name of the validator with her/his decision.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">validatorsGroupDecision</span> <span class="n">name1</span> <span class="n">name2</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chainNew,chainOld, validatorsHashMapOld ;
feedback : ;
:-----:
inputs : chainNew, chainOld ;
feedback : ;
operation : validator name1 ;
outputs : attested1 ;
returns : ;
// ^ Validator1 makes a decision
inputs : chainNew, chainOld ;
feedback : ;
operation : validator name2 ;
outputs : attested2 ;
returns : ;
// ^ Validator2 makes a decision
inputs : [(name1,attested1),(name2,attested2)],
validatorsHashMapOld ;
feedback : ;
operation : forwardFunction $ uncurry
newValidatorMap ;
outputs : validatorHashMap ;
returns : ;
// ^ Creates a map of which validator voted for
which index
inputs : chainNew, [attested1,attested2] ;
feedback : ;
operation : forwardFunction $ uncurry updateVotes ;
outputs : chainNewUpdated;
returns : ;
// ^ Updates the chain with the relevant votes
:-----:
outputs : validatorHashMap, chainNewUpdated;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>The group of validators together is not really anything exciting but it serves to illustrate a general point. The nesting of smaller games into larger games is mostly about establishing clear interfaces. As long as we do not change the interfaces, we can change the internal behavior. When we build our model in several steps and refine it over time, this is very helpful. Here, for instance, the payment for an individual validator might change. But such a change is only required in one place - assuming the interaction with the outside world does not change - and will not affect the wider construction of the game. In other words, it reduces efforts in rewriting games.</p>
<p>Similarly, we chose the output type of the grouped validators with the intention that it would be easy to add more validators while keeping the interface, the mapping of validators to their decisions, intact.</p>
<p>The next diagram illustrates the composition of components and the information flow.</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/groupdecision.png" alt="Information flow in the validators' group decision" /></p>
<h3>Integrating the components towards one episode</h3>
<p>Having assembled all the necessary components, we can now turn to a model of an episode of the complete protocol.</p>
<p>Given the previous chain $(t-1)$, the block which was the head of the chain in $(t-2)$, and the voting decisions of the previous validators, this game puts all the decisions together.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">oneEpisode</span> <span class="n">p0</span> <span class="n">p1</span> <span class="n">a10</span> <span class="n">a20</span> <span class="n">a11</span> <span class="n">a21</span> <span class="n">reward</span> <span class="n">fee</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chainOld, headOfChainIdT2,
validatorsHashMapOld ;
// ^ chainOld is the old hash
feedback : ;
:-----:
inputs : chainOld ;
feedback : ;
operation : proposer p1 ;
outputs : chainNew ;
returns : ;
// ^ Proposer makes a decision, a new hash is
proposed
inputs : chainNew,chainOld, validatorsHashMapOld;
feedback : ;
operation : validatorsGroupDecision a11 a21 ;
outputs : validatorHashMapNew, chainNewUpdated ;
returns : ;
// ^ Validators make a decision
inputs : chainNewUpdated ;
feedback : ;
operation : determineHeadOfChain ;
outputs : headOfChainId ;
returns : ;
// ^ Determines the head of the chain
inputs : validatorsHashMapOld, chainNewUpdated,
headOfChainId ;
feedback : ;
operation : validatorsPayment a10 a20 fee ;
outputs : ;
returns : ;
// ^ Determines whether validators from period (t-1)
were correct and get rewarded
inputs : chainOld, headOfChainIdT2 ;
feedback : ;
operation : oldProposerAddedBlock ;
outputs : blockAddedInT1, headOfChainIdT1;
returns : ;
// ^ This determines whether the proposer from
period (t-1) did actually add a block or not
inputs : blockAddedInT1, chainNewUpdated ;
feedback : ;
operation : proposerPayment p0 reward ;
outputs : ;
returns : ;
// ^ This determines whether the proposer from
period (t-1) was correct and triggers payments
accordingly
:-----:
outputs : chainNewUpdated, headOfChainIdT1,
validatorHashMapNew ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>For clarity, the diagram below illustrates the interacting of the different components and their information flow.</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/oneepisode.png" alt="Information flow" /></p>
<p>One important thing to note is that this game representation has no inherent dynamics. This is due to a general principle behind the theory of open games as it does not have the notion of time.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>This is a limitation in the sense that we cannot see the dynamics unfold. It also has advantages though: The incentive analysis has no side-effects; in functional programming terms, it acts like a pure function and is fully referential relative to some state of the game.</p>
<h3>More models from here on</h3>
<p>Once we have represented the one-episode model, we have choices. We can directly work with that model. And we will do that in the next section. But we can also construct “larger models”: Either by manually combining several episodes into a new multi-episode model or by embedding the single episode into a Markov game structure.</p>
<p>We do not cover the construction proper or the analysis of the Markov game in this post. But the idea is simple: The stage game is a state in a Markov game where the state is fully captured by the inputs to the stage game. A Markov strategy then determines the move in the stage game. This, in turn, allows to derive the next state of the Markov game. To analyze such a game we can approximate the future payoff from the point of view of a single player under the assumption that the other players keep playing their strategy. In that way we can also assess unilateral deviations for the player in focus.</p>
<h2>Analysis</h2>
<p>After having established a model of the protocol, let us turn to its analysis. The whole point is of course not to represent the games but to learn something about the incentives of the agents involved.</p>
<p>It is important to note, here, that the model we arrived at above is just <em>one</em> possible way to represent the situation. Obviously, the engine cannot guarantee that you end up with a useful model. But what it should guarantee is that you can adapt the model quickly and iterate through a range of models. <em>The</em> “one true model” rarely exists. Instead being able to adapt and consider many different scenarios is the default.</p>
<p>We will illustrate two analyses showing that following the protocol’s intention, namely agents who follow the protocol truthfully, end up in an equilibrium. But we also see that in the current form, there are problems if a proposer chooses to delay his message strategically. Thus, this makes the protocol susceptible to attacks.</p>
<h3>Honest behavior</h3>
<p>We will first illustrate that the protocol works as intended if all agents involved are honest. They all observe the current head of the chain. Proposers then build a new block on top of that head; validators validate that head. The analysis can be found in <code class="language-plaintext highlighter-rouge">HonestBehavior.hs</code>.</p>
<p>We use the <code class="language-plaintext highlighter-rouge">oneEpisode</code> model. That is, we are slicing the protocol into one period and supply the initial information at which that round begins and a continuation for how the game continues in the next round. Recall that the rewards for proposer and validators in period $t$ is determined in period $(t+1)$. This information, the initialization and the continuation are fed in through <code class="language-plaintext highlighter-rouge">initialContextLinear</code> where <em>linear</em> signifies that we consider a non-forked chain.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">initialContextLinear</span> <span class="n">p</span> <span class="n">a1</span> <span class="n">a2</span> <span class="n">reward</span> <span class="n">successFee</span> <span class="o">=</span>
<span class="kt">StochasticStatefulContext</span>
<span class="p">(</span><span class="n">pure</span> <span class="p">(</span><span class="nb">()</span><span class="p">,(</span><span class="n">initialChainLinear</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">initialMap</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="n">x</span> <span class="o">-></span> <span class="n">feedPayoffs</span> <span class="n">p</span> <span class="n">a1</span> <span class="n">a2</span> <span class="n">reward</span> <span class="n">successFee</span> <span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<p>This expression looks more complicated than it actually is. The first part, <code class="language-plaintext highlighter-rouge">(pure ()),(initialChainLinear, 3, initialMap)))</code>, determines the starting conditions of the situation we consider. That is, we provide the input parameters which <code class="language-plaintext highlighter-rouge">oneEpisode</code> expects from us. Among other things, this contains the initial chain we start with. Here, replicated from above as a reminder:</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/chain.png" alt="Example chain for two validators" /></p>
<p>The second part, <code class="language-plaintext highlighter-rouge">(\_ x -> feedPayoffs p a1 a2 reward successFee x)</code> describes a function which computes the payoff from the current action in the next period. Details of how this payoff is determined can be found under the implementation of <code class="language-plaintext highlighter-rouge">feedPayoffs</code>.</p>
<p>Again, the way we approach this problem is by exploiting the key feature of open games: the one-episode model is like a pipe expecting some inflows and outflows. Once we have them defined, we can analyze what is going on inside of that “pipe”.</p>
<p>The last element needed for our analysis are the strategies. We define “honest” strategies for both proposer and validators: <code class="language-plaintext highlighter-rouge">strategyProposer</code> and <code class="language-plaintext highlighter-rouge">strategyValidator</code>.</p>
<p>Both type of agents observe previous information, for instance the past chain, then build on the head of the chain (proposer) or attest the head of the chain (validators).</p>
<p>Note that we include a condition in the strategies that deals with the scenario when there is not a unique head of the chain. In the analysis we focus on here, when everyone behaves honest, we will never reach this case. However, once not all agents are honest, there might be scenarios where the head is not unique. This will be important in the second case we analyze below.</p>
<p>Once we have defined the strategies, there is only one thing left to do: Initialize the game with some parameters, specifically rewards and fees for proposer, respectively validators.</p>
<p>In the file <code class="language-plaintext highlighter-rouge">HonestBehavior.hs</code> you can find one such parameterization, <code class="language-plaintext highlighter-rouge">analyzeScenario</code>:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">analyzeScenario</span> <span class="o">=</span> <span class="n">eqOneEpisodeGame</span> <span class="s">"p0"</span> <span class="s">"p1"</span> <span class="s">"a10"</span> <span class="s">"a20"</span> <span class="s">"a11"</span> <span class="s">"a21"</span> <span class="mi">2</span> <span class="mi">2</span> <span class="n">strategyOneEpisode</span> <span class="p">(</span><span class="n">initialContextLinear</span> <span class="s">"p1"</span> <span class="s">"a11"</span> <span class="s">"a21"</span> <span class="mi">2</span> <span class="mi">2</span><span class="p">)</span>
</code></pre></div></div>
<p>This game employs the honest strategies. If we query it, we see that the proposer as well as the validators have no incentive to deviate. These strategies form an equilibrium - as intended in the design of the protocol.</p>
<h3>Identifying attacks - zooming in</h3>
<p>Let us turn to a second analysis. This analysis can be found in <code class="language-plaintext highlighter-rouge">Attacker.hs</code>.</p>
<p>In that episode we continue to consider the behavior of honest agents. However, these agents will start out on a chain that we assume has been intentionally delayed by the proposer in the episode before. This is achieved by adding an additional input to <code class="language-plaintext highlighter-rouge">oneEpisodeAttack</code>, <code class="language-plaintext highlighter-rouge">chainManipulated</code>, which is otherwise equivalent to <code class="language-plaintext highlighter-rouge">oneEpisode</code>: we as analysts can manipulate the chain that the proposer sees and that the validators see.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">oneEpisodeAttack</span> <span class="n">p0</span> <span class="n">p1</span> <span class="n">a10</span> <span class="n">a20</span> <span class="n">a11</span> <span class="n">a21</span> <span class="n">reward</span> <span class="n">fee</span> <span class="o">=</span> <span class="o">[</span><span class="n">opengame</span><span class="o">|</span>
inputs : chainOld, headOfChainIdT2,
validatorsHashMapOld, chainManipulated ;
// ^ chainOld is the old hash
feedback : ;
:-----:
inputs : chainOld ;
feedback : ;
operation : proposer p1 ;
outputs : chainNew ;
returns : ;
// ^ Proposer makes a decision, a new hash is
proposed
inputs : chainNew, chainManipulated ;
feedback : ;
operation : mergeChain ;
outputs : mergedChain ;
returns : ;
// ^ Merges the two chains into a new chain for the
validators
inputs : mergedChain,chainOld,
validatorsHashMapOld;
feedback : ;
operation : validatorsGroupDecision a11 a21 ;
outputs : validatorHashMapNew, chainNewUpdated ;
returns : ;
// ^ Validators make a decision
inputs : chainNewUpdated ;
feedback : ;
operation : determineHeadOfChain ;
outputs : headOfChainId ;
returns : ;
// ^ Determines the head of the chain
inputs : validatorsHashMapOld, chainNewUpdated,
headOfChainId ;
feedback : ;
operation : validatorsPayment a10 a20 fee ;
outputs : ;
returns : ;
// ^ Determines whether validators from period (t-1)
were correct and get rewarded
inputs : chainOld, headOfChainIdT2 ;
feedback : ;
operation : oldProposerAddedBlock ;
outputs : blockAddedInT1, headOfChainIdT1;
returns : ;
// ^ This determines whether the proposer from
period (t-1) did actually add a block or not
inputs : blockAddedInT1, chainNewUpdated ;
feedback : ;
operation : proposerPayment p0 reward ;
outputs : ;
returns : ;
// ^ This determines whether the proposer from
period (t-1) was correct and triggers payments
accordingly
:-----:
outputs : chainNewUpdated, headOfChainIdT1,
validatorHashMapNew ;
returns : ;
<span class="o">|]</span>
</code></pre></div></div>
<p>This simulates the situation where the malicious proposer from the episode before sends a block after the honest proposer from this episode has added his own block. As a result there are now two nodes in the chain with 0 votes on it. In other words, there are two contenders for the head of the chain. The chain at point in time looks like this:</p>
<p><img src="/assetsPosts/2022-06-24-a-software-engine-for-game-theoretic-modelling-part-2/attackchain.png" alt="Forked chain for two validators" /></p>
<p>The next steps are analogous to the analysis before, we define inputs and how the game continues. Lastly, we need to define strategies.</p>
<p>We consider two strategies by the validators adapted to the specific scenario: Either they vote with the honest proposer, i.e. vote for node 4 (<code class="language-plaintext highlighter-rouge">strategyValidator4</code>), or they vote with the attacker’s block, i.e. vote for node 5 (<code class="language-plaintext highlighter-rouge">strategyValidator5</code>). We assume the proposer behaves honest as before.</p>
<p>If we ran the equilibrium check on these two scenarios, <code class="language-plaintext highlighter-rouge">analyzeScenario4</code> and <code class="language-plaintext highlighter-rouge">analyzeScenario5</code>, we see that <em>both</em> constitute an equilibrium. That is, in both cases none of the players has an incentive to deviate. Obviously, the scenario where the validators vote for the malicious proposer is not an equilibrium we want from the design perspective of the protocol.</p>
<p>We can shed further light in what is going on here: So far we assumed that the validators will coordinate on one node, they either both choose node 4 or both choose node 5. The key issue is that they observe two candidate nodes for the new head of the chain. We can also consider the case where the validators randomize when facing a tie (<code class="language-plaintext highlighter-rouge">analyzeScenarioRandom</code>). In that case, we see that the result is a non-equilibrium state. Both validators would profit from voting on another block. The reason is simple: They are not coordinated. In case of randomly drawing one of the heads, there is the possibility that the validators output mutually contradictory information. Which means, they will not be rewarded.</p>
<h2>Outlook</h2>
<p>The development of the engine is ongoing. Protocols which involve a timing choice, as for instance for a proposer waiting to send information and thereby potentially learning something about the validators’ behavior in the meantime, pose a challenge for the current implementation. One should add, they also pose a challenge for classical game representations such as the extensive form. As we have shown, it is still entirely possible to represent such games in the engine. However, such modelling puts the burden on the modeller to make reasonable choices. It would be nice to start with an actual protocol and extract a game-theoretic model out of it. Extending the underlying theory and the engine to better accommodate such scenarios is on the top of our todo list.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This is not the only way to model the protocol in the current implementation. It is also possible to consider a timer explicitly as a state variable. This <a href="https://github.com/20squares/block-validation/tree/timer">branch</a> contains such a model. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>We should be more precise: In the current theory of open games there is always a clear notion of causality - who moves when and what is observed when by whom. The relevant “events” can be organized in a relation. This follows the overall categorical structure in which open games are embedded. We are working on a version of the theory where time - or other underlying structures like networks - are what open games are based on. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Philipp ZahnSome time ago, in a previous blog post, we introduced our software engine for game theoretic modelling. In this post, we expand more on how to apply the engine to use cases relevant for the Ethereum ecosystem. We will consider an analysis of a simplified staking protocol. Our focus will be on compositionality – what this means from the perspective of representing protocols and from the perspective of analyzing protocols.What is Categorical Cybernetics?2022-05-29T00:00:00+00:002022-05-29T00:00:00+00:00https://cybercat-institute.github.io//2022/05/29/what-is-categorical-cybernetics<p><strong>Categorical cybernetics</strong>, or <strong>CyberCat</strong> to its friends, is – no surprise – the application of methods of (applied) category theory to cybernetics. The “<strong>category theory</strong>” part is clear enough, but the term “<strong>cybernetics</strong>” is notoriously fluid, and throughout history has meant more or less whatever the writer wanted it to mean. So, let’s lay down some boundaries.</p>
<p>I first proposed CyberCat, both as a field and as a term, in <a href="https://julesh.com/2019/11/27/categorical-cybernetics-a-manifesto/">this 2019 blog post</a> (for which this one is partly an update). There I fixed a definition that I still like: <strong>cybernetics is the control theory of complex systems</strong>. That is, cybernetics is the interaction of control theory and systems theory.</p>
<p>We add to this <a href="https://www.appliedcategorytheory.org/">applied category theory</a>, which has some generic benefits. Most importantly we have <a href="https://julesh.com/2017/04/22/on-compositionality/">compositionality</a> by default, and a more precise way of talking about it than in fields like machine learning where it is present but informal. Compositionality also gets us half way to computer implementation by default, by making our models similar to programs. Finally category theory gives us a disciplined way to talk about interaction between models in different fields.</p>
<p>It turns out - and this fact is at the heart of CyberCat - that the category-theoretic study of control has a huge amount of overlap with things like <strong>learning</strong> and <strong>strategic analysis</strong>. Those were also historically part of cybernetics, and can be seen as aspects of control theory with a certain amount of squinting, so we also include them.</p>
<p>On top of that definition, a cultural aspect of the historical cybernetics movement that we want to retain is that <strong>cybernetics is inherently interdisciplinary</strong>. Cybernetics is not just the theory but the practice: in engineering, artificial intelligence, economics, ecology, political science, and anywhere else where it might be useful. (Part of the reason we created the Institute – more on that in a future post – is to make this cross-cutting collaboration easier than in a unviersity.)</p>
<p>Cybernetics has been an academic dirty word since many decades now: in the 60s and 70s it went through a hype cycle, things were over-claimed and the field eventually fell apart. As founders of the CyberCat Institute we believe that <strong>the time is right to reclaim the word cybernetics</strong>. Apart from anything else, the word is just too cool to not use. More importantly, the objects of study – and the interdisciplinary approach to studying them – are even more important now than 50 years ago.</p>
<p>Having laid out what CyberCat could potentially be, I will now narrow the scope. At the Institute we are focussing on not just any applications of category theory to cybernetics, but to a small set of very closely interrelated tools. These are, roughly, things that have a family resemblance to <strong>open games</strong>.</p>
<p>This post isn’t the place to go into technical details, but what these things have in common is that they model <strong>bidirectional processes</strong>: they are processes (that is, they have an extent in time) in which some information appears to flow backwards (I described the idea in more detail in <a href="https://julesh.com/2017/09/29/a-first-look-at-open-games/">this post</a>). The best known of these is <strong>backpropagation</strong>, where the backwards pass goes backwards. A key technical idea behind CyberCat is the observation that many other important processes in cybernetics have a lot in common with backprop, once you take the right perspective. The category-theoretic tool used to model these processes is <strong>optics</strong>.</p>
<p>Besides backprop, the things we have put on a uniform mathematical foundation using optics are value iteration, Bayesian inference, filtering, and the unnamed process that is the secret sauce of compositional game theory.</p>
<p>This is the academic foundation that we start from. The question that comes next is, so what? How can this knowledge be exploited to solve actual problems? This is where the CyberCat Institute comes in, but I want to leave that for a future post. In the meantime, you can look at our <a href="/projects">projects page</a> to see the kinds of things we are working on right now.</p>Jules HedgesCategorical cybernetics, or CyberCat to its friends, is – no surprise – the application of methods of (applied) category theory to cybernetics. The "category theory" part is clear enough, but the term "cybernetics" is notoriously fluid, and throughout history has meant more or less whatever the writer wanted it to mean. So, let’s lay down some boundaries.