Jekyll2024-07-15T17:08:44+00:00https://cybercat-institute.github.io//Cybercat InstituteWrite an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.Cybercat InstituteCompositionality and the Mass Customization of Economic Models2024-07-15T00:00:00+00:002024-07-15T00:00:00+00:00https://cybercat-institute.github.io//2024/07/15/usefulness-models<p>I thank Oliver Beige for many helpful comments.</p>
<h1>Fables or algorithms?</h1>
<blockquote>
<p>Economic theory formulates thoughts via what we call “models.” The
word model sounds more scientific than the word fable or tale, but I
think we are talking about the same thing.
<em>(Ariel Rubinstein)</em><sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p>Are economic models useful for making decisions? One might expect that
there is a clear answer to this simple question. But in fact opinions on
the usefulness or non-usefulness of models as well as what exactly makes
models useful vary widely - within the economic profession and of course
even more so beyond. Sometimes the question feels like a Rorschach
test - telling more about the person than about the subject.</p>
<p>In this post, I want to explore the question of usefulness. Even more
so, I want to explore how the usefulness ties into the modelling
process. The reason for doing so is simple: Part of our efforts at
CyberCat is to build software tools to improve and accelerate the
modelling process.</p>
<p>The importance of this is also evident: If models are useful, and we
improve the process generating useful models, we improve our
decision-making. And in so far as these improvements tie into computing
technology, as they do in our opinion, improvements could be
significant.</p>
<h1>Economic models</h1>
<p>My question, "are economic models useful", is quite lofty. So, let's
first do some unpacking.</p>
<p>What do I mean by economic model? A mathematical, formal model which
relates to the domain of decision-making at hand. A prototypical example
is a model that tells us how to bid in an auction. Such models are often
classified as applied economic models.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>Why do I emphasize "economic"? If my question was: Are mathematical
models useful for decision-making, the answer would be a simple yes and
we could call it a day. Operations research models are in production for
a multitude of tasks (job scheduling, inventory management, revenue
management etc.). In fact, many of these models are so pervasive that it
is easy to forget them. Just think about the business models that have
been built on the navigation and prediction functionalities of Google
maps.</p>
<p>The distinction between operations research and economics is obviously
blurry and more due to artificial academic barriers than fundamental
differences (<a href="https://cybercat.institute/2024/05/17/economics-operations-research/">check out Oliver's post on
this</a>).
I am making the crude distinction that economic models are about several
agents interacting - most often strategically - whereas traditional
operations research models are focused on single decision-makers.</p>
<p>Now, this is crude because obviously operations research by now also
includes auctions and other models that are interactive in this way.
Moreover, as <a href="https://econpatterns.substack.com/p/designing-economic-mechanisms-the">Oliver pointed out in another
post</a>
several leading economists who advanced the practical use of economic
models (which we still come to) have an operations research background.</p>
<p>It is, I think, also not a coincidence that operations research has
moved into the realm of interactive agents: Due to globalization and in
particular the internet, companies have become more interconnected and
also have much more technical leverage. 50 years ago, the idea that a
regular company could be designing their own market probably would have
been quite a thing. Today, it is part of the standard startup toolkit.</p>
<p>Technology and interconnectedness are driving the need for models that
help decide in such a world as well as design the frameworks and
protocols in which decisions take place. Economic models are the natural
candidate for this task.</p>
<h1>Useful?</h1>
<p>Let's turn to the central part of my question. What do I mean by
useful? Opinions on this vary widely. According to Rubinstein, the
question how a model can be useful is already ill-posed. Models are not
useful. Models might carry a lesson and can transform our thinking. But
they are of little value for concrete decisions.</p>
<p>In economics, Rubinstein's position is an extreme point. On the other
side of the extreme, economists and even more importantly computer
scientists are working on market design and mechanism design models.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>
Models in this spirit are "very" practical: they do affect decisions
in a concrete sense - they get implemented in the form of algorithms and
are embedded in software systems.</p>
<p>We can think of fables and algorithms as two ends of a spectrum - from
basically irrelevant to decisive for a choice we have to make. While it
is hard to precisely locate a given model on this "usefulness" line,
we can consider how a model can become more useful when moving along the
spectrum. Of course, what constitutes value and who benefits how from a
model changes along this path as well. The usefulness of a model is a
matter of degree and not an absolute.</p>
<p>Let's begin at the fable end and start moving inroads. How can a model
produce value? If we are faced with infinitely many ways to think about
a situation, even a simplistic model can be valuable. It helps to focus
and to select a few key dimensions. This aspect becomes even more
important in an organizational context where people have to collaborate
and it is very easy to get lost in a myriad of possibility and different
interpretations.</p>
<p>Many classic games (in the game theory sense) like the Battle of the
Sexes, Matching Pennies, and of course the Prisoners' Dilemma help to
focus on key issues - for instance the interdependency between actions
and their consequences. To be clear, the connection how to map a model
into a concrete decision is very loose in this case and the value of the
model lies in the eyes of the analyst.</p>
<p>These games often focus on a few actions ("defect" or "cooperate").
Moreover, agents have perfect information about the consequences of
their actions and the actions of others. In many situations, e.g. in
business contexts, choices are more fine-grained and information is not
perfect. Models in Industrial Organization routinely incorporate these
aspects, for instance analyzing competition between companies. From a
practical perspective, these models often resemble the following
pattern: If we had information X, the model would help us make a
decision. Consider strategic pricing: It is standard in these models to
assume demand to be known or at least drawn from a known distribution.
The demand curve will then be typically a smooth, mathematically well
behaved object. Such models can produce insights - no doubt about it.</p>
<p>But they rarely help to make a concrete decision, e.g. what prices to
charge. There are many reasons for this but let me just give an obvious
one as a co-founder of a startup: I would love to maximize a demand
curve and price our services accordingly. But the reality is: I do not
have a curve. Hell, if I am lucky I observe a handful of points
(price-quantity combinations). But these points might not even be on any
actual demand curve in the model's sense. So, while useful for
structuring discussions around pricing, in the actual decision to set
prices, the model is only one (possibly small) input. And this is very
typical. Such models provide insights and do help to inform decisions.
But they are only part of a collage of inputs into a decision.</p>
<p>There are economic models which do play a more dominant role in shaping
decisions. Consider auctions. There is a rich theory that helps to
choose a specific auction format to solve a given allocation problem.
Still, even in this case, there are gaps between the model and the
actual implementation, for instance when it comes to multi-unit
auctions.</p>
<p>The examples I gave are obviously not meant to be exhaustive. There are
other ways how a model can be useful. But this is not so important. The
main point is, that all along the usefulness line, economic models can
produce value. The question is not whether a model produces a choice but
whether, at the margin, it helps us make better decisions. And this can
happen all along the spectrum. Moreover, ceteris paribus, the further we
move along the path towards the algorithm end, the more influence the
economic model gains relative to other inputs into a decision and the
more value it produces.</p>
<p>If we accept this, then an immediate question comes up: How can we push
models from the fable side more towards the algorithm side? Let's
explore this.</p>
<h1>The process of modelling and the library of models</h1>
<p>I first need to discuss how models get located on a specific point on
the usefulness line in the first place. But this requires digging into
the actual modelling process. Note again that I am only interested in
"instrumental" modelling - models that are useful for a specific
decision at hand. My exposition will be simplistic and subjective. I will
neither cover the full range of opinions nor be grounded in any
philosophical discussions of economics. This is just me describing how I
see this (and also how I have used models in my work at
<a href="https://20squares.xyz/">20squares</a>).</p>
<p>Applied models in economics are a mixture of mathematical formalism and
interpretative mapping connecting the internals of the model to the
outside world. Mappings are not exclusive: The same formal structure can
be mapped to different domains. The Prisoner's dilemma is such an
example. It has various interpretations from two prisoners in separate
cells to nuclear powers facing each other.</p>
<p>The formal, inner workings of models are "closed" objects. What do I
mean by that? Each model describes a typically isolated mechanism, e.g.
connecting a specific market design with some desirable properties. The
formal model has no interfaces to the outside world. And therefore it
cannot be connected to other models at the formal level. In that sense a
model is a self-contained story.</p>
<p>Let me contrast this with a completely different domain: If one thinks
about functional programming, then everything is about the composability
of functions (modulo types). The whole point of programming is that one
program (which is a function) can be composed with another program
(which is a function) to produce a new program (which is a
function).<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>
<p>Back to economic models. When it comes to applications, the "right"
model is not god given. So, how does the process of modelling real world
phenomena look like?</p>
<p>As observed by Dani Rodrik<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, the evolution of applied models in
economics is different from the evolution of theories in physics. In
physics one theory regularly supersedes another theory. In economics,
the same rarely happens. The practice of modelling is rather about
developing new models, like new stories, that then get added to the
canon.</p>
<p>One can compare this to a library where each book stands for a model
that has been added at some point. Applied modelling then means mapping
a concrete problem into a model among the existing staple or, if
something is missing, develop a new model and add it to the canon.</p>
<p>Inherent in this process is the positioning of a model on a specific
point in the spectrum between fables and algorithms. Models mostly take
on a fixed position on the line and will stay there. There are exogenous
factors that influence the positioning and that can change over time.
For instance, the domain matters. If you build a model of an
intergallactic trading institution, it is safe to assume that this model
will not be directly useful. Of course, this might change.</p>
<p>Like stories, certain models do get less fashionable over time, others
become prominent for a while, and a select few stay ever-greens.
Economists studying financial crises in 2006 were not really standing in
the spotlight of attention. That changed radically one year later.<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup></p>
<p>Let me emphasize another aspect. I depicted applied models as packages
of internal, formal structure and interpretative maps connecting the
internals with some outside phenomenon. This interpretative mapping is
subjective. And indeed discussions in economic policy often do not focus
on the internal consistency of models but instead are more about the
adequateness of the model's mapping (and its assumptions) for the
question at hand. Ultimately, this discourse is verbal and it is
structurally not that different from deciding which story in the bible
(or piece of literature, or movie) is the best representation of a
specific decision problem.</p>
<p>The more a model will lean towards the fable side, the more it will be
just one piece in a larger puzzle and the more other sources of
information a decision-maker will seek. This might include other
economic models but of course also sources outside. Different models and
other sources of information need to be integrated.</p>
<p>As a consequence, whatever powers we gain through the formal model,
a lot of it is lost the moment we move beyond the model's inner
working and need to compare and select between different models as well
as integrate with other sources. A synthesis at the formal level is not
feasible.</p>
<p>Let me summarize so far: A model's position on the spectrum of fable to
algorithm is mostly given. There is not much we can do to push a single
model along. Moreover, we have no systematic way of synthesizing
different models - which would be another possibility to advance along
the spectrum.</p>
<p>We have been mostly concerned with the type of output the modelling
process generates. Let's also briefly turn to the inputs. Modelling by
and large today is not that different compared to 50 years ago. Sure,
co-authorships have increased, computers are used, and papers circulate
online. But in the end, the modelling process is still a slow,
labor-intensive craft and demands a lot from the modeller. He or she
needs knowledge in the domain, must be familiar with the canon of
models, needs judgment to balance off the tradeoffs involved in
different models, etc.</p>
<p>This makes the modelling process costly. And it means we cannot brute
force our way to push models from fable to algorithm. In fact, in the
context of policy questions many economists like Dani Rodrik<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup>
criticize the fact that discussions focus on a single model whereas a
discussion would be more robust if it could be grounded in a collage of
different models. But generating an adequate model is just very
costly.<sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup></p>
<p>Taken together, the nature of the model generating process as well as
its cost function, are bottlenecks that we need to overcome if we want
to transform the modelling process.</p>
<p>Let's go back to our (functional) programming domain to see an
alternative paradigm. Here, we are also relying on libraries. But the
process of using them is markedly different. Sure, one can just simply
choose programs from a library an apply it. But one can also compose
models and form new, more powerful programs. One can synthesize
different programs; and one can find better abstractions through the
patterns of multiple programs which do similar things. Lastly, one can
refine a program by adding details. And of course, if you consider
statistical modelling, this modularity is already present in many
software packages.</p>
<p>It is modularity which gives computing scalability. And it is this
missing modularity which severely limits the scalability of economic
modelling.</p>
<p>Consider the startup pricing example I gave before. Say, I thought about
using a pricing model to compute prices but I am lacking the demand
information. What am I supposed to do? Right now, I am most likely
forced to abandon the model altogether and choose a different framework
instead.</p>
<p>What I would like to do instead is to have my model in a modular shape
so that I could add a "demand" module and combine it with my pricing
optimization - maybe a sampling procedure or even just a heuristic. The
feature I want is that I have a coherent path from low to higher
resolution.</p>
<p>The goal behind our research and engineering efforts is to lift economic
modelling to this paradigm. Yet, we do not just want to compose software
packages. We want an actual composition of economic models AND the
software built on top.</p>
<h1>How to get there? Compositionality!</h1>
<p>Say, we want to turn the manual modelling process, which mostly relies
on craft, experience and judgement, into a software engineering process.
But not only that. We are aiming for a framework of synthesis in which
formal mathematical models can be composed.</p>
<p>How should we go about this? This is totally unclear! Even more, the
question does not even make sense. This is a bit like asking how do we
multiply a story from Hemingway with a story by Marquez.<sup id="fnref:9" role="doc-noteref"><a href="#fn:9" class="footnote" rel="footnote">9</a></sup></p>
<p>Similarly, models in economics are independent and closed objects and
generally do not compose. It is here where the "Cat" in CyberCat comes
in. Category theory gives us a way to consider open systems and model
them by default relative to an environment. It is this feature which
allows us to even consider the composition of models - for instance the
composition of game theoretic models we developed.</p>
<p>Another central feature that is enabled through category theory is the
following paradigm:</p>
<blockquote>
<p>model == code</p>
</blockquote>
<p>That is, the formalism can be seamlessly translated back and forth
between model and an actual (software) implementation. Thereby, instead
of modelling on pen and paper, modelling itself becomes programming. It
is important to note that we do not just want to translate mathematical
models into simulations but code does actually symbolically represent
mathematical statements.</p>
<p>To summarize, category theory gives us a formal language of composable
economic models which can be directly implemented.</p>
<p>Equipped with this foundation, we can turn to the programming language
design task to turn the modelling process into a process of software
engineering.</p>
<h1>Industrial mass customization of economic models</h1>
<p>Modelling as programming enables the iterative refinement of models.
Whereas in the traditional sense, models are not only closed but also
dead wood (written on paper), under this paradigm models are more like
living objects which can be (automatically) updated over time.</p>
<p>Instead of building a library of books, in our case the models are part
of a software library. Which means the overall environment becomes way
more powerful over time, as the ecosystem grows.</p>
<p>Composition also means division of labor. We can build models where
parts are treated superficially at first but then details get filled in
later. This can mean more complexity but most importantly means that we
can build consistent models that are extended, refined, and updated over
time.</p>
<p>These aspects resemble similar attempts in mathematics and the use of
proof assistants and verification systems more generally. Here is
Terence Tao on these efforts<sup id="fnref:10" role="doc-noteref"><a href="#fn:10" class="footnote" rel="footnote">10</a></sup>:</p>
<blockquote>
<p>One thing that changed is the development of standard math libraries.
Lean, in particular, has this massive project called mathlib. All the
basic theorems of undergraduate mathematics, such as calculus and
topology, and so forth, have one by one been put in this library. So
people have already put in the work to get from the axioms to a
reasonably high level. And the dream is to actually get [the
libraries] to a graduate level of education. Then it will be much
easier to formalize new fields [of mathematics]. There are also
better ways to search because if you want to prove something, you have
to be able to find the things that it already has confirmed to be
true. So also the development of really smart search engines has been
a major new development.</p>
</blockquote>
<p>It also means different forms of collaboration between field experts and
across traditional boundaries. Need a financial component in that
traditional IO model? No problem, get a finance expert to write this
part - a modern pin factory equivalent. See again Terence Tao<sup id="fnref:11" role="doc-noteref"><a href="#fn:11" class="footnote" rel="footnote">11</a></sup>:</p>
<blockquote>
<p>With formalization projects, what we’ve noticed is that you can
collaborate with people who don’t understand the entire mathematics of
the entire project, but they understand one tiny little piece. It’s
like any modern device. No single person can build a computer on their
own, mine all the metals and refine them, and then create the hardware
and the software. We have all these specialists, and we have a big
logistics supply chain, and eventually we can create a smartphone or
whatever. Right now, in a mathematical collaboration, everyone has to
know pretty much all the mathematics, and that is a stumbling block,
as [Scholze] mentioned. But with these formalizations, it is
possible to compartmentalize and contribute to a project only knowing
a piece of it.</p>
</blockquote>
<p>Lastly, the current developments of ML and AI favor the setup of our
system. We can leverage the rapid development of ML and AI to improve
the tooling on both ends of the pipeline: Users are supported in the
modelling setup and solving or analyses of models becomes easier.</p>
<p>The common thread behind all of our efforts is to boost the modelling
process. The traditional process is manual, slow, and limited by domain
expertise - in other words very expensive.</p>
<p>Our goal is to turn manual work into mass customizable production.</p>
<h1>Closing remarks</h1>
<p>What I described so far is narrowly limited to economic modelling. Where
is the "Cybernetics"?</p>
<p>First, I focused on the composability of economic models. But the
principles of the categorical approach extend beyond this domain. This
includes the understanding how apparently distinct approaches share
commonality (e.g. game theory and learning) and how different structures
can be composed (build game theoretic models on top of some underlying
structure like networks). In short, we work towards a whole "theory
stack".</p>
<p>Second, the software engineering process depicted above focuses very
narrowly on extending the economic modelling process itself. But the
same approach will mirror the theory stack with software enabling
analyses along each level.</p>
<p>Third, once we are operating software, we open the ability towards
leveraging other software to support the modelling process. This follows
pragmatic needs and can range from data analytics to LLMs.</p>
<p>A general challenge to decision-making is the hyper-specialization of
expert knowledge. But as decisions are more and more interconnected,
what is lacking is the ability to synthesize this knowledge. Just
consider the decision-making of governments during the Covid epidemic.
For instance, in the decision to close schools, one cannot simply rely
on a single group of domain experts (say physicians). One needs to
synthesize the outcomes of different models following different
methodologies from different domains. We want to develop frameworks in
which these tradeoffs can be articulated.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Ariel Rubinstein. Economic fables. Open book publishers, 2012,
p.16 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I will focus on micro-economic models. They are simply closest to
my home base and relevant for my daily work. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>The view on what economists do there is markedly different from
Rubinstein's. Prominently Al Roth: <a href="https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0262.00335">The Economist as Engineer: Game
Theory, Experimentation, and Computation as Tools for Design
Economics</a>. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>And probably most importantly, functions themselves can be input
to other functions. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Economics Rules: The Rights and Wrongs of The Dismal Science. New
York: W.W. Norton; 2015 <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:6" role="doc-endnote">
<p>Of course, the classification of practical and non-practical is
not exclusive to economics. Mathematics is full of examples of
domains that are initially seen as without any practical use and
then turned out to be important later on. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:7" role="doc-endnote">
<p>Ibid. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:8" role="doc-endnote">
<p>In addition, if the modelling falls to academics, then also their
incentives kick in. The chances for publishing a model on a subject
that has already been tackled by a prominent model can be very low -
in particular in the case of a null-result. <a href="#fnref:8" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:9" role="doc-endnote">
<p>We might of course come up with a way how these two stories can be
combined or compared. But this requires extra work; there is no
operation to achieve this generically. These days we might ask an
LLM to do so. And indeed this might be a useful direction for the
future to support this process. <a href="#fnref:9" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:10" role="doc-endnote">
<p>Quoted from <a href="https://www.scientificamerican.com/article/ai-will-become-mathematicians-co-pilot/">this
interview</a> <a href="#fnref:10" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:11" role="doc-endnote">
<p>Ibid. <a href="#fnref:11" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Philipp ZahnAre economic models useful for making decisions? One might expect that there is clear answer to this simple question. But in fact opinions on the usefulness or non-usefulness of models as well as what exactly makes models useful vary widely. In this post, I want to explore the question of usefulness. Even more so, I want to explore how the usefulness ties into the modelling process. The reason for doing so is simple: Part of our efforts at CyberCat is to build software tools to improve and accelerate the modelling process.The Yoga of Contexts I2024-06-28T00:00:00+00:002024-06-28T00:00:00+00:00https://cybercat-institute.github.io//2024/06/28/yoga-contexts<p>Suppose we have some category $\mathcal C$, whose morphisms are some kind of <em>processes</em> or <em>systems</em> that we care about. We would like to be able to talk about <em>contexts</em> (or <em>environments</em>) in which these processes or systems can be located.</p>
<p>This post is to finally write part of the lore of categorical cybernetics that I’ve been working out on the backburner for a few years, and I’ve talked about in front of various audiences a few times. I never thought it was quite compelling enough to write a paper about it, but it’s been part of my bag of tricks for a while, for example playing a central role in my <a href="https://julesh.com/videos/">lecture series on compositional game theory</a>. In the meantime, similar ideas have been invented a few times in applied category theory, most notably being taken further for talking about <a href="https://arxiv.org/abs/2402.02997">quantum supermaps</a>.</p>
<h2>Contexts in a category</h2>
<p>Topologically, we draw morphisms of our category as nodes, which have a hole <em>outside</em> but no hole <em>inside</em> (that is to say they are really point-like, despite how we conventionally draw them) - and dually, we draw contexts as diagram elements that have a hole <em>inside</em> but no hole <em>outside</em>.</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img1.png" alt="String diagram" /></p>
<p>Being good category theorists, we choose not to say what a context <em>is</em> but how it <em>transforms</em>, which will lead to being able to define them via additional structure we can equip our categories with. If we have a context for morphisms $X \to Y$, and we have morphisms $f : X \to X’$ and $g : Y’ \to Y$, we should be able to <em>demote</em> these morphisms into being part of an extended environment for morphisms $X’ \to Y’$:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img2.png" alt="String diagram" /></p>
<p>By asking that demoting twice gives the same result as demoting a composite, and the order of demoting on the domain and codomain doesn’t matter, we end up inventing the following definition: A <em>system of contexts</em> for a category $\mathcal C$ is a functor $\overline{\mathcal C} : \mathcal C \times \mathcal C^{\mathrm{op}} \to \mathbf{Set}$, and a context for morphisms $X \to Y$ is an element of $\overline{\mathcal C} (X, Y)$.</p>
<p>Things get much more interesting when $\mathcal C$ is not just a category but a symmetric monoidal category, as is virtually always the case in any applied domain. Our first guess might be to replace the functor $\mathcal C$ with some kind of monoidal functor. <em>Lax</em> monoidal (for the cartesian monoidal product on $\mathbf{Set}$) turns out to be probably what we want - this says that if we have a context for morphisms $X \to Y$ and one for morphisms $X’ \to Y’$ we can compose them to get a context for morphisms $X \otimes X’ \to Y \otimes Y’$, but this operation is not necessarily reversible. Topologically this is a bit subtle, and says we can <em>bridge</em> 2 holes with a single morphism:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img3.png" alt="String diagram" /></p>
<p>We probably get away with this because we are assuming everything is symmetric monoidal. I sometimes think of holes as <em>anti-nodes</em> that we can slide around as though they are nodes. This part of the definition has an odd status right now: it seems that we can virtually always get it in practice, and it plays a role in the theory, but I have never actually deployed the lax monoidal structure of contexts while doing any applied work.</p>
<p>In any case, this is not enough to describe contexts in a symmetric monoidal category, so we need to go back to first principles.</p>
<h2>The yoga of contexts</h2>
<p>Suppose we have a symmetric monoidal category and we have a context for morphisms $X \otimes X’ \to Y \otimes Y’$, and suppose we have a morphism $f : X \to Y$. Similarly to before, we should be able to <em>demote</em> $f$ into the context, obtaining a context for morphisms $X’ \to Y’$:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img4.png" alt="String diagram" /></p>
<p>I wrote this definition in section 9 of <a href="https://arxiv.org/abs/1904.11287">The Game Semantics of Game Theory</a>. But it turns out this isn’t the best way to write it: it’s enough to be able to demote an identity morphism, with an operation $\overline{\mathcal C} (Z \otimes X, Z \otimes Y) \to \overline{\mathcal C} (X, Y)$:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img5.png" alt="String diagram" /></p>
<p>A category theorist would call this a (monoidal) <em>costrength</em> for $\overline{\mathcal C}$, although I find it useful to think of it as a kind of <em>tensor contraction</em>.</p>
<p>But there’s another way to think about this whole thing. Given a symmetric monoidal category $\mathcal C$, a <em>comb</em> in $\mathcal C$ is a diagram element with 1 hole on the inside and 1 hole on the outside:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img6.png" alt="String diagram" /></p>
<p>(Note, drawing them with this “comb” shape is enough because our ambient category is symmetric. In a planar setting, we would actually have to puncture a box with a hole.)</p>
<p>Concretely, a comb consists of a pair of morphisms coupled through a “residual” wire - but by drawing a box around it, we lose the ability to distinguish combs that differ by sliding a morphism between the front and back along the residual wire:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img7.png" alt="String diagram" /></p>
<p>This turns out to be exactly the the definition of an <em>optic</em> in $\mathcal C$ - I think of combs as one <em>syntactic</em> presentation (among several others) of the <em>semantic</em> concept of an optic in a category. There is a category $\mathbf{Optic} (\mathcal C)$ whose objects are pairs of objects of $\mathcal C$, and whose morphisms are combs. Whereas string diagrams in $\mathcal C$ compose left-to-right, these “comb diagrams” in $\mathcal C$ compose <em>outside-in</em>, like an operad:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img8.png" alt="String diagram" /></p>
<p>We also get a symmetric monoidal product on $\mathbf{Optic} (\mathcal C)$ that encompasses what I said earlier about sliding holes around. Now we get an alternative definition of context: it’s a <em>generalised state</em> of optics. That is to say, it’s an <em>ultimate outside</em>, which can be transformed by attaching a comb to the inside of the hole:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img9.png" alt="String diagram" /></p>
<p>If we do this, the properties we had to demand of the co-strength map get absorbed into the quotient defining optics.</p>
<p>What is a “generalised state”? A <em>state</em> in a monoidal category $\mathcal C$ is a morphism from the monoidal unit, and a <em>generalised</em> state is something that transforms like a state: an element of some lax monoidal functor $\mathcal C \to \mathbf{Set}$. That is to say: if we have a generalised state $x$ of $X$ and a morphism $f : X \to Y$, we get a pushforward state $f_* (x)$; and if we have generalised states $x$ of $X$ and $y$ of $Y$, we get a state $x \otimes y$ of $X \otimes Y$.</p>
<p>So now we have 2 different definitions of a system of contexts: as a lax monoidal functor $\mathcal C \times \mathcal C^{\mathrm{op}} \to \mathbf{Set}$ equipped with a co-strength map, or as a lax monoidal functor $\mathbf{Optic} (\mathcal C) \to \mathbf{Set}$. Fortunately, these definitions turn out to be equivalent: it’s a dual of the <a href="https://arxiv.org/abs/2001.07488">profunctor representation theorem</a>. The normal version of this theorem says that <em>Tambara modules</em> - endo-profunctors on $\mathcal C$ equipped with a strength map - are equivalent to functors $\mathbf{Optic} (\mathcal C)^{\mathrm{op}} \to \mathbf{Set}$. It turns out that a Tambara module on $\mathcal C^{\mathrm{op}}$ is the same thing as a Tambara module, which conveniently frees up the name <em>Tambara co-module</em> to be used for this thing.</p>
<p>(A word of warning: the paper I linked defines “$\mathbf{Optic} (\mathcal C)$” to be $\mathbf{Optic} (\mathcal C)^\mathrm{op}$, which means they say $\mathbf{Optic} (\mathcal C) \to \mathbf{Set}$ when they mean $\mathbf{Optic} (\mathcal C)^{\mathrm{op}} \to \mathbf{Set}$ and vice versa.)</p>
<p>As a personal anecdote, at different points I’ve convinced myself that both of these definitions were the correct definition of “system of contexts”, before realising that they were equivalent by the profunctor representation theorem - this led to me getting some quite good, graphical intuition for this otherwise notoriously abstract theorem.</p>
<p>Some time after working out the last part of this, I learned about the existence of <a href="https://www.sciencedirect.com/science/article/pii/S0304397512000163">this paper</a> by Hermida and Tennent, which finally backed up my intuition behind my definition of generalised states by formulating a universal construction forcing them to become actual states. Incredibly this construction itself also falls squarely in the small cluster of methods we call categorical cybernetics, which caps off the whole thing very nicely. I touched on this construction in <a href="https://cybercat.institute/2024/02/22/iteration-optics/">this blog post</a>, and perhaps I’ll have more to say about it later too.</p>
<h2>Conclusion</h2>
<p>Often we don’t need generalised states, and ordinary states are enough: that’s when we take the representable functor $\mathcal C (I, -) : \mathcal C \to \mathbf{Set}$, which is indeed lax monoidal. (General representable functors on a monoidal category are <em>not</em> lax monoidal in general!)</p>
<p>This leads to what I call the “representable system of contexts” for a symmetric monoidal category $\mathcal C$: it’s the one described by $\mathbf{Optic} (\mathcal C) (I, -)$, where the monoidal unit of $\mathbf{Optic} (\mathcal C)$ is $(I, I)$. What this ends up saying is that a context for morphisms $X \to Y$ in $\mathcal C$ is an equivalence class of pairs of a state and a costate in $\mathcal C$, coupled through a residual:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img10.png" alt="String diagram" /></p>
<p>This turns out (in a non-trivial way) to be equivalent to the definition of context used for both <a href="https://arxiv.org/abs/1603.04641">deterministic</a> and <a href="https://compositionality-journal.org/papers/compositionality-5-9/">Bayesian</a> open games. In those cases, $\mathcal C$ is itself a category of optics, making systems of contexts examples of <em>double optics</em>. Iterating the $\mathbf{Optic} (-)$ construction can be usefully depicted in 2 different ways: as 1-hole combs in a bidirectional category:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img11.png" alt="String diagram" /></p>
<p>or as 3-hole combs:</p>
<p><img src="/assetsPosts/2024-06-28-yoga-contexts/img12.png" alt="String diagram" /></p>
<p>Moving back and forth between these equivalent views of the iterated optic construction is a key part of the yoga of contexts as it applies to categorical cybernetics.</p>
<p>An example of a non-representable system of contexts is the “iteration functor” I talked about in <a href="https://cybercat.institute/2024/02/22/iteration-optics/">this post</a>. It’s closely related to the <em>algebra of Moore machines</em> which plays a major role in David Jaz Myers’ book on <a href="http://davidjaz.com/Papers/DynamicalBook.pdf">categorical systems theory</a>.</p>
<p>But, the actual reason this is a blog post and not a paper is that I don’t have any really compelling examples outside of categorical cybernetics. But I’ll talk more about my struggles with that in part II, where I’ll build a category of “behaviours in context” given a system of contexts, generalising the construction of open games.</p>Jules HedgesSuppose we have some category, whose morphisms are some kind of processes or systems that we care about. We would like to be able to talk about contexts (or environments) in which these processes or systems can be located.Reinforcement Learning through the Lens of Categorical Cybernetics2024-05-29T00:00:00+00:002024-05-29T00:00:00+00:00https://cybercat-institute.github.io//2024/05/29/reinforcement-learning-in-cat-cyb<p>Cross-posted from <a href="https://riurodsak.github.io/posts/2024/05/rl_cat_cyb/">Riu’s blog</a>.</p>
<p>In modelling disciplines, one often faces the challenge of balancing three often conflicting aspects: representational elegance, the breadth of examples to capture, and the depth or specificity in capturing those examples of interest.
In the context of reinforcement learning theory, this raises the question: what is an adequate ontology for the techniques involved in agents learning from interaction with an environment?</p>
<p>Here we make a structural approach to the above dilemma, both in the sense of structural realism and <a href="https://ncatlab.org/nlab/show/stuff%2C+structure%2C+property">stuff, structure, property</a>.
The characteristics of RL algorithms that we capture are their modularity and specification via typed interfaces.</p>
<p>To keep this exposition grounded in something practical, we will follow an example, <a href="https://en.wikipedia.org/wiki/Q-learning">Q-learning</a>, which from this point of view captures the essence of reinforcement learning.
It is an algorithm that finds an optimal policy in an MDP by keeping an estimate of the value of taking a certain action in a certain state, encoded as a table $Q:A\times S\to R$, and updating it from previous estimates (<em>bootstrapping</em>) and from samples obtained by interacting with an environment.
This is the content of the following equation (we’ll give the precise type for it later):</p>
\[\begin{equation}
Q(s,a) \gets (1-\alpha) Q(s,a) + \alpha [r + \max_{a':A}Q(s',a') ]
\end{equation}\]
<p>One does also have a policy that is derived from the $Q$ table, usually an $\varepsilon$-greedy policy that selects with probability $1-\varepsilon$ for a state $s$ the action that maximizes the estimated value, $\max_{a:A}Q(s,a)$, and a uniformly sampled action with probability $\varepsilon$.
This choice helps to overcome the exploration-exploitation balance.</p>
<p>Ablating either component produces other existing algorithms, which is reassuring:</p>
<ul>
<li>If we remove the bootstrapping component, one recovers a (model-free) one-step Monte Carlo algorithm.</li>
<li>If we remove the samples, one recovers classical Dynamic Programming methods<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> such as Value Iteration. We’ll come back to these sample-free algorithms <a href="#continuations-function-spaces-and-lenses-in-lenses">later</a>.</li>
</ul>
<h1>The RL lens</h1>
<p>Q-learning as we’ve just described, and other major RL algorithms, can be captured as lenses; the forward map is the policy deployment from the model’s parameters, and the backward map is the update function.</p>
<p><img src="/assetsPosts/2024-05-28-reinforcement-learning-in-cat-cyb/generic_model.png" alt="Generic model lens" /></p>
<p>The interface types vary from algorithm to algorithm.
In the case of Q-learning, the forward map $P$ is of type $R^{S\times A}\to (DA)^S$ (where $D$ is the distribution monad). It takes the current $Q$-table $Q:S\times A\to R$ and outputs a policy $S\to DA$. This is our $\varepsilon$-greedy policy defined earlier.
The backward map $G$ has the following type (we define $\tilde{Q}$ in (2)):</p>
\[\begin{align*}
R^{S\times A}\times (S\times A\times R\times S) &\to T_{(s,a)}^{*}(S\times A) \newline
Q, (s,a,r,s) &\mapsto \tilde{Q}
\end{align*}\]
<p><img src="/assetsPosts/2024-05-28-reinforcement-learning-in-cat-cyb/Q_learning_model.png" alt="Q-learning model lens" /></p>
<p>The type of model parameter change $\Delta(R^{S\times A})=T_{(s,a)}^{*}(S\times A)$ has as elements cotangent vectors to the base space $S\times A$ (not to $R^{S\times A}$).
This technicality allows us to define the pointwise update of equation (1) as $((s,a),g)$, where $g=(r + \gamma\max_{a’:A}Q(s’,a’))\in R$ is the <em>update target</em> of our model.
The new $Q$ function then is defined as:</p>
\[\begin{equation}
\tilde{Q}(\tilde{s},\tilde{a}) = \begin{cases}
(1-\alpha)Q(s,a) + \alpha [r + \gamma \max_{a'} Q(s',a')] & (\tilde{s},\tilde{a})=(s,a) \newline
Q(s,a) & o/w
\end{cases} \end{equation}\]
<p>The quotes in the diagram above reflects that showing explicitly the $S$ and $A$ wires below loses the dependency of the type $R^{S\times A}$ over them.
This is why in the paper we prefer to write the backward map as a single box $G$ with inputs $R^{S\times A}\times (S\times A\times R)$ and output $T_{(s,a)}^{*}(S\times A)$.</p>
<h1>From Q-learning to Deep Q-networks</h1>
<p>Writing the change type as a cotangent space allows us to bridge the gap to Deep Learning methods.
In our running example, we can do the standard transformation of the Bellman update to a Bellman error to decompose $G$ into two consecutive steps:</p>
<ul>
<li>
<p>Backward map:</p>
\[\begin{align*}
G:R^{S\times A} \times (S\times A\times R\times S') &\to S\times A\times R \newline
Q, (s,a,r,s') &\mapsto (s,a,\mathcal{L})
\end{align*}\]
<p>The loss $\mathcal{L}$ is defined as the MSE between the current $Q$-value and the update target $g$:</p>
\[\mathcal{L} = \left(Q(s,a) - g\right)^2 = \left(Q(s,a) - (r + \gamma \max_{a'} \bar{Q}(s',a')) \right)^2\]
<p>We treat $\bar{Q}(s’,a’)$ ($Q$ bar) as a constant value, so that the (semi-)gradient of $\mathcal{L}$ wrt. the $Q$-matrix <em>is</em> the Bellman Q-update, as we show next.</p>
</li>
<li>
<p>Feedback unit (Bellman update):</p>
\[\begin{align*}
(1+S\times A\times R)\times R^{S\times A} \to& R^{S\times A} \newline
*, Q \mapsto& Q \newline
(s,a,\mathcal{L}), Q \mapsto& \tilde{Q} \newline
=& Q - {\alpha\over 2}{\partial\mathcal{L}\over\partial Q} \newline
=& \forall (\tilde{s},\tilde{a}). \begin{cases}
Q(s,a) - \alpha(Q(s,a) - g) & (\tilde{s},\tilde{a}) = (s,a) \newline
Q(s,a) & o/w
\end{cases} \newline
& \forall (\tilde{s},\tilde{a}).\begin{cases}
(1-\alpha) Q(s,a) + \alpha g & (\tilde{s},\tilde{a}) = (s,a) \newline
Q(s,a) & o/w
\end{cases}
\end{align*}\]
<p>This recovers (2), so we can say that the backwards map is doing <em>pointwise</em> gradient descent, by only updating the $(s,a)$ indexed $Q$-value.</p>
</li>
</ul>
<h1>Continuations, function spaces, and lenses in lenses</h1>
<p>Focusing now on sample-free algorithms, the model’s parameter update is an operator $(X\to R)\to (X\to R)$ between function spaces.
State value methods for example update value functions $S\to R$, whereas state-action value methods update functions $S\times A\to R$ (the $Q$-functions).
More concretely, the updates of function spaces that appear in RL are known as Bellman operators.
It turns out that a certain subclass which we call <em>linear Bellman operators</em> can be obtained functorially from lenses as well!</p>
<p>The idea is to employ the continuation functor<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> which is the following representable functor:</p>
\[\begin{align*}
\mathbb{K} =\mathbf{Lens}(-,1) : \mathbf{Lens}^\mathrm{op} &\to \mathbf{Set} \newline
{X\choose R} &\mapsto R^X \newline
{X\choose R}\rightleftarrows {X'\choose R'} &\mapsto R'^{X'} \to R^X
\end{align*}\]
<p>The contravariance hints already at the corecursive nature of these operators:
They take as input a value function of states <em>in the future</em>, and return a value function of states <em>in the present</em>.
The subclass of Bellman operators that we obtain this way is linear in the sense that it uses the domain function in $R’^{X’}$ only once.</p>
<p>An example of this is the value improvement operator from dynamic programming.
This operator improves the value function $V:S\to R$ to give a better approximation of the long-term value of a policy $\pi:S\to A$, and is given by</p>
\[V(s) \gets \mathbb{E}_{\mkern-14mu\substack{a\sim \pi(s)\newline (s',r)\sim t(s,a)}}[r+\gamma V(s')] = \sum _{a\in A}\pi(a\mid s) \sum _{\substack{s'\in S\newline r\in R}}t(s',r\mid s, a) (r + \gamma V(s'))\]
<p>This is the image under $\mathbb{K}$ of a lens <sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> whose forward and backward maps are the transition function $\mathrm{pr}_1(t(-,\pi(-))):S \to S$ under a fixed policy $\pi:S\to A$, and the update target computation $(-)+\gamma\cdot(=):\mathbb{R}\times \mathbb{R}\to \mathbb{R}$ respectively, as shown below.</p>
<p><img src="/assetsPosts/2024-05-28-reinforcement-learning-in-cat-cyb/VI_lens_to_set.png" alt="Value Improvement lens to function" /></p>
<p>If you want to read more about this “optics perspective” on Value Iteration and its relation with problems like the control of an inverted pendulum, the savings problem in economics and more, check out our previous <a href="https://arxiv.org/abs/2206.04547">ACT2022 paper</a>.</p>
<p>Once we have transformed the Bellman operator into a function using $\mathbb{K}$, this embeds into the backward map of the RL lens.</p>
<p><img src="/assetsPosts/2024-05-28-reinforcement-learning-in-cat-cyb/Bellman_embedding_into_lens.png" alt="Embedding of the Bellman operator into the backward pass of the RL lens" /></p>
<p>It is then natural to ask what a backward map that does not ignore the sample input might look like, and these are what we call <em>parametrised</em> Bellman operators.
These are obtained by the lifting of $\mathbb{K}$ to the (externally parametrised) functor $\mathrm{Para}(\mathbb{K}):\mathrm{Para}(\mathrm{Lens}^\mathrm{op})\to\mathrm{Set}$, and captures exactly what algorithms like <a href="https://en.wikipedia.org/wiki/State%E2%80%93action%E2%80%93reward%E2%80%93state%E2%80%93action">SARSA</a> are doing in terms of usage of both bootstrapping and sampling.</p>
<h1>Outlook</h1>
<p>We talked about learning from bootstrapping and from sampling as two distinct processes that fit into the lens structure. While the difference between these two is usually not emphasized enough, we believe that it is useful for understanding the structure of novel algorithms by making the information flow explicit.
You can find more details, along with a discussion on Bellman operators, the <a href="https://cybercat.institute/2024/02/22/iteration-optics/">iteration functor</a> used to model stateful environments, prediction and bandit problems as nice corner cases of our framework, and more on our recent <a href="https://arxiv.org/abs/2404.02688">submission</a> to ACT2024.</p>
<p>Moreover, this opens up the study of stateful models: multi-step methods like $n$-temporal difference or Monte Carlo Tree Search (used e.g. in AlphaZero), which we will leave for a future post, so stay tuned!</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This is sometimes denoted “offline” RL, but one should note that offline methods include learning from a constant dataset and learning by updating one’s estimates only at the end of episodes too. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>In general, the continuation functor is defined for any optic as $\mathbb{K}=\mathbf{Optic}(\mathcal{C})(-,I):\mathbf{Optic}(\mathcal{C})^\mathrm{op}\to\mathbf{Set}$, represented by the monoidal unit $I$. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Ok I lied here a little: To be precise, the equation shown arises as the continuation of a <em>mixed</em> optic, where the forwards category is $\mathrm{Kl}(D)$ for a probability monad $D$, and the backwards category is $\mathrm{EM}(D)$. The value improvement operator that arises from the continuation of a lens is a deterministic version of this, where there’s no expectation taken in the backwards pass because we fix the policy and the transition functions to be deterministic. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Riu Rodriguez SakamotoThis is an overview of the 'RL lens', a construction that we recently introduced to understand some reinforcement learning algorithms like Q-learningThe Blurry Boundary between Economics and Operations Research2024-05-17T00:00:00+00:002024-05-17T00:00:00+00:00https://cybercat-institute.github.io//2024/05/17/economics-operations-research<p>Cross-posted from <a href="https://econpatterns.substack.com/p/machine-replacement-machine-scheduling">Oliver’s EconPatterns blog</a></p>
<p>The <a href="https://econpatterns.substack.com/p/designing-economic-mechanisms-the">last EconPatterns post</a> traced the history of economic design, focusing on the operations research group at Stanford’s business school and its role in developing auction design and market design. In this post I want to take this a bit further and describe the overlapping roles of operations research and economic design in more detail, anchoring on typical “operations research” domains, and how they quickly cross over into “economic” domains.</p>
<p>But a short definition of terms first: operations research is an applied branch of mathematics, mostly focusing on optimization (or “programming” in the original sense of linear programming, dynamic programming, combinatorial programming, etc.): orchestrating inputs to optimize (minimize or maximize) an output in the form of an objective variable.</p>
<p>The canonical result is a constrained programming or optimization problem expressed as one objective function and any number of inequalities expressing constraints or limits.</p>
<p>In the <a href="https://cybercat.institute/2024/03/08/stocks-flows-transformations/">first post</a> on the cybernetic economy I already stressed the role of these limits: available stocks can ran out or warehouses can overflow, machines can only transform so many pieces in an hour, pipelines, roads, conveyor belts can reach their capacity and get congested.</p>
<p>“Orchestration” means putting tasks into their correct order, balancing loads and flows, minimizing stocks without risking stockouts, avoiding congestions, disruptions, or volatility in the flow of goods, people, information, targeting objectives like fastest time to completion, minimal slack times, or lowest cost of inventory.</p>
<p>Operations research is the formal mathematical tool used by industrial engineers to design production plants, supply chains, workforce deployment plans, transport schedules and sundry other things that require juggling many parts under tight and often volatile conditions.</p>
<p>Even if it’s steeped in industrial (and military) lore, it’s also used in areas like microchip design, financial engineering, and all over the place in the digital economy. It’s pretty much everywhere in those parts of the economy that typically remain invisible to the casual observer: the engine room of a modern economy.</p>
<p>The hallmark of operations research is that it’s set up to serve one principal, focusing mostly on operations within an organization. This distinguishes it from economics proper, which focuses on exchange between and the resulting tension in objectives, motivations, and desires of multiple principals.</p>
<p>Operations research cuts over to (mathematical) economics at the same juncture where decision theory crosses over to game theory: when the diverging interests of the participants move to the forefront of the analysis.</p>
<p>EconPatterns deliberately straddles the boundary between the disciplines for a number of reasons: operations research has much closer ties to both computer science and industrial design, offers a much richer toolset to aggregate and disaggregate processes within a hierarchical structure, has a closer connection between theory and practice, is a much better design paradigm to model complex longitudinal interactions with many specialized components, and ultimately has more tangible and straightforward objectives, typically those that can be measured with a stopwatch or a yardstick, rather than abstractions such as the idea of an equilibrium as a stable state where conflicts are resolved.</p>
<p>On the other hand operations research works from a paradigm of central planning, a paradigm that is losing analytical heft the more the connected process under scrutiny — the value chain — involves interaction, goal and resource conflicts between principals rather than between machines, tools, parts, information, and labor.</p>
<p>So roughly speaking, as soon as the tension between principals becomes the driving factor, we cross over to economics. As soon as the need to concatenate activities or to disaggregate higher-level processes into tasks and subtasks dominates, we’ll lean more on operations research.</p>
<p>But the core message is still that from the EconPatterns vantage point, where the value chain is the analytical starting point for any design endeavor, that all but the most trivial value chains have multiple crossings not only between machines but also between organizations, jurisdictions, and even belief systems, and that not only efficiency but also accountability is relevant to the integrity of that value chain, the formal aspects of economic design will inevitably be on the cusp between the disciplines.</p>
<p>Let’s put this to use in two examples.</p>
<h2>Machine replacement and coordinated machine replacement</h2>
<p>Machine replacement is one of the core problems in industrial engineering. In its simplest form, it means finding the ideal time of putting an existing machine or process out of use and replacing it with another, presumably superior one.</p>
<p>The calculus, easy enough for bachelor-level exams, requires comparing the cost of the new machine (minus scrap value of the old machine) to the performance differential, most likely in a net present value calculation. If the performance benefit is higher than the cost of replacement, it’s a go.</p>
<p>From this starting point we can make the problem arbitrarily complex. What if the performances of the machines are not constant over time? What if the old one becomes gradually less efficient, with lower throughput and more frequent outages, or the new machine needs time to ramp up? What if the replacement itself doesn’t only include the purchasing cost of the new machine but also a work stoppage? What if the machine is part of a production facility? Does the whole production line have to be closed down, or are other similar machines on a different line able to take on the shortfall? What if the new machine is only able to perform better in conjunction with other replacements? What if uncertainty is involved?</p>
<p>Even if it’s useful to think of industrial engineering in terms of real industrial machines for milling, turning, or drilling, in an industrial machine shop, these “machines” could be pretty much everything. If a bank considers a new process for checking creditworthiness or if a college department contemplates restructuring its degree curriculum, they encounter similar planning and orchestrating problems. The introduction of video assist in sports is an example of machine replacement.</p>
<p>Today, in most cases the “machine” is simply a computer. More abstractly, a “machine” is simply any workplace where a defined transformation is taking place.</p>
<p>All of this happens, if nothing goes wrong, according a meticulously planned program, and if something goes wrong, hopefully according to a meticulously crafted contingency plan — the hallmark of central planning.</p>
<p>If we remove that requirement, and allow two new stakeholders in — competitors and customers — we get something we might call coordinated machine replacement, or in a more succinct and better known term: innovation.</p>
<p>Innovation in its most technical definition is the increase in <a href="https://en.wikipedia.org/wiki/Total_factor_productivity">total factor productivity</a>, or aggregate outputs produced by aggregate input factors (in economics, famously labor, capital, and soil, but I’ll devote another post to that). In other words: the collective replacement of machines, processes, activities, to make resource use more efficient, in a (more or less) competitive economy.</p>
<p>In a “textbook” model of the economy, where firms are seen as singular and solitary production functions, replacement happens by Schumpeterian competition: companies which improve their efficiency by optimizing their production will gain a competitive advantage which lets them capture value, “Schumpeterian rents” in the innovation economics nomenclature, for as long as that efficiency advantage persists.</p>
<p>This economic pressure: technologically disadvantaged competitors see their margins evaporate until they either catch up or give up and leave the industry, is the driver of economic innovation and, in turn, economic growth.</p>
<p>So much for the textbook treatment.</p>
<p>The era of Henry Ford shutting down production for five months to replace the Model T with the Model A being decisively over, the problem of keeping interdependencies uninterrupted while interrupting a single step in a complex value chain moves to the forefront.</p>
<p>Research and development for new car generations now starts long before the existing car generation gets taken off the market. There is no more reason to lay off workers, cancel orders for parts, keep dealerships waiting for new vehicles, or hope that customers are willing to wait a few months rather than wandering off to the competition.</p>
<p>Some of this still falls under a competitive bracket. Laid-off workers and stranded dealers might also defect to the competition. In other cases, outright coordination might become necessary such as in the adoption of shared technology standards or auditing rules. Value chains might become reintegrated such as electric vehicle manufacturers, recognizing that market competition does not supply enough charging stations, reluctantly entering the market for charging infrastructure.</p>
<p>The less we think about technological disruption of value chain as a purely competitive event of isolated actors, the more we need to reach into the toolbox of operations research methods.</p>
<h2>Machine scheduling and coordinated machine scheduling</h2>
<p>Machine scheduling is at the heart of operations research, and even if one of its synonyms, “<a href="https://en.wikipedia.org/wiki/Job-shop_scheduling">job shop scheduling</a>”, betrays its origin on the shop floors of the industrial era, it’s still at the heart of most algorithmic processes that try to direct inputs toward productive outputs.</p>
<p>The underlying idea is that jobs have to be allocated to machines on which they can performed. In its simplest form, these jobs consist of a sequence of steps, similar to Adam Smith’s pin factory, where a prior step has to be finished before a subsequent job can be started.</p>
<p>This setup can be made more complicated in many ways. Machines (and their operators) might be specialized to perform only certain tasks. Jobs might require setup times which either have to wait until prior jobs are finished or can be started while the prior job is still running. Uncertainty can come into play in many ways.</p>
<p>The objective is typically to minimize time to completion, maximize machine utilization, or some related measure.</p>
<p>Machine scheduling has successfully crossed over from the shop floor to the digital economy, especially when it comes to platform operations where the “machines” can be vehicles: taxicabs, scooters, coaches, and the “jobs” can be passengers trying to get from A to B in a timely, cheap, and secure manner.</p>
<p>This is again a scenario where the worlds of economics and operations research intersect. We can think of a platform as a central conductor trying to move people from A to B, which inevitably requires operations research knowledge, but we also have passengers (and in some cases, drivers) as participants with diverging interests, which requires economic and especially game theoretic knowledge.</p>
<p>The boundary is blurry and the scale might tip whenever we realize that we’re better off assigning a modicum of autonomy to the many interlocking parts, that the machines might find a better solution if we let them compete for scarce resources and avoid congestions rather than insisting that coordination requires central control.</p>
<p>But it also helps to think of operations research as the discipline that operates bottom-up, assembling economic engines from universal elementary operations, while economics tend to operate top-down, from a highly aggregated macroeconomic perspective to individual microperspectives. But it also helps to think of operations research as the discipline that moved from the shop floors to academia while economics is still trying to move in the opposite direction.</p>
<p><img src="/assetsPosts/2024-05-17-economics-operations-research/img1.webp" alt="Trees" /></p>
<h2>The blurry boundary between economics and operations research</h2>
<p>Design is ultimately about breaking complex problems down into their constituent parts, solving them in isolation and reassembling them in the hope that the partial solutions fit together. This requires, almost regardless of the application domain, that we start with a rough outline of the potential solution and decide step by step which partial problems require particular attention to detail.</p>
<p>This can be done in a methodical or in a haphazard fashion. In particular, the opposing risks of not enough attention to detail or too much attention to detail loom large over failed design projects. This is certainly not restricted to economic design, but economics as a discipline suffers from a lack of conceptual rigor and increasingly an overflow of formal rigor.</p>
<p>This isn’t only the case for the part of the design process where we go from a “rough outline” (a conceptual understanding of the overall problem) to a fully fleshed out formal model, but also, once we understand that we need to apply a formal toolset, a lack of understanding which toolset applies to the problem at hand.</p>
<p>In this post, we’re in the latter part of that process. Both operations research and mathematical economics are highly formalized frameworks which share a common history in the evolution of constrained optimization but which for at least two generations (roughly from the inception of the Econ Nobel and the deliberate choice by the Nobel committee to reward the economists but not the operations researchers working on the same problem) barely talked to each other.</p>
<p>Over the last ten years or so, we’ve seen a gradual rapprochement between the disciplines, in large part because the new players of the digital economy started to realize that their machinery is often economic in nature — auctions, matching markets, information and risk aggregators — even if they deal in abstract information goods rather than in physical objects assembled on the shop floors of the industrial economy.</p>
<p>In the process they’ve also recognized that the academic paper exercises which constitute the main output of modern economics aren’t sufficient to assemble production-ready economic engines. For this you also need scalability, modularity, interoperability, and an understanding of human interaction that bows as much to drab realism as it does to formal aesthetics.</p>
<p>To offer a simple example: in the push to succeed in the global coordinated machine replacement problem known as the transition from fossil to renewable sources of energy, we can’t just assume that we’re fine and markets clear if aggregate supply matches aggregate demand.</p>
<p>We also have to take into account that energy is rarely ever produced where and when it’s needed, neither in place nor in time. So we have to apply a model of an energy economy that pays attention to stocks and flows — in other words, a <a href="https://cybercat.institute/2024/03/08/stocks-flows-transformations/">cybernetic economy</a>.</p>Oliver BeigeIn which we bring back together the estranged fraternal disciplines of economics and operations research and map out how we can combine them to design cybernetic economies.Exploring best response dynamics2024-05-09T00:00:00+00:002024-05-09T00:00:00+00:00https://cybercat-institute.github.io//2024/05/09/exploiring-best-response-dynamics<p>I have been playing around with Nash equilibrium search in random normal form games by following best response dynamics - partly because I was curious, and partly as a learning exercise in NumPy.</p>
<p>Here are my preliminary conclusions:</p>
<ul>
<li>The key to success is replacing argmax with softmax for a sufficiently high temperature. For low temperatures, nothing I could do would make it converge.</li>
<li>Provided the temperature is high enough, the learning rate is irrelevant, with a learning rate of 1 (ie. literally overriding each strategy with its softmax best response every stage) converging in just a few steps.</li>
<li>Normal form games are big. Like, really big. For 9 players and 9 moves, the payoff matrix no longer fits in my VRAM. For 10 players and 10 moves (which I still consider absolutely tiny) the payoff matrix contains exactly 10${}^{11}$ = 100 billion parameters, making it large by the standards of an LLM at the time of writing.</li>
</ul>
<p>My understanding of the theory is that this type of iterative method will not in general converge to a Nash equilibrium, but it will converge to an $\varepsilon$-equilibrium for <em>some</em> $\varepsilon$. What I don’t know is how the error $\varepsilon$ can depend on the game and the learning algorithm. That’s something I’ll look into in some follow-up work, presumably by comparing my results to what <a href="https://nashpy.readthedocs.io/en/stable/">NashPy</a> finds.</p>
<h2>Payoff tensors</h2>
<p>For a normal form game with $N$ players and $M$ moves, the payoff matrix is an $M^N \times N$ tensor, of rank $N + 1$. Each player gets one tensor rank of dimension $M$ for their move, and then there is one more rank of dimension $N$ to assign a payoff to each player.</p>
<p>Here is my incantation for initialising a random payoff tensor, with payoffs drawn uniformly between 0 and 1:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">gen</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">default_rng</span><span class="p">()</span>
<span class="n">shape</span> <span class="o">=</span> <span class="nf">tuple</span><span class="p">(</span><span class="n">numMoves</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">numPlayers</span><span class="p">))</span> <span class="o">+</span> <span class="p">(</span><span class="n">numPlayers</span><span class="p">,)</span>
<span class="n">payoffMatrix</span> <span class="o">=</span> <span class="n">gen</span><span class="p">.</span><span class="nf">random</span><span class="p">(</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<p>It turns out that with this distribution of payoffs, the law of large numbers kicks in and payoffs for any mixed strategy profile are extremely close to 0.5, and the more players there are the closer to 0.5 they are. Normal form games are defined up to arbitrary positive affine transformations of the payoffs, so I ended up going with a sort-of exponential distribution of payoffs, so that much higher payoffs could sometimes happen. This made very little difference but made me feel happier:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">payoffMatrix</span> <span class="o">=</span> <span class="n">gen</span><span class="p">.</span><span class="nf">exponential</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">shape</span><span class="p">)</span> <span class="o">*</span> <span class="n">gen</span><span class="p">.</span><span class="nf">choice</span><span class="p">((</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span> <span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<h2>Computing payoffs</h2>
<p>A mixed strategy profile is an $M \times N$ stochstic matrix:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">strategies</span> <span class="o">=</span> <span class="n">gen</span><span class="p">.</span><span class="nf">random</span><span class="p">((</span><span class="n">numPlayers</span><span class="p">,</span> <span class="n">numMoves</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">numPlayers</span><span class="p">):</span>
<span class="n">strategies</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">strategies</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="nf">sum</span><span class="p">(</span><span class="n">strategies</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
</code></pre></div></div>
<p>Here is the best incantation I could come up with for computing the resulting payoffs:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">payoffs</span> <span class="o">=</span> <span class="n">payoffMatrix</span>
<span class="k">for</span> <span class="n">player</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">numPlayers</span><span class="p">):</span>
<span class="n">payoffs</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">tensordot</span><span class="p">(</span><span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">],</span> <span class="n">payoffs</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>
</code></pre></div></div>
<p>(For 9 players this is already too much for my poor laptop.)</p>
<p>I wish I could find a more declarative way to do this. For small and fixed number of players, this kind of thing works, but I didn’t want to mess with the stringly-typed incantation that would be necessary to do it for $N$ players:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">payoffs</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">einsum</span><span class="p">(</span><span class="sh">'</span><span class="s">abcu,a,b,c</span><span class="sh">'</span><span class="p">,</span> <span class="n">payoffMatrix</span><span class="p">,</span> <span class="n">strategies</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">strategies</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">strategies</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span>
</code></pre></div></div>
<h2>Deviations</h2>
<p>For best response dynamics, the key step is: for each player, compute the payoffs that player can obtain by a unilateral deviation to each of their moves.</p>
<p>Here is the very unpleasant incantation I came up with to do this:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">deviations</span> <span class="o">=</span> <span class="n">payoffMatrix</span><span class="p">[...,</span> <span class="n">player</span><span class="p">]</span>
<span class="k">for</span> <span class="n">opponent</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">player</span><span class="p">):</span>
<span class="n">deviations</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">tensordot</span><span class="p">(</span><span class="n">strategies</span><span class="p">[</span><span class="n">opponent</span><span class="p">],</span> <span class="n">deviations</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>
<span class="k">for</span> <span class="n">opponent</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">player</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">numPlayers</span><span class="p">):</span>
<span class="n">deviations</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">tensordot</span><span class="p">(</span><span class="n">strategies</span><span class="p">[</span><span class="n">opponent</span><span class="p">],</span> <span class="n">deviations</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</code></pre></div></div>
<p>First we slice the payoff tensor so we only have the payoffs for the player in question. Then for opponents of index lower than the player, we contract their strategy against the lowest tensor dimension. After that loop, the lowest tensor dimension corresponds to the current player’s move. Then for opponents of index higher than the player, we contract their strategy but skipping the first tensor dimension. At the end we’re left with just a vector, giving the payoff of each of the player’s moves when each opponent plays their current strategy.</p>
<p>As a functional programmer, I find all of this in very bad taste.</p>
<h2>Learning</h2>
<p>The rest is the easy part. We can use softmax:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">softmax</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">temperature</span><span class="p">):</span>
<span class="n">exps</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">temperature</span><span class="p">)</span>
<span class="k">return</span> <span class="n">exps</span> <span class="o">/</span> <span class="nf">sum</span><span class="p">(</span><span class="n">exps</span><span class="p">)</span>
<span class="n">newStrategy</span> <span class="o">=</span> <span class="nf">softmax</span><span class="p">(</span><span class="n">deviations</span><span class="p">,</span> <span class="n">temperature</span><span class="p">)</span>
</code></pre></div></div>
<p>or, in the zero-temperature limit, take a Dirac distribution at the argmax using this little incantation:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">newStrategy</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">identity</span><span class="p">(</span><span class="n">numPlayers</span><span class="p">)[</span><span class="n">np</span><span class="p">.</span><span class="nf">argmax</span><span class="p">(</span><span class="n">deviations</span><span class="p">)]</span>
</code></pre></div></div>
<p>Then apply the learning rate:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">delta</span> <span class="o">=</span> <span class="n">newStrategy</span> <span class="o">-</span> <span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">]</span>
<span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">]</span> <span class="o">=</span> <span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">]</span> <span class="o">+</span> <span class="n">learningRate</span><span class="o">*</span><span class="n">delta</span>
</code></pre></div></div>
<p>All of this is looped over each player, and then over a number of learning stages, plus logging each player’s payoff and the maximum delta:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">stage</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">numStages</span><span class="p">):</span>
<span class="c1"># Compute expected payoffs
</span> <span class="n">tempPayoffs</span> <span class="o">=</span> <span class="n">payoffMatrix</span>
<span class="k">for</span> <span class="n">player</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">numPlayers</span><span class="p">):</span>
<span class="n">tempPayoffs</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">tensordot</span><span class="p">(</span><span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">],</span> <span class="n">tempPayoffs</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>
<span class="n">payoffs</span><span class="p">[</span><span class="n">stage</span><span class="p">]</span> <span class="o">=</span> <span class="n">tempPayoffs</span>
<span class="k">for</span> <span class="n">player</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">numPlayers</span><span class="p">):</span>
<span class="c1"># Compute deviation payoffs
</span> <span class="n">deviations</span> <span class="o">=</span> <span class="n">payoffMatrix</span><span class="p">[...,</span> <span class="n">player</span><span class="p">]</span>
<span class="k">for</span> <span class="n">opponent</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">player</span><span class="p">):</span>
<span class="n">deviations</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">tensordot</span><span class="p">(</span><span class="n">strategies</span><span class="p">[</span><span class="n">opponent</span><span class="p">],</span> <span class="n">deviations</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">))</span>
<span class="k">for</span> <span class="n">opponent</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">player</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">numPlayers</span><span class="p">):</span>
<span class="n">deviations</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">tensordot</span><span class="p">(</span><span class="n">strategies</span><span class="p">[</span><span class="n">opponent</span><span class="p">],</span> <span class="n">deviations</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
<span class="c1"># Update strategy
</span> <span class="n">newStrategy</span> <span class="o">=</span> <span class="nf">softmax</span><span class="p">(</span><span class="n">deviations</span><span class="p">,</span> <span class="n">temperature</span><span class="p">)</span>
<span class="n">delta</span> <span class="o">=</span> <span class="n">newStrategy</span> <span class="o">-</span> <span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">]</span>
<span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">]</span> <span class="o">=</span> <span class="n">strategies</span><span class="p">[</span><span class="n">player</span><span class="p">]</span> <span class="o">+</span> <span class="n">learningRate</span><span class="o">*</span><span class="n">delta</span>
<span class="c1"># Log errors
</span> <span class="n">errors</span><span class="p">[</span><span class="n">stage</span><span class="p">]</span> <span class="o">=</span> <span class="nf">max</span><span class="p">(</span><span class="n">errors</span><span class="p">[</span><span class="n">stage</span><span class="p">],</span> <span class="nf">max</span><span class="p">(</span><span class="n">delta</span><span class="p">))</span>
</code></pre></div></div>
<h2>Results</h2>
<p>Everything in this section is for 8 players and 8 moves, which is the largest that my laptop can handle.</p>
<p>Here is a typical plot of each player’s payoff over 100 stages of learning, with a temperature of 0.01 and a learning rate of 0.1:</p>
<p><img src="/assetsPosts/2024-05-09-exploring-best-response-dynamics/img1.png" alt="Graph" /></p>
<p>With this temperature, the learning rate can be increased all the way to 1, and the dynamics visibly converges in just a few stages:</p>
<p><img src="/assetsPosts/2024-05-09-exploring-best-response-dynamics/img2.png" alt="Graph" /></p>
<p>In fact, this is so robust that it makes me wonder whether there could be a good proof of the constructive Brouwer theorem using statistical physics methods.</p>
<p>If the temperature is decreased further to 0.001, we lose convergence:</p>
<p><img src="/assetsPosts/2024-05-09-exploring-best-response-dynamics/img3.png" alt="Graph" /></p>
<p>Although I haven’t confirmed it, my assumption is that lower temperature will converge to an $\varepsilon$-equilibrium for smaller $\varepsilon$, so we want it to be as low as possible while still converging.</p>
<p>Worst of all, if we decrease the learning rate to compensate we can get a sudden destabilisation after hundreds of stages:</p>
<p><img src="/assetsPosts/2024-05-09-exploring-best-response-dynamics/img4.png" alt="Graph" /></p>
<p>That’s all for now. I’ll come back to this one I’ve figured out how to calculate the Nash error, which is the next thing I’m interested in finding out.</p>Jules HedgesI explore the effect of players following their best response dynamics in large random normal form games.The Build Your Own Open Games Engine Bootcamp — Part I: Lenses2024-04-22T00:00:00+00:002024-04-22T00:00:00+00:00https://cybercat-institute.github.io//2024/04/22/open-games-bootcamp-i<p>Cross-posted from the <a href="https://blog.20squares.xyz/open-games-bootcamp-i/">20[ ] blog</a></p>
<p>Welcome to part I of the Build Your Own Open Games Engine Bootcamp, where we’ll be learning the inner workings of the Open Games Engine and Compositional Game Theory in general, while implementing a super-simple Haskell version of the engine along the way.</p>
<p>In this episode we will learn about <strong>Lenses</strong>, how to compose them and how they can be implemented in Haskell. But first, let’s set the context for this whole series.</p>
<h2>How to scale classical Game Theory</h2>
<p>In classical Game Theory, the definitions for (deterministic) <a href="https://en.wikipedia.org/wiki/Normal-form_game">Normal-form</a> and <a href="https://en.wikipedia.org/wiki/Extensive-form_game">Extensive-form</a> games have undoubtedly proved successful as mathematical tools for studying strategic interactions between rational agents. Despite this, the monolithic nature of these definitions becomes apparent over time, eventually leading to a complexity wall in one’s game theoretic modelling career. This limitation arises as games become more intricate, and the rigid structure of these definitions gets in the way of modelling, similar to how mantaining a large codebase written in a <a href="https://en.wikipedia.org/wiki/X86_assembly_language">x86 assembly</a> quickly becomes a superhuman feat.</p>
<p>Compositional Game Theory solves this exact problem: By turning games into composable open processes, one can build up a library of reusable components and approach the problem compositionally™, in a divide-et-impera fashion. To keep the programming language analogy going: Programming in a high-level language like Haskell or Rust is way easier than programming in straight assembly. The ability to modularize code by breaking it up into modules and functions, which are predictably<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> composable and reusable, helps tame the mental overhead of complex programs. It also saves the programmer tons of time and keystrokes that would otherwise be spent re-writing the same chunk of boilerplate code with minor modifications over and over.</p>
<p>The primary goal of this series is to introduce Compositional Game Theory and provide readers with a practical understanding of Open Games. This includes a very simple Haskell implementation of Open Games for readers to play with and test their intuitions against. By the end of this series, you will have the knowledge and tools to start modelling simple deterministic games. Additionally, you’ll be equipped to start exploring the <a href="https://github.com/CyberCat-Institute/open-game-engine">Open Game Engine</a> codebase and see how Open Games are applied in real-world modeling.</p>
<h2>What is an Open Game?</h2>
<p>In the following posts, we’re going to break down and understand the following definition:</p>
<div class="definition">
<p>An <strong>Open Game</strong> is a pair $(A,\varepsilon)$, where $A$ is a <strong>Parametrized Lens</strong> with co/parameters $P$ and $Q$ and $\varepsilon$ is a <strong>Selection Function</strong> on $P \to Q$.</p>
</div>
<p>Moreover, we will learn about how Open Games can be composed both sequentially and in parallel, and hopefully some extra cool stuff along the way.</p>
<h2>(Parametrized) Lenses</h2>
<p>The first and most important component of an Open Game is the arena, i.e. the “playing field” where all the dynamics happens and the players can interface with. The arena is a <strong>parametrized lens</strong>, a composable typed bidirectional process.</p>
<div class="definition">
<p>A <strong>Parametrized Lens</strong> from a pair of sets $\binom{X}{S}$ to a pair of sets $\binom{Y}{R}$ with <strong>Parameters</strong> $\binom{P}{Q}$ is a pair of functions $\mathsf{get}: P\times X \to Y$ and $\mathsf{put}:P\times X\times R \to S\times Q$.</p>
</div>
<p>Which can be implemented in the following manner in Haskell by making use of <a href="https://en.wikipedia.org/wiki/Currying">currying</a>:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span> <span class="kr">where</span>
<span class="c1">-- get put</span>
<span class="kt">MkLens</span> <span class="o">::</span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">x</span> <span class="o">-></span> <span class="n">y</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">x</span> <span class="o">-></span> <span class="n">r</span> <span class="o">-></span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">q</span><span class="p">))</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span>
</code></pre></div></div>
<p>Diagrammatically speaking, a parametrized lens can be represented as a box with 6 typed wires, which under the lens (pun intended) of compositional game theory are interpreted as the following:</p>
<ul>
<li>$\mathsf{x}$ is the type of <strong>game states</strong> that can be observed by the player prior to making a move.</li>
<li>$\mathsf{p}$ is the type of <strong>strategies</strong> a player can adopt.</li>
<li>$\mathsf{y}$ is the type of <strong>game states</strong> that can be observed after the player made its move.</li>
<li>$\mathsf{r}$ is the type of <strong>utilities</strong>/<strong>payoffs</strong> the player can receive after making its move.</li>
<li>$\mathsf{s}$ is the type of <strong>back-propagated utilities</strong> a player can send to players that moved before it.</li>
<li>$\mathsf{q}$ is the type of <strong>rewards</strong> representing the player’s intrinsic utility.</li>
</ul>
<div class="tikz"><script type="text/tikz">
\begin{tikzpicture}
\draw [line width=1.5pt, rounded corners] (0,0) rectangle (8,5) node[pos=0.5] {$\mathsf A$};
\draw [-stealth, line width=1.5pt] (-3,4) -- (0,4) node[pos=0.1, above] {$\mathsf x$};;
\draw [-stealth, line width=1.5pt] (0,1) -- (-3,1) node[pos=0.9, above] {$\mathsf s$};
\draw [-stealth, line width=1.5pt] (8,4) -- (11,4) node[pos=0.9, above] {$\mathsf y$};
\draw [-stealth, line width=1.5pt] (11,1) -- (8,1) node[pos=0.1, above] {$\mathsf r$};
\draw [-stealth, line width=1.5pt] (2,8) -- (2,5) node[pos=0.1, right] {$\mathsf p$};
\draw [-stealth, line width=1.5pt] (6,5) -- (6,8) node[pos=0.9, right] {$\mathsf q$};
\end{tikzpicture}
</script></div>
<p>With this in mind, we can open the box in the previous diagram and have a look at the internals of a parametrized lens<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">2</a></sup>:</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/exploded_lens.png" alt=""exploded" internals of a parametrized lens" /></p>
<p>By looking at the internals of a lens and the direction of the arrows, it becomes clear that data flows in two different directions:</p>
<ul>
<li>The <strong>forward</strong> pass, i.e. the <code class="language-plaintext highlighter-rouge">get</code> function, is happening at the time a player can observe the state before interacting with the game.</li>
<li>The <strong>backward</strong> pass, i.e. the <code class="language-plaintext highlighter-rouge">put</code> function, is happening “in the future”, after all players did their moves, and represents the stage in which payoffs are being computed and passed around.</li>
</ul>
<p>To limit mental overload, the following definition of non-parametrized lens will also come useful later:</p>
<div class="definition">
<p>A <strong>(non-parametrized) Lens</strong> is a parametrized lens with parameters $\binom{\mathbf{1}}{\mathbf{1}}$, where $\mathbf{1}$ is the singleton set.</p>
</div>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- A (non-parametrized) `Lens` is a `ParaLens` with trivial parameters</span>
<span class="kr">type</span> <span class="kt">Lens</span> <span class="o">=</span> <span class="kt">ParaLens</span> <span class="nb">()</span> <span class="nb">()</span>
<span class="c1">-- Non-parametrized Lens constructor</span>
<span class="n">nonPara</span> <span class="o">::</span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">y</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">r</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span>
<span class="n">nonPara</span> <span class="n">get</span> <span class="n">put</span> <span class="o">=</span> <span class="kt">MkLens</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="n">x</span> <span class="o">-></span> <span class="n">get</span> <span class="n">x</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="n">x</span> <span class="n">r</span> <span class="o">-></span> <span class="p">(</span><span class="n">put</span> <span class="n">x</span> <span class="n">r</span><span class="p">,</span> <span class="nb">()</span><span class="p">))</span>
</code></pre></div></div>
<p>Diagrammatically we will represent wires of type <code class="language-plaintext highlighter-rouge">()</code> (the singleton type) as no wires at all. This will also come useful to us later in order to simplify some definitions and diagrams. For example, here’s a representation of the flow of data in a non-parametrized lens, courtesy of <a href="https://www.brunogavranovic.com">Bruno Gavranović</a>:</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/lens_traces.gif" alt="Representation of the flow of data in a non-parametrized lens, courtesy of Bruno Gavranović" /></p>
<h3>Composing Lenses two ways</h3>
<p>What makes Compositional Game Theory compositional is (unsurprisingly) the fact that parametrized lenses are closed under two different kinds of composition operators, one behaving like <strong>sequential composition</strong> of pure functions and one behaving like <strong>parallel</strong> execution of programs, or more or less like a tensor product<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">3</a></sup>.</p>
<h4>Sequential Composition</h4>
<p>Let’s start with sequential composition: When the right boundary types of $\mathsf A:\binom{X}{S}\to\binom{Y}{R}$ match the left boundary types of $\mathsf B:\binom{Y}{R}\to\binom{Z}{T}$, we should be able to build another lens out of it that amounts to running what happens in $\mathsf A$ first, and then run what happens in $\mathsf B$ while taking into account the parameters of both lenses:</p>
<p>By trying to code this up in a type-directed way in Haskell, the only sensible definition that can possibly come out is the following:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">infixr</span> <span class="mi">4</span> <span class="o">>>>></span>
<span class="p">(</span><span class="o">>>>></span><span class="p">)</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p'</span> <span class="n">q'</span> <span class="n">y</span> <span class="n">r</span> <span class="n">z</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="n">x</span> <span class="n">s</span> <span class="n">z</span> <span class="n">t</span>
<span class="p">(</span><span class="kt">MkLens</span> <span class="n">get</span> <span class="n">put</span><span class="p">)</span> <span class="o">>>>></span> <span class="p">(</span><span class="kt">MkLens</span> <span class="n">get'</span> <span class="n">put'</span><span class="p">)</span> <span class="o">=</span>
<span class="kt">MkLens</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="n">x</span> <span class="o">-></span> <span class="n">get'</span> <span class="n">p'</span> <span class="p">(</span><span class="n">get</span> <span class="n">p</span> <span class="n">x</span><span class="p">))</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="n">x</span> <span class="n">t</span> <span class="o">-></span>
<span class="kr">let</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="o">=</span> <span class="n">put'</span> <span class="n">p'</span> <span class="p">(</span><span class="n">get</span> <span class="n">p</span> <span class="n">x</span><span class="p">)</span> <span class="n">t</span>
<span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span> <span class="o">=</span> <span class="n">put</span> <span class="n">p</span> <span class="n">x</span> <span class="n">r</span>
<span class="kr">in</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div></div>
<p>From the Haskell implementation we can see that composing two lenses, parametrized or not, isn’t as simple as plugging one end into another, merging the parameter wires and calling it a day<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">4</a></sup>. Something a bit more articulate is happening:</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/exploded_comp.png" alt=""exploded" lens composition" /></p>
<p>Mathematically, this amounts to the following compositions:</p>
<ul>
<li>For the <code class="language-plaintext highlighter-rouge">get</code> part: $P’\times P\times X\xrightarrow{\mathsf{id}\times\mathsf{get}}P’\times Y\xrightarrow{get’} Z$</li>
<li>
<p>For the <code class="language-plaintext highlighter-rouge">put</code> part:
\(\begin{align*}
P'\times P\times X \times T
&\xrightarrow{\mathsf{id}\times \Delta_{P}\times \Delta_{X}\times\mathsf{id}} P'\times P\times P\times X \times X \times T\\
&\xrightarrow{\mathsf{sym}\times \mathsf{get}\times \mathsf{sym}} P\times P'\times Y \times T \times X\\
&\xrightarrow{\mathsf{id}\times \mathsf{put}'\times \mathsf{id}} P\times R\times Q'\times X\\
&\xrightarrow{\mathsf{rearrange}} P\times X\times R\times Q'\\
&\xrightarrow{\mathsf{put}\times\mathsf{id}} S\times Q\times Q'
\end{align*}\)</p>
<p>Where $\Delta(x) = (x,x)$, $\mathsf{sym}(x,y)=(y,x)$ and $\mathsf{rearrange}$ is a suitable composition of $\mathsf{sym}$s.</p>
</li>
</ul>
<h4>Parallel Composition</h4>
<p>Luckily, parallel composition is way easier than the sequential one: In fact, parallel composition of $\mathsf{A}:\binom{X}{S}\to\binom{Y}{R}$ with parameters $\binom{P}{Q}$ and $\mathsf{B}:\binom{X’}{S’}\to\binom{Y’}{R’}$ with parameters $\binom{P’}{Q’}$, amounts to a lens $\mathsf{A}\times\mathsf{B}:\binom{X\times X’}{S\times S’}\to\binom{Y \times Y’}{R \times R’}$ with parameters $\binom{P\times P’}{Q \times Q’}$, such that \(\mathsf{put}_{\mathsf{A}\times\mathsf{B}}\) and \(\mathsf{get}_{\mathsf{A}\times\mathsf{B}}\) are respectively the cartesian product of the <code class="language-plaintext highlighter-rouge">put</code> and <code class="language-plaintext highlighter-rouge">get</code> functions from $\mathsf{A}$ and $\mathsf{B}$, modulo some rearrangement of inputs and outputs.</p>
<p>This is even clearer from the Haskell implementation:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">infixr</span> <span class="mi">4</span> <span class="o">####</span>
<span class="p">(</span><span class="o">####</span><span class="p">)</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="n">x</span> <span class="n">s</span> <span class="n">y</span> <span class="n">r</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p'</span> <span class="n">q'</span> <span class="n">x'</span> <span class="n">s'</span> <span class="n">y'</span> <span class="n">r'</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x'</span><span class="p">)</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">s'</span><span class="p">)</span> <span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y'</span><span class="p">)</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">r'</span><span class="p">)</span>
<span class="p">(</span><span class="kt">MkLens</span> <span class="n">get</span> <span class="n">put</span><span class="p">)</span> <span class="o">####</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="n">get'</span> <span class="n">put'</span><span class="p">)</span> <span class="o">=</span>
<span class="kt">MkLens</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x'</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">get</span> <span class="n">p</span> <span class="n">x</span><span class="p">,</span> <span class="n">get'</span> <span class="n">p'</span> <span class="n">x'</span><span class="p">))</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">p'</span><span class="p">)</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">x'</span><span class="p">)</span> <span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">r'</span><span class="p">)</span> <span class="o">-></span>
<span class="kr">let</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">q</span><span class="p">)</span> <span class="o">=</span> <span class="n">put</span> <span class="n">p</span> <span class="n">x</span> <span class="n">r</span>
<span class="p">(</span><span class="n">s'</span><span class="p">,</span> <span class="n">q'</span><span class="p">)</span> <span class="o">=</span> <span class="n">put'</span> <span class="n">p'</span> <span class="n">x'</span> <span class="n">r'</span>
<span class="kr">in</span> <span class="p">((</span><span class="n">s</span><span class="p">,</span> <span class="n">s'</span><span class="p">),</span> <span class="p">(</span><span class="n">q</span><span class="p">,</span> <span class="n">q'</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div></div>
<p>Diagrammatically, this amounts to just putting the two lenses near each other.</p>
<p><img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/parallel_comp.png" alt="parallel lens composition" /></p>
<h2>Building Concrete Lenses</h2>
<p>Now that we have laid all the groundwork, let’s have a look at a couple of concrete examples of lenses.</p>
<h3>Lenses from Functions</h3>
<p>Our first source of lenses will be functions: For each function $f: X\to S$ there is a non-parametrized lens $\mathsf{F}:\binom{X}{S}\to\binom{\mathbf{1}}{\mathbf{1}}$ such that $\mathsf{get}(*,x)=*$ and $\mathsf{put}(*,x,*)=(f(x),*)$. Vice-versa, we can always extract a unique function from non-parametrized lenses of this kind.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">funToCostate</span> <span class="o">::</span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="nb">()</span> <span class="nb">()</span>
<span class="n">funToCostate</span> <span class="n">f</span> <span class="o">=</span> <span class="n">nonPara</span> <span class="p">(</span><span class="n">const</span> <span class="nb">()</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="n">x</span> <span class="kr">_</span> <span class="o">-></span> <span class="n">f</span> <span class="n">x</span><span class="p">)</span>
<span class="n">costateToFun</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="nb">()</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">(</span><span class="n">x</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span>
<span class="n">costateToFun</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="kr">_</span> <span class="n">f</span><span class="p">)</span> <span class="n">x</span> <span class="o">=</span> <span class="n">fst</span> <span class="o">$</span> <span class="n">f</span> <span class="nb">()</span> <span class="n">x</span> <span class="nb">()</span>
</code></pre></div></div>
<p>Similarly, for each function $f: P\to Q$ there is a parametrized lens \(\bar{\mathsf{F}}:\binom{\mathbf{1}}{\mathbf{1}}\to\binom{\mathbf{1}}{\mathbf{1}}\) with parameters \(\binom{P}{Q}\), such that $\mathsf{get}(*,*)=*$ and $\mathsf{put}(p,*,*)=(f(p),*)$. Likewise, we can always extract a unique function from this kind of parametrized lenses.</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">funToParaState</span> <span class="o">::</span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">q</span><span class="p">)</span> <span class="o">-></span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span>
<span class="n">funToParaState</span> <span class="n">f</span> <span class="o">=</span> <span class="kt">MkLens</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="kr">_</span> <span class="o">-></span> <span class="nb">()</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="n">p</span> <span class="kr">_</span> <span class="kr">_</span> <span class="o">-></span> <span class="p">(</span><span class="nb">()</span><span class="p">,</span> <span class="n">f</span> <span class="n">p</span><span class="p">))</span>
<span class="n">paraStateTofun</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">p</span> <span class="n">q</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">(</span><span class="n">p</span> <span class="o">-></span> <span class="n">q</span><span class="p">)</span>
<span class="n">paraStateTofun</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="kr">_</span> <span class="n">coplay</span><span class="p">)</span> <span class="n">p</span> <span class="o">=</span> <span class="n">snd</span> <span class="o">$</span> <span class="n">coplay</span> <span class="n">p</span> <span class="nb">()</span> <span class="nb">()</span>
</code></pre></div></div>
<h3>Lenses from Scalars</h3>
<p>For each value $\bar{y}\in Y$ and for any set $R$ we can build a non-parametrized lens \(\mathcal{S}_\bar{y}:\binom{\mathbf{1}}{\mathbf{1}}\to\binom{Y}{R}\) such that \(\mathsf{put}(*,*)=\bar{y}\) and \(\mathsf{get}(*,*,r)=(*,*)\).</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scalarToState</span> <span class="o">::</span> <span class="n">y</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="nb">()</span> <span class="nb">()</span> <span class="n">y</span> <span class="n">r</span>
<span class="n">scalarToState</span> <span class="n">y</span> <span class="o">=</span> <span class="n">nonPara</span> <span class="p">(</span><span class="n">const</span> <span class="n">y</span><span class="p">)</span> <span class="n">const</span>
<span class="n">stateToScalar</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="nb">()</span> <span class="nb">()</span> <span class="n">y</span> <span class="n">r</span> <span class="o">-></span> <span class="n">y</span>
<span class="n">stateToScalar</span> <span class="p">(</span><span class="kt">MkLens</span> <span class="n">get</span> <span class="kr">_</span><span class="p">)</span> <span class="o">=</span> <span class="n">get</span> <span class="nb">()</span> <span class="nb">()</span>
</code></pre></div></div>
<h3>The Identity Lens</h3>
<p>The <strong>Identity Lens</strong> is a non-parametrized lens of type \(\binom{X}{S}\to\binom{X}{S}\) that serves as the identity morphism for parametrized lenses, i.e. pre-/post-composing a lens $\mathsf{A}$ with the identity lens gives you back $\mathsf{A}$ modulo readjusting the parameters (we will see how to do that in the next post). In Haskell:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">idLens</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="n">x</span> <span class="n">s</span> <span class="n">x</span> <span class="n">s</span>
<span class="n">idLens</span> <span class="o">=</span> <span class="n">nonPara</span> <span class="n">id</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="n">x</span> <span class="o">-></span> <span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<h3>Corners</h3>
<p>(Right) <strong>Corners</strong> are parametrized lenses of type \(\binom{\mathbf{1}}{\mathbf{1}}\to\binom{Y}{R}\) and parameters $\binom{Y}{R}$ that bend parameter wires into right wires, such that \(\mathsf{get}(y,*)=y\) and \(\mathsf{put}(y,*,r)=(r,*)\).</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">corner</span> <span class="o">::</span> <span class="kt">ParaLens</span> <span class="n">y</span> <span class="n">r</span> <span class="nb">()</span> <span class="nb">()</span> <span class="n">y</span> <span class="n">r</span>
<span class="n">corner</span> <span class="o">=</span> <span class="kt">MkLens</span> <span class="n">const</span> <span class="p">(</span><span class="nf">\</span><span class="kr">_</span> <span class="kr">_</span> <span class="n">r</span> <span class="o">-></span> <span class="p">(</span><span class="nb">()</span><span class="p">,</span> <span class="n">r</span><span class="p">))</span>
</code></pre></div></div>
<p>And diagrammatically:
<img src="/assetsPosts/2024-04-15-open-games-bootcamp-i/corner.png" alt="corner lens" /></p>
<p>As we will see in later posts, corners are an important component of bimatrix games.</p>
<h2>Final Remarks</h2>
<p>Parametrized lenses are not only useful for reasoning about Open Games, but also serve as the base of <a href="https/arxiv.o/abs/2105.06332">a whole categorical framework</a> for reasoning about complex multi-agent systems which has also been applied to <a href="https/arxiv.o/abs/2103.01931">gradient-based learning</a>, <a href="https/arxiv.o/abs/2206.04547">dynamic programming</a>, <a href="https://arxiv.org/abs/2404.02688">reinforcement learning</a>, <a href="https/arxiv.o/abs/2305.06112">bayesian inference</a> and <a href="https/arxiv.o/abs/2203.15633">servers</a> on top of various flavors of game theory (e.g.<a href="https/arxiv.o/abs/2105.06763">[2105.06763]</a>). Indeed, this categorical framework is so general and promising that we spawned an entire <a href="https://cybercat.institute">research institute</a> dedicated to it.</p>
<p>Phew! That’s all for today. I hope that this introduction to the world of parametrized lenses has left you wanting for more! I’ll see you in the next post, were we will explore how to handle spurious parameters with reparametrizations and model players and their agency with selection functions.</p>
<hr />
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Without side-effects and/or emergent behavior. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Sometimes it will be useful to represent certain lenses in their unboxed form and with product-type wires decoupled when reasoning pictorially, luckily this approach to reasoning with lenses is still completely formal. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>In mathematical lingo, one would say that parametrized lenses can be organized as the morphisms of some kind of somewhat complicated <a href="https://en.wikipedia.org/wiki/Monoidal_category"><strong>monoidal category</strong></a>-like structure called a <a href="https://ncatlab.org/nlab/show/monoidal+bicategory"><strong>symmetric monoidal bicategory</strong></a>. This is not a 1-category on-the-nose since there’s some issues with the bracketing of parameters after sequential composition that makes associativity hold only up to isomorphism. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>Actually there’s a useful generalization of the (parametrized) lens definition, called (parametrized) optics which allows this, on top of other operational advantages over the lens definition and allowing to expand the “classical” definition of Open Games to Bayesian Game Theory and more. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Daniele PalombiThe first installment of a multi-part series demistifying the underlying mechanics of the open games engine in a simple manner.Building a Neural Network from First Principles using Free Categories and Para(Optic)2024-04-15T00:00:00+00:002024-04-15T00:00:00+00:00https://cybercat-institute.github.io//2024/04/15/neural-network-first-principles<h2>Introduction</h2>
<p>Category theory for machine learning has been a big topic recently, both with <a href="https://arxiv.org/abs/2403.13001">Bruno’s thesis</a> dropping, and the <a href="https://arxiv.org/abs/2402.15332">paper on using the Para construction for deep learning</a>.</p>
<p>In this post we will look at how dependent types can allow us to almost effortlessly implement the category theory directly, opening up a path to new generalisations.</p>
<p>I will be making heavy use of Tatsuya Hirose’s <a href="https://zenn.dev/lotz/articles/14458f024674e14f4134">code that implements the Para(Optic) construction in Haskell</a>. Our goal here is to show that when we make the category theory in the code explicit, it becomes a powerful scaffolding that lets us structure our program.</p>
<p>All in all, our goal is to formulate this: A simple neural network with static types enforcing the parameters and input and output dimensions.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">import</span> <span class="nn">Data.Fin</span>
<span class="kr">import</span> <span class="nn">Data.Vect</span>
<span class="nf">model :</span> <span class="kt">GPath</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="o"><</span> <span class="p">[</span><span class="mf">4</span><span class="p">,</span> <span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">]]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span>
<span class="n">model</span> <span class="o">=</span> <span class="p">[</span><span class="o"><</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">,</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">]</span>
</code></pre></div></div>
<p>The cruicial part is the $\mathbf{Para}$ construction, which lets us accumulate parameters along the composition of edges. This lets us state the parameters of each edge separately, and then compose them into a larger whole as we go along.</p>
<h2>Graded monoids</h2>
<p>$\mathbf{Para}$ forms a graded category, and in order to understand what this is we will start with a graded monoid first.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">namespace</span> <span class="nc">Monoid
</span> <span class="kr">data</span> <span class="kt">Env</span> <span class="o">:</span> <span class="p">(</span><span class="n">par</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="kt">List </span><span class="n">par</span> <span class="o">-></span> <span class="kt">Type </span><span class="kr">where</span>
<span class="c1">-- Empty list</span>
<span class="kt">Nil</span> <span class="o">:</span> <span class="kt">Env</span> <span class="n">f</span> <span class="kt">[]</span>
<span class="c1">-- Add an element to the list, and accumulate its parameter</span>
<span class="p">(</span><span class="o">::</span><span class="p">)</span> <span class="o">:</span> <span class="p">{</span><span class="n">f</span> <span class="o">:</span> <span class="n">par</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">}</span> <span class="o">-></span> <span class="n">f</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Env</span> <span class="n">f</span> <span class="n">ns</span> <span class="o">-></span> <span class="kt">Env</span> <span class="n">f</span> <span class="p">(</span><span class="n">n</span><span class="o">::</span><span class="n">ns</span><span class="p">)</span>
<span class="c1">-- Compare this to the standard free monoid </span>
<span class="c1">-- data List : Type -> Type where </span>
<span class="c1">-- Nil : List a </span>
<span class="c1">-- (::) : a -> List a -> List a </span>
</code></pre></div></div>
<p>I used this datatype in a <a href="https://zanzix.github.io/posts/stlc-idris.html">previous blog post</a> where it is used to represent variable environments.</p>
<p>We can use it for much more, though. For instance, let’s say that we want to aggregate a series of vectors, and later perform some computation on them.</p>
<p>Our free graded monoid lets us accumulate a list of vectors, while keeping their sizes in a type-level list.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kt">Vec</span> <span class="o">:</span> <span class="kt">Nat </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">Vec</span> <span class="n">n</span> <span class="o">=</span> <span class="kt">Fin</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Double</span>
<span class="n">f1</span> <span class="o">:</span> <span class="kt">Vec</span> <span class="mf">1</span>
<span class="n">f2</span> <span class="o">:</span> <span class="kt">Vec</span> <span class="mf">2</span>
<span class="n">f3</span> <span class="o">:</span> <span class="kt">Vec</span> <span class="mf">3</span>
<span class="n">fins</span> <span class="o">:</span> <span class="kt">Env</span> <span class="kt">Vec</span> <span class="p">[</span><span class="mf">1</span><span class="p">,</span> <span class="mf">2</span><span class="p">,</span> <span class="mf">3</span><span class="p">]</span>
<span class="n">fins</span> <span class="o">=</span> <span class="p">[</span><span class="n">f1</span><span class="p">,</span> <span class="n">f2</span><span class="p">,</span> <span class="n">f3</span><span class="p">]</span>
</code></pre></div></div>
<p>As we will soon see, $\mathbf{Para}$ works the same way, but instead of forming a graded monoid, it forms a graded category.</p>
<h2>Free categories</h2>
<p>Before we look at free graded categories, let’s first look at how to work with a plain free category. I’ve used them in another <a href="https://zanzix.github.io/posts/bcc.html">previous blog post</a>.
A nice trick that I’ve learned from André Videla is that we can use Idris notation for lists with free categories too, we just need to name the constructors appropriately.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">Graph :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">Graph</span> <span class="n">obj</span> <span class="o">=</span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Type </span>
<span class="c1">-- The category of types and functions</span>
<span class="nf">Set :</span> <span class="kt">Graph</span> <span class="kt">Type
Set</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=</span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span>
<span class="kr">namespace</span> <span class="kt">Cat</span>
<span class="kr">data</span> <span class="kt">Path</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Graph</span> <span class="n">obj</span> <span class="kr">where</span>
<span class="c1">-- Empty path</span>
<span class="kt">Nil</span> <span class="o">:</span> <span class="kt">Path</span> <span class="n">g</span> <span class="n">a</span> <span class="n">a</span>
<span class="c1">-- Add an edge to the path </span>
<span class="p">(</span><span class="o">::</span><span class="p">)</span> <span class="o">:</span> <span class="n">g</span> <span class="n">a</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">Path</span> <span class="n">g</span> <span class="n">b</span> <span class="n">c</span> <span class="o">-></span> <span class="kt">Path</span> <span class="n">g</span> <span class="n">a</span> <span class="n">c</span>
</code></pre></div></div>
<p>While vectors form graded monoids, matrices form categories.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="kt">Matrix</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="kt">Nat </span>
<span class="kt">Matrix</span> <span class="n">n</span> <span class="n">m</span> <span class="o">=</span> <span class="kt">Fin</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Fin</span> <span class="n">m</span> <span class="o">-></span> <span class="kt">Double</span>
<span class="n">mat1</span> <span class="o">:</span> <span class="kt">Matrix</span> <span class="mf">2</span> <span class="mf">3</span>
<span class="n">mat2</span> <span class="o">:</span> <span class="kt">Matrix</span> <span class="mf">3</span> <span class="mf">1</span>
<span class="n">matrixPath</span> <span class="o">:</span> <span class="kt">Path</span> <span class="kt">Matrix</span> <span class="mf">2</span> <span class="mf">1</span>
<span class="n">matrixPath</span> <span class="o">=</span> <span class="p">[</span><span class="n">mat1</span><span class="p">,</span> <span class="n">mat2</span><span class="p">]</span>
<span class="c1">-- matrixPath = mat1 :: mat2 :: Nil</span>
</code></pre></div></div>
<p>Just as we did at the start of the blog post, we are using the inbuilt syntactic sugar to represent a list of edges. We will now generalise from free paths to their parameterised variant!</p>
<h2>Free graded categories</h2>
<p>A free graded category looks not unlike a free category, except now we are accumulating an additional parameter, just as we did with graded monoids:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParGraph :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">ParGraph</span> <span class="n">par</span> <span class="n">obj</span> <span class="o">=</span> <span class="n">par</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Type </span>
<span class="c1">-- A free graded path over a parameterised graph</span>
<span class="kr">data</span> <span class="kt">GPath</span> <span class="o">:</span> <span class="kt">ParGraph</span> <span class="n">par</span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">ParGraph</span> <span class="p">(</span><span class="kt">List </span><span class="n">par</span><span class="p">)</span> <span class="n">obj</span> <span class="kr">where</span>
<span class="c1">-- Empty path, with an empty list of grades</span>
<span class="kt">Nil</span> <span class="o">:</span> <span class="kt">GPath</span> <span class="n">g</span> <span class="kt">[]</span> <span class="n">a</span> <span class="n">a</span>
<span class="c1">-- Add an edge to the path, and accumulate its parameter</span>
<span class="p">(</span><span class="o">::</span><span class="p">)</span> <span class="o">:</span> <span class="p">{</span><span class="n">g</span> <span class="o">:</span> <span class="n">par</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">}</span> <span class="o">-></span> <span class="p">{</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">:</span> <span class="n">obj</span><span class="p">}</span>
<span class="o">-></span> <span class="n">g</span> <span class="n">p</span> <span class="n">a</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">GPath</span> <span class="n">g</span> <span class="n">ps</span> <span class="n">b</span> <span class="n">c</span> <span class="o">-></span> <span class="kt">GPath</span> <span class="n">g</span> <span class="p">(</span><span class="n">p</span> <span class="o">::</span> <span class="n">ps</span><span class="p">)</span> <span class="n">a</span> <span class="n">c</span>
</code></pre></div></div>
<p>So a graded path will take in a parameterised graph, and give back a path of edges with an accumulated parameter.
Where could we find such parameterised graphs? This is where the Para construction comes in.
Para takes a category $\mathcal C$, an action of a monoidal category $\mathcal M \times \mathcal C \to \mathcal C$, and gives us a parameterised category over $\mathcal C$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Para over a monoidal category C </span>
<span class="nf">Para :</span> <span class="p">(</span><span class="n">c</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">act</span> <span class="o">:</span> <span class="n">par</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="kt">ParGraph</span> <span class="n">par</span> <span class="n">obj</span>
<span class="kt">Para</span> <span class="n">c</span> <span class="n">act</span> <span class="n">p</span> <span class="n">x</span> <span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="n">p</span> <span class="p">`</span><span class="n">act</span><span class="p">`</span> <span class="n">x</span><span class="p">)</span> <span class="p">`</span><span class="n">c</span><span class="p">`</span> <span class="n">y</span>
</code></pre></div></div>
<p>In other words, we have morphisms and an accumulating parameter.
A simple example is the graded co-reader comonad, also known as the pair comonad.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaSet :</span> <span class="kt">ParGraph</span> <span class="kt">Type Type </span>
<span class="kt">ParaSet</span> <span class="n">p</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=</span> <span class="kt">Para</span> <span class="kt">Set</span> <span class="kt">Pair</span> <span class="n">p</span> <span class="n">a</span> <span class="n">b</span>
<span class="c1">-- A function Nat -> Double, parameterised by Nat</span>
<span class="nf">pair1 :</span> <span class="kt">ParaSet</span> <span class="kt">Nat Nat Double</span>
<span class="c1">-- A function Double -> Int, parameterised by String</span>
<span class="nf">pair2 :</span> <span class="kt">ParaSet</span> <span class="kt">String</span> <span class="kt">Double</span> <span class="kt">Int</span>
<span class="c1">-- A function Nat -> Int, parameterised by [Nat, String]</span>
<span class="nf">ex :</span> <span class="kt">GPath</span> <span class="kt">ParaSet</span> <span class="p">[</span><span class="kt">Nat</span><span class="p">,</span> <span class="kt">String</span><span class="p">]</span> <span class="kt">Nat Int</span>
<span class="n">ex</span> <span class="o">=</span> <span class="p">[</span><span class="n">pair1</span><span class="p">,</span> <span class="n">pair2</span><span class="p">]</span>
</code></pre></div></div>
<p>It works a lot like the standard co-reader comonad, but it now accumulates parameters as we compose functions.</p>
<h2>The category of lenses</h2>
<p>Functional programers tend to be familiar with lenses. They are often presented as coalgebras of the costate comonad, and their links to automatic differentiation <a href="https://www.philipzucker.com/reverse-mode-differentiation-is-kind-of-like-a-lens-ii/">are now well known</a>.</p>
<p>Monomorphic lenses correspond to the plain costate comonad, and polymorphic lenses correspond to the indexed version.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Monomorphic Lens</span>
<span class="nf">MLens :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">MLens</span> <span class="n">s</span> <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span> <span class="o">-></span> <span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="o">-></span> <span class="n">s</span><span class="p">)</span>
<span class="c1">-- Polymorphic Lens, Haskell-style</span>
<span class="nf">Lens :</span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type </span><span class="o">-></span> <span class="kt">Type
Lens</span> <span class="n">s</span> <span class="n">t</span> <span class="n">a</span> <span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span> <span class="o">-></span> <span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">-></span> <span class="n">t</span><span class="p">)</span>
</code></pre></div></div>
<p>Idris allows us to bundle up the arguments for a polymorphic lens into a pair, sometimes called a boundary. This will help us form the category of parametric lenses more cleanly, as well as cut down on the number of types that we need to wrangle.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Boundary : Type
Boundary = (Type, Type)
-- Polymorphic lenses are morphisms of boundaries
Lens : Boundary -> Boundary -> Type
Lens (s, t) (a, b) = (s -> a, (s, b) -> t)
</code></pre></div></div>
<p>Both monomorphic and polymorphic lenses form categories. But before we look at them, let’s generalise our notion of lens away from $\mathbf{Set}$ and towards arbitrary (cartesian) monoidal categories.</p>
<p>In other words, given a cartesian monoidal category $\mathcal C$, we want to form the category $\mathbf{Lens} (\mathcal C)$ of lenses over $\mathcal C$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- take a category C, and a cartesian monoidal product, to give back the category Lens(C) </span>
<span class="nf">LensC :</span> <span class="p">(</span><span class="n">c</span> <span class="o">:</span> <span class="kt">Graph</span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">ten</span><span class="o">:</span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span> <span class="o">-></span> <span class="n">obj</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Graph</span> <span class="n">obj</span>
<span class="kt">LensC</span> <span class="n">c</span> <span class="n">ten</span> <span class="n">s</span> <span class="n">a</span> <span class="o">=</span> <span class="p">(</span><span class="n">s</span> <span class="p">`</span><span class="n">c</span><span class="p">`</span> <span class="n">a</span><span class="p">,</span> <span class="p">(</span><span class="n">s</span> <span class="p">`</span><span class="n">ten</span><span class="p">`</span> <span class="n">a</span><span class="p">)</span> <span class="p">`</span><span class="n">c</span><span class="p">`</span> <span class="n">s</span><span class="p">)</span>
</code></pre></div></div>
<p>We then take $\mathbf{Para}$ of this construction, giving us the category $\mathbf{Para} (\mathbf{Lens} (\mathcal C))$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaLensSet :</span> <span class="kt">ParGraph</span> <span class="kt">Type Type </span>
<span class="kt">ParaLensSet</span> <span class="n">p</span> <span class="n">s</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">Para</span> <span class="p">(</span><span class="kt">LensC</span> <span class="kt">Set</span> <span class="kt">Pair</span><span class="p">)</span> <span class="kt">Pair</span> <span class="n">p</span> <span class="n">s</span> <span class="n">t</span>
</code></pre></div></div>
<p>We now have all the theoretical pieces together. At this point, we could simply implement $\mathbf{Para} (\mathbf{Lens} (\mathbf{Set}))$, which would give us the morphisms of our neural network. However, there is one more trick up our sleeve - rather than working in the category of sets, we would like to work in the category of vector spaces.</p>
<p>This means that we will parameterise the above construction to work over some monoidal functor $\mathcal C \to \mathbf{Set}$.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaLensF :</span> <span class="p">(</span><span class="n">f</span> <span class="o">:</span> <span class="n">k</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="kt">ParGraph</span> <span class="n">k</span> <span class="n">k</span>
<span class="kt">ParaLensF</span> <span class="n">f</span> <span class="n">p</span> <span class="n">m</span> <span class="n">n</span> <span class="o">=</span> <span class="kt">ParaLensSet</span> <span class="p">(</span><span class="n">f</span> <span class="n">p</span><span class="p">)</span> <span class="p">(</span><span class="n">f</span> <span class="n">m</span><span class="p">)</span> <span class="p">(</span><span class="n">f</span> <span class="n">n</span><span class="p">)</span>
</code></pre></div></div>
<p>And now, let us proceed to do machine learning.</p>
<h2>Tensor algebra from first principles</h2>
<p>First we will introduce the type of tensors of arbitrary rank. Our first instinct would be to do this with a function</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">Tensor' :</span> <span class="kt">List Nat </span><span class="o">-></span> <span class="kt">Type </span>
<span class="kt">Tensor'</span> <span class="kt">[]</span> <span class="o">=</span> <span class="kt">Double</span>
<span class="kt">Tensor'</span> <span class="p">(</span><span class="n">n</span> <span class="o">::</span> <span class="n">ns</span><span class="p">)</span> <span class="o">=</span> <span class="kt">Fin</span> <span class="n">n</span> <span class="o">-></span> <span class="kt">Tensor'</span> <span class="n">ns</span>
</code></pre></div></div>
<p>But unfortunately this will mess up with type inference down the line. Dependent types tend to struggle when it comes to inferring types whose codomain contains arbitrary computation. This is what Conor McBride calls “green slime”, and is one of the major pitfalls that functional programmers encounter when they try to make the jump to dependent types.</p>
<p>For this reason, we will represent our rank-n tensors using a datatype, which will allow Idris to infer the types much more easily. Luckily, tensors are easily represented using an alternative datatype that’s popular in Haskell.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Tensor</span> <span class="o">:</span> <span class="kt">List Nat </span><span class="o">-></span> <span class="kt">Type </span><span class="kr">where</span>
<span class="kt">Scalar</span> <span class="o">:</span> <span class="kt">Double</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="kt">Nil</span>
<span class="kt">Dim</span> <span class="o">:</span> <span class="kt">Vect</span> <span class="n">n</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="n">ns</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">(</span><span class="n">n</span> <span class="o">::</span> <span class="n">ns</span><span class="p">)</span>
</code></pre></div></div>
<p>This is essentially a nesting of vectors, which accumulates their sizes.</p>
<p>All together, our datatype of parameterised lenses over tensors becomes</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">ParaLensTensor :</span> <span class="kt">ParGraph</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span>
<span class="kt">ParaLensTensor</span> <span class="n">pars</span> <span class="n">ms</span> <span class="n">ns</span> <span class="o">=</span> <span class="kt">ParaLensF</span> <span class="kt">Tensor</span> <span class="n">pars</span> <span class="n">ms</span> <span class="n">ns</span>
</code></pre></div></div>
<p>We can now start writing neural networks. I’ll be mostly adapting <a href="https://zenn.dev/lotz/articles/14458f024674e14f4134">Tatsuya’s code</a> in the following section. The full code for our project can be found <a href="https://github.com/zanzix/idris-neural-net">here</a>, and I’ll only include the most interesting bits.</p>
<p>Unlike the original code, we will be using a heterogeneous list - rather than nested tuples - to keep track of all of our parameters, which is why the resulting dimensions will be much easier to track.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">linear :</span> <span class="p">{</span><span class="n">n</span><span class="p">,</span> <span class="n">m</span> <span class="o">:</span> <span class="kt">Nat</span><span class="p">}</span> <span class="o">-></span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span>
<span class="n">linear</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span>
<span class="n">getter</span> <span class="p">(</span><span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="o">=</span> <span class="n">joinM</span> <span class="n">w</span> <span class="n">x</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="n">w</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="n">outer</span> <span class="n">y</span> <span class="n">x</span><span class="p">,</span> <span class="n">joinM</span> <span class="p">(</span><span class="n">dist</span> <span class="n">w</span><span class="p">)</span> <span class="n">y</span><span class="p">)</span>
<span class="nf">bias :</span> <span class="p">{</span><span class="n">n</span> <span class="o">:</span> <span class="kt">Nat</span><span class="p">}</span> <span class="o">-></span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">bias</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">getter</span> <span class="p">(</span><span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="o">=</span> <span class="n">pointwise</span> <span class="p">(</span><span class="o">+</span><span class="p">)</span> <span class="n">x</span> <span class="n">b</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="n">b</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="nf">relu :</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">relu</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="n">getter</span> <span class="p">(</span><span class="kr">_</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="o">=</span> <span class="n">dvmap</span> <span class="p">(</span><span class="nb">max </span><span class="mf">0.0</span><span class="p">)</span> <span class="n">x</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">,</span> <span class="n">pointwise</span> <span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="n">y</span> <span class="p">(</span><span class="n">dvmap</span> <span class="n">step</span> <span class="n">x</span><span class="p">))</span> <span class="kr">where</span>
<span class="n">step</span> <span class="o">:</span> <span class="kt">Double</span> <span class="o">-></span> <span class="kt">Double</span>
<span class="n">step</span> <span class="n">x</span> <span class="o">=</span> <span class="kr">if</span> <span class="n">x</span> <span class="o">></span> <span class="mf">0</span> <span class="kr">then</span> <span class="mf">1</span> <span class="kr">else</span> <span class="mf">0</span>
<span class="nf">learningRate :</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="kt">[]</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span>
<span class="n">learningRate</span> <span class="o">=</span> <span class="p">(</span><span class="nb">const </span><span class="p">(</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">),</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="kt">[]</span><span class="p">),</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">])</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="kt">[]</span><span class="p">)</span>
<span class="n">setter</span> <span class="p">((</span><span class="kr">_</span><span class="p">,</span> <span class="p">(</span><span class="kt">Scalar</span> <span class="n">loss</span><span class="p">)),</span> <span class="kr">_</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="kt">Dim</span> <span class="kt">[]</span><span class="p">,</span> <span class="kt">Scalar</span> <span class="p">(</span><span class="o">-</span><span class="mf">0.2</span> <span class="o">*</span> <span class="n">loss</span><span class="p">))</span>
<span class="nf">crossEntropyLoss :</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="kt">[]</span>
<span class="n">crossEntropyLoss</span> <span class="o">=</span> <span class="p">(</span><span class="n">getter</span><span class="p">,</span> <span class="n">setter</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">getter</span> <span class="o">:</span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span> <span class="o">-></span> <span class="kt">Tensor</span> <span class="kt">[]</span>
<span class="n">getter</span> <span class="p">(</span><span class="n">y'</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span> <span class="o">=</span>
<span class="kr">let</span> <span class="kt">Scalar</span> <span class="n">dot'</span> <span class="o">=</span> <span class="n">dot</span> <span class="n">y'</span> <span class="n">y</span> <span class="kr">in</span>
<span class="kt">Scalar</span> <span class="p">(</span><span class="nb">log </span><span class="p">(</span><span class="n">sumElem</span> <span class="p">(</span><span class="n">dvmap</span> <span class="nb">exp </span><span class="n">y</span><span class="p">))</span> <span class="o">-</span> <span class="n">dot'</span><span class="p">)</span>
<span class="n">setter</span> <span class="o">:</span> <span class="p">((</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">]),</span> <span class="kt">Tensor</span> <span class="kt">[]</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">],</span> <span class="kt">Tensor</span> <span class="p">[</span><span class="n">n</span><span class="p">])</span>
<span class="n">setter</span> <span class="p">((</span><span class="n">y'</span><span class="p">,</span> <span class="n">y</span><span class="p">),</span> <span class="p">(</span><span class="kt">Scalar</span> <span class="n">z</span><span class="p">))</span> <span class="o">=</span> <span class="kr">let</span>
<span class="n">expY</span> <span class="o">=</span> <span class="n">dvmap</span> <span class="nb">exp </span><span class="n">y</span>
<span class="n">sumExpY</span> <span class="o">=</span> <span class="n">sumElem</span> <span class="n">expY</span> <span class="kr">in</span>
<span class="p">(</span><span class="n">dvmap</span> <span class="p">(</span><span class="o">*</span> <span class="p">(</span><span class="o">-</span><span class="n">z</span><span class="p">))</span> <span class="n">y</span><span class="p">,</span>
<span class="n">dvmap</span> <span class="p">(</span><span class="o">*</span> <span class="n">z</span><span class="p">)</span> <span class="p">(</span>
<span class="p">((</span><span class="n">pointwise</span> <span class="p">(</span><span class="o">-</span><span class="p">)</span> <span class="p">(</span><span class="n">dvmap</span> <span class="p">(</span><span class="o">/</span><span class="n">sumExpY</span><span class="p">)</span> <span class="n">expY</span><span class="p">)</span> <span class="n">y'</span><span class="p">))))</span>
<span class="c1">-- Our final model: parameters source target</span>
<span class="nf">model :</span> <span class="kt">GPath</span> <span class="kt">ParaLensTensor</span> <span class="p">[</span><span class="o"><</span> <span class="p">[</span><span class="mf">4</span><span class="p">,</span> <span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">]]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span>
<span class="n">model</span> <span class="o">=</span> <span class="p">[</span><span class="o"><</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">,</span> <span class="n">linear</span><span class="p">,</span> <span class="n">bias</span><span class="p">,</span> <span class="n">relu</span><span class="p">]</span>
</code></pre></div></div>
<p>All that remains is to implement an algebra for this structure. Normally we would use the generic recursion schemes machinery to do this, but for now we will implement a one-off fold specialized to graded paths.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Evaluate the free graded category over ParaLensTensor</span>
<span class="nf">eval :</span> <span class="kt">GPath</span> <span class="kt">ParaLensTensor</span> <span class="n">ps</span> <span class="n">s</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">ParaLensTensorEnvS</span> <span class="n">ps</span> <span class="n">s</span> <span class="n">t</span>
<span class="n">eval</span> <span class="p">[</span><span class="o"><</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="kr">_</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="n">s</span><span class="p">,</span> <span class="nf">\</span><span class="p">((</span><span class="n">l</span><span class="p">,</span> <span class="n">s'</span><span class="p">),</span> <span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="p">([</span><span class="o"><</span><span class="p">],</span> <span class="n">s</span><span class="p">))</span>
<span class="n">eval</span> <span class="p">(</span><span class="n">es</span> <span class="o">:<</span> <span class="p">(</span><span class="n">fw</span><span class="p">,</span> <span class="n">bw</span><span class="p">))</span> <span class="o">=</span> <span class="kr">let</span> <span class="p">(</span><span class="n">fw'</span><span class="p">,</span> <span class="n">bw'</span><span class="p">)</span> <span class="o">=</span> <span class="n">eval</span> <span class="n">es</span> <span class="kr">in</span>
<span class="p">(</span><span class="nf">\</span><span class="p">((</span><span class="n">ps</span> <span class="o">:<</span> <span class="n">p</span><span class="p">),</span> <span class="n">s</span><span class="p">)</span> <span class="o">=></span> <span class="kr">let</span> <span class="n">b</span> <span class="o">=</span> <span class="n">fw'</span> <span class="p">(</span><span class="n">ps</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span> <span class="kr">in</span> <span class="n">fw</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span>
<span class="p">(</span><span class="nf">\</span><span class="p">(((</span><span class="n">ps</span> <span class="o">:<</span> <span class="n">p</span><span class="p">),</span> <span class="n">s</span><span class="p">),</span> <span class="n">dt</span><span class="p">)</span> <span class="o">=></span> <span class="kr">let</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">fw'</span> <span class="p">(</span><span class="n">ps</span><span class="p">,</span> <span class="n">s</span><span class="p">)</span>
<span class="p">(</span><span class="n">p'</span><span class="p">,</span> <span class="n">b'</span><span class="p">)</span> <span class="o">=</span> <span class="n">bw</span> <span class="p">((</span><span class="n">p</span><span class="p">,</span> <span class="n">b</span><span class="p">),</span> <span class="n">dt</span><span class="p">)</span>
<span class="p">(</span><span class="n">ps'</span><span class="p">,</span> <span class="n">s'</span><span class="p">)</span> <span class="o">=</span> <span class="n">bw'</span> <span class="p">((</span><span class="n">ps</span><span class="p">,</span> <span class="n">s</span><span class="p">),</span> <span class="n">b'</span><span class="p">)</span>
<span class="kr">in</span> <span class="p">(</span><span class="n">ps'</span> <span class="o">:<</span> <span class="n">p'</span><span class="p">,</span> <span class="n">s'</span><span class="p">)))</span>
</code></pre></div></div>
<p>It would actually be possible to write an individual algebra for $\mathbf{Lens} (\mathcal C)$ and $\mathbf{Para} (\mathcal C)$ and then compose them into an algebra $\mathbf{Para} (\mathbf{Lens} (\mathcal C))$, but we can leave that for a future blog post.</p>
<h2>Defunctionalizing and working with the FFI</h2>
<p>Running a neural network in Idris compared to NumPy is going to be obviously slow. However, since we’re working entirely with free categories, it means that we don’t have to actually evaluate our functions in Idris!</p>
<p>What we can do is organise all of our functions into a signature, where each constructor corresponds to a primitive function in the target language. We could then use the FFI to interpret them, allowing us to get both the static guarantees of Idris and the performance of NumPy.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">TensorSig</span> <span class="o">:</span> <span class="kt">ParGraph</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span> <span class="p">(</span><span class="kt">List Nat</span><span class="p">)</span> <span class="kr">where</span>
<span class="kt">Linear</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span>
<span class="kt">Bias</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="kt">Relu</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="n">m</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="kt">LearningRate</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="mf">1</span><span class="p">]</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span>
<span class="kt">CrossEntropyLoss</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="kt">[]</span>
<span class="kt">SoftMax</span> <span class="o">:</span> <span class="kt">TensorSig</span> <span class="p">[</span><span class="mf">0</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span> <span class="p">[</span><span class="n">n</span><span class="p">]</span>
<span class="nf">model' :</span> <span class="kt">GPath</span> <span class="kt">PTensorSig</span> <span class="p">[</span><span class="o"><</span> <span class="p">[</span><span class="mf">4</span><span class="p">,</span> <span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">,</span> <span class="mf">4</span><span class="p">],</span> <span class="p">[</span><span class="mf">2</span><span class="p">],</span> <span class="p">[</span><span class="mf">0</span><span class="p">]]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span> <span class="p">[</span><span class="mf">2</span><span class="p">]</span>
<span class="n">model'</span> <span class="o">=</span> <span class="p">[</span><span class="o"><</span> <span class="kt">Linear</span><span class="p">,</span> <span class="kt">Bias</span><span class="p">,</span> <span class="kt">Relu</span><span class="p">,</span> <span class="kt">Linear</span><span class="p">,</span> <span class="kt">Bias</span><span class="p">,</span> <span class="kt">Relu</span><span class="p">]</span>
</code></pre></div></div>
<p>We’ve also only sketched out the tensor operations, but we could take this a step forward and develop a proper tensor library in Idris.</p>
<p>In a future post, we will see how to enhance the above with auto-diff: meaning that the user needs to only supply the getter, and the setter will be derived automatically.</p>Zanzi MihejevsIn this post we will look at how dependent types can allow us to effortlessly implement the category theory of machine learning directly, opening up a path to new generalisations.Enriched Closed Lenses2024-04-12T00:00:00+00:002024-04-12T00:00:00+00:00https://cybercat-institute.github.io//2024/04/12/enriched-closed-lenses<p>I’m going to record something that I think is known to everyone doing research on categorical cybernetics, but I don’t think has been written down somewhere: an even more general version of mixed optics that replaces the backwards actegory with an enrichment. With it, I’ll make sense of a curious definition appearing in <a href="https://homepages.inf.ed.ac.uk/gdp/publications/compiler-forest.pdf">The Compiler Forest</a>.</p>
<h1>Actegories and enrichments</h1>
<p>An <strong>actegory</strong> consists of a monoidal category $\mathcal M$, a category $\mathcal C$ and a functor $\bullet : \mathcal M \times \mathcal C \to \mathcal C$ that behaves like an “external product”: namely that it’s equipped with coherent isomorphisms $I \bullet X \cong X$ and $(M \otimes N) \bullet X \cong M \bullet (N \bullet X)$.</p>
<p>An <strong>enriched category</strong> consists of a category $\mathcal C$, a monoidal category $\mathcal M$ and a functor $[-, -] : \mathcal C^\mathrm{op} \times \mathcal C \to \mathcal M$ that behaves like an “external hom” (I’m not going to write down what this means because it’s more complicated).</p>
<p>There’s a very close relationship between actegories and enrichments, to the point that I consider them different perspectives on the same idea. This is the <em>final form</em> of the famous tensor-hom adjunction, aka. currying. (I learned this incredible fact from Matteo Capucci, and I have no idea where it’s written down, although it’s definitely written down somewhere.)</p>
<p>A <strong>tensored enrichment</strong> is one where every $[Z, -] : \mathcal C \to \mathcal M$ has a left adjoint $- \bullet X : \mathcal M \to \mathcal C$. Allowing $Z$ to vary results in a functor $\bullet$ which (nontrivial theorem) is always an actegory.</p>
<p>A <strong>closed actegory</strong> is one where every $- \bullet Z : \mathcal M \to \mathcal C$ has a right adjoint $[Z, -] : \mathcal C \to \mathcal M$. Allowing $Z$ to vary results in a functor $[-, -]$ which (nontrivial theorem) is always an enrichment.</p>
<p>So, closed actegories and tensored enrichments are equivalent ways of defining the same thing, namely a monoidal category $\mathcal M$ and category $\mathcal C$ equipped with $\bullet$ and $[-, -]$ related by a tensor-hom adjunction $\mathcal C (X \bullet Z, Y) \cong \mathcal M (Z, [X, Y])$.</p>
<h1>Parametrisation</h1>
<p>Given an actegory, we can define a bicategory \(\mathbf{Para}_\mathcal M (\mathcal C)\), whose objects are objects of $\mathcal C$ and 1-cells are pairs of $M : \mathcal M$ and $f : \mathcal C (M \bullet X, Y)$. We can also define a bicategory \(\mathbf{Copara}_\mathcal M (\mathcal C)\), whose objects are objects of $\mathcal C$ and 1-cells are pairs of $M : \mathcal M$ and $f : \mathcal C (X, M \bullet Y)$.</p>
<p>Given an enriched category, we can define a bicategory \(\mathbf{Para}_\mathcal M (\mathcal C)\), whose objects are objects of $\mathcal C$ and morphisms are pairs of $M : \mathcal M$ and $f : \mathcal M (M, [X, Y])$. If this is a tensored enrichment then the two definitions of \(\mathbf{Para}_\mathcal M (\mathcal C)\) are equivalent.</p>
<p>In all of these cases we are locally fibred over $\mathcal M$, and I will write \(\mathbf{Para}_\mathcal M (\mathcal C) (X, Y) (M)\), \(\mathbf{Copara}_\mathcal M (\mathcal C) (X, Y) (M)\) for the set of co/parametrised morphisms with a fixed parameter type.</p>
<p>It’s not possible to define $\mathbf{Copara}_\mathcal M (\mathcal C)$ for an enrichment. There’s a very slick common generalisation of actegories and enrichments called a <a href="https://ncatlab.org/nlab/show/locally+graded+category">locally graded category</a>, which is a category enriched in presheaves with Day convolution. There’s also a very slick definition of $\mathbf{Para}$ for a locally graded category. I’d like to know, for exactly which locally graded categories is possible to define $\mathbf{Copara}$?</p>
<h1>Mixed optics</h1>
<p>If we have two actegories $\mathcal C, \mathcal D$ that share the same acting category $\mathcal M$ then we can define <strong>mixed optics</strong>, which first appeared in <a href="https://compositionality-journal.org/papers/compositionality-6-1/">Profunctor Optics: A Categorical Update</a>. This is a 1-category \(\mathbf{Optic}_\mathcal M (\mathcal C, \mathcal D)\) whose objects are pairs $\binom{X}{X’}$ of an object of $\mathcal C$ and an object of $\mathcal D$, and a morphism $\binom{X}{X’} \to \binom{Y}{Y’}$ is an element of the coend</p>
\[\int^{M : \mathcal M} \mathbf{Copara}_\mathcal M (\mathcal C) (X, Y) (M) \times \mathbf{Para}_\mathcal M (\mathcal D) (Y', X') (M)\]
<p>There’s a slightly more general definition called “weighted optics” that appears in <a href="https://arxiv.org/abs/2403.13001">Bruno’s thesis</a> and was used very productively there, which replaces $\mathcal M$ with two monoidal categories related by a Tambara module. I think that it’s an orthogonal generalisation to the one I’m about to do here.</p>
<h1>Enriched closed lenses</h1>
<p>Putting together everything I’ve just said, the next step is clear. If we have categories $\mathcal C, \mathcal D$ and a monoidal category $\mathcal M$, with $\mathcal M$ acting on $\mathcal C$ and $\mathcal D$ enriched in $\mathcal M$, then we can still define \(\mathbf{Optic}_\mathcal M (\mathcal C, \mathcal D)\) in exactly the same way, replacing \(\mathbf{Para}_\mathcal M (\mathcal D)\) with its enriched version. But now, unlike before, we can use the ninja Yoneda lemma to eliminate the coend and get</p>
\[\mathbf{Optic}_\mathcal M (\mathcal C, \mathcal D) \left( \binom{X}{X'}, \binom{Y}{Y'} \right) \cong \mathcal C (X, [Y', X'] \bullet Y)\]
<p>In general I refer to optics that can be defined without type quantification as <em>lenses</em>, and so this is an <strong>enriched closed lens</strong>. It’s the <em>final form</em> of “linear lenses”, the version of lenses that is defined like <code class="language-plaintext highlighter-rouge">Lens s t a b = s -> (a, b -> t)</code>.</p>
<h1>Into the compiler forest</h1>
<p>Section 5 of <a href="https://homepages.inf.ed.ac.uk/gdp/publications/compiler-forest.pdf">The Compiler Forest</a> by Budiu, Galenson and Plotkin has a <em>very</em> interesting definition in it. They have a cartesian closed category $\mathcal C$ (whose internal hom I’ll write as $\to$) and a strong monad $T$ on it, and they define a category whose objects are pairs of objects of $\mathcal C$ and whose morphisms $f : \binom{X}{X’} \to \binom{Y}{Y’}$ are morphisms $f : X \to T (Y \times (Y’ \to T X’))$ of $\mathcal C$.</p>
<p>They also nail an intuition for lenses that I use constantly and I haven’t seen written down anywhere else: problems go forwards, solutions go backwards.</p>
<p>Me and this definition have quite a history. It came to my attention while polishing <a href="https://compositionality-journal.org/papers/compositionality-5-9/">Bayesian Open Games</a> for submission. For a while, I thought that it was equivalent to optics in the kleisli category of $T$, and we’d wasted a years of our lives trying to understand optics (this being around 2018, when optics were still a niche idea). Then, for a while I thought that the paper made a mistake and these things don’t compose associatively. Now I’ve made peace: I think their definition is <em>conceptually</em> subtly wrong in a way that makes no difference in practice, and I can say very precisely how it relates to kleisli optics.</p>
<p>There is an action of $\mathcal C$ on $\mathrm{Kl} (T)$ given by $M \bullet X = M \otimes X$, where $\otimes$ is the tensor product of $\mathrm{Kl} (T)$ which on objects is given by the product $\times$ of $\mathcal C$. That’s the actegory generated by the strong monoidal embedding $\mathcal C \hookrightarrow \mathrm{Kl} (T)$. There is also an enrichment of $\mathrm{Kl} (T)$ in $\mathcal C$, given by $[X, Y] = X \to T Y$. This action and enrichment are adjoint to each other: $\mathrm{Kl} (T) (M \otimes X, Y) \cong \mathcal C (X, M \to TY)$.</p>
<p>The category defined in Compiler Forest turns out to be equivalent to</p>
\[\mathrm{Optic}_\mathcal C (\mathrm{Kl} (T), \mathrm{Kl} (T))\]
<p>whose forwards pass is given by the action of $\mathcal C$ on $\mathrm{Kl} (T)$ and whose backwards pass is given by the enrichment of $\mathrm{Kl} (T)$ in $\mathcal C$. Its hom-sets are given by</p>
\[\mathrm{Optic}_\mathcal C (\mathrm{Kl} (T), \mathrm{Kl} (T)) \left( \binom{X}{X'}, \binom{Y}{Y'} \right)\]
\[= \int^{M : \mathcal C} \mathcal C (X, T (M \times Y)) \times \mathcal C (M, Y' \to T X')\]
<p>which Yoneda-reduces to the definition in the paper.</p>
<p>Even though the action and enrichment are adjoint, this is <em>not</em> the same as optics in the klesli category:</p>
\[\mathrm{Optic}_\mathcal C (\mathrm{Kl} (T), \mathrm{Kl} (T)) \not\cong \mathrm{Optic}_{\mathrm{Kl} (T)} (\mathrm{Kl} (T), \mathrm{Kl} (T))\]
<p>where the hom-sets of the latter are defined by</p>
\[\mathrm{Optic}_{\mathrm{Kl} (T)} (\mathrm{Kl} (T), \mathrm{Kl} (T)) \left( \binom{X}{X'}, \binom{Y}{Y'} \right)\]
\[= \int^{M : \mathrm{Kl} (T)} \mathcal C (X, T (M \times Y)) \times \mathcal C (M \times Y', T X')\]
<p>This equivalence, between optics whose backwards passes are an adjoint action or enrichment, would be a completely reasonable-looking lemma but it just isn’t true!</p>
<p>The difference between them is extremely subtle, though. The “proper” definition of kleisli optics identifies morphisms that agree up to sliding any kleisli morphism, whereas the definition in Compiler Forest only identifies morphisms that agree up to sliding pure morphisms of $\mathcal C$. So hom-sets of coend optics are a quotient of the hom-sets defined in Compiler Forest. While writing this up, I realised that most of this conclusion actually appears in section 4.9 of <a href="https://arxiv.org/abs/1809.00738">Riley’s original paper</a>.</p>
<p>As long as you don’t care about equality of morphisms - which in practice is never, because they are made of functions - the difference between them can be safely ignored. The only genuine reason to prefer kleisli optics is <a href="https://arxiv.org/abs/2209.09351">their better runtime performance</a>.</p>Jules HedgesI'm going to record something that I think is known to everyone doing research on categorical cybernetics, but I don't think has been written down somewhere: an even more general version of mixed optics that replaces the backwards actegory with an enrichment. With it, I'll make sense of a curious definition appearing in The Compiler Forest.Modular Error Reporting with Dependent Lenses2024-04-08T00:00:00+00:002024-04-08T00:00:00+00:00https://cybercat-institute.github.io//2024/04/08/modular-error-reporting<p>A big part of programming language design is in feedback delivery. One aspect of feedback is parse errors. Parsing is a very large area of research and there are new developments from industry that make it easier and faster than ever to parse files. This post is about an application of dependent lenses that facilitate the job of reporting error location from a parsing pipeline.</p>
<h2>What is parsing & error reporting</h2>
<p>A simple parser could be seen as a function with the signature</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">parse :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Maybe </span><span class="n">output</span>
</code></pre></div></div>
<p>where <code class="language-plaintext highlighter-rouge">output</code> is a parsed value.</p>
<p>In that context, an error is represented with a value of <code class="language-plaintext highlighter-rouge">Nothing</code>, and a successful value is represented with <code class="language-plaintext highlighter-rouge">Just</code>. However, in the error case, we don’t have enough information to create a helpful diagnostic, we can only say “parse failed” but we cannot say why or where the error came from. One way to help with that is to make the type aware of its context and carry the error location in the type:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">parseLoc :</span> <span class="n">string</span> <span class="o">-></span> <span class="kt">Either Loc</span> <span class="n">output</span>
</code></pre></div></div>
<p>where <code class="language-plaintext highlighter-rouge">Loc</code> holds the file, line, and column of the state of the parser.
This is a very successful implementation of a parser with locations and many languages deployed today use a similar architecture where the parser, and its error-reporting mechanism, keep track of the context in which they are parsing files and use it to produce helpful diagnostics.</p>
<p>I believe that there is a better way, one that does not require a tight integration between the error-generating process (here parsing) and the error-reporting process (here, location tracking). For this, I will be using container morphisms, or dependent lenses, to represent parsing and error reporting.</p>
<h2>Dependent lenses</h2>
<p>Dependent lenses are a generalisation of lenses where the backward part makes use of dependent types to keep track of the origin and destination of arguments. For reference the type of a lens <code class="language-plaintext highlighter-rouge">Lens a a' b b'</code> is given by the two functions:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">get : a -> b</code></li>
<li><code class="language-plaintext highlighter-rouge">set : a -> b' -> a'</code></li>
</ul>
<p>Dependent lenses follow the same pattern, but their types are indexed:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">record</span> <span class="kt">DLens</span> <span class="o">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">:</span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">a'</span> <span class="o">:</span> <span class="n">a</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">b</span> <span class="o">:</span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">b'</span> <span class="o">:</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="kr">where</span>
<span class="n">get</span> <span class="o">:</span> <span class="n">a</span> <span class="o">-></span> <span class="n">b</span>
<span class="n">set</span> <span class="o">:</span> <span class="p">(</span><span class="n">x</span> <span class="o">:</span> <span class="n">a</span><span class="p">)</span> <span class="o">-></span> <span class="n">b'</span> <span class="p">(</span><span class="n">get</span> <span class="n">x</span><span class="p">)</span> <span class="o">-></span> <span class="n">a'</span> <span class="n">x</span>
</code></pre></div></div>
<p>The biggest difference with lenses is the second argument of <code class="language-plaintext highlighter-rouge">set</code>: <code class="language-plaintext highlighter-rouge">b' (get x)</code>. It means that we always get a <code class="language-plaintext highlighter-rouge">b'</code> that is indexed over the result of <code class="language-plaintext highlighter-rouge">get</code>, for this to typecheck, we <em>must know</em> the result of <code class="language-plaintext highlighter-rouge">get</code>.</p>
<p>This change in types allows a change in perspective. Instead of treating lenses as ways to convert between data types, we use lenses to convert between query/response APIs.</p>
<p><img src="/assetsPosts/2024-04-08-modular-error-reporting/lens2.png" alt="Lens" /></p>
<p>On each side <code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">B</code> are queries and <code class="language-plaintext highlighter-rouge">A'</code> and <code class="language-plaintext highlighter-rouge">B'</code> are <em>corresponding responses</em>. The two functions defining the lens have type <code class="language-plaintext highlighter-rouge">get : A -> B</code>, and <code class="language-plaintext highlighter-rouge">set : (x : A) -> A' (get x) -> B' x</code>, that is, a way to convert queries together, and a way to <em>rebuild</em> responses given a query. A lens is therefore a mechanism to map between one API to another.</p>
<p>If the goal is to find on what line an error occurs, then what the <code class="language-plaintext highlighter-rouge">get</code> function can do is split our string into multiple lines, each of which will be parsed separately.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">splitLines :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">List String</span>
</code></pre></div></div>
<p>Once we have a list of strings, we can call a parser on each line, this will be a function like above <code class="language-plaintext highlighter-rouge">parseLine : String -> Maybe output</code>. By composing those two functions we have the signature <code class="language-plaintext highlighter-rouge">String -> List (Maybe output)</code>. This gives us a hint as to what the response for <code class="language-plaintext highlighter-rouge">splitLine</code> should be, it should be a list of potential outputs. If we draw our lens again we have the following types:</p>
<p><img src="/assetsPosts/2024-04-08-modular-error-reporting/lens.png" alt="Lens" /></p>
<p>We are using <code class="language-plaintext highlighter-rouge">(String, String)</code> on the left to represent “files as inputs” and “messages as outputs” both of which are plain strings.</p>
<p>There is a slight problem with this, given a <code class="language-plaintext highlighter-rouge">List (Maybe output)</code> we actually have no way to know which of the values refer to which line. For example, if the outputs are numbers and we know the input is the file</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>23
24
3
</code></pre></div></div>
<p>and we are given the output <code class="language-plaintext highlighter-rouge">[Nothing, Nothing, Just 3]</code> we have no clue how to interpret the <code class="language-plaintext highlighter-rouge">Nothing</code> and how it’s related to the result of splitting the lines, they’re not even the same size. We can “guess” some behaviors but that’s really flimsy reasoning, ideally the API translation system should keep track of that so that we don’t have to guess what’s the correct behavior. And really, it should be telling us what the relationship is, we shouldn’t even be thinking about this.</p>
<p>So instead of using plain lists, we are going to keep the information <em>in the type</em> by using dependent types. The following type keeps track of an “origin” list and its constructors store values that fulfill a predicate in the origin list along with their position in the list:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Some</span> <span class="o">:</span> <span class="p">(</span><span class="n">a</span> <span class="o">-></span> <span class="kt">Type</span><span class="p">)</span> <span class="o">-></span> <span class="kt">List </span><span class="n">a</span> <span class="o">-></span> <span class="kt">Type </span><span class="kr">where</span>
<span class="kt">None</span> <span class="o">:</span> <span class="kt">Some</span> <span class="n">p</span> <span class="n">xs</span>
<span class="kt">This</span> <span class="o">:</span> <span class="n">p</span> <span class="n">x</span> <span class="o">-></span> <span class="kt">Some</span> <span class="n">p</span> <span class="n">xs</span> <span class="o">-></span> <span class="kt">Some</span> <span class="n">p</span> <span class="p">(</span><span class="n">x</span> <span class="o">::</span> <span class="n">xs</span><span class="p">)</span>
<span class="kt">Skip</span> <span class="o">:</span> <span class="kt">Some</span> <span class="n">p</span> <span class="n">xs</span> <span class="o">-></span> <span class="kt">Some</span> <span class="n">p</span> <span class="p">(</span><span class="n">x</span> <span class="o">::</span> <span class="n">xs</span><span class="p">)</span>
</code></pre></div></div>
<p>We can now write the above situation with the type <code class="language-plaintext highlighter-rouge">Some (const Unit) ["23", "", "24", "3"]</code> which is inhabited by the value <code class="language-plaintext highlighter-rouge">Skip $ Skip $ Skip $ This () None</code> to represent the fact that only the last element is relevant to us. This ensures that the response always matches the query.</p>
<p>Once we are given a value like the above we can convert our response into a string that says <code class="language-plaintext highlighter-rouge">"only 3 parsed correctly"</code>.</p>
<h2>A Simple parser</h2>
<p>Equipped with dependent lenses, and a type to keep track of partial errors, we can start writing a parsing pipeline that keeps track of locations without interfering with the actual parsing. For this, we start with a simple parsing function:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">containsEven :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">Maybe Int</span>
<span class="n">containsEven</span> <span class="n">str</span> <span class="o">=</span> <span class="n">parseInteger</span> <span class="n">str</span> <span class="o">>>=</span> <span class="p">(</span><span class="nf">\</span><span class="n">i</span> <span class="o">:</span> <span class="kt">Int</span> <span class="o">=></span> <span class="n">toMaybe</span> <span class="p">(</span><span class="n">even</span> <span class="n">i</span><span class="p">)</span> <span class="n">i</span><span class="p">)</span>
</code></pre></div></div>
<p>This will return a number if it’s even, otherwise it will fail. From this we want to write a parser that will parse an entire file, and return errors where the file does not parse. We do this by writing a lens that will split a file into lines and then rebuild responses into a string such that the string contains the line number.</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">splitFile :</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">SomeC</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="n">output</span><span class="p">)</span>
<span class="n">splitFile</span> <span class="o">=</span> <span class="kt">MkMorphism</span> <span class="nb">lines </span><span class="n">printErrors</span>
<span class="kr">where</span>
<span class="n">printError</span> <span class="o">:</span> <span class="p">(</span><span class="n">orig</span> <span class="o">:</span> <span class="kt">List String</span><span class="p">)</span> <span class="o">-></span> <span class="p">(</span><span class="n">i</span> <span class="o">:</span> <span class="kt">Fin</span> <span class="p">(</span><span class="nb">length </span><span class="n">orig</span><span class="p">))</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">printError</span> <span class="n">orig</span> <span class="n">i</span> <span class="o">=</span> <span class="s">"At line </span><span class="se">\</span><span class="err">{show (c</span><span class="se">a</span><span class="s">st {to = Nat} i)}: Could not parse </span><span class="se">\"\</span><span class="err">{i</span><span class="se">n</span><span class="s">dex' orig i}</span><span class="se">\"</span><span class="s">"</span>
<span class="n">printErrors</span> <span class="o">:</span> <span class="p">(</span><span class="n">input</span> <span class="o">:</span> <span class="kt">String</span><span class="p">)</span> <span class="o">-></span> <span class="kt">Some</span> <span class="p">(</span><span class="nb">const </span><span class="n">error</span><span class="p">)</span> <span class="p">(</span><span class="nb">lines </span><span class="n">input</span><span class="p">)</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">printErrors</span> <span class="n">input</span> <span class="n">x</span> <span class="o">=</span> <span class="nb">unlines </span><span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="n">printError</span> <span class="p">(</span><span class="nb">lines </span><span class="n">input</span><span class="p">))</span> <span class="p">(</span><span class="n">getMissing</span> <span class="n">x</span><span class="p">))</span>
</code></pre></div></div>
<p>Some notation: <code class="language-plaintext highlighter-rouge">=%></code> is the binary operator for dependent lenses, and <code class="language-plaintext highlighter-rouge">:-</code> is the binary operator for non-dependent boundaries. Later <code class="language-plaintext highlighter-rouge">!></code> will be used for dependent boundaries.</p>
<p><code class="language-plaintext highlighter-rouge">printErrors</code> builds an error message by collecting the line number that failed. We use the missing values from <code class="language-plaintext highlighter-rouge">Some</code> as failed parses. Equipped with this program, we should be able to generate an error message that looks like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>At line 3: could not parse "test"
At line 10: could not parse "-0.012"
At line 12: could not parse ""
</code></pre></div></div>
<p>The only thing left is to put together the parser and the line splitter. We do this by composing them into a larger lens via lens composition and then extracting the procedure from the larger lens. First we need to convert our parser into a lens.</p>
<p>Any function <code class="language-plaintext highlighter-rouge">a -> b</code> can also be written as <code class="language-plaintext highlighter-rouge">a -> () -> b</code> and any function of that type can be embedded in a lens <code class="language-plaintext highlighter-rouge">(a :- b) =%> (() :- ())</code>. That’s what we do with our parser and we end up with this lens:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">parserLens :</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">Maybe Int</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span> <span class="c1">-- this is the unit boundary () :- ()</span>
<span class="n">parserLens</span> <span class="o">=</span> <span class="n">embed</span> <span class="n">parser</span>
</code></pre></div></div>
<p>We can lift any lens with a failable result into one that keeps track of the origin of the failure:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">lineParser :</span> <span class="kt">SomeC</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">Int</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span>
<span class="n">lineParser</span> <span class="o">=</span> <span class="n">someToAll</span> <span class="o">|></span> <span class="kt">AllListMap</span> <span class="n">parserLens</span> <span class="o">|></span> <span class="n">close</span>
</code></pre></div></div>
<p>We can now compose this lens with the one above that adjusts the error message using the line number:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">composedParser :</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span>
<span class="n">composedParser</span> <span class="o">=</span> <span class="n">splitFile</span> <span class="o">|></span> <span class="n">lineParser</span>
</code></pre></div></div>
<p>Knowing that a function <code class="language-plaintext highlighter-rouge">a -> b</code> can be converted into a lens <code class="language-plaintext highlighter-rouge">(a :- b) =%> CUnit</code> we can do the opposite, we can convert any lens with a unit codomain into a simple function, which gives us a very simple <code class="language-plaintext highlighter-rouge">String -> String</code> program:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">mainProgram :</span> <span class="kt">String</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">mainProgram</span> <span class="o">=</span> <span class="n">extract</span> <span class="n">composedParser</span>
</code></pre></div></div>
<p>Which we can run as part of a command-line program</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">main :</span> <span class="kt">IO </span><span class="nb">()</span>
<span class="n">main</span> <span class="o">=</span> <span class="kr">do</span> <span class="nb">putStrLn </span><span class="s">"give me a file name"</span>
<span class="n">fn</span> <span class="o"><-</span> <span class="n">getLine</span>
<span class="kc">Right</span> <span class="n">fileContent</span> <span class="o"><-</span> <span class="nb">readFile </span><span class="n">fn</span>
<span class="o">|</span> <span class="kc">Left</span> <span class="n">err</span> <span class="o">=></span> <span class="n">printLn</span> <span class="n">err</span>
<span class="kr">let</span> <span class="n">output</span> <span class="o">=</span> <span class="n">mainProgram</span> <span class="n">fileContent</span>
<span class="nb">putStrLn </span><span class="n">output</span>
<span class="n">main</span>
</code></pre></div></div>
<p>And given the file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0
2
-3
20
04
1.2
</code></pre></div></div>
<p>We see:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>At line 2: Could not parse ""
At line 3: Could not parse "-3"
At line 6: Could not parse "1.2"
</code></pre></div></div>
<h2>Handling multiple files</h2>
<p>The program we’ve seen is great but it’s not super clear why we would bother with such a level of complexity if we just want to keep track of line numbers. That is why I will show now how to use the same approach to keep track of file origin without touching the existing program.</p>
<p>To achieve that, we need a lens that will take a list of files, and their content, and keep track of where errors emerged using the same infrastructure as above.</p>
<p>First, we define a filesystem as a mapping of file names to a file content:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">Filename</span> <span class="o">=</span> <span class="kt">String</span>
<span class="kt">Content</span> <span class="o">=</span> <span class="kt">String</span>
<span class="kt">Filesystem</span> <span class="o">=</span> <span class="kt">List </span><span class="p">(</span><span class="kt">Filename</span> <span class="o">*</span> <span class="kt">Content</span><span class="p">)</span>
</code></pre></div></div>
<p>A lens that splits problems into files and rebuilds errors from them will have the following type:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">handleFiles :</span> <span class="kt">Interpolation</span> <span class="n">error</span> <span class="o">=></span>
<span class="p">(</span><span class="kt">Filesystem</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">SomeC</span> <span class="p">(</span><span class="kt">String</span> <span class="o">:-</span> <span class="n">error</span><span class="p">)</span>
<span class="n">handleFiles</span> <span class="o">=</span> <span class="kt">MkMorphism</span> <span class="p">(</span><span class="nb">map </span><span class="err">π</span><span class="mf">2</span><span class="p">)</span> <span class="n">matchErrors</span>
<span class="kr">where</span>
<span class="n">matchErrors</span> <span class="o">:</span> <span class="p">(</span><span class="n">files</span> <span class="o">:</span> <span class="kt">List </span><span class="p">(</span><span class="kt">String</span> <span class="o">*</span> <span class="kt">String</span><span class="p">))</span> <span class="o">-></span>
<span class="kt">Some</span> <span class="p">(</span><span class="nb">const </span><span class="n">error</span><span class="p">)</span> <span class="p">(</span><span class="nb">map </span><span class="err">π</span><span class="mf">2</span> <span class="n">files</span><span class="p">)</span> <span class="o">-></span>
<span class="kt">String</span>
<span class="n">matchErrors</span> <span class="n">files</span> <span class="n">x</span> <span class="o">=</span> <span class="nb">unlines </span><span class="p">(</span><span class="nb">map </span><span class="p">(</span><span class="nf">\</span><span class="p">(</span><span class="n">path</span> <span class="o">&&</span> <span class="n">err</span><span class="p">)</span> <span class="o">=></span> <span class="s">"In file </span><span class="se">\</span><span class="err">{p</span><span class="se">a</span><span class="s">th}:</span><span class="se">\n\</span><span class="err">{e</span><span class="se">r</span><span class="s">r}"</span><span class="p">)</span> <span class="p">(</span><span class="n">zipWithPath</span> <span class="n">files</span> <span class="n">x</span><span class="p">))</span>
</code></pre></div></div>
<p>This time I’m representing failures with the <em>presence</em> of a value in <code class="language-plaintext highlighter-rouge">Some</code> rather than its absence. The rest of the logic is similar: we reconstruct the data from the values we get back in the backward part and return a flat <code class="language-plaintext highlighter-rouge">String</code> as our error message.</p>
<p>Combining this lens with the previous parser is as easy as before:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">filesystemParser :</span> <span class="p">(</span><span class="kt">Filesystem</span> <span class="o">:-</span> <span class="kt">String</span><span class="p">)</span> <span class="o">=%></span> <span class="kt">CUnit</span>
<span class="n">filesystemParser</span> <span class="o">=</span> <span class="n">handleFiles</span> <span class="o">|></span> <span class="nb">map </span><span class="n">splitFile</span> <span class="o">|></span> <span class="n">join</span> <span class="p">{</span><span class="n">a</span> <span class="o">=</span> <span class="kt">String</span> <span class="o">:-</span> <span class="kt">Int</span><span class="p">}</span> <span class="o">|></span> <span class="n">lineParser</span>
<span class="nf">fsProgram :</span> <span class="kt">Filesystem</span> <span class="o">-></span> <span class="kt">String</span>
<span class="n">fsProgram</span> <span class="o">=</span> <span class="n">extract</span> <span class="n">filesystemParser</span>
</code></pre></div></div>
<p>We can now write a new main function that will take a list of files and return the errors for each file:</p>
<div class="language-idris highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">main2 :</span> <span class="kt">IO </span><span class="nb">()</span>
<span class="n">main2</span> <span class="o">=</span> <span class="kr">do</span> <span class="n">files</span> <span class="o"><-</span> <span class="n">askList</span> <span class="kt">[]</span>
<span class="n">filesAndContent</span> <span class="o"><-</span> <span class="n">traverse</span> <span class="p">(</span><span class="nf">\</span><span class="n">fn</span> <span class="o">=></span> <span class="nb">map </span><span class="p">(</span><span class="n">fn</span> <span class="o">&&</span><span class="p">)</span> <span class="o"><$></span> <span class="nb">readFile </span><span class="n">fn</span><span class="p">)</span> <span class="p">(</span><span class="nb">reverse </span><span class="n">files</span><span class="p">)</span>
<span class="kr">let</span> <span class="kc">Right</span> <span class="n">contents</span> <span class="o">=</span> <span class="nb">sequence </span><span class="n">filesAndContent</span>
<span class="o">|</span> <span class="kc">Left</span> <span class="n">err</span> <span class="o">=></span> <span class="n">printLn</span> <span class="n">err</span>
<span class="kr">let</span> <span class="n">result</span> <span class="o">=</span> <span class="n">fsProgram</span> <span class="n">contents</span>
<span class="nb">putStrLn </span><span class="n">result</span>
</code></pre></div></div>
<p>We can now write two files.
file1:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0
2
-3
20
04
1.2
</code></pre></div></div>
<p>file2:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>7
77
8
</code></pre></div></div>
<p>And obtain the error message:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>In file 'file1':
At line 2: Could not parse ""
At line 3: Could not parse "-3"
At line 6: Could not parse "1.2"
In file 'file2':
At line 0: Could not parse "7"
At line 1: Could not parse "77"
</code></pre></div></div>
<p>All that without touching our original parser, or our line tracking system.</p>
<h2>Conclusion</h2>
<p>We’ve only touched the surface of what dependent lenses can do for software engineering by providing a toy example. Yet, this example is simple enough to be introduced, and resolved in one post, but also shows a solution to a complex problem that is affecting parsers and compilers across the spectrum of programming languages. In truth, dependent lenses can do much more than what is presented here, they can deal with effects, non-deterministic systems, machine learning, and more. One of the biggest barriers to mainstream adoption is the availability of dependent types in programming languages. The above was written in <a href="https://www.idris-lang.org/">idris</a>, a language with dependent types, but if your language of choice adopts dependent types one day, then you should be able to write the same program as we did just now, but for large-scale production software.</p>
<p>The program is available on <a href="https://gitlab.com/avidela/types-laboratory/-/blob/main/src/Interactive/Parsing.idr?ref_type=heads">gitlab</a>.</p>Andre VidelaDependent lenses are useful for general-purpose programming, but in which way exactly? This post demonstrates the use of dependent lenses as input/output-conversion processes, using parsing and error location reporting as a driving example.Value Chain Integrity2024-04-04T00:00:00+00:002024-04-04T00:00:00+00:00https://cybercat-institute.github.io//2024/04/04/value-chain-integrity<p>Cross-posted from <a href="https://econpatterns.substack.com/p/value-chain-integrity">Oliver’s Substack blog, EconPatterns</a></p>
<p>In the first four posts, I tried to map out an economy structured around the need to find out. This didn’t happen by accident, but is the result of spending a couple of decades in a realm where academic economic knowledge is held in little regard in no small part because its gatekeepers like to give off an air of having it already figured out, even if from the circumstances it’s clear that is rarely ever the case.</p>
<p>It doesn’t match my own opinion, but I perfectly understand when, say, the founders of a three-person startup bid adieu to their knowledge of academic economics when they learn that there is no such thing as a demand curve unless they put in the effort to assemble it piece by piece, transaction by transaction, price change by price change.</p>
<p>Most of them stop at this point and direct their attention to other, more pressing concerns, and I can’t blame them for it. The “need to find out” gets short shrift in most economics classes since economic instruction at universities generally starts from a vantage point where the groundwork has already been laid by wizards behind the curtain, and all that’s needed for mere mortals is to fine-tune the preconceived machinery.</p>
<p>That’s also a major reason why economists find employment in government, big banks, and, increasingly publicly listed tech firms, within large machineries, but are rarely ever in demand as one of the three founders of a recently minted startup with more enthusiasm than cash — or data.</p>
<p>This series tries to remedy that situation, and I could have subtitled it either “economics for startuppers” or “startup thinking for economists”, except the intended scope — and my intended audience — is a bit wider than that.</p>
<h1>Use of decentralized knowledge in society</h1>
<p>The underlying idea of “finding out” pursued in EconPatterns is ultimately derived from Adam Smith’s gains from specialization that drives specialization of labor, and that ultimately influenced another key contribution to economic lore, Friedrich Hayek’s <a href="https://www.econlib.org/library/Essays/hykKnw.html">Use of Knowledge in Society</a>.</p>
<p>Hayek’s point was that there’s no point trying to steer the whole economy from a central vantage point because there is always someone somewhere closer to the ground, steeped in operational detail, who knows better, and can put that knowledge to better use than the central planner.</p>
<p>This idea that there is always local knowledge that is more detailed than the aggregated knowledge on the macro level, that there is knowledge that is nested, and that all participants have a mental map of the economy that is most detailed in their own vicinity and that degrades in detail, certainty, or precision, that resorts to using coarse-grained models, aggregates, or even stereotypes, the further one moves away from one’s own location, is deeply embedded in EconPatterns.</p>
<p>And this isn’t only true for the physical dimensions, it’s also true for the temporal dimension. Both the past and the future get hazy very quickly, and we resort to increasingly coarse-grained knowledge the further we go in each direction: Hayek’s “knowledge of the particular circumstances of time and place”.</p>
<p>There is an inevitable urge to remedy this shortcoming with the magic potion of “more transparency”. Every time we hear news about another supply chain pile-up, there is the inevitable stratum of pundits opining that this (the negative surprise, that is) could have been avoided if we just magically gave every participant a detailed map of the whole economy, or at least the whole chain of events — network really — leading to the participant’s problems stemming from the unexpected supply chain outage.</p>
<p>This is illusory of course to anyone attuned to the operational details of supply chain, not only because these pundits habitually underestimate by several orders of magnitude just how much operational raw data is out there, most of which is of no use to anyone but the data owner, but also because the countervailing demands of privacy and transparency (usually leading to the conundrum of each side demanding transparency from the other party but insisting on privacy for oneself) will inevitably lead to privacy winning out, except in those cases where the more powerful actor can compel less powerful actors to disclose their secrets.</p>
<p><img src="/assetsPosts/2024-04-04-value-chain-integrity/img1.webp" alt="Container ship" /></p>
<h1>Supply chain and value chain integrity</h1>
<p>Designing a mechanism that orchestrates the conflicting information needs of the participants in a value chain or its mapping into the physical realm, the supply chain, is still a holy grail in operations and in economic trade, but in no small part because the reasons why such a governance mechanism is hard to come by are still poorly understood.</p>
<p>Finding this holy grail, and mapping out the path to its discovery, is of course the goal of this series. A starting point is to arrive at a better understanding how knowledge disseminates thru an economy, where, when, and why it forms clusters (especially in the form of belief clusters), and how to interfere in that flow in a structured, goal-oriented way.</p>
<p>Just to offer a simple example, novices in the field of supply chain are often surprised to learn that the bill of lading, one of the crucial documents ensuring integrity of a product thruout its transit along ports, flights, shipments, loading and unloading, handovers and often rough handling, is still legally required to be in paper form, sent by courier from station to station.</p>
<p>A simple impulse is to blame an overbearing bureaucracy or an industry staunchly resistant to organizational change and technological progress, but an alternative and more plausible explanation is that paper solves a few integrity requirements that electronic communication still has a hard time to solve.</p>
<p>When handovers and handshakes are still the literal thing involving actual hands, if signatures are still done by hand in the presence of the counterparty, we are solving a few problems about identity that turn out to be quite tricky once we try to shift them online, into the digital domain where ascertaining that an individual is who they claim to be can be exceedingly tricky.</p>
<p>Turns out this simple example is repeated all over the place, in all kinds of domains and scenarios, with a number of idiosyncratic details added, but the underlying pattern still the same.</p>
<p>This is why I will come back to that example again and again. Because that is what EconPatterns is about.</p>Oliver BeigeIn which we discuss how knowledge travels thru the economy, and how, when and where it forms clusters.Colimits of Selection Functions2024-04-01T00:00:00+00:002024-04-01T00:00:00+00:00https://cybercat-institute.github.io//2024/04/01/colimits-selection-functions<p>In <a href="https://arxiv.org/abs/2105.06332">Towards Foundations of Categorical Cybernetics</a> we built a category whose objects are selection functions and whose morphisms are lenses. It was a key step in how we <em>justified</em> open games in that paper: they’re <em>just</em> parametrised lenses “weighted” by selection functions. In this post I’ll show that by adding dependent types and stirring, we can get a nicer category that does the same job but has all colimits, and comes extremely close to having all limits. Fair warning: this post assumes quite a bit of category-theoretic background.</p>
<p>Besides being a nice thing to do in itself, we have a very specific motivation for this. The recently realised paper <a href="https://arxiv.org/abs/2402.15332">Categorical deep learning: An algebraic theory of architectures</a> proposed using initial algebras and final coalgebras in categories of parametrised morphisms to build neural networks with learning invariants designed to operate on complex data structures, in a huge generalisation of <a href="https://geometricdeeplearning.com/">geometric deep learning</a>. This post is the first step to replicating the same structure in compositional game theory, and is probably the first case where a class of deep learning architectures has a game-theoretic analogue right from the beginning (ok, the first other than <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GANs</a>) - something that is absolutely key to our vision of AI safety, as I described in <a href="https://cybercat.institute/2024/03/18/learning-invariant-preferences/">this previous post</a>.</p>
<h1>Dependent lenses</h1>
<p>In this post I’m going work over the category of sets, to make my life easy. A <strong>container</strong> (also known as a <strong>polynomial functor</strong>) is a pair $\binom{X}{X’}$ where $X$ is a set and $X’$ is an $X$-indexed family of sets.</p>
<p>Given a pair of containers, a <strong>dependent lens</strong> $f : \binom{X}{X’} \to \binom{Y}{Y’}$ is a pair of a function $f : X \to Y$ and a function $f’ : (x : X) \times Y’ (f (x)) \to X’ (x)$. There’s a category $\mathbf{DLens}$ whose objects are containers and whose morphisms are dependent lenses (also known as the <em>category of containers</em> $\mathbf{Cont}$ and the <em>category of polynomial functors</em> $\mathbf{Poly}$ by different authors).</p>
<p>The category $\mathbf{DLens}$ has all limits and colimits, distinguishing it from the category of simply-typed lenses which is missing many of both (see my old paper <a href="https://arxiv.org/abs/1711.07059">Morphisms of Open Games</a>). In this post I want to just take that as a given fact, because calculating them is not always so easy. The slick way to prove it is by constructing $\mathbf{DLens}$ as a fibration $\int_{X : \mathbf{Set}} \left( \mathbf{Set} / X \right)^\mathrm {op}$, and using the fact that a fibred category has all co/limits if every fibre does and reindexing preserves them (a fact that we’ll be seeing again later).</p>
<h1>Dependent selection functions</h1>
<p>Write $I$ for the tensor unit of dependent lenses: it’s made of the set $1 = \{ * \}$ and the $1$-indexed set $* \mapsto 1$. A dependent lens $I \to \binom{X}{X’}$ is an element of $X$, and a dependent lens $\binom{X}{X’} \to I$ is a <em>section</em> of the container: a function $k : (x : X) \to X’ (x)$. For shorthand I’ll write $H = \mathbf{DLens} (I, -) : \mathbf{DLens} \to \mathbf{Set}$ and $K = \mathbf{DLens} (-, I) : \mathbf{DLens}^\mathrm{op} \to \mathbf{Set}$ for these representable functors.</p>
<p>By analogy to <a href="https://julesh.com/2021/03/30/selection-functions-and-lenses/">what happens in the simply-typed case</a>, a <strong>dependent selection function</strong> for a container $\binom{X}{X’}$ should be a function $\varepsilon : K \binom{X}{X’} \to H \binom{X}{X’}$ - that is, a thing that turns costates into states.</p>
<p>But I think we’re going to need things to be multi-valued in order to get all colimits (and we need it to do much game theory anyway), so let’s immediately forget that and define a <strong>dependent multi-valued selection function</strong> of type $\binom{X}{X’}$ to be a binary relation $\varepsilon \subseteq H \binom{X}{X’} \times K \binom{X}{X’}$.</p>
<p>To be honest, I don’t really have any serious examples of these things to hand, I think they’ll arise from taking colimits of things that are simply-typed. For game theory the main one we care about is still $\arg\max$, which <em>is</em> a “dependent” multi-valued selection function but only in a boring way that doesn’t use the dependent types - it’s a binary relation $\arg\max \subseteq H \binom{X}{\mathbb R} \times K \binom{X}{\mathbb R}$, where $\mathbb R$ here means the $X$-indexed set that is constantly the real numbers.</p>
<p>For each container $\binom{X}{X’}$, write $E \binom{X}{X’} = \mathcal P \left( H \binom{X}{X’} \times K \binom{X}{X’} \right)$ for the set of multi-valued selection functions for it. Since it’s a powerset it inherits a posetal structure from subset inclusion, which is a boolean algebra. That means that as a thin category, it has all limits and colimits, something that will come in useful later.</p>
<p>Given $\varepsilon \subseteq H \binom{X}{X’} \times K \binom{X}{X’}$ and a dependent lens $f : \binom{X}{X’} \to \binom{Y}{Y’}$ we can define a “pushforward” selection function $f_* (\varepsilon) \subseteq H \binom{Y}{Y’} \times K \binom{Y}{Y’}$ by $f_* (\varepsilon) = \{ (hf, k) \mid (h, fk) \in \varepsilon \}$. Defining it this way means we get functoriality for free, and it’s also monotone, so we have a functor $E : \mathbf{DLens} \to \mathbf{Pos}$.</p>
<p>The fact that we could just as easily have defined a contravariant action on dependent lenses means that the fibration we’re about to get is a bifibration, something that will <em>definitely</em> come in useful one day, but not today.</p>
<h1>Colimits of selection functions</h1>
<p>The next thing we do is take the category of elements of $E$. Objects of $\int E$ are pairs $\left( \binom{X}{X’}, \varepsilon \right)$ of a container and a selection function for it. A morphism $f : \left( \binom{X}{X’}, \varepsilon \right) \to \left( \binom{Y}{Y’}, \delta \right)$ is a dependent lens $f : \binom{X}{X’} \to \binom{Y}{Y’}$ with the property that $f_* (\varepsilon) \leq \delta$ - which is to say, any $h : H \binom{X}{X’}$ and $k : K \binom{Y}{Y’}$ satisfying $(h, fk) \in \varepsilon$ must also satisfy $(hf, k) \in \delta$.</p>
<p>So, $\int E$ is a category whose objects are dependent multi-valued selection functions and morphisms are dependent lenses. The only difference to the original category of selection functions from <a href="https://arxiv.org/abs/2105.06332">Towards Foundations</a> is that we replaced simply typed lenses with dependent lenses. This is enough to get all limits, and I’d call $\int E$ a “nice category of selection functions”.</p>
<p>The good way to prove that a fibred category has all co/limits (see <a href="https://arxiv.org/abs/1801.02927">this paper</a>) is to show that (1) the base category has all co/limits, (2) every fibre has all co/limits, and (3) reindexing preserves co/limits. We already know (1) and (2) (remember the fibres are all boolean algebras), so we just need to prove (3). Since limits and colimits in the fibres are unions and intersections, this should not be too hard.</p>
<p>For some container $\binom{X}{X’}$, suppose we have some family $\varepsilon_i \subseteq E \binom{X}{X’}$ indexed by $i : I$. We can define the meet $\bigwedge_{i : I} \varepsilon_i$ and join $\bigvee_{i : I} \varepsilon_i : E \binom{X}{X’}$ by intersection and union. To get all colimits in $\int E$, what we need to prove is that for any dependent lens $f : \binom{X}{X’} \to \binom{Y}{Y’}$, $f_* \left( \bigvee_{i : I} \varepsilon_i \right) = \bigvee_{i : I} f_* (\varepsilon_i)$. Let’s do it:</p>
<p>Going forwards, suppose $(h, k) \in f_* \left( \bigvee_i \varepsilon_i \right)$, so by definition of $f_* $ there must be $h’$ such that $h = h’f$ and $(h’, fk) \in \bigvee_i \varepsilon_i$. So there is some $i : I$ such that $(h’, fk) \in \varepsilon_i$, so $(h’f, k) = (h, k) \in f_* (\varepsilon_i)$, therefore $(f, k) \in \bigvee_i f_* (\varepsilon_i)$.</p>
<p>The other direction, suppose $(h, k) \in \bigvee_i f_* (\varepsilon_i)$, so $(h, k) \in f_* (\varepsilon_i)$ for some $i : I$. So we must have $h’$ such that $h = h’f$ and $(h’, fk) \in \varepsilon_i$. So $(h’, fk) \in \bigvee_i \varepsilon_i$, therefore $(h’f, k) = (h, k) \in f_* \left( \bigvee_i \varepsilon_i \right)$.</p>
<p>Note, this is intentionally a pure existence proof. Actually calculating these things can be quite a pain, and I’m going to put it off until later, specifically until a paper we’re cooking up on <em>branching</em> open games.</p>
<h1>Limits of selection functions</h1>
<p>If we also had $f_* \left( \bigwedge_{i : I} \varepsilon_i \right) = \bigwedge_{i : I} f_* (\varepsilon_i)$ then $\int E$ would also have all limits, but sadly in general the best we can do is $f_* \left( \bigwedge_{i : I} \varepsilon_i \right) \subseteq \bigwedge_{i : I} f_* (\varepsilon_i)$. I’d guess this probably means that $\int E$ has some kind of lax limits or something, but I’ll deal with that another day.</p>
<p>It’s instructive to look at what goes wrong. If $(h, k) \in \bigwedge_i f_* (\varepsilon_i)$, then for all $i : I$ we have $(h, k) \in f_* (\varepsilon_i)$. So, for every $i$ we have $h’_i$ such that $h = h’_i f$ and $(h’_i, fk) \in \varepsilon_i$. We can make progress if $f$ is a monomorphism, in which case all of the $h’_i$ are equal because $h’_i f = h = h’_j f$ implies $h’_i = h’_j$. In fact, while I don’t know what general monomorphisms in $\mathbf{DLens}$ look like, in this case it’s enough that the forwards pass of $f$ is an injective function. This probably gives us a decent subcategory of $\int E$ that has all limits as well as all colimits, but I don’t know whether that category will be useful for anything.</p>Jules HedgesIn Towards Foundations of Categorical Cybernetics we built a category whose objects are selection functions and whose morphisms are lenses. It was a key step in how we justified open games in that paper: they're just parametrised lenses weighted by selection functions. In this post I'll show that by adding dependent types and stirring, we can get a nicer category that does the same job but has all colimits, and comes extremely close to having all limits. Fair warning: this post assumes quite a bit of category-theoretic background.On Organization2024-03-22T00:00:00+00:002024-03-22T00:00:00+00:00https://cybercat-institute.github.io//2024/03/22/on-organization<p>Cross-posted from <a href="https://econpatterns.substack.com/p/on-organization">Oliver’s Substack blog, EconPatterns</a></p>
<p>Leonard Read wrote his 1958 essay, <a href="https://oll.libertyfund.org/titles/read-i-pencil-my-family-tree-as-told-to-leonard-e-read-dec-1958">I, Pencil</a>, to drive the point home in dramatic prose that even a contraption as humble as the eponymous writing utensil depends on a wide variety of raw materials, production processes, labor, and technological advancements, to come together — all coordinated by the marvel of the market system.</p>
<p>“<em>Consider the millwork in San Leandro. The cedar logs are cut into small, pencil-length slats less than one-fourth of an inch in thickness. These are kiln dried and then tinted for the same reason women put rouge on their faces. People prefer that I look pretty, not a pallid white. The slats are waxed and kiln dried again. How many skills went into the making of the tint and the kilns, into supplying the heat, the light and power, the belts, motors, and all the other things a mill requires? Sweepers in the mill among my ancestors? Yes, and included are the men who poured the concrete for the dam of a Pacific Gas & Electric Company hydroplant which supplies the mill’s power!</em>”</p>
<p>Milton Friedman made a truncated version of Read’s pencil story famous in a 1980 <a href="https://www.youtube.com/watch?v=67tHtpac5ws">television special</a>, and in the process connected it to another famous story from economic history about the production of a seemingly simple object: Adam Smith’s parable of the pin factory.</p>
<p>But one detail eluded Friedman: the step-by-step process of putting a pin together that opens Adam Smith’s magnum opus <a href="https://archive.org/details/bim_eighteenth-century_an-inquiry-into-the-natu_smith-adam_1785_1/page/6/mode/2up">The Wealth of Nations</a> to explain the division of labor, happens entirely under one roof. No handover across markets is mentioned until, presumably, the finished product is sold in bulk.</p>
<p><img src="/assetsPosts/2024-03-22-on-organization/img1.jpg" alt="Wealth of Nations" /></p>
<p>The same, we could surmise, might perfectly well hold true for Read’s story. There’s no reason why a pencil maker should not also mine the graphite needed for their one marketable product, harvest the rubber, or produce the electricity.</p>
<p>All of this has happened in the history of industrial production. But we can extrapolate from these stories and wonder if Smith’s pin maker also mills its raw material, iron or steel of a given quality, or if Read’s resource-conscious pencil maker would go as far as producing the mining machinery in-house, or maybe that’s the point where it’s willing to hand over the reigns to someone more qualified.</p>
<h1>Organizations as tectonic plates</h1>
<p>Abstracting away from these two stories, we can ask the question where in a complex production process should we put the handovers? The unremarkable-sounding name for this question is the make-or-buy decision, or, if a more academic term is needed, the degree of vertical (dis)integration. In operations, we speak of production depth.</p>
<p>Abstracting even further away, we can also ask where within any larger network of interactions, social or economic, should we draw the boundaries?</p>
<p>This question, on multiple layers, will occupy us quite a bit.</p>
<p>We can think of it popping up in the context of industrial production and market exchange: the economic sphere, in the context of public goods and livelihood risks: the political sphere, or even in the context of language, religion, and shared expressions of ideas and beliefs: the social sphere.</p>
<p>The laws by which we draw these boundaries, consciously or habitually, share some commonalities while there are also rules that hold only for one of these layers. Capturing them in design patterns is what this post is about.</p>
<p>For the economic sphere, which forms our primary concern, Oliver Williamson has established the fundamental dichotomy in the title of his first book: <a href="https://archive.org/details/marketshierarchi00will">markets vs hierarchies</a>, as shorthand for activities across firms vs activities within firms.</p>
<p>But in both governance mechanisms (to appropriate the title of Williamson’s <a href="https://archive.org/details/mechanismsofgove0000will">final book</a>), these labels hide some intricate machinery under the hood: “hierarchy” might be the canonical form of structuring interactions within, and “market” for interactions across organizations. But these terms contain a multitude of moving parts, all of which are subject to a myriad of design decisions.</p>
<p>Hierarchy is the canonical reporting structure for any larger organization. It’s ubiquitous enough that we can use the two terms synonymously, even if they’re not perfectly identical. It takes on the form of an upside-down tree (in the <a href="https://en.wikipedia.org/wiki/Tree_(graph_theory)">graph-theoretic sense</a>), with its root node at the top.</p>
<p>The branches in a hierarchy describe vertical relationships, usually one-to-many, also known command-and-control. The superior defines tasks for the subordinates to undertake, provides the necessary resources, monitors, evaluates, and recompenses the work effort — at least in theory.</p>
<p>In theory hierarchy is a sorting mechanism by seniority, a catch-all term that encompasses more experience, more ability to put individual tasks in context and orchestrate them: more ability to manage. In practice, the lofty goal of sorting by superior skill is at best approximated but rarely reached.</p>
<p>In practice, hierarchies take many forms based on and sometimes even deviating from this fundamental design pattern. They can be steeper or flatter, they can incorporate matrix elements, they can be stiff or flexible.</p>
<p>“Reorganization” is a popular game played in the higher echelons of most corporate hierarchies and a neverending income stream for consultants, usually deeply unpopular among those manning the trenches.</p>
<p>This just shows that finding the perfect organizational structure is elusive for all but the simplest organizations.</p>
<p><img src="/assetsPosts/2024-03-22-on-organization/img2.jpg" alt="Acropolis" /></p>
<p>Market is the catch-all term for all economic interactions that happen between organizations. But typically we think of a market more narrowly as a central place where many buyers meet many sellers: an agora.</p>
<p>In reality most economic interactions are of the few-to-few, few-to-one, or one-to-one variety, shaped by relational rather than market interaction. The key ingredient that the economic abstraction of a many-to-many market requires is the “coincidence of wants”: buyers and sellers wanting to trade the very same thing at a price they can both agree upon have to come together at the same place and the same time.</p>
<p>This is often tricky to achieve, and might require two steps mentioned in the first newsletter: displacement in space or time, transportation or storage, to bridge the gap between producer and consumer. Even the advent of online marketplaces did little to change this.</p>
<p>Beyond the recurring reorganizations typically triggered by underperformance, companies have also been known to first outsource their entire distribution network just to reverse course and bring it back in-house. So the make-or-buy label hides a non-trivial problem with massive costs but no obvious solution.</p>
<p>But other than the recognition that markets, organizations, and the boundary inbetween are subject to design choices which ostensibly influence performance, can we offer another explanation for how to split a supply network into its constituent parts other than the Coasean “<a href="https://onlinelibrary.wiley.com/doi/full/10.1111/j.1468-0335.1937.tb00002.x">costs of carrying out the exchange transactions in the open market</a>”?</p>
<p>To reduce the work of half a dozen Nobelists including Ronald Coase and Oliver Williamson to a tweet-length statement (which might itself evolve into a pattern), the make-or-buy decision boils down to the choice between cost and control.</p>
<p>Expressed in another way, more exactly expressed in accounting terms, the cost of holding control over the production process vs the cost of losing control over it.</p>
<p>This is the point where we can bring in the patterns from the first two newsletters. Assuming under Adam Smith’s division of labor that there is another producer who can produce our part cheaper than we could do it in-house, what are costs of losing control?</p>
<p>They are the costs of negative surprise.</p>
<p>While it’s in Read’s essay perfectly within the supplier’s self-interest to ship us the part in the volume ordered, there are two reasons why the shipment might stall: accidentally or deliberately.</p>
<p>Accidental production stops, or more exactly fluctuation between demand and supply that trigger stock-outs, are fairly common occurrences and the daily of supply bottleneck managers. The risk that a stock-out can trigger massive knock-on costs, the aforementioned missing five-dollar part that can stop a ten-million-dollars-per-hour production line and reduce finished products into 99.9%-finished unsellable inventory drive the decision to increase production depth even if there is no ill will by the supplier.</p>
<p>But the supplier knows this and can withhold deliveries strategically, essentially holding them hostage in order to negotiate better terms. The world of procurement is even in normal times rougher than portrayed by Read. Add an external shock to the supply infrastructure and planning cycles, inventory costs, and strategic maneuvering can explode.</p>
<h1>Organizations as belief structures</h1>
<p>But can we abstract away from the purely economic — most organizations are not in their intent economic — and express this avoidance of negative surprise as a design pattern for drawing the boundaries around organizations, or viewed from the other end: how to split a network of interactions, social, political, or economic, into coherent clusters which we might want to call organizations or, more specifically, companies, parties, states, religious communities?</p>
<p>In the world I’ve drawn so far, the need for design (and the need to capture them in design patterns) arises from the myriad of moving parts that require design choices on multiple levels: we have to choose between market and hierarchy, once we choose hierarchy we have to choose the structure of the hierarchy, including its governance structure, and another level down we have to decide on the shape of each reporting relationship, including who gets to sit on each end.</p>
<p>This is a world of high uncertainty and while we would like to resolve each design question empirically, empiricism is costly, so we will ultimately end up with goal conflict.</p>
<p>This goal conflict, which shapes the boundary of the organization, can be expressed in three patterns.</p>
<p>The first fundamental problem of organization is resolving the conflict between moving forward and staying together.</p>
<p>The second fundamental problem of organization is resolving the conflict between moving forward and staying put.</p>
<p>The third fundamental problem of organization is to decide which direction is forward.</p>
<p>All organizations have to resolve this goal conflict — literally “where do we go from here?” — or risk breaking up. Or expressed differently, the tectonic rifts between factions occur where these goal conflicts are unresolvable.</p>
<p>Using another pattern, the paradigm of traditional industrial organization introduced by Ed Chamberlin and applied by Joe Bain, “<a href="https://archive.org/details/industrialorgani00bain">structure, conduct, performance</a>” is a more general translation as “given the situation we’re in, of all the options available, which courses of action are the ones that promise the most success?”</p>
<p>Applying this paradigm requires coming to an agreement on a mapping between actions (conduct) and future outcomes (performance) given a set of starting conditions both internal and external (structure). This mapping requires expressing and ranking subjective expectations of conditional futures: beliefs (as opposed to objective probabilities in the statistical nomenclature). Where these beliefs diverge sufficiently, coordinating efforts within an organization is no longer feasible.</p>
<p>Competition is not only, as economic textbooks imply, competition between like products, but competition between differing courses of action based on differing beliefs about their feasibility.</p>
<p>This underlying idea, that boundaries emerge where coherence of beliefs breaks down between participants, will come up repeatedly in the future, and it is the main reason why organization rather than market exchange takes pride of place in this discussion — very simply because from a design perspective, market exchange is simply a special form of organization.</p>
<p><img src="/assetsPosts/2024-03-22-on-organization/img3.jpg" alt="Landscape" /></p>Oliver BeigeIn which we describe organization and organizations as tectonic plates shaped by clashing beliefs.Learning with Invariant Preferences2024-03-18T00:00:00+00:002024-03-18T00:00:00+00:00https://cybercat-institute.github.io//2024/03/18/learning-invariant-preferences<p>It’s been a busy few weeks in the world of category theory for deep learning. First of all come the preprint <a href="https://arxiv.org/abs/2402.15332">Categorical Deep Learning: An Algebraic Theory of Architectures</a> from authors at <a href="https://www.symbolica.ai/">Symbolica</a> and <a href="https://deepmind.google/">DeepMind</a>, including our friend <a href="https://www.brunogavranovic.com/">Bruno</a>. And then hot on the heels of the paper, Symbolica raised a <em>big</em> investment round based largely on applications of the ideas in the paper.</p>
<p>The paper is about <em>structured learning</em> and it proposes a big generalisation of geometric deep learning, which is itself a big generalisation of convolutional networks. The general idea is that the data processed by a neural network is not just random data but is the vectorisation of data coming from some real world domain. If your vectors encode an image then there is implicit geometry inherited from the physical world. Geometric deep learning is all about designing architectures that encode <em>geometric</em> invariants of data, specifically in the form of invariant <em>group actions</em> a la <a href="https://en.wikipedia.org/wiki/Erlangen_program">Klein’s Erlangenprogramm</a>.</p>
<p>What the paper points out is that the whole of geometric deep learning can be massively generalised from group actions to arbitrary (co)algebras of functors and (co)monads. From there you can easily re-specialise for specific applications. For example, if your training data is vectorisation of source code of a programming language, you can encode the structure of that language’s source grammar into your architecture in a virtually mechanical way.</p>
<p>Suffice to say, I’m <em>very</em> excited about this idea. This could be a watershed moment for applied category theory in general, and it happens to be something that’s right next door to us - the paper heavily uses categories of parametrised morphisms, one of the two building blocks of categorical cybernetics.</p>
<p><img src="/assetsPosts/2024-03-18-learning-invariant-preferences/eugenio-mazzone-6ywyo2qtaZ8.jpg" alt="Books" /></p>
<h1>Invariant preferences</h1>
<p>The first thought I had when I read the paper was <em>invariant preferences</em>. A real AI system is not something that exists in isolation but is something that interacts in some way with the world around it. Even if it’s not a direct “intentional” action such as a robot actuator, the information flow from the AI to the outside world is some kind of <em>action</em>, making the AI an <em>agent</em>. For example, ChatGPT is an agent that acts by responding to user prompts.</p>
<p>Intelligent agents who act can have <em>preferences</em>, the most fundamental structure of <em>decision theory</em> and perhaps also <em>microeconomics</em>. In full generality, “having preferences” means selecting actions in order to bring about certain states of the world and avoid others. Philosophical intention is not strictly required: preferences could have been imposed by the system’s designer or user, one extreme case being a thermostat. AI systems that act on an external world are the general topic of <em>reinforcement learning</em> (although some definitions of RL are too strict for our purposes here).</p>
<p>This gave me a future vision of AI safety where neural network architectures have been designed upfront to <em>statically guarantee</em> (ie. in a way that can be mathematically proven) that the learned system will act in a way that conforms to preferences chosen by the system designer. This is in contrast to, and in practice complements, most approaches to AI safety that involve supervision, interpretation, or “dynamic constraint” of a deployed system - making it the very first line of an overall <em>defense in depth</em> strategy.</p>
<p>A system whose architecture has invariant preferences will act in a way to bring about or avoid certain states of the world, <em>no matter what it learns</em>. A lot of people have already put a lot of thought into the issue of “good and bad world-states”, including very gnarly issues of how to agree on what they should be - what I’m proposing is a technological missing link, how to bridge from that level of abstraction to low-level neural network architectures.</p>
<p>This post is essentially a pitch for this research project, which as of right now we don’t have funding to do. We would have to begin with a deep study of the relationship between <em>preference</em> (the thing that actions optimise) and <em>loss</em> (the thing that machine learning optimises). This is a crossover that already exists: for example in the connection between softmax and Boltzmann distributions, where thermodynamics and entropy enter the picture uninvited yet again. But going forward I expect that categorical cybernetics, which has already built multiple new bridges between all of the involved fields (see this picture that I sketched a year ago), is going to have a lot to say about this, and we’re going to listen carefully to it.</p>
<p><img src="/assetsPosts/2024-03-18-learning-invariant-preferences/img1.jpg" alt="Mind map" /></p>
<p>There’s a few category-theoretic things I already have to say, but this post isn’t the best place for it. To give a hint: I suspect that preferences should be <em>coalgebraic</em> rather than algebraic according to the structural invariant learning machinery, because they describe the <em>output</em> of a neural network, as opposed to things like geometric invariant which describe the <em>input</em>.</p>
<h1>World-models</h1>
<p>The thing that will stop this being easy is that in a world of <a href="https://en.wikipedia.org/wiki/Complete_information">incomplete information</a>, such as the real world, agents with preferences can only act with respect to their <em>internal model</em> of the outside world. If we’re relying on invariant preferences for safety, they can only be as safe as the agent’s internal model is accurate. We would also have to worry about things like the agent systematically deceiving itself for long-term gain, as many humans do. The good news is that practitioners of RL have spent a long time working on the exact issue of accurately learning world-models, the first step being off-policy algorithms that decouple <em>exploration</em> (ie. world-model learning) from <em>exploitation</em> (ie. optimisation of rewards).</p>
<p>There is also an alternative possibility of <em>manually</em> imposing a human-engineered world-model rather than allowing the agent to learn it. This would be an absolutely monumental task of industrial-scale ontology, but it’s a big part of what <a href="https://www.aria.org.uk/what-were-working-on/#davidad">Davidad’s project</a> at the UK’s new ARIA agency aims to do. Personally I’m more bullish on learning world-models by provably-accurate RL at the required scale, but your mileage may vary, and in any case invariant preferences will be needed either way.</p>
<p>To wrap up: this is a project we’re thinking about and pursuing funding to actively work on. The “Algebraic Theory of Architecture” paper only dropped a few weeks ago as I’m writing this and opens up a whole world of new possibilities, of which invariant preferences is only one, and we want to strike while the iron is still hot.</p>Jules HedgesA system whose architecture has invariant preferences will act in a way to bring about or avoid certain states of the world, no matter what it learns. A lot of people have already put a lot of thought into the issue of good and bad world-states, including very gnarly issues of how to agree on what they should be - what I'm proposing is a technological missing link, how to bridge from that level of abstraction to low-level neural network architectures.The Attention-Seeking Rational Actor2024-03-15T00:00:00+00:002024-03-15T00:00:00+00:00https://cybercat-institute.github.io//2024/03/15/attention-seeking-rational-actor<p>Cross-posted from <a href="https://econpatterns.substack.com/p/the-attention-seeking-rational-actor">Oliver’s Substack blog, EconPatterns</a></p>
<p><em>The fundamental economic exchange is surprises for eyeballs.</em></p>
<p>Modern economics is built around understanding the mechanics of market exchange, but it hasn’t always been that way. The etymological root of economics, the Greek <a href="https://www.etymonline.com/word/economy">oikonomia</a> points toward household management, or husbandry of the (largely self-sufficient) estate, the oikos. Today we would call it home economics.</p>
<p>After discussing the fundamental grid of the economy in the <a href="2024-03-18-stocks-flows-transformations">last post</a>, it makes sense to lay out the underlying assumptions of human behavior within that economy in some detail — and both the title and the introductory statement (possibly the first pattern introduced) should make it clear that these assumptions differ somewhat from the traditional textbook treatment of economic agents.</p>
<p>But they also differ from the various attempts to bound the rationality assumptions of textbook economics in some way, be it in the Carnegie “<a href="https://en.wikipedia.org/wiki/Satisficing">satisficing</a>” or in the Berkeley “<a href="https://en.wikipedia.org/wiki/Behavioral_economics">behavioral</a>” tradition. It nevertheless incorporates both, in addition to a variety of other behavioral quirks which we might not associate with the economic realm.</p>
<p>The major reason to tweak our behavioral assumptions is that to design economic structures we need a coherent framework for dealing with a variety of settings in which we need to be able to apply a varying set of behavioral assumptions while still trying to stay coherent.</p>
<p>So it’s not so much a behavioral assumption but a template for developing context-specific behavioral assumptions — or in other words, a design pattern. Humans behave differently in different social settings, and we should be able to pick the right model for the right circumstances, but still be able to treat it as a special instantiation of a shared underlying pattern.</p>
<p>This explicitly includes using the assumption of perfect rationality wherever it is warranted.</p>
<p>So let’s grab our opening statement and take it apart.</p>
<p><img src="/assetsPosts/2024-03-15-attention-seeking-rational-actor/img1.jpg" alt="Woman's face" /></p>
<h1>Eyeballs</h1>
<p>“Eyeballs” is marketing vernacular for attention. The term can be taken quite literally — there are devices that track eyeball movement to find out how much screentime is spent staring at ads. But for the most part I will use it metaphorically as the cognitive effort devoted to a task.</p>
<p>It is perfectly fine to assume away cognitive limitations in a wide variety of circumstances. It simplifies our model significantly. It deflects accusations that a given policy claim is the outcome of an opportunistically chosen (boundedly rational) behavioral model rather than an underlying economic force. And in many scenarios it creates good-enough predictions for the task at hand.</p>
<p>Assumptions are simplifications that ideally give us more gain in parsimony than loss in predictive accuracy. As long as that’s what they do, they do their job.</p>
<p>But there are also situations where such an simplifying assumption produces results that stray too far from the observable reality, and we need to have a plan for how we want to adjust the behavioral model in those situations.</p>
<p>A fair starting assumption is to expect that the economic actor will allocate cognitive resources economically and allocate the most attention to those tasks where she expects the most bang for the buck. And that brings us to the other part of the statement.</p>
<h1>Surprises</h1>
<p>The economic expression for “expects most bang for the buck” is “maximum expected utility”, but this requires a lot of foreknowledge where we can’t simply assume under all circumstances that our economic actor already possesses it. Every time you see an economics paper assuming that our actor knows something about the distribution of a random variable you know we’re on shaky ground.</p>
<p>So the next level is to assume that our actor will venture to find out and acquire this knowledge step-by-step in what we can call a process of discovery — which usually means a sequence of failures that terminates either with a moment of success or the decision to call it off. In econspeak, this discovery process is known as tatônnement.</p>
<p>But we shouldn’t assume that our agent just wanders around in the desert aimlessly hoping to find an oasis — a stark example of such a discovery process with a life-or-death ending — but that there should be a plan behind those wanderings.</p>
<p>That plan is usually to devote the existing resources, cognitive and physical, in a way that maximizes the knowledge gained about the terrain. In our desert scenario this might translate to climbing to the top of a ridge to survey the territory, or alternatively to stay near the valley floor to limit exposure to sunlight.</p>
<p>We can call this process in two ways: uncovering secrets — where a secret is anything that wasn’t known before but is known after — or hunting for surprises.</p>
<p>Surprise expresses the same thing — some difference between what was known before vs what is known after — but it also gives us the opportunity to express it in two ways: positive surprise and negative surprise.</p>
<h1>The fundamental economic exchange is surprise for eyeballs</h1>
<p>Loosely translated, positive surprise is beneficial — something worth seeking out — and negative surprise is harmful — something to be avoided. On this single dimension we can build a (surprisingly) wide range of behavioral models, including differentiating individuals by their propensity to seek out positive surprise and accept negative surprise in the process, in other words by their affinity for disorder.</p>
<p>This has clear connotations to the behavioral assumption of risk preference, and this connection definitely warrants further attention — risk is a transferable economic commodity — but it also gives us the additional angle that planning is a vehicle to mitigate negative surprise for individual actors, and contracting is a vehicle to mitigate negative surprise for collective action, including the canonical form of collective action: the organization (which will be at the center of next week’s post).</p>
<p>A lot of this will be fleshed out in the weeks to come, and some of the jumping-off points should already be apparent. Surprise gives us the opportunity to invoke both information entropy and ultimately thermodynamic entropy. But as already mentioned, this series will only use these ideas conceptually, and point towards formal treatments in their respective literatures.</p>
<p>Design is a guided trial-and-error process where judgment calls have to be made about the structure of the problem, about splitting it into its constituent parts and putting the parts back together in the hope that no unwanted interaction effects emerge, about taking requirements and putting them in an order, about defining and resolving contingencies and dependencies, about the level of detail at which a problem needs to be resolved, at which precision, and how far into the future.</p>
<p>For this we need a flexible model of behavioral assumptions that can be adjusted to fit the task at hand, that can be experimented with. “Surprises for eyeballs”, or in other words, “secrets for attention”, gives us exactly that.</p>
<h1>The good old-fashioned attention economy</h1>
<p>There’s an obvious objection to this treatment, and it’s a fair one. “Surprise for eyeballs” is most obviously suited to the information economy, or maybe more aptly: the attention economy, and in the trad economy we might be better off dealing with the canonical exchange of supply vs demand in its trad form of an effort (a product or service) vs a payment.</p>
<p>Let me use George Akerlof’s famous essay on the <a href="https://www.jstor.org/stable/1879431">market for lemons</a> to show why even in a world of a one-off transfer of a physical object for a simultaneous transfer of a monetary equivalent is still a special case of an attention economy full of surprises.</p>
<p>Akerlof’s paper kicked off the field of information economics, and is most widely associated with introducing the concept of asymmetric information. But as the second half of its title suggests, it’s actually about quality competition (a “lemon” being a colloquial term for a used car of poor quality), and the information angle is about the inability of conveying this quality — especially about the inability of an owner of a high-quality car to establish that his car is not a lemon.</p>
<p>But how do we find out if a car is a lemon? And how do we insure ourselves against the risk of acquiring a lemon? By finding out.</p>
<p>In the same sense of the stranded-in-the-desert example above, the process of finding out is a discovery process except with opposite signs. It’s a sequence of successes terminated by a failure — which is true for all machines: they run until they break down.</p>
<p>But there’s an inevitable random element to this process, and even if we can assume that lemon-ness correlates negatively with longevity, that relationship is far from deterministic. We cannot conclude with certainty from the time of failure whether the car was a lemon — even if the prior owner knew about its lemon-ness.</p>
<p>This simple recognition has a wide array of ramifications worth taking apart in detail, because most of them are central to economic design — not only of economic engines like markets, auctions, recommenders or reputation engines, but also to the design of economic institutions. Notoriously, the business model of the Roman Catholic Church is that of a certifier of good conduct: a good old-fashioned reputation engine.</p>
<p>The tl;dr of this excursion is that almost all goods are experience goods in that their value only becomes apparent when they are consumed, and the consumption harbors the possibility for surprise, positive or negative.</p>
<p>If this happens over a longer time span like driving a car, if it happens immediately like eating ice cream, or if immediate consumption might trigger belated effects like getting toothache, depends on the circumstances.</p>
<p>But the canonical economic trade of a perfectly substitutable commodity of perfectly equal quality is a simplifying assumption resting on a lot of institutional underpinnings. Almost all trades, in the trad economy or the digital economy, contain an element of surprise, and in turn engage our propensity to shield ourselves from it, or to embrace it.</p>Oliver BeigeIn which we establish an underlying model for human behavior and claim that all economies are just a variation of the attention economy.Stocks, Flows, Transformations: The Cybernetic Economy2024-03-08T00:00:00+00:002024-03-08T00:00:00+00:00https://cybercat-institute.github.io//2024/03/08/stocks-flows-transformations<p>Cross-posted from <a href="https://econpatterns.substack.com/p/stocks-flows-transformations-the">Oliver’s Substack blog, EconPatterns</a></p>
<p>On a certain level of abstraction, an economy can be described as a network of stocks, flows, and transformations. Let’s call this level the cybernetic economy.</p>
<h1>Stocks, flows, transformations</h1>
<p>Stocks and flows are two fundamental forms of displacement: in time and space respectively, and they are typically restricted by upper and lower capacity constraints: overstock vs stockout, overflow vs desiccation.</p>
<p>Transformation in the usual sense of industrial production means the recombination of inputs to produce new outputs, but we can also include creation and consumption as starting and endpoints of network flow. In the case of natural resources, creation often takes the form of extraction.</p>
<p>The stocks and flows usually come in the form of information, materials, effort, payments, equipment, and on a more abstract level, risks, beliefs, rights, and commitments. Risk is just as much an economic good that can be transformed, bundled, disassembled, transported as any physical material.</p>
<p>Most of these objects should sound familiar from economic textbooks, especially macroeconomic textbooks. The cybernetic economy differs from this textbook treatment mostly by explicitly highlighting the network of interactions, and by stressing the global ramifications of local interactions.</p>
<p>This network view of the economy on the other hand should be familiar to anyone with a background in industrial production, where orchestrating multi-step processes on shop floors densely packed with machines, pathways, buffers, and assembly stations is a major part of the job description, and where stockouts of five-dollar parts can stop ten-million-an-hour assembly lines — as can pathways congested by improvised material buffer overflows.</p>
<p><img src="/assetsPosts/2024-03-08-stocks-flows-transformations/img1.jpg" alt="Shipping containers" /></p>
<p>Economics, especially macroeconomics, usually skips this operational layer for the sake of expositional expediency, and for the most part it does ok doing so. As long as the operational friction stays within bounds, no stocks and flows pushing against their upper or lower capacity limits, no production schedules foiled by unobtainable five-dollar components, we can safely assume a frictionless world and focus on the established gears and levers central to macroeconomic inquiry.</p>
<p>In other words, as long as there is only a modicum of disorder in the economy, it’s perfectly fine to assume a well-ordered economy.</p>
<p>Which underlines a key principle: the right level of aggregation matters. A map is not the territory, but we might need different maps to do different things within the territory. In the same sense we can drop operational details and aggregate activity on a high level as long as we can be sure that the loss of realism — the loss of predictability — is inconsequential for the task at hand.</p>
<p>But we should have a more fine-grained map at the ready just in case our survey map fails to capture the finer points.</p>
<h1>The cybernetic economy</h1>
<p>The economy we’re looking at is an economy that can be disaggregated and disassembled to the individual component, the individual participant, the individual activity, just as needed whenever it is needed.</p>
<p>I’m resurrecting the somewhat outmoded term “cybernetic” for it because it conveys the focus on flows, on routing, buffering, concatenating, on orchestrating activities and resources.</p>
<p>Routing, network flow, buffering, job shop scheduling, machine replacement models are all standard tools of the trade in operations research. They are no longer, or not yet again, standard tools in economics, but in order to describe the economic activities as intended, and to couch them in a wider social and political context, they should become economic tools again.</p>
<p>EconPatterns intends to bring them back together under the same motivation that it intends to bring mathematical, statistical and computational tools together: to build up a toolset which we can use to design economic objects.</p>
<p>But, and this is the conjurer’s trick, it’ll do so almost entirely without resorting to formal modeling or even mathematical notation. This is not out of nostalgia for an era where political economy was a branch of the philosophical faculties. The economy is as data rich as any field of inquiry and we seem to have just enough recognizable, repeating and generalizable patterns to give the scientific method a try.</p>
<p>But the point of the exercise is to develop an economic design language, to establish a conceptual foundation, rather to rephrase current economic knowledge. This is why it invokes the famous Bauhaus Vorkurs, the foundational course that gave the Bauhaus students a starting point from which to branch out into their respective workshops.</p>
<p>The things for which economics, mathematics, statistics, operations research, computer science, and other fields have developed very intricate formal mechanisms will pop up mostly as pointers. The question which sorting, filtering, or separating algorithm to use is relevant and often decisive to the success of an economic activity, but it is secondary to the question when to sort, filter or separate — and what.</p>
<p>Instead it will take very close looks — some might think unreasonably close looks but my hope is the reasons for doing so will reveal themselves in due time — at existing economic artifices and their constituent parts. One of the motivations is to show that the Grand Bazaar in Istanbul and an online e-commerce platform have surprisingly many things in common, and there’s a reason for it.</p>
<h1>An economic pattern language</h1>
<p>To this end, EconPatterns — and I believe this is the defining novelty — will borrow liberally from design theory and practice, as well as from architecture. The chosen container for this endeavor is Christopher Alexander’s design pattern. There are many reasons for this choice, not the least of which is that design patterns have successfully been translated from architecture to software design.</p>
<p>The in-depth discussion of “why design patterns?” surely deserves its own article, but it also introduces an interesting tension. As design philosophies go, Alexander and the Bauhaus stalwarts are certainly at opposing ends of the spectrum, A to B, organic to geometric, habitable spaces to machines for living.</p>
<p>I’m hoping to put this tension to good use. Designing economic contraptions poses relevant questions beyond their productivity and efficiency. Which is a major reason why I am not trying to resolve that conflict or take sides.</p>
<p>Admittedly, the whole endeavor is open-ended, and the crucial question if the patterns sketched out so far will ultimately come together as a coherent whole is still unresolved. This is why the blog format is the right one at this juncture: to put the question out in the open while I present the first pieces of the puzzle.</p>
<p>EconPatterns will inevitably be shaped by my own background and my own particular interests, which is one reason why economic organization will be the initial focus. The fundamental model of the economy is different, as is the underlying concept of human behavior (as next week’s entry will show). I’m somewhat inclined to say that there are not that many people out there with a background both in design and economics, so I’m quite comfortable in claiming that the exercise should offer sufficient novelty.</p>
<p>I’m also very clear that I don’t hold exclusive rights to the very concept of design patterns — if anything I might be the first practitioner to apply them to economic design problems — but the ultimate defining characteristic of a design pattern that sets them apart from economic laws is that they’re entirely voluntary. They are simply proposals of how to look at, structure, and solve a certain design problem, and the ultimate arbiter for their success is if enough practitioners will find them useful enough to apply them to express their ideas.</p>
<p>Which in itself should hopefully take much of the pedantry out of economic debates.</p>Oliver BeigeAn Economic Pattern Language (@econpatterns for short) takes the economy and disassembles it into its constituent parts. But first, this blog post describes the economy as a whole.Iteration with Optics2024-02-22T00:00:00+00:002024-02-22T00:00:00+00:00https://cybercat-institute.github.io//2024/02/22/iteration-optics<p>In this post I’ll describe the theory of how to add iteration to categories of optics. Iteration is required for almost all applications of categorical cybernetics beyond game theory, and is something we’ve been handling only semi-formally for some time. The only tool we need is already one we have inside the categorical cybernetics framework: parametrisation weighted by a lax monoidal functor. I’ll end with a conjecture that this is an instance of a general procedure to force states in a symmetric monoidal category.</p>
<p>This post is strongly inspired by the account of Moore machines in <a href="http://davidjaz.com/">David Jaz Myers</a>’ book <a href="http://davidjaz.com/Papers/DynamicalBook.pdf">Categorical Systems Theory</a>, and <a href="https://matteocapucci.wordpress.com/">Matteo</a>’s enthusiasm for it. There’s probably a big connection to things like <a href="https://arxiv.org/abs/1903.01093">Delayed trace categories</a>, but I don’t understand it yet.</p>
<p>The diagrams in this post are made with <a href="https://q.uiver.app/">Quiver</a> and <a href="https://varkor.github.io/tangle/">Tangle</a>.</p>
<h1>The iteration functor</h1>
<p>For the purposes of this post, we’ll be working with a symmetric monoidal category $\mathcal C$, and the category $\mathbf{Optic} (\mathcal C)$ of monoidal optics over it. Objects of $\mathbf{Optic} (\mathcal C)$ are pairs of objects of $\mathcal C$, and morphisms are given by the coend formula</p>
\[\mathbf{Optic} (\mathcal C) \left( \binom{X}{X'}, \binom{Y}{Y'} \right) = \int_{M : \mathcal C} \mathcal C (X, M \otimes Y) \times \mathcal C (M \otimes Y', X')\]
<p>which amounts to saying that an optic $\binom{X}{X’} \to \binom{Y}{Y’}$ is an equivalence class of triples</p>
\[(M : \mathcal C, f : X \to M \otimes Y, f' : M \otimes Y' \to X')\]
<p>I’m pretty sure everything in this post works for other categories of bidirectional processes such as mixed optics and dependent lenses, this is just a setting to write it down which is both convenient and not at all obvious.</p>
<p>The <strong>iteration functor</strong> is a functor $\mathrm{Iter} : \mathbf{Optic} (\mathcal C) \to \mathbf{Set}$ defined on objects by</p>
\[\mathrm{Iter} \binom{X}{X'} = \int_{M : \mathcal C} \mathcal C (I, M \otimes X) \times \mathcal C (M \otimes X', M \otimes X)\]
<p>We refer to elements of $\mathrm{Iter} \binom{X}{X’}$ as <em>iteration data</em> for $\binom{X}{X’}$. We call the object $M$ the <em>state space</em>, the morphism $x_0 : I \to M \otimes X$ the <em>initial state</em> and the morphism $i : M \otimes X’ \to M \otimes X$ the <em>iterator</em>.</p>
<p>Note that in the common case that $\mathcal C$ is cartesian monoidal, we can eliminate the coend to obtain a simpler characterisation:</p>
\[\mathrm{Iter} \binom{X}{X'} = \mathcal C (1, X) \times \mathcal C (X', X)\]
<p>Given an optic $f : \binom{X}{X’} \to \binom{Y}{Y’}$ given by $f = (N, f : X \to N \otimes Y, f’ : N \otimes Y’ \to X’)$, we get a function</p>
\[\mathrm{Iter} (f) : \mathrm{Iter} \binom{X}{X'} \to \mathrm{Iter} \binom{Y}{Y'}\]
<p>Namely, the state space is $M \otimes N$, the initial state is</p>
\[I \overset{x_0}\longrightarrow M \otimes X \xrightarrow{M \otimes f} M \otimes N \otimes Y\]
<p>and the iterator is</p>
\[M \otimes N \otimes Y' \xrightarrow{M \otimes f'} M \otimes X' \overset{i}\longrightarrow M \otimes X \xrightarrow{M \otimes f} M \otimes N \otimes Y\]
<p>This is evidently functorial. Funnily enough, although the action of $\mathrm{Iter}$ on objects when $\mathcal C$ is cartesian is easier to understand, its action on morphisms is less obvious and is not <em>evidently</em> functorial, instead demanding a small proof.</p>
<h1>Pairing iterators and continuations</h1>
<p>We have an existing functor $K : \mathbf{Optic} (\mathcal C)^{\mathrm{op}} \to \mathbf{Set}$, given on objects by $K \binom{X}{X’} = \mathcal C (X, X’)$. This is the <em>continuation functor</em>, and it is the contravariant functor represented by the monoidal unit $\binom{I}{I}$. (This functor first appeared in <a href="https://arxiv.org/abs/1711.07059">Morphisms of Open Games</a>.)</p>
<p>For the remainder of this section I’ll specialise to the case $\mathcal C = \mathbf{Set}$, in which case an optic $\binom{X}{X’} \to \binom{Y}{Y’}$ is determined by a pair of functions $f : X \to Y$ and $f’ : X \times Y’ \to X’$, and iteration data $i : \mathrm{Iter} \binom{X}{X’}$ is determined by an initial value $x_0 : X$ and a function $i : X’ \to X$.</p>
<p>Given iteration data and a continuation that agree on their common boundary, we know enough to run the iteration and produce an infinite stream of values:</p>
\[\left< - | - \right> : \mathrm{Iter} \binom{X}{X'} \times K \binom{X}{X'} \to X^\omega\]
<p>Namely, this stream is defined corecursively by</p>
\[\left< x_0, i | k \right> = x_0 : \left< i (k (x_0)), i | k \right>\]
<p>This operation is natural (technically, <em>dinatural</em>): for any iteration data $i : \mathrm{Iter} \binom{X}{X’}$, optic $f : \binom{X}{X’} \to \binom{Y}{Y’}$ and continuation $k : K \binom{Y}{Y’}$, we have</p>
\[\left< i | K (f) (k) \right> = f^\omega \left( \left< \mathrm{Iter} (f) (i) | k \right> \right)\]
<p>where $f^\omega (-) : X^\omega \to Y^\omega$ means applying the forwards pass of $f$ to every element of the stream. As a commuting diagram,</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/dinaturality.png" alt="Dinaturality" /></p>
<p>Here’s a tiny implementation of the iteration functor and the pairing operator in Haskell:</p>
<div class="language-haskell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">data</span> <span class="kt">Iterator</span> <span class="n">s</span> <span class="n">t</span> <span class="o">=</span> <span class="kt">Iterator</span> <span class="p">{</span>
<span class="n">initialState</span> <span class="o">::</span> <span class="n">s</span><span class="p">,</span>
<span class="n">updateState</span> <span class="o">::</span> <span class="n">t</span> <span class="o">-></span> <span class="n">s</span>
<span class="p">}</span>
<span class="n">mapIterator</span> <span class="o">::</span> <span class="kt">Lens</span> <span class="n">s</span> <span class="n">t</span> <span class="n">a</span> <span class="n">b</span> <span class="o">-></span> <span class="kt">Iterator</span> <span class="n">s</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">Iterator</span> <span class="n">a</span> <span class="n">b</span>
<span class="n">mapIterator</span> <span class="n">l</span> <span class="p">(</span><span class="kt">Iterator</span> <span class="n">s</span> <span class="n">f</span><span class="p">)</span> <span class="o">=</span> <span class="kt">Iterator</span> <span class="p">(</span><span class="n">s</span> <span class="o">^#</span> <span class="n">l</span><span class="p">)</span> <span class="p">(</span><span class="nf">\</span><span class="n">b</span> <span class="o">-></span> <span class="p">(</span><span class="n">f</span> <span class="p">(</span><span class="n">s</span> <span class="o">&</span> <span class="n">l</span> <span class="o">.~</span> <span class="n">b</span><span class="p">))</span> <span class="o">^#</span> <span class="n">l</span><span class="p">)</span>
<span class="n">runIterator</span> <span class="o">::</span> <span class="kt">Iterator</span> <span class="n">s</span> <span class="n">t</span> <span class="o">-></span> <span class="kt">Lens</span> <span class="n">s</span> <span class="n">t</span> <span class="nb">()</span> <span class="nb">()</span> <span class="o">-></span> <span class="p">[</span><span class="n">s</span><span class="p">]</span>
<span class="n">runIterator</span> <span class="p">(</span><span class="kt">Iterator</span> <span class="n">s</span> <span class="n">f</span><span class="p">)</span> <span class="n">l</span> <span class="o">=</span> <span class="n">s</span> <span class="o">:</span> <span class="n">runIterator</span> <span class="p">(</span><span class="kt">Iterator</span> <span class="p">(</span><span class="n">f</span> <span class="p">(</span><span class="n">s</span> <span class="o">&</span> <span class="n">l</span> <span class="o">.~</span> <span class="nb">()</span><span class="p">))</span> <span class="n">f</span> <span class="p">)</span> <span class="n">l</span>
</code></pre></div></div>
<h1>The category of elements of Iterator</h1>
<p>The next step is to form the category of elements $\int \mathrm{Iter}$, also known as the discrete Grothendieck construction. This is a category whose objects are tuples $\left( \binom{X}{X’}, i \right)$ of an object $\binom{X}{X’}$ of $\mathbf{Optic} (\mathcal C)$ and a choice of iteration data $i : \mathrm{Iter} \binom{X}{X’}$. A morphism $\left( \binom{X}{X’}, i \right) \to \left( \binom{Y}{Y’}, j \right)$ is an optic $f : \binom{X}{X’} \to \binom{Y}{Y’}$ with the property that $\mathrm{Iter} (f) (i) = j$, that is to say, the iteration data on the left and right boundary have to agree.</p>
<p>The functor $\mathrm{Iter} : \mathbf{Optic} (\mathcal C) \to \mathbf{Set}$ is lax monoidal: there is an evident natural way to combine pairs of iteration data into iteration data for pairs:</p>
\[\nabla : \mathrm{Iter} \binom{X}{X'} \times \mathrm{Iter} \binom{Y}{Y'} \to \mathrm{Iter} \binom{X \otimes Y}{X' \otimes Y'}\]
<p>This means that the tensor product of $\mathbf{Optic} (\mathcal C)$ lifts to $\int \mathrm{Iter}$, by</p>
\[\left( \binom{X}{X'}, i \right) \otimes \left( \binom{Y}{Y'}, j \right) = \left( \binom{X \otimes Y}{X' \otimes Y'}, i \nabla j \right)\]
<p>The category $\int \mathrm{Iter}$ can essentially already describe iteration with optics, although in a slightly awkward way. Suppose we draw a string diagram that not coincidentally resembles a control loop:</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/closed-control-loop.png" alt="Control loop" /></p>
<p>Here, $f$ and $f’$ denote some morphisms $f : X \to Y$ and $f’ : Y \to X$ in our underlying category, and $x_0$ represents an initial state $x_0 : I \to X$.</p>
<p>Normally string diagrams denote morphisms of a monoidal category, but we make a cut just to the right of the backwards-to-forwards turning point, and consider that everything left of that is describing a boundary object. Namely in this case, we have the object $\left( \binom{X}{X}, i \right)$ where the iteration data $i : \mathrm{Iter} \binom{X}{X}$ is given by the state space $I$, the initial state $x_0 : I \to I \otimes X$ and the iterator $\mathrm{id} : I \otimes X \to I \otimes X$.</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/cut-control-loop.png" alt="Control loop" /></p>
<p>The remainder of the string diagram to the right of the cut denotes an ordinary optic $f : \binom{X}{X} \to \binom{I}{I}$, namely the one given by $f = (Y, f, f’)$, with forwards pass $f : X \to Y \otimes I$ and backwards pass $f’ : Y \otimes I \to X$. This boils down to describing the composite morphism $f; f’ : X \to X$.</p>
<p>Overall, we can read this diagram as denoting a morphism $f$ in $\int \mathrm{Iter}$ of type $f : \left( \binom{X}{X}, i \right) \to \left( \binom{I}{I}, \mathrm{Iter} (f) (i) \right)$. The iteration data on the right boundary is $\mathrm{Iter} (f) (i) : \mathrm{Iter} \binom{I}{I}$, which concretely has state space $Y$, the initial state $x_0; f : I \to Y$ and iterator $f’; f : Y \to Y$.</p>
<p>This works in principle, but splitting the diagram between denoting an object and denoting a morphism is very non-standard. So far, this amounts to doing for the iteration functor what we did for the selection functions functor in section 6 of <a href="https://arxiv.org/abs/2105.06332">Towards Foundations of Categorical Cybernetics</a>.</p>
<h1>The full theory of iteration</h1>
<p>Now we take the final step to fix the slight clunkiness of using $\int \mathrm{Iter}$ as a model of iteration. This continues the firmly established pattern that categorical cybernetics contains only two ideas that get combined in more and more intricate ways: optics and parametrisation.</p>
<p>There is a strong monoidal functor $\pi : \int \mathrm{Iter} \to \mathbf{Optic} (\mathcal C)$ that forgets the iteration data, namely the discrete fibration $\pi \left( \binom{X}{X’}, i \right) = \binom{X}{X’}$. This functor generates an action of the monoidal category $\int \mathrm{Iter}$ on $\mathbf{Optic} (\mathcal C)$, namely</p>
\[\left( \binom{X}{X'}, i \right) \bullet \binom{Y}{Y'} = \binom{X \otimes Y}{X' \otimes Y'}\]
<p>See section 5.5 of <a href="https://arxiv.org/abs/2203.16351">Actegories for the Working Amthematician</a> for far too much information about actegories of this form.</p>
<p>We now take the category $\mathbf{Para}_{\int \mathrm{Iter}} (\mathbf{Optic} (\mathcal C))$ of parametrised morphisms generated by this action. We also refer to this kind of thing (parametrisation for the action generated by a discrete fibration) as the Para construction <em>weighted</em> by $\mathrm{Iter}$, $\mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C))$ - the name comes from it being a kind of <a href="https://ncatlab.org/nlab/show/weighted+limit">weighted limit</a> and I think the reference for this is <a href="https://www.brunogavranovic.com/">Bruno</a>’s PhD thesis, which is sadly unreleased as I’m writing this.</p>
<p>Working things through: an object of $\mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C))$ is still a pair $\binom{X}{X’}$, but a morphism $\binom{X}{X’} \to \binom{Y}{Y’}$ consists of three things: another pair of objects $\binom{Z}{Z’}$, iteration data $i : \mathrm{Iter} \binom{Z}{Z’}$, and an optic $\binom{X \otimes Z}{X’ \otimes Z’} \to \binom{Y}{Y’}$.</p>
<p>Now suppose we have a diagram of an open control loop, that is to say, a control loop that is open-as-in-systems (not to be confused with an <a href="https://en.wikipedia.org/wiki/Open-loop_controller">open loop controller</a>, which is unrelated):</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/open-control-loop.png" alt="Open control loop" /></p>
<p>Here the primitive morphisms in the diagram are $f : A \otimes X \to B \otimes Y$, $f’ : B’ \otimes Y \to A’ \otimes X$, and an initial state $x_0 : I \to X$. The idea is that $f$ is the forwards pass, $f’$ is the backwards pass, and after the backwards pass comes another forwards pass but one time step in the future.</p>
<p>To make formal sense of this diagram, we imagine that we deform the backwards-to-forwards bend upwards, treating the state as a parameter, and then cut the diagram as we did before:</p>
<p><img src="/assetsPosts/2024-02-20-iteration-optics/cut-open-control-loop.png" alt="Cut open control loop" /></p>
<p>Now we can read this off as a morphism $\binom{X}{X’} \to \binom{Y}{Y’}$ in $\mathbf{Para}^\mathrm{Iter} (\mathbf{Optic} (\mathcal C))$. The (weighted) Para construction makes everything go smoothly, so this is an entirely standard string diagram with no funny stuff.</p>
<p>Technically categories of parametrised morphisms are always bicategories (or better, double categories), and I think this is a rare case where we actually want to quotient out all morphisms in the vertical direction, i.e. identify $\left( f : \binom{X \otimes Z}{X’ \otimes Z’} \to \binom{Y}{Y’}, i : \mathrm{Iter} \binom{Z}{Z’} \right)$ with $\left( g : \binom{X \otimes W}{X’ \otimes W’} \to \binom{Y}{Y’}, j : \mathrm{Iter} \binom{W}{W’} \right)$ whenever there is <em>any</em> optic $h : \binom{Z}{Z’} \to \binom{W}{W’}$ making $\mathrm{Iter} (h) (i) = j$ and commuting with $f$ and $g$. Coming back to our earlier picture of cutting a string diagram, this exactly says that we identify all of the different ways we could make the cut. In order to do this we change the base of enrichment along the functor $\pi_0 : \mathbf{Cat} \to \mathbf{Set}$ taking each category to its set of connected components.</p>
<p>One final note: Almost everything in this post used nothing but the fact that $\mathrm{Iter}$ is a lax monoidal functor $\mathbf{Optic} (\mathcal C) \to \mathbf{Set}$. With minimal translation, I think the entire thing works as a story about “forcing states in a symmetric monoidal category”: given any symmetric monoidal category $\mathcal C$ and a lax monoidal functor $F : \mathcal C \to \mathbf{Set}$, the category $\mathbf{Para}^F (\mathcal C)$ is equivalently described as $\mathcal C$ freely extended with a morphism $x : I \to X$ for every $x : F (X)$. I’ll leave this as a conjecture for somebody else to prove.</p>Jules HedgesIn this post I'll describe the theory of how to add iteration to categories of optics. Iteration is required for almost all applications of categorical cybernetics beyond game theory, and is something we've been handling only semi-formally for some time. The only tool we need is already one we have inside the categorical cybernetics framework: parametrisation weighted by a lax monoidal functor. I'll end with a conjecture that this is an instance of a general procedure to force states in a symmetric monoidal category.Passive Inference is Compositional, Active Inference is Emergent2024-02-06T00:00:00+00:002024-02-06T00:00:00+00:00https://cybercat-institute.github.io//2024/02/06/passive-inference-compositional<p>This post is a writeup of a talk I gave at the <a href="https://amcs-community.org/events/causal-cognition-humans-machines/">Causal Cognition in Humans and Machines</a> workshop in Oxford, about some work in progress I have with <a href="https://tsmithe.net/">Toby Smithe</a>. To a large extent this is my take on the theoretical work in Toby’s PhD thesis, with the emphasis shifted from category theory and neuroscience to numerical computation and AI. In the last section I will outline my proposal for how to build AGI.</p>
<h2>Markov kernels</h2>
<p>The starting point is the concept of a <a href="https://en.wikipedia.org/wiki/Markov_kernel">Markov kernel</a>, which is a synonym for <a href="https://en.wikipedia.org/wiki/Conditional_probability_distribution">conditional probability distribution</a> that sounds unnecessarily fancy but, crucially, contains only 30% as many syllables. If $X$ and $Y$ are some sets then a Markov kernel $\varphi$ from $X$ to $Y$ is a conditional probability distribution $\mathbb P_\varphi [y \mid x]$. Most of this post will be agnostic to what exactly “probability distribution” can mean, but in practice it will <em>probably</em> eventually mean “Gaussian”, in order to <a href="https://knowyourmeme.com/memes/money-printer-go-brrr">go <em>brrr</em></a>, by which I mean <em>effective in practice at the expense of theoretical compromise</em>. (I blatantly stole this usage of that meme from <a href="https://www.brunogavranovic.com/">Bruno</a>.)</p>
<p>There are two different perspectives on how Markov kernels can be implemented. They could be <em>exact</em>, for example, they could be represented as a stochastic matrix (in the finite support case) or as a tensor containing a mean vector and covariance matrix for each input (in the Gaussian case). Alternatively they could be <a href="https://en.wikipedia.org/wiki/Monte_Carlo_method">Monte Carlo</a>, that is, implemented as a function from $X$ to $Y$ that may call a pseudorandom number generator. If we send the same input repeatedly then the outputs are samples from the distribution we want. Importantly these functions satisfy the <a href="https://en.wikipedia.org/wiki/Markov_property">Markov property</a>: the distribution on the output depends only on the current input and not on any internal state.</p>
<p>An important fact about Markov kernels is that they can be composed. Given a Markov kernel $\mathbb P_\varphi [y \mid x]$ and another $\mathbb P_\psi [z \mid y]$, there is a composite kernel $\mathbb P_{\varphi; \psi} [z \mid x]$ obtained by integrating out $y$:</p>
\[\mathbb P_{\varphi; \psi} [z \mid x] = \int \mathbb P_\varphi [y \mid x] \cdot \mathbb P_\psi [z \mid y] \, dy\]
<p>This formula is sometimes given the unnecessarily fancy name <a href="https://en.wikipedia.org/wiki/Chapman%E2%80%93Kolmogorov_equation">Chapman-Kolmogorov equation</a>. If we represent kernels by stochastic matrices, then this is exactly matrix multiplication; if they are Gaussian tensors, then it’s a similar but slightly more complicated operation. Doing exact probability for anything more complicated is extremely hard in practice because of the <a href="https://en.wikipedia.org/wiki/Curse_of_dimensionality">curse of dimensionality</a>.</p>
<p>If we represent kernels by Monte Carlo funtions, then composition is literally just function composition, which is extremely convenient. That is, we can just send particles through a chain of functions and they’ll come out with the right distribution - this fact is basically what the term “Monte Carlo” actually means.</p>
<p>A special case of this is an ordinary (non-conditional) probability distribution, which can be usefully thought of as a Markov kernel whose domain is a single point. Given a distribution $\mathbb P_\pi [x]$ and a kernel $\mathbb P_\varphi [y \mid x]$ we can obtain a distribution $\pi; \varphi$ on $y$, known as the <em>pushforward distribution</em>, by integrating out $x$:</p>
\[\mathbb P_{\pi; \varphi} [y] = \int \mathbb P_\pi [x] \cdot \mathbb P_\varphi [y \mid x] \, dx\]
<h2>Bayesian inversion</h2>
<p>Suppose we have a Markov kernel $\mathbb P_\varphi [y \mid x]$ and we are shown a sample of its output, but we can’t see what the input was. What can we say about the input? To do this, we must start from some initial belief about how the input was distributed: a <em>prior</em> $\mathbb P_\pi [x]$. After observing $y$, <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes’ law</a> tells us how we should modify our belief to a <em>posterior distsribution</em> that accounts for the new evidence. The formula is</p>
\[\mathbb P [x \mid y] = \frac{\mathbb P_\varphi [y \mid x] \cdot \mathbb P_\pi [x]}{\mathbb P_{\pi; \varphi} [y]}\]
<p>The problem of computing posterior distributions in practice is called <a href="https://en.wikipedia.org/wiki/Bayesian_inference">Bayesian inference</a>, and is very hard and very well studied.</p>
<p>If we fix $\pi$, it turns out that the previous formula for $\mathbb P [x \mid y]$ defines a Markov kernel from $Y$ to $X$, giving the posterior distribution for each possible observation. We call this the <em>Bayesian inverse</em> of $\varphi$ with respect to $\pi$, and write $\mathbb P_{\varphi^\dagger_\pi} [x \mid y]$.</p>
<p>The reason we can have $y$ as the input of the kernel but we had to pull out $\pi$ as a parameter is that the formula for Bayes’ law is <em>linear</em> in $y$ but <em>nonlinear</em> in $\pi$. This nonlinearity is really the thing that makes Bayesian inference hard.</p>
<p>Technically, Bayes’ law only considers <em>sharp</em> evidence, that is, we observe a particular point $y$. Considering inverse Markov kernels also gives us a way of handling <em>noisy</em> evidence, such as stochastic uncertainty in a measurement, by pushing forward a distribution $\mathbb P_\rho [y]$ to obtain $\mathbb P_{\rho; \varphi^\dagger_\pi} [x]$. This way of handling noisy evidence is sometimes called a <em>Jeffreys update</em>, and contrasted with a different formula called a <em>Pearl update</em> - see <a href="https://arxiv.org/abs/1807.05609">this paper</a> by <a href="https://www.cs.ru.nl/B.Jacobs/">Bart Jacobs</a>. Pearl updates have very different properties and I don’t know how they fit into this story, if at all. Provisionally, I consider the story of this post as evidence that Jeffreys updates are “right” in some sense.</p>
<h2>Deep inference</h2>
<p>So far we’ve introduced 2 operations on Markov kernels: composition and Bayesian inversion. Are they related to each other? The answer is a resounding <em>yes</em>: they are related by the formula</p>
\[(\varphi; \psi)^\dagger_\pi = \psi^\dagger_{\pi; \varphi}; \varphi^\dagger_\pi\]
<p>We call this the <em>chain rule</em> for Bayesian inversion, because of its extremely close resemblance to the chain rule for transpose Jacobians that underlies backpropagation in neural networks and differentiable programming:</p>
\[J^\top_x (f; g) = J^\top_{f (x)} (g) \cdot J^\top_x (f)\]
<p>The Bayesian chain rule is <em>extremely</em> folkloric. I conjectured it in 2019 while talking to Toby, and he proved it a few months later, writing it down in his unpublished preprint <a href="https://arxiv.org/abs/2006.01631">Bayesian Updates Compose Optically</a>. It’s definitely not new - <em>some</em> people already know this fact - but extremely few, and we failed to find it written down in a single place. (I feel like it should have been known by the 1950s at the latest, when things like dynamic programming were being worked out. Perhaps it’s one of the things that was well known in the Soviet Union but wasn’t discovered in the West until much later.) The first place Toby <em>published</em> this fact was in <a href="https://arxiv.org/abs/2305.06112">The Compositional Structure of Bayesian Inference</a> with <a href="https://dylanbraithwaite.github.io/about.html">Dylan Braithwaite</a> and me, which fixed a minor problem to do with zero-probability observations in a nice way.</p>
<p>What this formula tells us is that if we have a Markov kernel with a known factorisation, we can compute Bayesian posteriors efficiently if we already know the Bayesian inverse of each factor. Since this is exactly the same form as differentiable programming, we have good evidence that it can go <em>brrr</em>. At first I thought it was completely obvious that this must be how compilers for probabilistic programming languages work, but it turns out this is not the case at all, probabilistic programming languages are monolithic. I’ve given this general methodology for computing posteriors compositionally the catchy name <em>deep inference</em>, by its very close structural resemblance to deep learning.</p>
<h2>Variational inference</h2>
<p>I wrote “we can compute Bayesian posteriors efficiently if we already know the Bayesian inverse of each factor”, but this is still a big <em>if</em>: computing posteriors even of simple functions is still hard if the dimensionality is high. Numerical methods are used in practice to approximate the posterior, and we would like to make use of these while still exploiting compositional structure.</p>
<p>The usual way of approximating a Bayesian inverse $\varphi^\dagger_\pi$ is to cook up a functional form $\varphi^\prime_\pi (p)$ that depends on some parameters $p \in \mathbb R^N$. Then we find a loss function on the parameters with the property that minimising it causes the approximate inverse to converge to the exact inverse, ie. $\varphi^\prime_\pi (p) \longrightarrow \varphi^\dagger_\pi$. This is called <em>variational inference</em>.</p>
<p>There are many ways to do this. Probably the most common loss function in practice is <a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence">KL divergence</a> (aka <em>relative entropy</em>),</p>
\[\mathbf{KL} (\varphi^\dagger_\pi, \varphi^\prime_\pi (p)) = \int \mathbb P_{\varphi^\dagger_\pi} [x \mid y] \log \frac{\mathbb P_{\varphi^\dagger_\pi} [x \mid y]}{\mathbb P_{\varphi^\prime_\pi (p)} [x \mid y]} \, dx\]
<p>This expression is a function of $y$, which can optionally also be integrated over (but the next paragraph reveals a better way to use it). A closely related alternative is <a href="https://en.wikipedia.org/wiki/Evidence_lower_bound">variational free energy</a>, which despite being more complicated to define is more computationally tractable.</p>
<p>Ideally, we would like to use a functional form for which we can derive an analytic formula that tells us exactly how we should update our parameters to decrease the loss, given (possibly batched) Monte Carlo samples that are assumed to be drawn from a distribution in a certain class, such as Gaussians.</p>
<p>Of course in 2024 if you are <em>serious</em> then the functional form you use is a deep neural network, and you replace your favourite loss function by its derivative. I refer to this version as <em>deep variational inference</em>. There is no fundamental difference in theory, but in practice deep variational inference is necessary in order to go <em>brrr</em>.</p>
<h2>Passive inference is compositional</h2>
<p>Now, suppose we have two Markov kernels $\mathbb P_\varphi [y \mid x]$ and $\mathbb P_\psi [z \mid y]$ which we compose. Suppose we have a prior $\mathbb P_\pi [x]$ for $\varphi$, which pushes forward to a prior $\mathbb P_{\pi; \varphi} [y]$ for $\psi$. We pick a functional form for approximating each Bayesian inverse, which we call $\mathbb P_{\varphi^\prime_\pi (p)} [x \mid y]$ and $\mathbb P_{\psi^\prime_{\pi; \varphi} (q)} [y \mid z]$.</p>
<p>Doing this requires a major generalisation of our loss function. This was found by Toby Smithe in <a href="https://arxiv.org/abs/2109.04461">Compositional active inference 1</a>. The method he developed comes straight from <a href="https://arxiv.org/abs/1603.04641">compositional game theory</a>, and this appearance of virtually identical structure in game theory and Bayesian inference is absolutely one of the most core ideas of <a href="https://cybercat-institute.github.io/2022/05/29/what-is-categorical-cybernetics/">categorical cybernetics</a> as I envision it.</p>
<p>The idea is to define the loss of an approximate inverse to a kernel $\varphi : X \to Y$ in a <em>context</em> that includes not only a prior distribution on $X$, but also a (generally nonlinear) function $k$ called the <em>continuation</em>, that transforms probability distributions on $Y$. The continuation is a black box that describes how predictions transform into observations. Then when $y$ appears free in the expressions for KL divergence and variational free energy, we integrate it over the distribution $k (\pi; \varphi)$.</p>
<p>So for our composite kernel $\varphi; \psi$, as well as the prior $\pi$ on $X$ we also have a continuation $k$ that transforms distributions on $Z$. In order to optimise the parameters $(p, q)$ in this context, we divide them into two sub-problems:</p>
<ul>
<li>Optimise the parameters $p$ for $\varphi$ in the context given by the prior $\pi$ on $X$ and the continuation $k’$ on $Y$ given by $k’ (\sigma) = k (\sigma; \psi); \psi’_\sigma (q)$</li>
<li>Optimise the parameters $q$ for $\psi$ in the context given by the prior $\pi; \varphi$ on $Y$ and the continuation $k$ on $Z$</li>
</ul>
<p>Notice that the optimisation step for $p$ involves the current value of $q$, but not vice versa. It is easy to prove that this method correctly converges to the total Bayesian inverse by a dynamic programming argument, if we first optimise $q$ to convergence and then optimise $p$. However, Toby and me conjecture that this procedure also converges if $p$ and $q$ are optimised asynchronously, which means the procedure can be parallelised.</p>
<h2>Active inference is emergent</h2>
<p>The convergence conjecture in the previous section crucially relies on the fact that the prediction kernels $\varphi$ and $\psi$ are fixed, and we are only trying to approximate their Bayesian inverses. That is why I referred to it as <em>passive inference</em>. The term <em>active inference</em> means several different things (more on this in the next section) but one thing it should mean is that we simultaneously learn to do both prediction and inference.</p>
<p>Toby and me think that if we do this, the compositionality result breaks. In particular, if we also have a parametrised family of prediction kernels $\varphi (p)$ which converge to our original kernel $\varphi$, it is <em>not</em> the case that</p>
\[\psi^\prime_{\pi; \varphi (p)} (q); \varphi^\prime_\pi (p) \longrightarrow (\varphi; \psi)^\dagger_\pi\]
<p>Specifically, we think that the nonlinear dependency of $\psi^\prime_{\pi; \varphi (p)} (q)$ on $\varphi^\prime (p)$ causes things to go wrong.</p>
<p>One way of saying this negative conjecture is: <em>compositional active inference can fail to converge to true beliefs, even in a stationary environment</em>. The main reason you’d want to do this anyway, even at the expense of getting the wrong answer, is that it might go <em>brrr</em> - but whether this is really true remains to be seen.</p>
<p>We can, however, put a positive spin on this negative result. I am known for the idea that <em>the opposite of compositionality is emergence</em>, from <a href="https://julesh.com/2017/04/22/on-compositionality/">this blog post</a>. A compositional active inference system does not behave like the sum of its parts. The interaction between components can prevent them from learning true beliefs, but can it do anything positive for us? So far we know nothing about how this emergent learning dynamics behaves, but our optimistic hope is that it could be responsible for what is normally called things like <em>intelligence</em> and <em>creativity</em> - on the basis that there aren’t many other places that they could be hiding.</p>
<h2>How to build a brain</h2>
<p>Boosted by the last paragraph, we now fully depart the realm of mathematical conjecture and enter the outer wilds of hot takes, increasing in temperature towards the end.</p>
<p>So far I’ve talked about active inference but not mentioned what is probably the most important thing in the cloud of ideas around the term: conflating <em>prediction</em> and <em>control</em>. Ordinarily, we would think of $\mathbb P_{\pi; \varphi} [y]$ as <em>prediction</em> and $\mathbb P_{\varphi^\dagger_\pi} [x \mid y]$ as <em>inference</em>. However it has been proposed (I believe the idea is due to <a href="https://www.fil.ion.ucl.ac.uk/~karl/">Karl Friston</a>) that in the end $\mathbb P_{\pi; \varphi} [y]$ is interpreted as a command: at the end of a chain of prediction-inference devices comes an actuator designed to act on the external environment in order to (try to) make the prediction true. That is, a prediction like “my arm will rise” is <em>the same thing</em> as the command “lift my arm” when connected to my arm muscles.</p>
<p>This lets us add one more piece to the puzzle, namely <em>reinforcement learning</em>. A deep active inference system can interact with an environment (either the real world or a simulated environment), by interpreting its ultimate predictions as commands, effecting those commands into the environment, and responding with fresh observations. Over time, the system should learn to predict the response of the environment, that is to say, it will learn an <em>internal model</em> of its environment. If several different active inference systems interact with the same environment, then we should consider the environment of each to contain the others, and expect each to learn a model of the others, recursively.</p>
<p>I am not a neuroscientist, but I understand it is at least plausible that the compositional structure of the mammalian cortex exactly reflects the compositional structure of deep active inference. The cortex is shaped (in the sense of connectivity) approximately like a pyramid, with both sensory and motor areas at the bottom. In particular, the brain is <em>not</em> a <a href="https://en.wikipedia.org/wiki/Series_of_tubes">series of tubes</a> with sensory signals going in at one end and motor signals coming out at the other end. Obviously the basic pyramid shape must be modified with endless ad-hoc modifications at every scale developed by evolution for various tasks. So following Hofstadter’s <a href="http://bert.stuy.edu/pbrooks/fall2014/materials/HumanReasoning/Hofstadter-PreludeAntFugue.pdf">Ant Fugue</a>, I claim <em>the cortex is shaped like an anthill</em>.</p>
<p>The idea is that the hierarchical structure is roughly an <em>abstraction</em> hierarchy. Predictions (aka commands) $\mathbb P_\varphi [y \mid x]$ travel down the hierarchy (towards sensorimotor areas), transforming predictions at a higher level of abstraction $\mathbb P_\pi [x]$ into predictions at a lower level of abstraction $\mathbb P_{\pi; \varphi} [y]$. Inferences $\mathbb P_{\varphi^\dagger_\pi} [x \mid y]$ travel up the hierarchy (away from sensorimotor areas), transforming observations at a lower level of abstraction $\mathbb P_\rho [y]$ into observations at a higher level of abstraction $\mathbb P_{\rho; \varphi^\dagger_\pi} [x]$.</p>
<p>Given this circularity, with observations depending on predictions recursively through many layers, I expect that the system will learn to predict <em>sequences</em> of inputs (as any recursive neural network does, and notably <em>transformers</em> do extremely successfully) - and also <em>sequences of sequences</em> and so on. I predict that stability will increase up the hierarchy - that is, updates will usually be smaller at higher levels - so that at least conceptually, higher levels run on a slower timescale than lower levels. This comes back to ideas I first read almost 15 years ago in the book <a href="https://us.macmillan.com/books/9780805078534/onintelligence">On Intelligence</a> by Jeff Hawkins and Sandra Blakeslee.</p>
<p>Conceptually, this is exactly the same idea I wrote about in <a href="https://link.springer.com/chapter/10.1007/978-3-031-08020-3_9">chapter 9</a> of <a href="https://link.springer.com/book/10.1007/978-3-031-08020-3">The Road to General Intelligence</a> - the main difference is that now I think I have a good idea how to actually compute commands and observations in practice, whereas back then I hand-crafted a toy proof of concept.</p>
<p>If both sensory and motor areas are at the bottom of the hierarchy, this raises the obvious question of what is at the <em>top</em>. It probably has something to do with long term memory formation, but it is almost impossible to not be thinking about <em>consciousness</em> at this point. I’m going to step back from this so that the hot takes in this post don’t reach their ignition temperature before the next paragraph.</p>
<p>The single hottest take that I genuinely believe is that <em>deep variational reinforcement learning is all you need</em>, and is the only conceptually plausible route to what is sometimes sloppily called “AGI” and what I refer to in private as “true intelligence”.</p>
<p>I should mention that none of my collaborators is as optimistic as me that <em>deep variational reinforcement sequence learning is all you need</em>. Uniquely among my collaborators, I am a hardcore connectionist and I believe good old fashioned symbolic methods have no essential role to play. Time will tell.</p>
<p>My long term goal is <em>obviously</em> to build this, if it works. My short term goal is to build some baby prototypes starting with passive inference, to verify and demonstrate that what works in theory also works in practice. So watch this space, because the future might be wild…</p>Jules HedgesThis post is a writeup of a talk I gave at the Causal Cognition in Humans and Machines workshop in Oxford, about some work in progress I have with Toby Smithe. To a large extent this is my take on the theoretical work in Toby's PhD thesis, with the emphasis shifted from category theory and neuroscience to numerical computation and AI. In the last section I will outline my proposal for how to build AGI.How to Stay Locally Safe in a Global World2024-01-16T00:00:00+00:002024-01-16T00:00:00+00:00https://cybercat-institute.github.io//2024/01/16/How%20to%20Stay%20Locally%20Safe%20in%20a%20Global%20World<p>Cross-posted from <a href="https://jadeedenstarmaster.wordpress.com/">Jade’s blog</a>: parts <a href="https://jadeedenstarmaster.wordpress.com/2023/12/06/how-to-stay-locally-safe-in-a-global-world/">1</a>, <a href="https://jadeedenstarmaster.wordpress.com/2023/12/17/how-to-stay-locally-safe-in-a-global-world-part-ii-defining-a-world-and-stating-the-problem/">2</a>, <a href="https://jadeedenstarmaster.wordpress.com/2023/12/17/how-to-stay-locally-safe-in-a-global-world-part-iii-the-global-safety-poset/">3</a></p>
<h2>Introduction</h2>
<p>Suppose your name is $x$ and you have a very important state machine $N_x : S_x \times \Sigma \to \mathcal{P}(S_x)$ that you cherish with all your heart. Because you love this state machine so much, you don’t want it to malfunction and you have a subset $P \subseteq S_x$ which you consider to be safe. If your state machine ever leaves this safe space you are in big trouble so you ask the following question. If you start in some subset $I \subseteq P$ will your state machine $N_x$ ever leave $P$? In math, you ask if</p>
\[\mu (\blacksquare(-) \cup I) \subseteq P\]
<p>where $\mu$ is the least fixed point and $\blacksquare(-)$ indicates the next-time operator of the cherished state machine. What is the next-time operator?</p>
<p>Definition: For a function $N : X \times \Sigma \to \mathcal{P}(Y)$ there is a monotone function $\blacksquare_N : \mathcal{P}(X) \to \mathcal{P}(Y)$ given by</p>
\[\blacksquare_N(A) = \bigcup_{a \in A} \bigcup_{s \in \Sigma} N(a,s)\]
<p>In layspeak the next-time operator sends a set of states the set of all possible successors of those states.</p>
<p>In a perfect world you could use these definitions to ensure safety using the formula</p>
\[\mu (\blacksquare(-) \cup I) = \bigcup_{n=0}^{\infty} (\blacksquare ( - ) \cup I)^n\]
<p>or at least check safety up to an arbitrary time-step $n$ by computing this infinite union one step at a time.</p>
<p>Unfortunately there is a big problem with this method! Your state machine does not exist in isolation. You have a friend whose name is $y$ with their own state machine $N_y : S_y \times \Sigma \to \mathcal{P} (S_y)$. $y$ has the personal freedom to run their state machine how they like but there are functions</p>
\[N_{xy} : S_x \times \Sigma \to \mathcal{P}(S_y)\]
<p>and</p>
\[N_{yx} : S_y \times \Sigma \to \mathcal{P}(S_x)\]
<p>which allow states of your friend’s machine to change the states of your own and vice-versa. Making matters worse, there is a whole graph $G$ whose vertices are your friends and whose edges indicate that the corresponding state machines may effect each other. How can you be expected to ensure safety under these conditions?</p>
<p>But don’t worry, category theory comes to the rescue. In the next sections I will:</p>
<ul>
<li>State my model of the world and the local-to-global safety problem for this model (Part II)</li>
<li>Propose a solution to the local-to-global safety problem based on an enriched version of the Grothendieck construction (Part III)</li>
</ul>
<h2>Defining a World and Stating the Problem</h2>
<p>Suppose we have a directed graph $G=(V(G),E(G))$ representing our world. The vertices of this graph are the different agents in our world and an edge represents a connection between these agents. The semantics of this graph will be the following:</p>
<p>Definition: Let $\mathsf{Mach}$ be the directed graph whose objects are sets and where there is an edge $e : X \to Y$ for every function</p>
\[e : X \times \Sigma \to \mathcal{P}(Y)\]
<p>A world is a morphism of directed graphs $W : G \to \mathsf{Mach}$.</p>
<p>A world has a set $S_x$ for each vertex $x$ called the local state over $\mathbf{x}$ and for each edge $e :x \to y$ a function $W(e) : S_x \times \Sigma_e \to \mathcal{P}(S_y)$ representing the state machine connecting the local state over $x$ to the local state over $y$. Note that self edges are ordinary state machines from a local state to itself. An example world may be drawn as follows:</p>
<p><img src="/assetsPosts/2023-12-18-How to Stay Locally Safe in a Global World/World.png" alt="Example World" /></p>
<p>Definition: Given a world $W: G \to \mathsf{Mach}$, the total machine of $W$ is the state machine
$\int W : \sum_{x \in V(G)} S_x \times \sum_{e \in E(G)} \Sigma_e \to \mathcal{P}( \sum_{x \in V(G)} S_x )$</p>
<p>given by</p>
\[( (s,x),(\tau,d)) \mapsto \bigcup_{e: x \to y} F(e) (s, \tau)\]
<p>The notation $\int$ is used based on the belief that this is some version of the Grothendieck construction. Exactly which flavor will be left to future work. The transition function of this state machine comes from unioning the transition functions of all the state machines associated to edges originating in a vertex.</p>
<p>Definition: Given a world $W : G \to \mathsf{Mach}$, a vertex $x \in V(G)$, and subsets $I,P \subset S_x$, we say that $I$ is locally safe in a global context if</p>
\[\mu (\blacksquare_{\int W} (-) \cup I) \subseteq P\]
<p>where $\blacksquare_{\int W}$ is the next-time operator of the state machine $\int W$.</p>
<p>The state machine $\int W$ may be large enough to make computing this least fixed point by brute force impractical. Therefore, we must leverage the compositional structure of $W$. We will see how to do this in the next post.</p>
<h2>The Global Safety Poset</h2>
<p>In this section we will give a compositional solution to the local safety problem in a global context in two steps:</p>
<ul>
<li>First by turning the world into a functor $\hat{W} : FG \to \mathsf{Poset}$</li>
<li>Then by gluing this functor into a single poset $\int \hat{W}$ whose inequalities solve the problem of interest.</li>
</ul>
<p>First we define the functor.</p>
<p>Given a world $W : G \to \mathsf{Mach}$, there is a functor</p>
\[\hat{W} : FG \to \mathsf{Poset}\]
<p>where</p>
<ul>
<li>$FG$ is the free category on the graph $G$,</li>
<li>$\mathsf{Poset}$ is the category whose objects are posets and whose morphisms are monotone functions.</li>
</ul>
<p>Functors from a free category are uniquely defined by their image on vertices and generating edges.</p>
<ul>
<li>For a vertex $x \in V(G)$, $\hat{W}(x) = \mathcal{P}(S_x)$,</li>
<li>for an edge $e : x \to y$, we define $\hat{W}(e): \mathcal{P}(S_x) \to \mathcal{P}(S_y)$ by $A \mapsto \blacksquare_{W(e)}(A)$</li>
</ul>
<p>Now for step two.</p>
<p>Given a functor $\hat{W} : FG \to \mathsf{Poset}$ defined from a world $W$, the <strong>global safety poset</strong> is a poset $\int \hat{W}$ where</p>
<ul>
<li>elements are pairs $(x \in V(G), A \subseteq S_x)$,</li>
<li>$(x, A) \leq (y, B) \iff \bigwedge_{f: x \to y \in FG} \hat{W} (f) (A) \subseteq B$</li>
</ul>
<p>Given a world $W : G \to \mathsf{Mach}$, a vertex $x \in V(G)$, and subsets $I,P \subseteq S_x$ then $I$ is locally safe in a global context if and only if there is an inequality
$(x,I) \subseteq (x,P)$ in the global safety poset $\int \hat{W}$</p>
<p>My half-completed proof of this theorem involves a square of functors</p>
<p><img src="/assetsPosts/2023-12-18-How to Stay Locally Safe in a Global World/commsquare.png" alt="Correctness Square" /></p>
<p>Going from right and then down, the first functor uses a Grothendieck construction to turn a world into a total state machine and then turns that state machine into it’s global safety poset. Going down and then right follows the construction detailed in the last two sections. The commutativity of this diagram should verify correctness. I will explain all of this in more detail later. Thanks for tuning in today!</p>Jade MasterSuppose your name is x and you have a very important state machine that you cherish with all your heart. Because you love this state machine so much, you don't want it to malfunction and you have a subset which you consider to be safe. If your state machine ever leaves this safe space you are in big trouble so you ask the following question.AI Safety Meets Value Chain Integrity2023-12-11T00:00:00+00:002023-12-11T00:00:00+00:00https://cybercat-institute.github.io//2023/12/11/ai-safety-meets-value-chain-integrity<p><strong>tl;dr - Advanced AI making economic decisions in supply chains and markets creates poorly-understood risks, especially by undermining the fundamental concept of individuality of agents. We propose to research these risks by building and simulating models.</strong></p>
<p>For many years, AI has been routinely used for economic decision making. Two major roles it has traditionally played are high frequency trading and algorithmic pricing. Traditionally these are quite simple, at the level of tabular Q-learning agents. Even these comparatively simple algorithms can behave in unexpected ways due to emergent interactions in an economic environment. Probably the most infamous of these events was the <a href="https://en.wikipedia.org/wiki/2010_flash_crash">flash crash</a>, for which algorithmic high speed trading was a major contributing cause. Much less well known is the subtle issue of <em>implicit collusion</em> in pricing algorithms, which are ubiquitous in several markets such as airline tickets and Amazon: <a href="https://www.aeaweb.org/articles?id=10.1257/aer.20190623">a widely 2020 cited paper</a> found that even very simple tabular Q-learning will converge to prices higher than the Nash equilibrium price - but <a href="https://arxiv.org/abs/2201.00345">our research</a> found that this depends sensitively on the exact method of training, and the effect vanishes when the algorithms are trained independenly in simulated markets.</p>
<p>Besides markets, AI is also already used for making decisions in supply chains (see for example [<a href="https://www.thomsonreuters.com/en-us/posts/technology/ai-supply-chains/">1</a> <a href="https://www.mckinsey.com/capabilities/operations/our-insights/autonomous-supply-chain-planning-for-consumer-goods-companies">2</a> <a href="https://www.forbes.com/sites/forbestechcouncil/2023/08/08/ais-role-in-supply-chain-management-and-how-organizations-can-get-started/">3</a> <a href="https://www.accenture.com/us-en/blogs/business-functions-blog/generative-ai-why-smarter-supply-chains-are-here">4</a>]), and surely will be moreso in the future. Contemporary supply chains are extraordinarily complex. A typical modern technology product can have hundred of thousands of components sourced from ten thousand suppliers across half a dozen tiers which need to be shipped across the globe to the final assembly. A single five-dollar part can stop an assembly line, which in the case of industries like automotive can cost millions per hour of downtime. The worst type of inventory a company can carry is a 99.9% finished product it cannot sell. Over time, supply chains have been hyper-optimised at the expense of integrity, so that a metaphorical perfect storm in the shape of an <a href="https://en.wikipedia.org/wiki/2010_eruptions_of_Eyjafjallaj%C3%B6kull">Icelandic volcano named Eyjafjallajökull erupting</a> or a <a href="https://en.wikipedia.org/wiki/2021_Suez_Canal_obstruction">container ship named <em>Ever Given</em> getting stuck in the Suez Canal</a> caused massive disruption that inevitably leads to delayed goods, spoiled perishables, lawsuits and contested insurance claims easily in the ten digits. The <a href="https://www.ey.com/en_gl/supply-chain/how-covid-19-impacted-supply-chains-and-what-comes-next">COVID-19 pandemic</a> was a business school case for all the types of havoc supply chain disruptions can wreak, oscillating wildly from not enough containers to too many containers in port, obstructing the handling of cargo, from COVID-related work shutdowns in China to sudden shifts in consumer behavior in Western countries, leading to layoffs in hospitality industries and labour shortages in production and transportation. Beyond these knock-on effects that can explode planning horizons for procurement and shift the delicate power balance from buyer to supplier, another major problem in supply chain is the knock-off effect: fashion brands and pharmaceutical companies alike fight the problem of counterfeit products being introduced into the supply chain when no one is looking, leading to multi-million dollar losses along with the reputational damage, and, especially in pharmaceuticals, posing a hazard to health and life for many. Supply chain integrity crucially on transparency across a multitude of participants who are typically less than eager to share confidential data.</p>
<p>Moving fowards from these events, the delicate tredeoff between efficiency and integrity is a perfect use-case for the integrated and inter-connected decision-making that is afforded by AI.</p>
<p>This brings us to the issue of economic decisions being deferred to large language models such as GPT4. The well known examples are not “natively economic”, but many people are adapting transformer architectures to operate on various types of data besides linguistic data, and it is only a matter of time before there are “economics LLMs”. In the meantime, GPT is entirely capable of making economic decisions with the right prompting - although virtually nothing is known about its performance on these type of tasks. We do not recommend using GPT to make investment decisions for you, but we expect it to become widespread anyway, if it isn’t already. Similarly, we expect large parts of complex supply chains to be almost entirely deferred to AI, extending the existing automation and its associated benefits and risks.</p>
<h2>AI undermines individuality in economics</h2>
<p>The traditional (tabular Q-learning) and contemporary (LLMs) situations are very different in many ways, but they have a subtle and crucial point in common. This is that decisions that look independent are secretly connected. There are two ways this could happen: one is that human decision-makers defer to off-the-shelf software that comes from the same upstream supplier - as is the case for algorithmic pricing in the airline industry for example. The other is that there really is a single instance of the AI system in the world and everybody is calling into it - as is the case with GPT.</p>
<p>For off-the-shelf implementations of tabular Q-learning for algorithmic pricing, there is some evidence that having a single upstream supplier has a significant impact on the behaviour of the market, and this is something that regulators are actively investigating. For LLMs virtually nothing is known, but we expect that the situation is worse. At the very least, the situation will certainly be more unpredictable, and we expect the compounding of implicit biases to be worse as these systems become ubiquitous and deeply embedded into decision-making. We plan to research this, by building economic simulations where decisions are made by advanced AIs and studying their behaviour.</p>
<p>A further possibility is more hypothetical, but we expect it to become a reality within the next few years. Right now the technology behind large language models - generative transformers - mainly operates on textual data, but it is actively being adapted for other types of data, and for other tasks besides text generation. Making economic decisions is very similar to playing games, and so there is an obvious analogy to the wildly successful application of deep reinforcement learning to strategically complex game playing tasks such as Go and StarCraft 2 by DeepMind. Combining this with generative transformer architectures could be immensely powerful, and it is not hard to believe such a system could surpass human performance on the task of economic decision-making.</p>
<h2>Modelling for harm prevention</h2>
<p>Compositional game theory - a technology that we <a href="https://arxiv.org/abs/1603.04641">developed</a> and <a href="https://github.com/CyberCat-Institute/open-game-engine">implemented</a> - is currently the state of the art for implementing complex meso-scale microeconomic models. The way things are traditionally done, models are written first in mathematics and are later converted into computational models in general purpose languages (traditionally Fortran, but increasingly in modern languages such as Python), a process that is very slow and very prone to introducing hard-to-detect errors. We use a <em>model is code</em> paradigm, where both the mathematical and computational languages are modified to bring them very close to each other - most commonly we build our models directly in code, with a clean separation of concerns between the economic and computational parts. Our models are not inherently more accurate, but they are 2 orders of magnitude faster and cheaper to build, and this unlocks our secret weapon: <em>rapid prototyping models</em>. By iterating quickly, and continuously obtaining feedback from data and stakeholders, we reach a better model than could be built monolithically.</p>
<p>Why do we want to build these models? The bigger picture is, we want to inform the discussion about regulation of AI. This discussion is already widespread at the highest level of governments around the world, but is currently heavily lacking in evidence one way or the other. There’s a good reason for this: the domain of LLMs is language, and it is extremely difficult to make convincing predictions about the possible harms that can happen mediated by linguistic communication. More restricted domains, such as the behaviours of API bots, are easier to reason about. We have identified the general realm of economic decision-making as a critically under-explored part of the general AI safety question, which our tools are well-placed to explore through modelling and simulations.</p>
<p>Our implementation of compositional game theory allows modularly switching the algorithm that each player uses for making decisions. Normally when doing applied game theory we use a monte carlo optimiser for every player. But we also have <a href="https://github.com/CyberCat-Institute/open-games-RLib">a version</a> that calls a Python implementation of Q-learning over a web socket. We could also easily switch it to calls to an open source LLM, or API calls to a GPT API bot or similar.</p>
<p>What’s more, this is emphatically <em>not</em> a mere hack that we bolt on top of game theory. At the core of our whole approach is our discovery, as seen in <a href="https://arxiv.org/abs/2105.06332">this paper</a>, that the foundations of compositional game theory and several branches of machine learning are extremely closely related - this foundation is what we call <a href="https://cybercat.institute/2022/05/29/what-is-categorical-cybernetics/">categorical cybernetics</a>. This foundation is what guides us and tells us that what are are doing is really meaningful. More than that, though, it opens a realistic possibility that we can know <em>qualitative</em> things about the behaviour of AIs making economic decisions, a much higher level of confidence than making inferences from simulation results. And when it comes to informing the discussion on regulation when the stakes are as high as they are, more certainty is always better.</p>
<h2>What if?</h2>
<p>So far we have focussed on the likely negative <em>accidental</em> impacts AI is likely to have on markets and supply chains, where they perform their intended purpose locally but interact in unforeseen ways. This is already concerning, but there is another side to the issue. What if decisions that should be independent are made by a single AI that has “gone rogue”, i.e. has a goal that is not the intended one? Depending on your personal assessment of the likelihood of this situation you could read this section as a fun thought experiment or a warning.</p>
<p>Being handed direct control of markets and supply chains gives perhaps the most powerful leverage over the physical world that an AI could have. Since it can <em>collude with itself</em>, it can easily create behaviours that would never be possible when decisions are made by agents that are independent and at least somewhat rational.</p>
<p>By far the most straightforward outcome of this situation is chaos. Markets and supply chains are so deeply interconnected that it would take very little intentional damage to create a recession deep enough to bring society to its knees. However, by virtually destroying the institutions that it controls this makes it a one-time event, which while extremely bad, would be easily recoverable for humanity as a whole.</p>
<p>Much worse would be the ability of a rogue AI to subtly direct real-world resources towards a secret goal of its own over a long period of time. It isn’t a hypothetical that complex supply chains can easily hide parts of themselves: consider how widespread is modern slavery in the supply chains of consumer electronics, or how the US government secretly procured the resources needed to build the first nuclear weapons at a time when supply chains were much simpler.</p>
<h2>Conclusion</h2>
<p>It is arguable exactly how extensive are the risks associated to allowing AIs to interact with economic systems, with the scenarios described in the previous section being hypothetical. However, it is undeniable that some serious risks do exist, including already-observed events such as flash crashes and implicit collusion. We have identified that the specific factor of decision-makers using the same upstream provider of decision-making software leads to poorly-understood emergent behaviours of supply chains and markets.</p>
<p>Our theoretical framework, compositional game theory, and our implementation of it, the open game engine, are the perfect tools for building and simulating models of economic situations with AI decision-makers. The goal of creating these models is to produce evidence leading to a better-informed debate on issues around the regulation of AI.</p>Jules HedgesAdvanced AI making economic decisions in supply chains and markets creates poorly-understood risks, especially by undermining the fundamental concept of individuality of agents. We propose to research these risks by building and simulating models.About the CyberCat Institute blog2023-11-26T00:00:00+00:002023-11-26T00:00:00+00:00https://cybercat-institute.github.io//2023/11/26/test-post<p>The Cybercat blog website is based on the <a href="https://jekyllthemes.io/theme/whiteglass">Whiteglass</a> theme.</p>
<h2>TOC <!-- omit in toc --></h2>
<ul>
<li><a href="#workflow">Workflow</a>
<ul>
<li><a href="#previewing">Previewing</a></li>
</ul>
</li>
<li><a href="#post-preamble">Post preamble</a></li>
<li><a href="#latex">Latex</a>
<ul>
<li><a href="#theorem-environments">Theorem environments</a>
<ul>
<li><a href="#referencing">Referencing</a></li>
</ul>
</li>
<li><a href="#typesetting-diagrams">Typesetting diagrams</a>
<ul>
<li><a href="#quiver">Quiver</a></li>
<li><a href="#tikz">Tikz</a></li>
<li><a href="#referencing-1">Referencing</a></li>
</ul>
</li>
</ul>
</li>
<li><a href="#images">Images</a>
<ul>
<li><a href="#referencing-2">Referencing</a></li>
</ul>
</li>
<li><a href="#code">Code</a></li>
</ul>
<h2>Workflow</h2>
<p>Standard github workflow:</p>
<ul>
<li>Clone this repo</li>
<li>Create a branch</li>
<li>Write your post</li>
<li>Make a PR</li>
<li>Wait for approval</li>
</ul>
<p>The blog will be automatically rebuilt once your PR is merged.</p>
<h3>Previewing</h3>
<p>Since the blog uses Jekyll, you will need to <a href="https://jekyllrb.com/docs/installation/">install it</a> or use the included nix flake devshell (just run <code class="language-plaintext highlighter-rouge">nix develop</code> with flakes-enabled nix installed) to be able to preview your contents. Once the installation is complete, just navigate to the repo folder and give <code class="language-plaintext highlighter-rouge">bundle exec jekyll serve</code>. Jekyll will spawn a local server (usually at <code class="language-plaintext highlighter-rouge">127.0.0.1:4000</code>) that will allow you to see the blog in locale.</p>
<h2>Post preamble</h2>
<p>Posts must be placed in the <code class="language-plaintext highlighter-rouge">_posts</code> folder. Post titles follow the convention <code class="language-plaintext highlighter-rouge">yyyy-mm-dd-title.md</code>. Post assets (such as images) go in the folder <code class="language-plaintext highlighter-rouge">assetsPost</code>, where you should create a folder with the same name of the post.</p>
<p>Each post should start with the following preamble:</p>
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">layout</span><span class="pi">:</span> <span class="s">post</span>
<span class="na">title</span><span class="pi">:</span> <span class="s">the title of your post</span>
<span class="na">author</span><span class="pi">:</span> <span class="s">your name</span>
<span class="na">categories</span><span class="pi">:</span> <span class="s">keyword or a list of keywords [keyword1, keyword2, keyword3]</span>
<span class="na">excerpt</span><span class="pi">:</span> <span class="s">A short summary of your post</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">assetsPosts/yourPostFolder/imageToBeUsedAsThumbnails.png This is optional, but useful if e.g. you share the post on Twitter.</span>
<span class="na">usemathjax</span><span class="pi">:</span> <span class="kc">true</span><span class="s"> (omit this line if you don't need to typeset math)</span>
<span class="na">thanks</span><span class="pi">:</span> <span class="s">A short acknowledged message. It will be shown immediately above the content of your post.</span>
<span class="nn">---</span>
</code></pre></div></div>
<p>As for the content of the post, it should be typeset in markdown.</p>
<h2>Latex</h2>
<ul>
<li>Inline math is shown by using <code class="language-plaintext highlighter-rouge">$ ... $</code>. Notice that some expressions such as <code class="language-plaintext highlighter-rouge">a_b</code> typeset correctly, while expressions like <code class="language-plaintext highlighter-rouge">a_{b}</code> or <code class="language-plaintext highlighter-rouge">a_\command</code> sometimes do not. I guess this is because mathjax expects <code class="language-plaintext highlighter-rouge">_</code> to be followed by a literal.</li>
<li>Display math is shown by using <code class="language-plaintext highlighter-rouge">$$ ... $$</code>. The problem above doesn’t show up in this case, but you gotta be careful:
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code> text
$$ ... $$
text
</code></pre></div> </div>
<p>does not typeset correctly, whereas:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code> text
$$
...
$$
text
</code></pre></div> </div>
<p>does. You can also use environments, as in:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> $$
\begin{align*}
...
\end{align*}
$$
</code></pre></div> </div>
</li>
</ul>
<h3>Theorem environments</h3>
<p>We provide the following theorem environments: Definition, Proposition, Lemma, Theorem and Corollary. Numbering is automatic. If you need others, just ask. The way these works is as follows:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% def %}</span>
A *definition* is a blabla, such that: <span class="p">$</span><span class="nb">...</span><span class="p">$</span>. Furthermore, it is:
<span class="p">$$</span><span class="nb">
...
</span><span class="p">$$</span>
<span class="p">{</span><span class="c">% enddef %}</span>
</code></pre></div></div>
<p>This gets rendered as follows:</p>
<div class="definition">
<p>A <em>definition</em> is a blabla, such that: $…$. Furthermore, it is:</p>
\[...\]
</div>
<p>Numbering is automatic. Use the tags:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% def %}</span>
For your definitions
<span class="p">{</span><span class="c">% enddef %}</span>
<span class="p">{</span><span class="c">% not %}</span>
For your notations
<span class="p">{</span><span class="c">% endnot %}</span>
<span class="p">{</span><span class="c">% ex %}</span>
For your examples
<span class="p">{</span><span class="c">% endex %}</span>
<span class="p">{</span><span class="c">% diag %}</span>
For your diagrams
<span class="p">{</span><span class="c">% enddiag %}</span>
<span class="p">{</span><span class="c">% prop %}</span>
For your propositions
<span class="p">{</span><span class="c">% endprop %}</span>
<span class="p">{</span><span class="c">% lem %}</span>
For your lemmas
<span class="p">{</span><span class="c">% endlem %}</span>
<span class="p">{</span><span class="c">% thm %}</span>
For your theorems
<span class="p">{</span><span class="c">% endthm %}</span>
<span class="p">{</span><span class="c">% cor %}</span>
For your corollaries
<span class="p">{</span><span class="c">% endcor %}</span>
</code></pre></div></div>
<h4>Referencing</h4>
<p>If you need to reference results just append a <code class="language-plaintext highlighter-rouge">{"id":"your_reference_tag"}</code> after the tag, where <code class="language-plaintext highlighter-rouge">your_reference_tag</code> is the same as a LaTex label. Fore example:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% def {"id":"your_reference_tag"} %}</span>
A *definition* is a blabla, such that: <span class="p">$</span><span class="nb">...</span><span class="p">$</span>. Furthermore, it is:
<span class="p">$$</span><span class="nb">
...
</span><span class="p">$$</span>
<span class="p">{</span><span class="c">% enddef %}</span>
</code></pre></div></div>
<p>Then you can reference this by doing:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>As we remarked in <span class="p">[</span><span class="nv">Reference description</span><span class="p">](</span><span class="sx">#your_reference_tag</span><span class="p">)</span>, we are awesome...
</code></pre></div></div>
<h3>Typesetting diagrams</h3>
<p>We support two types of diagrams: quiver and TikZ.</p>
<h4>Quiver</h4>
<p>You can render <a href="https://q.uiver.app/">quiver</a> diagrams by enclosing quiver expoted iframes between <code class="language-plaintext highlighter-rouge">quiver</code> tags:</p>
<ul>
<li>On <a href="https://q.uiver.app/">quiver</a>, click on <code class="language-plaintext highlighter-rouge">Export: Embed code</code></li>
<li>Copy the code</li>
<li>In the blog, put it between delimiters as follows:</li>
</ul>
<div class="language-html highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
{% quiver %}
<span class="c"><!-- https://q.uiver.app/codecodecode--></span>
<span class="nt"><iframe</span> <span class="na">codecodecode</span><span class="nt">></iframe></span>
{% endquiver %}
</code></pre></div></div>
<p>They get rendered as follows:</p>
<div class="quiver">
<!-- https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0= -->
<iframe class="quiver-embed" src="https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0=&embed" width="432" height="432" style="border-radius: 8px; border: none;"></iframe>
</div>
<p><strong>Should the picture come out cropped, select <code class="language-plaintext highlighter-rouge">fixed size</code> when exporting the quiver diagram, and choose some suitable parameters.</strong></p>
<h4>Tikz</h4>
<p>You can render tikz diagrams by enclosing tikz code between <code class="language-plaintext highlighter-rouge">tikz</code> tags, as follows:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% tikz %}</span>
<span class="nt">\begin{tikzpicture}</span>
<span class="k">\draw</span> (0,0) circle (1in);
<span class="nt">\end{tikzpicture}</span>
<span class="p">{</span><span class="c">% endtikz %}</span>
</code></pre></div></div>
<p>Tikz renders as follows:</p>
<div class="tikz"><script type="text/tikz">
\rotatebox{0}{
\scalebox{1}{
\begin{tikzpicture}
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=left:{$1$}] (al1) at (-2,0) {};
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=right:{$1$}] (ar1) at (0,0) {};
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=right:{$2$}] (ar2) at (0,-1) {};
\node[circle, fill, minimum size=5pt, inner sep=0pt, label=right:{$3$}] (ar3) at (0,-2) {};
\draw[thick] (al1) to (ar1);
\draw[thick, out=180, in=180, looseness=2] (ar2) to (ar3);
\end{tikzpicture}
}
}
</script></div>
<p>Notice that at the moment tikz rendering:</p>
<ul>
<li>Supports any option you put after <code class="language-plaintext highlighter-rouge">\begin{document}</code> in a <code class="language-plaintext highlighter-rouge">.tex</code> file. So you can use this to include any stuff you’d typeset with LaTex (but we STRONGLY advise against it).</li>
<li>Does not support usage of anything that should go in the LaTex preamble, that is, before <code class="language-plaintext highlighter-rouge">\begin{document}</code>. This includes exernal tikz libraries such as <code class="language-plaintext highlighter-rouge">calc</code>, <code class="language-plaintext highlighter-rouge">arrows</code>, etc; and packages such as <code class="language-plaintext highlighter-rouge">tikz-cd</code>. Should you need <code class="language-plaintext highlighter-rouge">tikz-cd</code>, use quiver as explained above. If you need fancier stuff, you’ll have to render the tikz diagrams by yourself and import them as images (see below).</li>
</ul>
<h4>Referencing</h4>
<p>Referencing works also for the quiver and tikz tags, as in:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% tikz {"id":"your_reference_tag"} %}</span>
...
<span class="p">{</span><span class="c">% endtikz %}</span>
</code></pre></div></div>
<p>This automatically creates a numbered ‘Figure’ caption under the figure, as in:</p>
<div class="quiverCaption" id="example"><div class="quiver">
<!-- https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0= -->
<iframe class="quiver-embed" src="https://q.uiver.app/#q=WzAsMyxbMCwwLCJYIl0sWzEsMiwiQiJdLFsyLDAsIkEiXSxbMCwxLCJnIiwxXSxbMiwxLCJmIiwxXSxbMCwyLCJoIiwxXV0=&embed" width="432" height="432" style="border-radius: 8px; border: none;"></iframe>
</div></div>
<p>Whenever possible, we encourage you to enclose diagrams into definitions/propositions/etc should you need to reference them.</p>
<h2>Images</h2>
<p>Images are included via standard markdown syntax:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">![</span><span class="nv">image description</span><span class="p">](</span><span class="sx">image_path</span><span class="p">)</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">image_path</code> can be a remote link. Should you need to upload images to this blog post, do as follows:</p>
<ul>
<li>Create a folder in <code class="language-plaintext highlighter-rouge">assetsPosts</code> with the same title of the blog post file. So if the blogpost file is <code class="language-plaintext highlighter-rouge">yyyy-mm-dd-title.md</code>, create the folder <code class="language-plaintext highlighter-rouge">assetsPosts/yyyy-mm-dd-title</code></li>
<li>Place your images there</li>
<li>Reference the images by doing:
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code> !<span class="p">[</span><span class="nv">image description</span><span class="p">](</span><span class="sx">../assetsPosts/yyyy-mm-dd-title/image</span><span class="p">)</span>
</code></pre></div> </div>
</li>
</ul>
<p>Whenever possible, we recommend the images to be in the format <code class="language-plaintext highlighter-rouge">.png</code>, and to be <code class="language-plaintext highlighter-rouge">800</code> pixels in width, with <strong>transparent</strong> backround. Ideally, these should be easily readable on the light gray background of the blog website. You can strive from these guidelines if you have no alternative, but our definition and your definition of ‘I had no alternative’ may be different, and <em>we may complain</em>.</p>
<h4>Referencing</h4>
<p>Referencing works exactly as for diagrams:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="p">{</span><span class="c">% figure {"id":"your_reference_tag"} %}</span>
![image description](image<span class="p">_</span>path)
<span class="p">{</span><span class="c">% endfigure %}</span>
</code></pre></div></div>
<h2>Code</h2>
<p>CyberCat blog offers support for code snippets:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">print_hi</span><span class="p">(</span><span class="nb">name</span><span class="p">)</span>
<span class="nb">puts</span> <span class="s2">"Hi, </span><span class="si">#{</span><span class="nb">name</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span>
<span class="n">print_hi</span><span class="p">(</span><span class="s1">'Tom'</span><span class="p">)</span>
<span class="c1">#=> prints 'Hi, Tom' to STDOUT.</span>
</code></pre></div></div>
<p>To include a code snippet, just give:</p>
<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">```</span><span class="nl">language the snippet is written in
</span><span class="sb">your code</span>
<span class="p">```</span>
</code></pre></div></div>
<p>Check out the <a href="https://jekyllrb.com/docs/home">Jekyll docs</a> for more info on how to get the most out of Jekyll. File all bugs/feature requests at <a href="https://github.com/jekyll/jekyll">Jekyll’s GitHub repo</a>. If you have questions, you can ask them on <a href="https://talk.jekyllrb.com/">Jekyll Talk</a>.</p>Fabrizio GenoveseThis is a short summary of the post. It is meant to explain how to write for our blog.