{
    "version": "https://jsonfeed.org/version/1",
    "title": "fleetwood.dev",
    "home_page_url": "https://fleetwood.dev",
    "feed_url": "https://fleetwood.dev/feed.json",
    "description": "Christopher Fleetwood's personal blog",
    "author": {
        "name": "Christopher Fleetwood",
        "url": "https://fleetwood.dev"
    },
    "items": [
        {
            "id": "https://fleetwood.dev/posts/virtual-cell-challenge",
            "content_html": "<p><a href=\"https://arcinstitute.org/\" rel=\"nofollow\" target=\"_blank\">Arc Institute</a> recently unveiled the <a href=\"https://virtualcellchallenge.org/\" rel=\"nofollow\" target=\"_blank\">Virtual Cell Challenge</a>. Participants are required to train a model capable of predicting the effect of silencing a gene in a (partially) unseen cell type, a task they term <em>context generalization</em>.\nFor ML engineers with little to no biology background, the jargon and required context can seem quite daunting. To encourage participation, we recapitulate the challenge in a form better suited to engineers from other disciplines.</p>\n<h1>Background</h1>\n<blockquote>\n<p><strong>Goal</strong> <br/>\n<!-- -->Train a model to predict the effect on a cell of silencing a gene using CRISPR.</p>\n</blockquote>\n<p>Doing things in the world of atoms is expensive, laborious and error prone. What if we could test thousands of drug candidates without ever touching a petri dish?\nThis is the goal of the virtual cell challenge — a model (most likely a neural network) that can simulate exactly what happens\nto a cell when we change some parameter. Given that tightening your feedback loop is often the best way to speed up progress,\na model capable of doing this accurately would have significant impact.</p>\n<p>To train this neural network, we will need data. For the challenge, Arc has curated a dataset of ~300k single-cell RNA sequencing profiles. It may be worthwhile to revisit the <a href=\"https://www.khanacademy.org/science/biology/gene-expression-central-dogma/central-dogma-transcription/v/rna-transcription-and-translation\" rel=\"nofollow\" target=\"_blank\">Central Dogma</a> before continuing. This essay will build off of this to provide the ~minimum biology knowledge you&#x27;ll need for the challenge.</p>\n<h2>Training data</h2>\n<p>The training set consists of a sparse matrix and some associated metadata. More specifically, we have 220k cells, and\nfor each cell we have a <a href=\"https://en.wikipedia.org/wiki/Transcriptome\" rel=\"nofollow\" target=\"_blank\">transcriptome</a>. This transcriptome is a sparse row vector, where each\nentry is the <strong>raw count of RNA molecules</strong> (transcripts) that the corresponding gene (our column) encodes for. Of the 220k cells,\n~38k are <em>unperturbed</em>, meaning no gene has been silenced using CRISPR. These control cells are crucial as we will see shortly.</p>\n<p>To understand the dataset more concretely, let&#x27;s select a gene, TMSB4X (the most frequently silenced gene in the dataset) and compare the number of RNA molecules detected for a control cell and a\nperturbed cell.</p>\n<p><img src=\"/avcc/TMSB4X.png\" alt=\"TMSB4X\"/></p>\n<p>We can see that the cell with TMSB4X silenced has a greatly reduced number of transcripts compared with the control\ncells.</p>\n<h3>Modelling the challenge</h3>\n<p>The astute among you may be wondering why you don&#x27;t just measure the count of the RNA molecules before and after\nsilencing the gene — why do we need the control cells at all? Unfortunately, <strong>reading the transcriptome destroys the cell</strong>, which is a problem reminiscent of the <a href=\"https://en.wikipedia.org/wiki/Observer_effect_(physics)\" rel=\"nofollow\" target=\"_blank\">observer effect</a>.</p>\n<p>This inability to measure the cell state before and after introduces many issues, as we are forced to use a population of <strong>basal</strong>\n(a.k.a control, unperturbed) cells as a reference point. The control cells and perturbed cells are not entirely\nhomogeneous even prior to the perturbation. This means that we have to now separate out our true signal, the perturbation, from\nnoise induced by the heterogeneity.</p>\n<p>More formally, we can model observed gene expression in perturbed cells as:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mover accent=\"true\"><mi>X</mi><mo>^</mo></mover><mi>p</mi></msub><mo>∼</mo><msub><mover accent=\"true\"><mi>T</mi><mo>^</mo></mover><mi>p</mi></msub><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"script\">D</mi><mtext>basal</mtext></msub><mo stretchy=\"false\">)</mo><mo>+</mo><mi>H</mi><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"script\">D</mi><mtext>basal</mtext></msub><mo stretchy=\"false\">)</mo><mo>+</mo><mi>ε</mi><mo separator=\"true\">,</mo><mspace width=\"1em\"></mspace><mi>ε</mi><mo>∼</mo><msub><mi>P</mi><mi>ε</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\hat{X}_p \\sim \\hat{T}_p(\\mathcal{D}_{\\text{basal}}) + H(\\mathcal{D}_{\\text{basal}}) + \\varepsilon, \\quad \\varepsilon \\sim P_\\varepsilon </annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2329em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9468em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span></span><span style=\"top:-3.2523em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1667em\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">∼</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2329em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9468em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span><span style=\"top:-3.2523em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1667em\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">basal</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em\">H</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">basal</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">ε</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:1em\"></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">ε</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">∼</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">P</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">ε</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span></span>\n<p>where:</p>\n<ul>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mover accent=\"true\"><mi>X</mi><mo>^</mo></mover><mi>p</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\hat{X}_p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2329em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9468em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span></span><span style=\"top:-3.2523em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1667em\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span></span>: The observed gene expression measurements in cells with perturbation <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span></li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"script\">D</mi><mtext>basal</mtext></msub></mrow><annotation encoding=\"application/x-tex\">\\mathcal{D}_{\\text{basal}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">basal</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>: The distribution of the unperturbed, baseline cell population.</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mover accent=\"true\"><mi>T</mi><mo>^</mo></mover><mi>p</mi></msub><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"script\">D</mi><mtext>basal</mtext></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\hat{T}_p(\\mathcal{D}_{\\text{basal}})</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2329em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9468em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span><span style=\"top:-3.2523em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1667em\"><span class=\"mord\">^</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">basal</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>: True effect caused by perturbation <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span> on the population.</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>H</mi><mo stretchy=\"false\">(</mo><msub><mi mathvariant=\"script\">D</mi><mtext>basal</mtext></msub><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">H(\\mathcal{D}_{\\text{basal}})</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.08125em\">H</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">basal</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>: Biological heterogeneity of the baseline population.</li>\n<li><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>ε</mi></mrow><annotation encoding=\"application/x-tex\">\\varepsilon</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">ε</span></span></span></span>: Experiment-specific technical noise, assumed independent of the unperturbed cell state and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"script\">D</mi><mtext>basal</mtext></msub></mrow><annotation encoding=\"application/x-tex\">\\mathcal{D}_{\\text{basal}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">basal</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>.</li>\n</ul>\n<h1>STATE: The baseline from Arc</h1>\n<p>Prior to the Virtual Cell Challenge, Arc released <a href=\"https://arcinstitute.org/manuscripts/State\" rel=\"nofollow\" target=\"_blank\">STATE</a>, their own attempt to solve the challenge\nusing a pair of transformer based models. This serves as a strong baseline for participants to start with, so we will\nexplore it in detail.</p>\n<p>STATE consists of two models, the <strong>State Transition Model</strong> (ST) and the <strong>State Embedding Model</strong> (SE). SE is designed to produce rich semantic embeddings of cells in an effort to improve cross cell type generalization. ST is the &quot;cell simulator&quot;, that takes in either a transcriptome of a control cell, or an embedding of a cell produced by SE, along with a one hot encoded vector representing the perturbation of interest, and outputs the perturbed transcriptome.</p>\n<h2>State Transition Model (ST)</h2>\n<p><img src=\"/avcc/ST.png\" alt=\"ST\"/></p>\n<p>The State Transition Model is a relatively simple transformer with a Llama backbone that operates upon the following:</p>\n<ol>\n<li>A set of transcriptomes (or SE embeddings) for covariate matched basal cells.</li>\n<li>A set of one hot vectors representing our gene perturbation for each cell.</li>\n</ol>\n<p>Using a covariate matched set of control cells with paired target cells should assist the model in discerning the\nactual effect of our intended perturbation. Both the control set tensor and the perturbation tensor are fed through independent encoders, which are simply 4 layer MLPs with GELU activations.\nIf working directly in gene expression space (i.e producing a full transcriptome), they pass the output through a learned\ndecoder.</p>\n<p>ST is trained using <a href=\"https://en.wikipedia.org/wiki/Kernel_embedding_of_distributions\" rel=\"nofollow\" target=\"_blank\">Maximum Mean Discrepancy</a>. Put simply, the model learns to minimize the difference between the two probability distributions.</p>\n<h2>State Embedding Model (SE)</h2>\n<p><img src=\"/avcc/SE.png\" alt=\"SE\"/></p>\n<p>The State Embedding Model is a BERT-like model trained using a masked prediction task. To understand this more deeply, first we have to\ntake a little detour for some more biological grounding.</p>\n<h3>A little biological detour</h3>\n<p><img src=\"/avcc/alt_splicing.gif\" alt=\"Alternative Splicing\"/></p>\n<p>A gene consists of <em>exons</em> (protein coding sections) and <em>introns</em> (non-protein coding sections). DNA is first <em>transcribed</em> into pre-mRNA, as shown above. The cell then performs <a href=\"https://en.wikipedia.org/wiki/Alternative_splicing\" rel=\"nofollow\" target=\"_blank\">Alternative Splicing</a>. This is basically &quot;pick and choose exons&quot;, cut out all introns. You can think of the gene as an IKEA manual for making a table. One could also construct a 3 legged table, perhaps an odd bookshelf with some effort, by leaving out some parts. These different objects are analogous to <strong>protein isoforms</strong>, proteins coded for by the same gene.</p>\n<h3>Back to the model</h3>\n<p>With this basic understanding, we can move on to how the SE model works. Remember, our core goal for SE is to create <strong>meaningful\ncell embeddings</strong>. To do this, we must first create meaningful gene embeddings.</p>\n<p>To produce a single gene embedding, we first obtain the amino acid sequence (e.g <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mtext mathvariant=\"monospace\">SDKPDMAEI</mtext></mrow><annotation encoding=\"application/x-tex\">\\texttt{SDKPDMAEI}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6111em\"></span><span class=\"mord text\"><span class=\"mord texttt\">SDKPDMAEI</span></span></span></span></span>... for TMSB4X) of all the different protein isoforms encoded for by the gene in question. We then feed these sequences to <a href=\"https://huggingface.co/facebook/esm2_t48_15B_UR50D\" rel=\"nofollow\" target=\"_blank\">ESM2</a>, a 15B parameter Protein Language Model from FAIR. ESM produces an embedding <em>per amino acid</em>, and we mean pool them together to obtain a &quot;transcript&quot; (a.k.a protein isoform) embedding.</p>\n<p>Now we have all of these protein isoform embeddings, we then just mean pool those to get the gene embedding. Next, we project these gene embeddings to our model dimension using a learned encoder as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mover accent=\"true\"><mi>g</mi><mo>~</mo></mover><mi>j</mi></msub><mo>=</mo><mtext>SiLU</mtext><mo stretchy=\"false\">(</mo><mtext>LayerNorm</mtext><mo stretchy=\"false\">(</mo><msub><mi>g</mi><mi>j</mi></msub><msub><mi mathvariant=\"bold\">W</mi><mi>g</mi></msub><mo>+</mo><msub><mi mathvariant=\"bold\">b</mi><mi>g</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\tilde{g}_j = \\text{SiLU}(\\text{LayerNorm}(g_j \\mathbf{W}_g + \\mathbf{b}_g))</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.954em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6679em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">g</span></span><span style=\"top:-3.35em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2222em\"><span class=\"mord\">~</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1944em\"><span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0361em;vertical-align:-0.2861em\"></span><span class=\"mord text\"><span class=\"mord\">SiLU</span></span><span class=\"mopen\">(</span><span class=\"mord text\"><span class=\"mord\">LayerNorm</span></span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">g</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em\">j</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathbf\" style=\"margin-right:0.01597em\">W</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.016em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em\">g</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0361em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord mathbf\">b</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03588em\">g</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mclose\">))</span></span></span></span></span>\n<p>We&#x27;ve now obtained a gene embedding, but what we really want is a <em>cell embedding</em>. To do this, Arc represents each cell\nas the top 2048 genes ranked by <a href=\"https://en.wikipedia.org/wiki/Fold_change#Fold_changes_in_genomics_and_bioinformatics\" rel=\"nofollow\" target=\"_blank\">log fold expression level</a>.</p>\n<p>We then construct a &quot;cell sentence&quot; from our 2048 gene embeddings as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msup><mover accent=\"true\"><mi mathvariant=\"bold\">c</mi><mo>~</mo></mover><mrow><mo stretchy=\"false\">(</mo><mi>i</mi><mo stretchy=\"false\">)</mo></mrow></msup><mo>=</mo><mrow><mo fence=\"true\">[</mo><msub><mi mathvariant=\"bold\">z</mi><mtext>cls</mtext></msub><mo separator=\"true\">,</mo><msubsup><mover accent=\"true\"><mi mathvariant=\"bold\">g</mi><mo>~</mo></mover><mn>1</mn><mrow><mo stretchy=\"false\">(</mo><mi>i</mi><mo stretchy=\"false\">)</mo></mrow></msubsup><mo separator=\"true\">,</mo><msubsup><mover accent=\"true\"><mi mathvariant=\"bold\">g</mi><mo>~</mo></mover><mn>2</mn><mrow><mo stretchy=\"false\">(</mo><mi>i</mi><mo stretchy=\"false\">)</mo></mrow></msubsup><mo separator=\"true\">,</mo><mo>…</mo><mo separator=\"true\">,</mo><msubsup><mover accent=\"true\"><mi mathvariant=\"bold\">g</mi><mo>~</mo></mover><mi>L</mi><mrow><mo stretchy=\"false\">(</mo><mi>i</mi><mo stretchy=\"false\">)</mo></mrow></msubsup><mo separator=\"true\">,</mo><msub><mi mathvariant=\"bold\">z</mi><mtext>ds</mtext></msub><mo fence=\"true\">]</mo></mrow><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><mrow><mo stretchy=\"false\">(</mo><mi>L</mi><mo>+</mo><mn>2</mn><mo stretchy=\"false\">)</mo><mo>×</mo><mi>h</mi></mrow></msup></mrow><annotation encoding=\"application/x-tex\">\\tilde{\\mathbf{c}}^{(i)} = \\left[\\mathbf{z}_{\\text{cls}}, \\tilde{\\mathbf{g}}_1^{(i)}, \\tilde{\\mathbf{g}}_2^{(i)}, \\ldots, \\tilde{\\mathbf{g}}_L^{(i)}, \\mathbf{z}_{\\text{ds}}\\right] \\in \\mathbb{R}^{(L+2) \\times h}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.938em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6813em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathbf\">c</span></span><span style=\"top:-3.3634em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1944em\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.938em\"><span style=\"top:-3.113em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mclose mtight\">)</span></span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8em;vertical-align:-0.65em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size2\">[</span></span><span class=\"mord\"><span class=\"mord mathbf\">z</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">cls</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6813em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathbf\" style=\"margin-right:0.01597em\">g</span></span><span style=\"top:-3.3634em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2222em\"><span class=\"mord\">~</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1944em\"><span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0448em\"><span style=\"top:-2.4337em;margin-left:-0.016em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span><span style=\"top:-3.2198em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mclose mtight\">)</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6813em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathbf\" style=\"margin-right:0.01597em\">g</span></span><span style=\"top:-3.3634em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2222em\"><span class=\"mord\">~</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1944em\"><span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0448em\"><span style=\"top:-2.4337em;margin-left:-0.016em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span><span style=\"top:-3.2198em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mclose mtight\">)</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2663em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\">…</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6813em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathbf\" style=\"margin-right:0.01597em\">g</span></span><span style=\"top:-3.3634em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2222em\"><span class=\"mord\">~</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1944em\"><span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0448em\"><span style=\"top:-2.4065em;margin-left:-0.016em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">L</span></span></span><span style=\"top:-3.2198em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mclose mtight\">)</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2935em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathbf\">z</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">ds</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size2\">]</span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.938em\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.938em\"><span style=\"top:-3.113em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">L</span><span class=\"mbin mtight\">+</span><span class=\"mord mtight\">2</span><span class=\"mclose mtight\">)</span><span class=\"mbin mtight\">×</span><span class=\"mord mathnormal mtight\">h</span></span></span></span></span></span></span></span></span></span></span></span></span>\n<p>We add a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mtext mathvariant=\"monospace\">[CLS]</mtext></mrow><annotation encoding=\"application/x-tex\">\\texttt{[CLS]}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.0833em\"></span><span class=\"mord text\"><span class=\"mord texttt\">[CLS]</span></span></span></span></span> token and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mtext mathvariant=\"monospace\">[DS]</mtext></mrow><annotation encoding=\"application/x-tex\">\\texttt{[DS]}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.0833em\"></span><span class=\"mord text\"><span class=\"mord texttt\">[DS]</span></span></span></span></span> token to our sentence. The <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mtext mathvariant=\"monospace\">[CLS]</mtext></mrow><annotation encoding=\"application/x-tex\">\\texttt{[CLS]}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.0833em\"></span><span class=\"mord text\"><span class=\"mord texttt\">[CLS]</span></span></span></span></span> token ends up being used as our &quot;cell embedding&quot; (very BERT-like)\nand the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mtext mathvariant=\"monospace\">[DS]</mtext></mrow><annotation encoding=\"application/x-tex\">\\texttt{[DS]}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.0833em\"></span><span class=\"mord text\"><span class=\"mord texttt\">[DS]</span></span></span></span></span> token is used to &quot;disentangle dataset-specific effects&quot;. Although the genes are sorted by log fold\nexpression level, Arc further enforces the magnitude of each genes expression by incorporating the transcriptome in a\nfashion analogous to positional embeddings. Through an odd <a href=\"https://github.com/ArcInstitute/state/blob/main/src/state/emb/nn/model.py#L374\" rel=\"nofollow\" target=\"_blank\">&quot;soft binning&quot; algorithm</a> and 2 MLPs, they create some\n&quot;expression encodings&quot; which they then add to each gene embedding. This should modulate the magnitude of each gene\nembedding by how intensely it is expressed in the transcriptome.</p>\n<p>To train the model, they mask 1280 genes per cell, and the model is tasked with predicting them. The 1280 genes are\nselected such that they have a wide range of expression intensities. For the graphically inclined, the below\ndemonstrates the construction of the cell sentence.</p>\n<p><img src=\"/avcc/SE_path.png\" alt=\"SE PATH\"/></p>\n<h1>Evaluations </h1>\n<p>Understanding how your submission will be evaluated is key to success. The 3 evaluation metrics chosen by Arc are <strong>Perturbation Discrimination</strong>, <strong>Differential Expression</strong> and <strong>Mean Average Error</strong>. Given that Mean Average Error is simple and exactly as it sounds, we will omit it from our analysis.</p>\n<h2>Perturbation Discrimination</h2>\n<p><img src=\"/avcc/pert_disc.png\" alt=\"Perturbation Discrimination\" title=\"Perturbation Discrimination\"/></p>\n<p>Perturbation Discrimination intends to evaluate how well your model can uncover <em>relative differences</em> between\nperturbations. To do this, we compute the Manhattan distances for all the measured perturbed transcriptomes in the test set (the ground\ntruth we are trying to predict, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>y</mi><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">y_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> and all other perturbed transcriptomes, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msubsup><mi>y</mi><mi>p</mi><mi>n</mi></msubsup></mrow><annotation encoding=\"application/x-tex\">y_p^n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0475em;vertical-align:-0.3831em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6644em\"><span style=\"top:-2.453em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3831em\"><span></span></span></span></span></span></span></span></span></span>) to our predicted transcriptome <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mover accent=\"true\"><mi>y</mi><mo>^</mo></mover><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\hat{y}_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span></span><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1944em\"><span class=\"mord\">^</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1944em\"><span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>. We then rank where the\nground truth lands with respect to all transcriptomes as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>r</mi><mi>t</mi></msub><mo>=</mo><munder><mo>∑</mo><mrow><mi>p</mi><mo mathvariant=\"normal\">≠</mo><mi>t</mi></mrow></munder><mn mathvariant=\"bold\">1</mn><mo stretchy=\"false\">{</mo><mi>d</mi><mo stretchy=\"false\">(</mo><msub><mover accent=\"true\"><mi>y</mi><mo>^</mo></mover><mi>t</mi></msub><mo separator=\"true\">,</mo><msub><mi>y</mi><mi>p</mi></msub><mo stretchy=\"false\">)</mo><mo>&lt;</mo><mi>d</mi><mo stretchy=\"false\">(</mo><msub><mover accent=\"true\"><mi>y</mi><mo>^</mo></mover><mi>t</mi></msub><mo separator=\"true\">,</mo><msub><mi>y</mi><mi>t</mi></msub><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">}</mo></mrow><annotation encoding=\"application/x-tex\">r_t = \\sum_{p \\neq t} \\mathbf{1}\\{d(\\hat{y}_t, y_p) &lt; d(\\hat{y}_t, y_t)\\}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4882em;vertical-align:-1.4382em\"></span><span class=\"mop op-limits\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.05em\"><span style=\"top:-1.8479em;margin-left:0em\"><span class=\"pstrut\" style=\"height:3.05em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mrel mtight\"><span class=\"mrel mtight\"><span class=\"mord vbox mtight\"><span class=\"thinbox mtight\"><span class=\"rlap mtight\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em\"></span><span class=\"inner\"><span class=\"mord mtight\"><span class=\"mrel mtight\"></span></span></span><span class=\"fix\"></span></span></span></span></span><span class=\"mrel mtight\">=</span></span><span class=\"mord mathnormal mtight\">t</span></span></span></span><span style=\"top:-3.05em\"><span class=\"pstrut\" style=\"height:3.05em\"></span><span><span class=\"mop op-symbol large-op\">∑</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4382em\"><span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathbf\">1</span><span class=\"mopen\">{</span><span class=\"mord mathnormal\">d</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span></span><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1944em\"><span class=\"mord\">^</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1944em\"><span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">&lt;</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord mathnormal\">d</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6944em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span></span><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1944em\"><span class=\"mord\">^</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1944em\"><span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose\">)}</span></span></span></span></span>\n<p>After, we normalize by the total number of transcriptomes:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mtext>PDisc</mtext><mi>t</mi></msub><mo>=</mo><mfrac><msub><mi>r</mi><mi>t</mi></msub><mi>T</mi></mfrac></mrow><annotation encoding=\"application/x-tex\">\\text{PDisc}_t = \\frac{r_t}{T}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">PDisc</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.7936em;vertical-align:-0.686em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>Where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>0</mn></mrow><annotation encoding=\"application/x-tex\">0</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">0</span></span></span></span> would be a perfect match. The overall score for your predictions is the mean of all <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mtext>PDisc</mtext><mi>t</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\text{PDisc}_t</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">PDisc</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">t</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>. This is then normalized to:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>PDiscNorm</mtext><mo>=</mo><mn>1</mn><mo>−</mo><mn>2</mn><mtext>PDisc</mtext></mrow><annotation encoding=\"application/x-tex\">\\text{PDiscNorm} = 1 - 2\\text{PDisc}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord text\"><span class=\"mord\">PDiscNorm</span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7278em;vertical-align:-0.0833em\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">2</span><span class=\"mord text\"><span class=\"mord\">PDisc</span></span></span></span></span></span>\n<p>We multiply by 2 as for a random prediction, ~half of the results would be closer and half would be further away.</p>\n<h2>Differential Expression</h2>\n<p>Differential Expression intends to evaluate what fraction of the truly affected genes did you correctly identify as significantly affected. Firstly, for each gene compute a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span>-value using a <a href=\"https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test\" rel=\"nofollow\" target=\"_blank\">Wilcoxon rank-sum test with tie correction</a>. We do this for both our predicted perturbation distribution and the ground truth perturbation distribution.</p>\n<p>Next, we apply the <a href=\"https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini%E2%80%93Hochberg_procedure\" rel=\"nofollow\" target=\"_blank\">Benjamini-Hochberg procedure</a>, basically some stats to modulate the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span>-values, as with <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>20</mn><mo separator=\"true\">,</mo><mn>000</mn></mrow><annotation encoding=\"application/x-tex\">20,000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em\"></span><span class=\"mord\">20</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">000</span></span></span></span> genes and a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span>-value threshold of <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>0.05</mn></mrow><annotation encoding=\"application/x-tex\">0.05</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">0.05</span></span></span></span>, you&#x27;d expect <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mo separator=\"true\">,</mo><mn>000</mn></mrow><annotation encoding=\"application/x-tex\">1,000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">000</span></span></span></span> false positives. We denote our set of predicted differentially expressed genes <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>G</mi><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>p</mi><mi>r</mi><mi>e</mi><mi>d</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">G_{p,pred}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span></span>, and the ground truth set of differentially expressed genes <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>G</mi><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>t</mi><mi>r</mi><mi>u</mi><mi>e</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">G_{p,true}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">r</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">e</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span></span>.</p>\n<p>If the size of our set is less than the ground truth set size, take the intersection of the sets, and divide by the true number of differentially expressed genes as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>D</mi><msub><mi>E</mi><mi>p</mi></msub><mo>=</mo><mfrac><mrow><msub><mi>G</mi><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>p</mi><mi>r</mi><mi>e</mi><mi>d</mi></mrow></msub><mo>∩</mo><msub><mi>G</mi><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>t</mi><mi>r</mi><mi>u</mi><mi>e</mi></mrow></msub></mrow><msub><mi>n</mi><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>t</mi><mi>r</mi><mi>u</mi><mi>e</mi></mrow></msub></mfrac></mrow><annotation encoding=\"application/x-tex\">DE_p = \\frac{G_{p,pred} \\cap G_{p,true}}{n_{p,true}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">E</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0576em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3324em;vertical-align:-0.9721em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">r</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">e</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">∩</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">r</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">e</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9721em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>If the size of our set is greater than the ground truth set size, select the subset we predict are most differentially expressed (our &quot;most confident&quot; predictions, denoted <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mover accent=\"true\"><mi>G</mi><mo>~</mo></mover><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>p</mi><mi>r</mi><mi>e</mi><mi>d</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">\\tilde{G}_{p,pred}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2063em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\">G</span></span><span style=\"top:-3.6023em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1667em\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span></span>), take the intersection with the ground truth set, and then divide by the true number.</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>D</mi><msub><mi>E</mi><mi>p</mi></msub><mo>=</mo><mfrac><mrow><msub><mover accent=\"true\"><mi>G</mi><mo>~</mo></mover><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>p</mi><mi>r</mi><mi>e</mi><mi>d</mi></mrow></msub><mo>∩</mo><msub><mi>G</mi><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>t</mi><mi>r</mi><mi>u</mi><mi>e</mi></mrow></msub></mrow><msub><mi>n</mi><mrow><mi>p</mi><mo separator=\"true\">,</mo><mi>t</mi><mi>r</mi><mi>u</mi><mi>e</mi></mrow></msub></mfrac></mrow><annotation encoding=\"application/x-tex\">DE_p = \\frac{\\tilde{G}_{p,pred} \\cap G_{p,true}}{n_{p,true}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9694em;vertical-align:-0.2861em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">E</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0576em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">p</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.5693em;vertical-align:-0.9721em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5972em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">r</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">e</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9202em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\">G</span></span><span style=\"top:-3.6023em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.1667em\"><span class=\"mord\">~</span></span></span></span></span></span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">re</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">∩</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">G</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">p</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">r</span><span class=\"mord mathnormal mtight\">u</span><span class=\"mord mathnormal mtight\">e</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9721em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>Do this for all predicted perturbations and take the mean to obtain the final score.</p>\n<h1>Conclusion</h1>\n<p>If a virtual cell can accurately model the change in a cells state in response to perturbations, we can look forward to a\nvery interesting time in pharma. I hope this post accelerated your understanding of the challenge.</p>\n<p>May the best team win.</p>",
            "url": "https://fleetwood.dev/posts/virtual-cell-challenge",
            "title": "Arc Virtual Cell Challenge: A Primer",
            "summary": "Arc recently unveiled the Virtual Cell Challenge - we explore the challenge from a non-biologist's perspective in the hope of encouraging participation from a range of disciplines.",
            "date_modified": "2025-07-05T00:00:00.000Z",
            "author": {
                "name": "Christopher Fleetwood",
                "url": "https://fleetwood.dev"
            }
        },
        {
            "id": "https://fleetwood.dev/posts/domain-specific-architectures",
            "content_html": "<p><em>Billions</em> of people may be continuously running AI inference for their waking hours in the near future. Satisfying this demand requires relentless focus on efficiency to reduce the required quantities of two key inputs: <em>energy</em> and <em>capital</em>. The constraints on these inputs in conjunction with the\nslowing and/or stagnation of both <a href=\"https://en.wikipedia.org/wiki/Moore%27s_law\" rel=\"nofollow\" target=\"_blank\">Moore&#x27;s Law</a> and <a href=\"https://en.wikipedia.org/wiki/Dennard_scaling\" rel=\"nofollow\" target=\"_blank\">Dennard Scaling</a> has left hardware architects no choice but to pursue <a href=\"https://en.wikipedia.org/wiki/Domain-specific_architecture\" rel=\"nofollow\" target=\"_blank\">Domain Specific Architectures</a> (DSAs) - architectures tailored to the task at hand.</p>\n<p>The current dominance of GPUs in modern deep learning is largely accidental - it was pure\nserendipity that the computational workload of graphics and deep learning were\nsimilar. Remnants of their graphical heritage still persist in GPU architectures today.\nWhat would AI inference hardware look like if it was redesigned carte blanche? By working backwards from the AI inference workload, we can determine some optimal properties these DSAs should have. Furthermore, we will attempt to\npredict the direction the inference paradigm will shift over time - a crucial exercise for\nhardware architects and engineers alike to ensure return on investment.</p>\n<h1>It&#x27;s memory, not compute that matters</h1>\n<blockquote>\n<p>&gt;90% of the total system energy is spent on memory in large ML models\n<br/>\n<!-- -->— <a href=\"https://people.inf.ethz.ch/omutlu/pub/onur-DAC-LightningTalk-MemoryCentricComputing-13-July-2023.pdf\" rel=\"nofollow\" target=\"_blank\">Onur Mutlu</a></p>\n</blockquote>\n<p>Often times it feels like many people have a view of computing\nsystems much like natural philosophers did prior to Copernicus — compute is like\nthe earth, at the centre of the universe. This is a <strong>bad</strong> model for understanding\nAI inference.</p>\n<p>Progress in memory latency and throughput has lagged behind Moore&#x27;s law since\nthe beginning <sup><a href=\"#user-content-fn-2\" id=\"user-content-fnref-2\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">1</a></sup>.Reading and writing from memory is <strong>extraordinarily slow</strong> when compared to computation. Below is one of my favourite animations\nadapted from <a href=\"https://x.com/vrushankdes\" rel=\"nofollow\" target=\"_blank\">Vrushank Desai</a>, demonstrating the\nlatency of different operations on an A100 GPU<sup><a href=\"#user-content-fn-3\" id=\"user-content-fnref-3\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">2</a></sup>.</p>\n<video src=\"/npu/latency.mp4\"></video>\n<p>Latency and throughput isn&#x27;t the only issue with memory. The one true optimization objective for any accelerator is <strong>performance per dollar</strong>, and reducing power consumption is one key way of reducing the Total Cost of Ownership (TCO). The graph below compares the energy cost of different operations on different\ndatatypes for the Google TPU v4i <sup><a href=\"#user-content-fn-1\" id=\"user-content-fnref-1\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">3</a></sup>.</p>\n<p><img src=\"/npu/energy_consumption.png\" alt=\"Energy Consumption\"/></p>\n<p>Note the <strong>log</strong> y-axis, this should highlight how <strong>incredibly energy intensive reading and writing to DRAM\nis</strong>, regardless of the memory standard, when compared with actual computation.</p>\n<p>All of this together allows us to define one key property of any DSA\nfor AI inference to decrease power consumption and improve performance — <strong>minimize data movement</strong>.</p>\n<p>We will use this as an axiom for the remainder of the essay.</p>\n<h1>A simplified model of Transformer inference</h1>\n<p>As always, we must begin with the problem we are trying to solve, and work\nbackwards towards our solution. We will iteratively build up a more and more\ndetailed picture of inference in an attempt to derive optimal hardware\nprinciples.</p>\n<div></div>\n<p>The above animation intends to give a very simple visual model of performing work\non a traditional accelerator (e.g. a GPU) with different bottlenecks - try clicking on the different bound\ntypes! This simple model affords us some intuition about performing a\nforward pass of a transformer. A simplified forward pass consists solely of moving <strong>extremely large, contiguous weight\ntensors</strong> from Global Memory (DRAM) to Shared Memory (L1 / SRAM), performing some number of operations on them\nin combination with our inputs, then storing the results back in DRAM.</p>\n<p>When memory bound, we can see there are instances when the\nProcessing Elements (PEs) are doing nothing (a.k.a stalling). When compute bound, we see the PEs cannot\nkeep up with the data flowing through memory. As compute progress has outpaced\nmemory progress, keeping our compute units fed is <a href=\"https://horace.io/brrr_intro.html\" rel=\"nofollow\" target=\"_blank\">one of the core optimization principles</a> in deep learning.\nTherefore, optimizing the process of moving these weight tensors to and from memory should be <strong>the underlying principle</strong> behind many of\nour hardware design decisions.</p>\n<h3>Lower precision</h3>\n<p>The first port of call when optimizing a piece of work is to simply try to <strong>avoid doing the work at all</strong>. One obvious way to do this is reducing the size of our weight tensors by using smaller data types.\nWe can cut our work in half, quarters or more by representing our weights with\nless bits. All modern accelerators support FP8 inference, pushing into\nFP4 territory in the latest generation. In order to maintain numerical accuracy with lower bit widths, hardware support\nfor high precision accumulation and online quantization is necessary. There is however another less obvious incentive for reducing the bit width of our data types.</p>\n<p>The number of <a href=\"https://en.wikipedia.org/wiki/Adder_(electronics)\" rel=\"nofollow\" target=\"_blank\">full adders</a> that are required for multiplying two floating point\nnumbers together is <strong>a quadratic function of the number of mantissa bits</strong>.\nFor two floating point numbers with <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>M</mi></mrow><annotation encoding=\"application/x-tex\">M</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">M</span></span></span></span> mantissa bits, the multiplication requires an\n<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">(</mo><mi>M</mi><mo>+</mo><mn>1</mn><mo stretchy=\"false\">)</mo><mo>⋅</mo><mo stretchy=\"false\">(</mo><mi>M</mi><mo>+</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">(M+1) \\cdot (M+1)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\">1</span><span class=\"mclose\">)</span></span></span></span> array of full adders. This translates to 576 full adders for FP32, 121 for FP16, 64 for BF16 and 16 for FP8(E4M3) <sup><a href=\"#user-content-fn-4\" id=\"user-content-fnref-4\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">4</a></sup>.\nLess adders per multiplier means we can fit more compute per unit area, or trade the die area\nfor L1 cache.</p>\n<p>These two forces together are a powerful incentive to reduce the precision of our\nmodel. Unfortunately, there are limits — how many bits does a transformer really\nneed? Given that intelligence and compression are deeply intertwined<sup><a href=\"#user-content-fn-17\" id=\"user-content-fnref-17\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">5</a></sup>, we cannot\nexpect to continue to reduce the precision of our datatypes without compromising\nmodel performance. Scaling Laws for Precision<sup><a href=\"#user-content-fn-5\" id=\"user-content-fnref-5\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">6</a></sup> explores this in depth, and it\nseems that the optimal region sits somewhere between 7 and 8 bits.</p>\n<blockquote>\n<p><strong>Principle 1: Hardware support for low precision data types.</strong></p>\n</blockquote>\n<h3>First class asynchronicity</h3>\n<p>One simple way to reduce the opportunity cost of moving a tensor from Global\nMemory to Shared Memory is to introduce asynchronous data movement. By\nimplementing <a href=\"https://en.wikipedia.org/wiki/Multiple_buffering\" rel=\"nofollow\" target=\"_blank\">double buffering</a>, the compute unit can work on one buffer while the other is being filled (skip ahead to <a href=\"#google-tpu\">the TPU animation</a> below for a\ndemonstration). The overlap of computation, memory transfers, and inter-accelerator communication is a prerequisite for optimal hardware utilization. This has been the defacto standard since the H100,\nand we should expect any self respecting DSAs to be designed &quot;asynchronous-first&quot;.</p>\n<blockquote>\n<p><strong>Principle 2: Design for asynchronous transfers from day 1.</strong></p>\n</blockquote>\n<h3>Dedicated hardware for memory transfers</h3>\n<p>We know that moving tensors from Global Memory to Shared Memory is a frequent\noperation in our forward pass. Therefore, it would be prudent\nto create an optimized subsystem within our hardware to expedite the transfers.\n<a href=\"https://en.wikipedia.org/wiki/Direct_memory_access\" rel=\"nofollow\" target=\"_blank\">Direct Memory Access</a> allows us to move our tensors between memory spaces (be it Global Memory to Shared Memory within one accelerator, or even to\nthe Shared Memory of another accelerator) without involving the\nhost (read: CPU) at all. Nvidia terms their subsystem the Tensor Memory Accelerator (TMA).</p>\n<p>It may sound ridiculous, but prior to the introduction of the TMA with the H100, Nvidia GPUs required <a href=\"https://pytorch.org/blog/hopper-tma-unit/\" rel=\"nofollow\" target=\"_blank\"><strong>all data to go via the\nregisters</strong></a> in the journey from Global Memory and Shared Memory, like forcing a river through a hosepipe. DMA that is aware of tensor layouts and can copy both within a node and between nodes (termed Remote Direct Memory Access or RDMA) should be a principle concern for any future DSA.</p>\n<blockquote>\n<p><strong>Principle 3: Dedicated hardware for tensor aware memory transfers.</strong></p>\n</blockquote>\n<h3>Optimal memory hierarchy for AI</h3>\n<p>Typically when we move data within a computing system, it travels through a\nmulti-level cache hierarchy, with each level of the hierarchy trading off latency/throughput for size or the inverse. This is analogous to the human memory hierarchy (<a href=\"https://en.wikipedia.org/wiki/Memory#Sensory_memory\" rel=\"nofollow\" target=\"_blank\">sensory memory</a>, working memory and long-term memory).</p>\n<p>The number of levels in your hierarchy and their size is a function of the expected workload with respect to two dimensions:</p>\n<ol>\n<li><strong>Temporal Locality</strong>: If you&#x27;ve used a piece of data, you&#x27;re likely to reuse it again soon.</li>\n<li><strong>Spatial Locality</strong>: If you&#x27;ve used a piece of data, you&#x27;re likely to access neighbouring elements.</li>\n</ol>\n<p>In multiprocessor architectures, privacy and data consistency are also key considerations in cache design. For example, the H100 features an 80MB shared L2 cache that is accessible by multiple Streaming Multiprocessors (SMs), requiring sophisticated synchronization and coherency checks to maintain data consistency, which is inherently slow.</p>\n<p><strong>Utilizing a memory hierarchy designed for one workload will be inefficient for\nanother.</strong> CPUs are designed for very unpredictable workloads, with random access\npatterns being the norm. GPUs on the other hand have threads frequently reusing\ndata and in larger contiguous chunks. These different access patterns lend\nthemselves to different designs, with CPUs benefiting from multiple smaller cache levels to expedite random accesses at different temporal and spatial localities. GPUs have more predictable and bulk-oriented access patterns, meaning they can effectively utilize fewer, larger cache levels. We should intuitively understand why CPUs often have a 3/4 tier cache hierarchy whereas GPUs typically have only 2 tiers.</p>\n<p>Let&#x27;s return to our simplified model and derive the optimal hierarchy for AI\ninference. We move weight tensors from DRAM into L1, using the whole weight tensor at ~the same time. Once we are done with that tensor, <strong>we will not use it again until the subsequent forward pass</strong>.\nTherefore, the idea of caching (storing data nearby for reuse) falls apart altogether. If we use a typical\ncache hierarchy for AI inference, we will be filling our L1/L2 with data that won&#x27;t be reused any time soon.</p>\n<p>What we really want is a high bandwidth, low latency storage space, much like L1/SRAM, but outside of the usual caching\nparadigm. This storage space can store chunks of our weight tensors whilst we\noperate on them. This is known as a <a href=\"https://en.wikipedia.org/wiki/Scratchpad_memory\" rel=\"nofollow\" target=\"_blank\">scratchpad</a>. A scratchpad negates all of the usual overheads that come with cache\nmanagement, and gives us the fast working memory we need! We have now arrived at our next principle: <strong>replace your cache hierarchy with an outsized scratchpad</strong>.</p>\n<p>In theory, if our scratchpad was large enough, we could store entire intermediate tensors between kernel invocations! This would completely remove roundtrips to\nGlobal Memory. Unfortunately, L1 is located on-chip and is implemented using SRAM which requires <a href=\"https://en.wikipedia.org/wiki/Static_random-access_memory#Design\" rel=\"nofollow\" target=\"_blank\"><strong>6(!)</strong> transistors per bit of storage</a>, meaning even small amounts\nof storage (read: kilobytes) takes an enormous amount of die area which could\notherwise be used for compute. This hasn&#x27;t stopped many hardware startups like <a href=\"https://cerebras.ai/\" rel=\"nofollow\" target=\"_blank\">Cerebras</a>, <a href=\"https://groq.com/\" rel=\"nofollow\" target=\"_blank\">Groq</a> and <a href=\"https://tenstorrent.com/en\" rel=\"nofollow\" target=\"_blank\">Tenstorrent</a> from trading off some amount of compute for cache.</p>\n<blockquote>\n<p><strong>Principle 4: Replace your cache hierarchy with an outsized scratchpad.</strong></p>\n</blockquote>\n<h1>Anatomy of AI Inference</h1>\n<p>We will now focus on an algorithmic analysis of Transformer inference from a single accelerator perspective. This section is largely derived from Chapter 7 of <em>How to scale your model</em><sup><a href=\"#user-content-fn-6\" id=\"user-content-fnref-6\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">7</a></sup> and <em>Transformer Inference Arithmetic</em><sup><a href=\"#user-content-fn-7\" id=\"user-content-fnref-7\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">8</a></sup>. I would recommend studying them both for a comprehensive understanding.</p>\n<p>A modern Transformer consists of only 6 operations:</p>\n<ol>\n<li>Some form of Attention (Multi Head Attention (MHA), Grouped Query Attention (GQA) etc)</li>\n<li>A Feedforward Network (Mixture of Experts for frontier models)</li>\n<li>Root Mean Square Normalization</li>\n<li>An activation function (typically Swish + GLU within the FFN)</li>\n<li>Rotary Position Embeddings</li>\n<li>Addition to the residual stream</li>\n</ol>\n<p>The Attention and Feedforward blocks are the most relevant when discussing\nperformance. In an ideal world, the other operations can be fused into the aforementioned\noperations and thus incur ~little cost.</p>\n<p>Inference consists of two distinct phases with <strong>very</strong> different computational requirements:</p>\n<ol>\n<li><strong>Prefill</strong>: Processing a long initial sequence (system/user prompt) in parallel, populating the KV Cache.</li>\n<li><strong>Decode</strong>: Autoregressively generating tokens using the KV Cache.</li>\n</ol>\n<p>Every algorithm has a property known as the <a href=\"https://en.wikipedia.org/wiki/Roofline_model#Arithmetic_intensity\" rel=\"nofollow\" target=\"_blank\">Arithmetic Intensity</a>, which is the\nratio of operations performed to bytes accessed. The counterpart to this is\nthe <code>ops:byte</code> ratio of the hardware accelerator, which is the ratio of the\naccelerators &quot;math&quot; bandwidth and memory bandwidth. An algorithm is compute\nbound if the arithmetic intensity is higher than the accelerators <code>ops:byte</code>\nratio. We should endeavour to be compute bound so that our hardware is always doing useful work. Understanding the interplay between these ratios is key for performance.</p>\n<p>Below is a table of key metrics for the most popular hardware accelerators as of\nFebruary 2025:</p>\n<div id=\"table-container\"><table><thead><tr><th>Hardware</th><th>FP16 TFLOPs</th><th colspan=\"3\">DRAM</th><th colspan=\"3\">SRAM</th></tr><tr><th></th><th></th><th>Size (GB)</th><th>BW (TB/s)</th><th>Ops:byte</th><th>Size (MB)</th><th>BW (TB/s)</th><th>Ops:byte</th></tr></thead><tbody><tr><td>NVIDIA H100-SXM5</td><td>989</td><td>80</td><td>3.35</td><td>295.2</td><td>30.8<sup>1</sup></td><td>31</td><td>31.9</td></tr><tr><td>NVIDIA A100</td><td>312</td><td>40</td><td>2.0</td><td>156.0</td><td>20.7<sup>2</sup></td><td>~19</td><td>16.4</td></tr><tr><td>Tenstorrent Blackhole</td><td>387</td><td>32</td><td>0.51</td><td>758.8</td><td>204.9<sup>3</sup></td><td>46.6</td><td>8.3</td></tr><tr><td>Google TPU v4i</td><td>138</td><td>8</td><td>0.61</td><td>226.2</td><td>144</td><td>~13.5<sup>4</sup></td><td>10.2</td></tr><tr><td colspan=\"8\"><sup>1</sup> 228KiB * 132 SMs, <sup>2</sup> 192KB * 108 SMs, <sup>3</sup> 1464KB * 140 Tensix Cores, <sup>4</sup>Estimated</td></tr></tbody></table></div>\n<p>As a mental reference point for these capacities, a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo fence=\"true\">[</mo><mn>4096</mn><mo separator=\"true\">,</mo><mn>4096</mn><mo fence=\"true\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\left[4096, 4096\\right]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord\">4096</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">4096</span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span></span></span></span> matrix (e.g. Llama2\nattention projections) in BF16 is 32MiB (~33.6MB). Determining if an operation is compute or memory bandwidth bound becomes quite complex, given the capacity constraints and throughput variation between the different memory spaces. Despite the difficulty, we will attempt to determine the typical boundedness of our two core operations in a transformer forward pass. This should inform the optimal balance between memory bandwidth and compute capacity.</p>\n<h2>Matrix Multiplications</h2>\n<p>Almost the entirety of the FLOPs spent on our model will be in matrix\nmultiplications. Matrix multiplication has a very interesting property that bears repeating: the computational cost grows with the cube of the dimension <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><msup><mi>n</mi><mn>3</mn></msup><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{O}(n^3)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0641em;vertical-align:-0.25em\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>, whereas the number of memory accesses grows with the square of the dimension <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">O</mi><mo stretchy=\"false\">(</mo><msup><mi>n</mi><mn>2</mn></msup><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\mathcal{O}(n^2)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0641em;vertical-align:-0.25em\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>. This means as the dimension grows the ratio of compute to data <em>improves</em> (from a hardware perspective).</p>\n<p>In a transformer, matmuls occur primarily in two places:</p>\n<ol>\n<li><strong>Attention Projections</strong>: Computing the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>q</mi></mrow><annotation encoding=\"application/x-tex\">q</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>k</mi></mrow><annotation encoding=\"application/x-tex\">k</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>v</mi></mrow><annotation encoding=\"application/x-tex\">v</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>o</mi></mrow><annotation encoding=\"application/x-tex\">o</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">o</span></span></span></span> projections.</li>\n<li><strong>Feedforward Networks</strong>: The large linear transformations in each FFN block.</li>\n</ol>\n<p>The most common form of matmul we will perform in a forward pass is <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mrow><mo fence=\"true\">[</mo><mi>B</mi><mo separator=\"true\">,</mo><msub><mi>D</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo fence=\"true\">]</mo></mrow><mo>⋅</mo><mrow><mo fence=\"true\">[</mo><msub><mi>D</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo separator=\"true\">,</mo><mi>F</mi><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\left[B, D_{model}\\right] \\cdot \\left[D_{model}, F\\right]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span></span></span></span>, where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>B</mi></mrow><annotation encoding=\"application/x-tex\">B</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span></span></span></span> is the batch size, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>D</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">D_{model}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> is our model dimension and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>F</mi></mrow><annotation encoding=\"application/x-tex\">F</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span> is the outer dimension of the weight tensor in question.\nAt what point does this operation become compute bound on a given accelerator?</p>\n<p>We are looking to determine when <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>T</mi><mrow><mi>m</mi><mi>a</mi><mi>t</mi><mi>h</mi></mrow></msub><mo>≥</mo><msub><mi>T</mi><mrow><mi>m</mi><mi>e</mi><mi>m</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">T_{math} \\ge T_{mem}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ma</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\">h</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\">m</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> for the following:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>T</mi><mtext>math</mtext></msub><mo>=</mo><mfrac><mtext>Total FLOPs</mtext><mtext>Accelerator FLOPs/s</mtext></mfrac><mo>=</mo><mfrac><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mi>F</mi></mrow><mtext>Accelerator FLOPs/s</mtext></mfrac></mrow><annotation encoding=\"application/x-tex\">T_\\text{math} = \\frac{\\text{Total FLOPs}}{\\text{Accelerator FLOPs/s}} = \\frac{2BDF}{\\text{Accelerator FLOPs/s}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">math</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.3074em;vertical-align:-0.936em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Accelerator FLOPs/s</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Total FLOPs</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.2963em;vertical-align:-0.936em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Accelerator FLOPs/s</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.936em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>T</mi><mtext>mem</mtext></msub><mo>=</mo><mfrac><mtext>Total Bytes</mtext><mtext>DRAM Bandwidth</mtext></mfrac><mo>=</mo><mfrac><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mo>+</mo><mn>2</mn><mi>F</mi><mi>D</mi><mo>+</mo><mn>2</mn><mi>B</mi><mi>F</mi></mrow><mtext>DRAM Bandwidth</mtext></mfrac></mrow><annotation encoding=\"application/x-tex\">T_\\text{mem} = \\frac{\\text{Total Bytes}}{\\text{DRAM Bandwidth}} = \\frac{2BD + 2FD + 2BF}{\\text{DRAM Bandwidth}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">mem</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0574em;vertical-align:-0.686em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3714em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">DRAM Bandwidth</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Total Bytes</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0463em;vertical-align:-0.686em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">DRAM Bandwidth</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">BF</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>For our loads, we multiply each term by a factor of two as our values are 2 bytes. In order for the matmul to become compute bound on a given accelerator, the arithmetic intensity must exceed the <code>ops:byte</code> ratio:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mi>F</mi></mrow><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mo>+</mo><mn>2</mn><mi>D</mi><mi>F</mi><mo>+</mo><mn>2</mn><mi>B</mi><mi>F</mi></mrow></mfrac><mo>≥</mo><mfrac><mtext>Accelerator FLOPs/s</mtext><mtext>DRAM Bandwidth</mtext></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{2BDF}{2BD + 2DF + 2BF} \\geq \\frac{\\text{Accelerator FLOPs/s}}{\\text{DRAM Bandwidth}} </annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.1297em;vertical-align:-0.7693em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">BF</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.113em;vertical-align:-0.686em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">DRAM Bandwidth</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Accelerator FLOPs/s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>As demonstrated in <em>How to scale your model</em><sup><a href=\"#user-content-fn-6\" id=\"user-content-fnref-6-2\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">7</a></sup>, we can simplify the denominator\nto <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mi>D</mi><mi>F</mi></mrow><annotation encoding=\"application/x-tex\">2DF</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span>, as both <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>F</mi></mrow><annotation encoding=\"application/x-tex\">F</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span> are typically much larger than <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>B</mi></mrow><annotation encoding=\"application/x-tex\">B</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span></span></span></span>. Using an Nvidia H100 as our reference, capable of 989 TFLOPs of BF16 compute with memory bandwidth of 3.35TB/s, we get the following:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right\" columnspacing=\"\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mfrac><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mi>F</mi></mrow><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mo>+</mo><mn>2</mn><mi>D</mi><mi>F</mi><mo>+</mo><mi>B</mi><mi>F</mi></mrow></mfrac><mo>≊</mo><mfrac><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mi>F</mi></mrow><mrow><mn>2</mn><mi>D</mi><mi>F</mi></mrow></mfrac><mo>≥</mo><mfrac><mtext>Accelerator FLOPs/s</mtext><mtext>DRAM Bandwidth</mtext></mfrac></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mo>=</mo><mfrac><mrow><mn>9.89</mn><mo>×</mo><msup><mn>10</mn><mn>14</mn></msup></mrow><mrow><mn>3.35</mn><mo>×</mo><msup><mn>10</mn><mn>12</mn></msup></mrow></mfrac><mtext>  </mtext><mo>⟹</mo><mtext>  </mtext><mi>B</mi><mo>≥</mo><mn>295</mn><mo>=</mo><msub><mi>B</mi><mtext>crit</mtext></msub></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{align*} \n\\frac{2BDF}{2BD + 2DF + BF} \\approxeq \\frac{2BDF}{2DF} \\geq \\frac{\\text{Accelerator FLOPs/s}}{\\text{DRAM Bandwidth}} \\\\[0.9em] \n= \\frac{9.89 \\times 10^{14}}{3.35 \\times 10^{12}} \\implies B \\geq 295 = B_{\\text{crit}} \n\\end{align*}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:5.5474em;vertical-align:-2.5237em\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.0237em\"><span style=\"top:-5.0878em\"><span class=\"pstrut\" style=\"height:3.4911em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">BF</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel amsrm\">≊</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">DRAM Bandwidth</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">Accelerator FLOPs/s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span><span style=\"top:-2.0367em\"><span class=\"pstrut\" style=\"height:3.4911em\"></span><span class=\"mord\"><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4911em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">3.35</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">1</span><span class=\"mord\"><span class=\"mord\">0</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7401em\"><span style=\"top:-2.989em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">12</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">9.89</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">1</span><span class=\"mord\"><span class=\"mord\">0</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">14</span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">⟹</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">295</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3175em\"><span style=\"top:-2.55em;margin-left:-0.0502em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">crit</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.5237em\"><span></span></span></span></span></span></span></span></span></span></span></span>\n<p>This means our matmuls will only become compute bound when our <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>B</mi><mo>≥</mo><mn>295</mn></mrow><annotation encoding=\"application/x-tex\">B \\geq 295</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8193em;vertical-align:-0.136em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">295</span></span></span></span>. During\ninference, this will almost certainly occur during prefill (most system + user\nprompts exceed this (ignoring prompt caching)). However during decoding it will\nbe quite the challenge to achieve this batch size.</p>\n<p>One might think that quantizing our weights to <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>F</mi><mi>P</mi><mn>8</mn></mrow><annotation encoding=\"application/x-tex\">FP8</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">FP</span><span class=\"mord\">8</span></span></span></span> would cut our <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>B</mi><mrow><mi>c</mi><mi>r</mi><mi>i</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">B_{crit}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0502em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">cr</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> in\nhalf, but the H100 can also perform 2x the number of FLOPS in FP8, so we are\nback where we started. If we wanted to lower our <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>B</mi><mrow><mi>c</mi><mi>r</mi><mi>i</mi><mi>t</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">B_{crit}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0502em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">cr</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>, what hardware changes could we make? The obvious one is\nto simply increase our memory bandwidth by stumping up for more advanced High Bandwidth\nMemory (HBM). However, this has diminishing returns with respect to our true optimization objection of performance /\ndollar. Another approach could be to trade off compute for SRAM as discussed\nearlier. In our previous table we can see that <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>B</mi><mrow><mi>c</mi><mi>r</mi><mi>i</mi><mi>t</mi></mrow></msub><mo>=</mo><mn>32</mn></mrow><annotation encoding=\"application/x-tex\">B_{crit} = 32</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0502em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.02778em\">cr</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mord mathnormal mtight\">t</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">32</span></span></span></span> for the H100,\nmeaning we can accommodate algorithms with much lower intensity whilst still\nsaturating our compute.</p>\n<h2>Attention</h2>\n<p><em>How to scale your model</em><sup><a href=\"#user-content-fn-6\" id=\"user-content-fnref-6-3\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">7</a></sup> rigorously analyses attention to determine its Arithmetic Intensity. We will summarize it here, expand with recent advancements and what implications they have for hardware.</p>\n<p>Using FlashAttention<sup><a href=\"#user-content-fn-8\" id=\"user-content-fnref-8\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">9</a></sup>, the Arithmetic Intensity of a single head of (multi-headed) attention is as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>Attention Intensity</mtext><mo>=</mo><mfrac><mrow><mn>4</mn><mi>B</mi><mi>S</mi><mi>T</mi><mi>D</mi></mrow><mrow><mn>4</mn><mi>B</mi><mi>S</mi><mi>D</mi><mo>+</mo><mn>4</mn><mi>B</mi><mi>T</mi><mi>D</mi></mrow></mfrac><mo>=</mo><mfrac><mrow><mi>S</mi><mi>T</mi></mrow><mrow><mi>S</mi><mo>+</mo><mi>T</mi></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">\\text{Attention Intensity} = \\frac{4BSTD}{4BSD + 4BTD} = \\frac{ST}{S+T}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8778em;vertical-align:-0.1944em\"></span><span class=\"mord text\"><span class=\"mord\">Attention Intensity</span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1297em;vertical-align:-0.7693em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">BS</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">BT</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">BST</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1297em;vertical-align:-0.7693em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">S</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">ST</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>Where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>S</mi></mrow><annotation encoding=\"application/x-tex\">S</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">S</span></span></span></span> is the (existing) sequence length, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>T</mi></mrow><annotation encoding=\"application/x-tex\">T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span></span></span> is the number of tokens we are\ncurrently sampling, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>B</mi></mrow><annotation encoding=\"application/x-tex\">B</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span></span></span></span> is the batch size and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> is our <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>D</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">D_{model}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>. During prefill <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>S</mi><mo>=</mo><mi>T</mi></mrow><annotation encoding=\"application/x-tex\">S = T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">S</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span></span></span>, so the expression simplifies to <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>T</mi><mn>2</mn></msup><mi mathvariant=\"normal\">/</mi><mn>2</mn><mi>T</mi><mo>=</mo><mi>T</mi><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow><annotation encoding=\"application/x-tex\">T^2 /2T = T/2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0641em;vertical-align:-0.25em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mord\">/2</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"mord\">/2</span></span></span></span>. This is quite favourable for hardware utilization! For our H100, this means that we are compute bound if <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>T</mi><mo>≳</mo><mn>590</mn></mrow><annotation encoding=\"application/x-tex\">T \\gtrsim 590</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9592em;vertical-align:-0.2296em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel amsrm\">≳</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">590</span></span></span></span> — which\nshould be very common during prefill (e.g. a standard system prompt). Decoding however, is a different story.</p>\n<p>During decode <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>T</mi><mo>=</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">T = 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">1</span></span></span></span>, and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>S</mi><mo>≫</mo><mi>T</mi></mrow><annotation encoding=\"application/x-tex\">S \\gg T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7224em;vertical-align:-0.0391em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">S</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≫</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span></span></span>, which allows us to approximate the original expression as:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mfrac><mrow><mi>S</mi><mi>T</mi></mrow><mrow><mi>S</mi><mo>+</mo><mi>T</mi></mrow></mfrac><mo>≈</mo><mn>1</mn></mrow><annotation encoding=\"application/x-tex\">\\frac{ST}{S+T} \\approx 1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.1297em;vertical-align:-0.7693em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">S</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">ST</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.7693em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≈</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">1</span></span></span></span></span>\n<p>Arithmetic intensity this low means <strong>decode is fundamentally memory bound</strong>. This should make intuitive sense — we are loading a (potentially large) KV cache and performing a small number of operations due to the volume of input data (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo fence=\"true\">[</mo><mn>1</mn><mo separator=\"true\">,</mo><msub><mi>D</mi><mrow><mi>m</mi><mi>o</mi><mi>d</mi><mi>e</mi><mi>l</mi></mrow></msub><mo fence=\"true\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\left[1, D_{model}\\right]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">m</span><span class=\"mord mathnormal mtight\">o</span><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mathnormal mtight\">e</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.01968em\">l</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span></span></span></span>). Without some architectural modifications (e.g. MQA, GQA, MLA) to reduce the size of the cache, there isn&#x27;t much we can do to improve this situation.</p>\n<p>This makes decreasing the size of our KV cache one of our primary concerns,\nnot only for inference performance, but also for <em>capacity</em>. Earlier, our simplified model assumed only weight tensors are consuming DRAM capacity. However, the size of our cache quickly becomes considerable. Let&#x27;s do some napkin math using Deepseek V3<sup><a href=\"#user-content-fn-15\" id=\"user-content-fnref-15\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">10</a></sup> 671Bs hyperparameters to demonstrate, where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>n</mi><mtext>layers</mtext></msub><mo>=</mo><mn>61</mn></mrow><annotation encoding=\"application/x-tex\">n_{\\text{layers}} = 61</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7167em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">layers</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">61</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>d</mi><mtext>model</mtext></msub><mo>=</mo><mn>7168</mn></mrow><annotation encoding=\"application/x-tex\">d_{\\text{model}} = 7168</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">model</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">7168</span></span></span></span>. V3 didn&#x27;t use MHA, so we will use a plausible value of <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>d</mi><mtext>head</mtext></msub><mo>=</mo><mn>128</mn></mrow><annotation encoding=\"application/x-tex\">d_{\\text{head}} = 128</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">head</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">128</span></span></span></span>.</p>\n<p>In the MHA case, we store both <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">k</mi><mo separator=\"true\">,</mo><mi mathvariant=\"bold\">v</mi><mo>∈</mo><msup><mi mathvariant=\"double-struck\">R</mi><msub><mi>d</mi><mtext>model</mtext></msub></msup></mrow><annotation encoding=\"application/x-tex\">\\mathbf{k}, \\mathbf{v} \\in \\mathbb{R}^{d_{\\text{model}}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em\"></span><span class=\"mord mathbf\">k</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathbf\" style=\"margin-right:0.01597em\">v</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">∈</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8491em\"></span><span class=\"mord\"><span class=\"mord mathbb\">R</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8491em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.3488em;margin-left:0em;margin-right:0.0714em\"><span class=\"pstrut\" style=\"height:2.5em\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">model</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1512em\"><span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span> for each layer using 2 bytes per value, meaning our cache size is as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>KV Cache Size</mtext></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mn>2</mn><mo>⋅</mo><mn>2</mn><mo>⋅</mo><msub><mi>n</mi><mtext>layers</mtext></msub><mo>⋅</mo><msub><mi>n</mi><mtext>heads</mtext></msub><mo>⋅</mo><msub><mi>d</mi><mtext>head</mtext></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mn>2</mn><mo>⋅</mo><mn>2</mn><mo>⋅</mo><mn>61</mn><mo>⋅</mo><mn>56</mn><mo>⋅</mo><mn>128</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mn>1</mn><mo separator=\"true\">,</mo><mn>749</mn><mtext> KB/token</mtext></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\text{KV Cache Size} &amp;= 2 \\cdot 2 \\cdot n_{\\text{layers}} \\cdot n_{\\text{heads}} \\cdot d_{\\text{head}} \\\\\n&amp;= 2 \\cdot 2 \\cdot 61 \\cdot 56 \\cdot 128  \\\\\n&amp;= 1,749 \\text{ KB/token}\n\\end{aligned}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:4.5em;vertical-align:-2em\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.5em\"><span style=\"top:-4.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">KV Cache Size</span></span></span></span><span style=\"top:-3.16em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-1.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2em\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.5em\"><span style=\"top:-4.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">layers</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">heads</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">head</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.16em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">61</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">56</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">128</span></span></span><span style=\"top:-1.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">1</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">749</span><span class=\"mord text\"><span class=\"mord\"> KB/token</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2em\"><span></span></span></span></span></span></span></span></span></span></span></span>\n<p>Given that a H100 has 80GB of DRAM, we will exceed the entire storage capacity once the\ncache hits ~46,000 tokens, just from the cache! This means naïve MHA fundamentally prohibits long context. We can see now why there\nhas been intense optimization pressure applied to reducing the size of the KV cache - it is key for both long context conversations <strong>and</strong> inference performance.\nThese optimizations included Multi Query Attention and Grouped Query Attention, but both methods trade model performance for cache size.</p>\n<p>More recently Deepseek introduced Multi Latent Attention<sup><a href=\"#user-content-fn-16\" id=\"user-content-fnref-16\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">11</a></sup> (MLA), with the intention of compressing the KV cache by performing a low-rank decomposition. This works well as data and computation are two sides of the same coin, and here we trade data for compute in the form of our up and down projection matrices.</p>\n<p>For MLA, we simply store a single compressed latent <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>d</mi><mtext>c</mtext></msub></mrow><annotation encoding=\"application/x-tex\">d_{\\text{c}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8444em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">c</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> for both\n<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">k</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{k}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathbf\">k</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">v</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{v}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4444em\"></span><span class=\"mord mathbf\" style=\"margin-right:0.01597em\">v</span></span></span></span>. Let&#x27;s redo our math:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mtext>MLA KV Cache Size</mtext></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mn>2</mn><mo>⋅</mo><msub><mi>n</mi><mtext>layers</mtext></msub><mo>⋅</mo><msub><mi>d</mi><mtext>c</mtext></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mn>2</mn><mo>⋅</mo><mn>61</mn><mo>⋅</mo><mn>512</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mn>62.5</mn><mtext> KB/token</mtext></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{aligned}\n\\text{MLA KV Cache Size} &amp;= 2 \\cdot n_{\\text{layers}} \\cdot d_{\\text{c}} \\\\\n&amp;= 2 \\cdot 61 \\cdot 512 \\\\\n&amp;= 62.5 \\text{ KB/token}\n\\end{aligned}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:4.5em;vertical-align:-2em\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.5em\"><span style=\"top:-4.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">MLA KV Cache Size</span></span></span></span><span style=\"top:-3.16em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-1.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2em\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.5em\"><span style=\"top:-4.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">layers</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">c</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.16em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">61</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\">512</span></span></span><span style=\"top:-1.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">62.5</span><span class=\"mord text\"><span class=\"mord\"> KB/token</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2em\"><span></span></span></span></span></span></span></span></span></span></span></span>\n<p>This means our KV cache is <strong>28x</strong> smaller using MLA. Our context would need to be ~1.3M (4x the length of Ulysses) tokens before\nexceeding our H100 capacity. An order of magnitude difference has implications\nfor hardware. Using MHA, we could only store 17 tokens worth of cache in SRAM on the H100, whereas with MLA we can store <strong>492</strong> tokens worth of cache. This means we can\nstore entire pages of attention on-chip when using something like PagedAttention<sup><a href=\"#user-content-fn-9\" id=\"user-content-fnref-9\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">12</a></sup>. Much like reducing the precision of our data types, there is a lower limit here. However, I wouldn&#x27;t be surprised to see inter-layer projections compressing the total cache size further.</p>\n<p>Despite recent optimizations, we can clearly see that any single accelerator is going to be unable\nto accommodate a frontier model. Therefore, we must spill over from one accelerator to many.</p>\n<blockquote>\n<p><strong>Principle 5: For a single accelerator, turn the memory bandwidth up to 11.</strong></p>\n</blockquote>\n<h1>Scaling out</h1>\n<p>All models being served by the frontier labs far exceed the size of a single accelerator.\nThis necessitates <em>sharding</em> our model across multiple accelerators and\ncommunicating between them. Adding more chips creates a tradeoff: while it distributes both the compute and memory burden, it also increases communication overhead between chips, reducing the amount of computation available per chip to hide the communication cost. At a certain point, adding chips no longer increases performance.</p>\n<p>Given that we want to use computation to hide communication cost, the ratio\nbetween the number of operations we can perform to number of bytes we can send\nbetween accelerators is important. This is known as the <code>ops:comms</code> ratio.</p>\n<p>Below is a table of the <code>ops:comms</code> ratio for the most popular accelerators\ntoday:</p>\n<div id=\"table-container\"><table><thead><tr><th>Hardware</th><th>FP16 TFLOPs</th><th>Interconnect</th><th>Bidirectional Bandwidth (GB/s)</th><th>Ops:Comms Ratio</th></tr></thead><tbody><tr><td>NVIDIA H100-SXM5</td><td>989</td><td>NVLink 4.0</td><td>900</td><td>1099</td></tr><tr><td>NVIDIA A100</td><td>312</td><td>NVLink 3.0</td><td>600</td><td>520</td></tr><tr><td>Tenstorrent Blackhole</td><td>387</td><td>Ethernet</td><td>800</td><td>484</td></tr><tr><td>Google TPU v5p</td><td>459</td><td>ICI</td><td>600</td><td>765</td></tr></tbody></table></div>\n<p>Different sharding strategies incur different communication costs. Determining at which point an accelerator becomes communication bound for a\ngiven strategy is <a href=\"https://jax-ml.github.io/scaling-book/inference/#distributing-inference-over-multiple-accelerators\" rel=\"nofollow\" target=\"_blank\">quite involved</a>. Luckily for us, <em>How to scale your model</em><sup><a href=\"#user-content-fn-6\" id=\"user-content-fnref-6-4\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">7</a></sup> derives useful heuristics we can use to get an idea of when modern transformers may become\ncommunication bound, and use it to inform our hardware design.</p>\n<p>We will restrict our analysis to the decoding phase for brevity. During decode we can only use <a href=\"https://huggingface.co/docs/transformers/v4.49.0/en/perf_train_gpu_many#tensor-parallelism\" rel=\"nofollow\" target=\"_blank\">Model Parallelism</a> for sharding our model. At what\npoint do we become bottlenecked by communication? To simplify the problem, we\nreduce our forward pass to a stack of <a href=\"https://en.wikipedia.org/wiki/Multilayer_perceptron\" rel=\"nofollow\" target=\"_blank\">MLPs</a>. We consider 1D Model Parallelism for an MLP layer where we perform an <strong>AllGather</strong> on the input activations and a <strong>ReduceScatter</strong> on\nthe output activations as shown <a href=\"https://jax-ml.github.io/scaling-book/training/#tensor-parallelism\" rel=\"nofollow\" target=\"_blank\">here</a>. We want to understand at what point <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>T</mi><mrow><mi>m</mi><mi>a</mi><mi>t</mi><mi>h</mi></mrow></msub><mo>≥</mo><msub><mi>T</mi><mrow><mi>c</mi><mi>o</mi><mi>m</mi><mi>m</mi><mi>s</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">T_{math} \\ge T_{comms}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ma</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\">h</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">co</span><span class=\"mord mathnormal mtight\">mm</span><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>, for our MLP performing\nthe following:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mtext>A</mtext><mtext>in</mtext></msub><mrow><mo fence=\"true\">[</mo><mi>B</mi><mo separator=\"true\">,</mo><msub><mi>D</mi><mi>s</mi></msub><mo fence=\"true\">]</mo></mrow><mo>⋅</mo><msub><mtext>W</mtext><mtext>in</mtext></msub><mrow><mo fence=\"true\">[</mo><mi>D</mi><mo separator=\"true\">,</mo><msub><mi>F</mi><mi>s</mi></msub><mo fence=\"true\">]</mo></mrow><mo>→</mo><mtext>tmp</mtext><mrow><mo fence=\"true\">[</mo><mi>B</mi><mo separator=\"true\">,</mo><msub><mi>F</mi><mi>s</mi></msub><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\text{A}_{\\text{in}}\\left[B, D_{s}\\right] \\cdot \\text{W}_{\\text{in}}\\left[D, F_{s}\\right] \\rightarrow \\text{tmp}\\left[B, F_{s}\\right]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">A</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3175em\"><span style=\"top:-2.55em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">in</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">W</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3175em\"><span style=\"top:-2.55em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">in</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord text\"><span class=\"mord\">tmp</span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span></span></span></span></span>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>tmp</mtext><mrow><mo fence=\"true\">[</mo><mi>B</mi><mo separator=\"true\">,</mo><msub><mi>F</mi><mi>s</mi></msub><mo fence=\"true\">]</mo></mrow><mo>⋅</mo><msub><mtext>W</mtext><mtext>out</mtext></msub><mrow><mo fence=\"true\">[</mo><msub><mi>F</mi><mi>s</mi></msub><mo separator=\"true\">,</mo><mi>D</mi><mo fence=\"true\">]</mo></mrow><mo>→</mo><msub><mtext>A</mtext><mtext>out</mtext></msub><mrow><mo fence=\"true\">[</mo><mi>B</mi><mo separator=\"true\">,</mo><msub><mi>D</mi><mi>s</mi></msub><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\text{tmp}\\left[B, F_{s}\\right] \\cdot \\text{W}_{\\text{out}}\\left[F_{s}, D\\right] \\rightarrow \\text{A}_{\\text{out}}\\left[B, D_{s}\\right]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord text\"><span class=\"mord\">tmp</span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">W</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">out</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord\">A</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2806em\"><span style=\"top:-2.55em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">out</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span></span></span></span></span>\n<p>Where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>B</mi></mrow><annotation encoding=\"application/x-tex\">B</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span></span></span></span> is our batch dimension, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> is our model dimension, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>F</mi></mrow><annotation encoding=\"application/x-tex\">F</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span> is our FFN\ndimension and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>s</mi></mrow><annotation encoding=\"application/x-tex\">s</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">s</span></span></span></span> denotes sharding.</p>\n<p>Given that we perform 2 matmuls, each performing <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mi>B</mi><mi>D</mi><mi>F</mi></mrow><annotation encoding=\"application/x-tex\">2BDF</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span> FLOPs, and we\ncommunicate 2 matrices of size <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo fence=\"true\">[</mo><mi>B</mi><mo separator=\"true\">,</mo><mi>D</mi><mo fence=\"true\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\left[B, D\\right]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mclose delimcenter\" style=\"top:0em\">]</span></span></span></span></span>, our <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>T</mi><mtext>math</mtext></msub></mrow><annotation encoding=\"application/x-tex\">T_{\\text{math}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">math</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>T</mi><mtext>comms</mtext></msub></mrow><annotation encoding=\"application/x-tex\">T_{\\text{comms}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">comms</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> are as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left right\" columnspacing=\"0em 1em\"><mtr><mtd class=\"mtr-glue\"></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>T</mi><mtext>math</mtext></msub><mo>=</mo><mfrac><mrow><mn>4</mn><mo>⋅</mo><mi>B</mi><mo>⋅</mo><mi>D</mi><mo>⋅</mo><mi>F</mi></mrow><mrow><mi>Y</mi><mo>⋅</mo><mi>C</mi></mrow></mfrac></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>T</mi><mtext>comms</mtext></msub><mo>=</mo><mfrac><mrow><mn>4</mn><mo>⋅</mo><mi>B</mi><mo>⋅</mo><mi>D</mi></mrow><mi>W</mi></mfrac></mrow></mstyle></mtd><mtd class=\"mtr-glue\"></mtd><mtd class=\"mml-eqn-num\"></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{align} \nT_{\\text{math}} = \\frac{4 \\cdot B \\cdot D \\cdot F}{Y \\cdot C} &amp;&amp; T_{\\text{comms}} = \\frac{4 \\cdot B \\cdot D}{W}\n\\end{align}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.3463em;vertical-align:-0.9232em\"></span><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4232em\"><span style=\"top:-3.4232em\"><span class=\"pstrut\" style=\"height:3.3603em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">math</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">C</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">4</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9232em\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4232em\"><span style=\"top:-3.4232em\"><span class=\"pstrut\" style=\"height:3.3603em\"></span><span class=\"mord\"><span class=\"mord\"></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9232em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em\"></span><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4232em\"><span style=\"top:-3.4232em\"><span class=\"pstrut\" style=\"height:3.3603em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">comms</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">W</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">4</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9232em\"><span></span></span></span></span></span></span></span><span class=\"tag\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4232em\"><span style=\"top:-3.4232em\"><span class=\"pstrut\" style=\"height:3.3603em\"></span><span class=\"eqn-num\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9232em\"><span></span></span></span></span></span></span></span></span>\n<p>Where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Y</mi></mrow><annotation encoding=\"application/x-tex\">Y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span></span></span></span> is the number of accelerators, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>C</mi></mrow><annotation encoding=\"application/x-tex\">C</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">C</span></span></span></span> is the per accelerator FLOPs and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>W</mi></mrow><annotation encoding=\"application/x-tex\">W</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">W</span></span></span></span> is our band<strong>w</strong>idth. As we want to know when <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>T</mi><mrow><mi>m</mi><mi>a</mi><mi>t</mi><mi>h</mi></mrow></msub><mo>≥</mo><msub><mi>T</mi><mrow><mi>c</mi><mi>o</mi><mi>m</mi><mi>m</mi><mi>s</mi></mrow></msub></mrow><annotation encoding=\"application/x-tex\">T_{math} \\ge T_{comms}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ma</span><span class=\"mord mathnormal mtight\">t</span><span class=\"mord mathnormal mtight\">h</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">≥</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">co</span><span class=\"mord mathnormal mtight\">mm</span><span class=\"mord mathnormal mtight\">s</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>, this simplifies down to:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right\" columnspacing=\"\"><mtr><mtd class=\"mtr-glue\"></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mi>F</mi><mo>&gt;</mo><mi>Y</mi><mo>⋅</mo><mfrac><mi>C</mi><mi>W</mi></mfrac></mrow></mstyle></mtd><mtd class=\"mtr-glue\"></mtd><mtd class=\"mml-eqn-num\"></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{align} F &gt; Y \\cdot \\frac{C}{W} \\end{align}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.3463em;vertical-align:-0.9232em\"></span><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4232em\"><span style=\"top:-3.4232em\"><span class=\"pstrut\" style=\"height:3.3603em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">&gt;</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">W</span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">C</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.686em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9232em\"><span></span></span></span></span></span></span></span><span class=\"tag\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.4232em\"><span style=\"top:-3.4232em\"><span class=\"pstrut\" style=\"height:3.3603em\"></span><span class=\"eqn-num\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9232em\"><span></span></span></span></span></span></span></span></span>\n<p>Astute readers will notice that <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mi>C</mi><mi>W</mi></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{C}{W}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2173em;vertical-align:-0.345em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8723em\"><span style=\"top:-2.655em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em\">W</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.394em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07153em\">C</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span> is our <code>ops:comms</code> ratio! This formula is quite handy. Using <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>F</mi><mo>=</mo><mn>18432</mn></mrow><annotation encoding=\"application/x-tex\">F = 18432</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">F</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">18432</span></span></span></span> <a href=\"https://github.com/deepseek-ai/DeepSeek-V3/blob/592fd5daf8177b205af11651bbb31a1834a8b0e0/inference/configs/config_671B.json#L4\" rel=\"nofollow\" target=\"_blank\">from Deepseek V3</a>, we can see that we become communication bound on an H100 at ~16 GPUs. If we had a more\nfavourable <code>ops:comms</code> ratio, we could scale our tensor parallelism further. This analysis neglects latency, which can dominate communication time when the volume of data transferred is low, please refer to <em>Transformer Inference Arithmetic</em><sup><a href=\"#user-content-fn-7\" id=\"user-content-fnref-7-2\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">8</a></sup> for a more thorough\ntreatment.</p>\n<p>We will now briefly cover Expert Parallelism (EP), as Mixture of Expert (MOE)\nmodels are the standard for frontier models today and it has implications\nfor hardware design. In Expert Parallelism, we dedicate an accelerator to hosting\none or more experts (a specialized FFN). When processing a batch of tokens, we\nmust send each of the tokens to a different subset of experts, depending on the\nresult of our routing mechanism. The result from our subset of activated experts\nthen must be aggregated afterwards. This warrants an <strong>AllToAll</strong> communication\n&quot;primitive&quot;:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mtext mathvariant=\"bold\">AllToAll</mtext><mrow><mi>X</mi><mo separator=\"true\">,</mo><mi>J</mi></mrow></msub><mi>A</mi><mo stretchy=\"false\">[</mo><msub><mi>I</mi><mi>X</mi></msub><mo separator=\"true\">,</mo><mi>J</mi><mo stretchy=\"false\">]</mo><mo>→</mo><mi>A</mi><mo stretchy=\"false\">[</mo><mi>I</mi><mo separator=\"true\">,</mo><msub><mi>J</mi><mi>X</mi></msub><mo stretchy=\"false\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\textbf{AllToAll}_{X, J} A[I_X, J] \\rightarrow A[I, J_X]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0361em;vertical-align:-0.2861em\"></span><span class=\"mord\"><span class=\"mord text\"><span class=\"mord textbf\">AllToAll</span></span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em\"><span style=\"top:-2.55em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.09618em\">J</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">A</span><span class=\"mopen\">[</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">I</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em\"><span style=\"top:-2.55em;margin-left:-0.0785em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07847em\">X</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.09618em\">J</span><span class=\"mclose\">]</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord mathnormal\">A</span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">I</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.09618em\">J</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em\"><span style=\"top:-2.55em;margin-left:-0.0962em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07847em\">X</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mclose\">]</span></span></span></span></span>\n<p><strong>AllToAll</strong> is also known as a <em>resharding</em> operation, as we have a batch\nof tokens sharded in the batch dimension (our batch entries are entirely\nindependent which is good for sharding), and we <em>reshard</em> them based on the result of our routing mechanism. You can imagine this as matrices flying all over to different accelerators, which necessitates a low latency, high throughput communication fabric.\nGiven that we are performing the same communication primitives repeatedly, we should explore specialized hardware subsystems to optimize their performance.</p>\n<p>The <a href=\"https://arxiv.org/pdf/2412.19437v1\" rel=\"nofollow\" target=\"_blank\">Deepseek V3 paper</a> highlights the need\nfor hardware optimizations for communications. In lieu of vendor provided hardware optimizations, Deepseek &quot;specialized&quot; 20/132 SMs specifically for communications. This includes performing reduce operations for the MOE <strong>AllToAll</strong> aggregation, managing memory layout of tensors being transferred between experts etc. These SMs are designed to do computation, not network management, and by specializing them for communications, not matrix multiplication, 15% of all Tensor Cores are wasted.</p>\n<p>We should expect any performant domain specific architecture to have dedicated hardware for managing communications. It would be ideal if during data transfers the data could be transformed (e.g. arriving at the destination reduced, transposed etc). Remote Direct Memory Access (RDMA) being able to perform some primitive operations is suitably biomimetic — axons in the brain aren&#x27;t just passive transmission channels, but perform &quot;diverse functional operations&quot; <sup><a href=\"#user-content-fn-10\" id=\"user-content-fnref-10\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">13</a></sup> beyond mere signal propagation.</p>\n<blockquote>\n<p><strong>Principle 6: Design for scale-out from day 1.</strong> <br/>\n<strong>Principle 7: Dedicated communication hardware should complement compute hardware.</strong></p>\n</blockquote>\n<h1>Implications of test time compute scaling</h1>\n<p>By now we have analysed today&#x27;s state of the art. Unfortunately, the paradigm does not stand still whilst we are\ndesigning hardware. Recent advancements in test time compute scaling have been called &quot;the next\nscaling law&quot;, as demonstrated by the <a href=\"https://openai.com/index/learning-to-reason-with-llms/\" rel=\"nofollow\" target=\"_blank\">OpenAI O series</a>. There are 2 dimensions\nin which we can scale inference compute: <em>serially</em> and <em>parallel</em>.</p>\n<p>Scaling inference compute <em>serially</em> manifests as extended multi step reasoning chains, analogous to Kahneman&#x27;s <a href=\"https://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow\" rel=\"nofollow\" target=\"_blank\">System 2</a> thinking.\nThis is opposed to saying the first thing off the top of your head (System 1). In\ncontrast, scaling inference computation in <em>parallel</em> is like having <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">n</span></span></span></span> clones\napproach the same problem, with each potentially landing at different places in the solution\nspace.</p>\n<p>Scaling in the serial dimension (massively) increases demand for inference, in\nparticular the decoding phase. As we determined earlier, attention during decode is inherently\nmemory bound (MLA notwithstanding) without high batch sizes. This may contribute to something like &quot;economies\nof scale&quot; for providers that can hit high numbers of concurrent users.</p>\n<p>Scaling in parallel is more nuanced. As we stated earlier, we can only reach high utilization of our\nhardware during decode when our batch size is high (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>B</mi><mo>≳</mo><mn>295</mn></mrow><annotation encoding=\"application/x-tex\">B \\gtrsim 295</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.9592em;vertical-align:-0.2296em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05017em\">B</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel amsrm\">≳</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">295</span></span></span></span>), which is not always easy to\nachieve even in a serving setting using Continuous Batching<sup><a href=\"#user-content-fn-18\" id=\"user-content-fnref-18\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">14</a></sup>. To reiterate: given that we <strong>must pay the fixed cost of moving the weights from\nmemory</strong>, low batch sizes have low Arithmetic Intensity and therefore do not\nutilize our hardware effectively. Now that scaling in parallel has been shown to\nincrease response quality, we can run our serving infrastructure and achieve\nhigher utilization by simply padding our batches with more parallel chains of thought. The\nusers get an improved experience that can be delivered for free.</p>\n<p>I expect research into the optimal trade off between these 2 scaling dimensions to be a\nhot topic this year, resulting in &quot;token optimal inference time scaling&quot;. This\nclosely mirrors the <a href=\"https://en.wikipedia.org/wiki/Exploration%E2%80%93exploitation_dilemma\" rel=\"nofollow\" target=\"_blank\">exploration-exploitation dilemma</a>.</p>\n<blockquote>\n<h2>Summary</h2>\n<p>Using multiple different approaches, we&#x27;ve derived a (non-exhaustive) list of design decisions\nthat should hold for any AI inference accelerator, namely:</p>\n<ol>\n<li>Hardware support for low precision data types</li>\n<li>Design for asynchronous transfers from day 1</li>\n<li>Dedicated hardware for tensor aware memory transfers</li>\n<li>Replace your cache hierarchy with an outsized scratchpad</li>\n<li>For a single accelerator, turn the memory bandwidth up to 11</li>\n<li>Design for scale-out from day 1</li>\n<li>Dedicated communication hardware should complement compute hardware</li>\n</ol>\n</blockquote>\n<h1>Domain Specific Architectures</h1>\n<p>With an understanding of the inference workload, we can now explore some domain\nspecific architectures and determine if they adhere to the design principles we derived.</p>\n<h2>Google TPU</h2>\n<video src=\"/npu/tpu.mp4\" caption=\"Source: How to scale your model\" captionLink=\"https://jax-ml.github.io/scaling-book/tpus/#what-is-a-tpu\"></video>\n<p>The Google TPU is one of the most successful DSAs in\nthe history of computing. It would be derivative to explore the architecture in\ndetail, particularly as Google has been open about the design across the generations<sup><a href=\"#user-content-fn-11\" id=\"user-content-fnref-11\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">15</a></sup>^, ^<sup><a href=\"#user-content-fn-12\" id=\"user-content-fnref-12\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">16</a></sup>. In brief, the TPU adheres to all of the\nprinciples we have previously outlined, which is why it is currently the leading\naccelerator on a <a href=\"https://semianalysis.com/2023/09/01/tpuv5e-the-new-benchmark-in-cost/\" rel=\"nofollow\" target=\"_blank\">performance / dollar basis</a>, our one true optimization objective.</p>\n<p>Instead, we will focus on one of the most interesting components of the TPU - the <a href=\"https://en.wikipedia.org/wiki/Systolic_array\" rel=\"nofollow\" target=\"_blank\">Systolic array</a>. A systolic array is a special purpose compute unit designed to accelerate certain regular computations, popularized by H.T Kung<sup><a href=\"#user-content-fn-13\" id=\"user-content-fnref-13\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">17</a></sup>, who <a href=\"https://en.wikipedia.org/wiki/Colossus_computer#cite_note-64\" rel=\"nofollow\" target=\"_blank\">rediscovered</a> systolic arrays in 1982 (<a href=\"https://gwern.net/timing#go-to-the-ant-thou-sluggard\" rel=\"nofollow\" target=\"_blank\">It’s important to be the last person to discover something</a>). Its ability to accelerate matrix multiplications makes it a popular choice for AI inference accelerators.</p>\n<p>The TPUv2 onwards features one or more 128x128 weight stationary systolic arrays. They take their fitting name from the similarity to the human circulatory system. Data (blood) passes through many processing elements (cells) before returning to memory (heart).</p>\n<div></div>\n<p>The above animation shows a 4x4 weight stationary systolic array, a scaled down\nversion of the unit in the TPU, performing a matrix multiplication (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Y</mi><mo>=</mo><mi>W</mi><mi>X</mi></mrow><annotation encoding=\"application/x-tex\">Y=WX</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">W</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span></span></span></span>). Weights are preloaded into the processing\nelements, and the output is accumulated from left to right.</p>\n<p>The principle benefit of these units is <strong>minimizing data movement to the theoretical limit</strong>. A naïve square matrix multiplication performs <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mi>n</mi><mn>3</mn></msup></mrow><annotation encoding=\"application/x-tex\">n^3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span></span></span></span></span></span></span></span> memory reads, and has a space complexity of <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>3</mn><msup><mi>n</mi><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">3n^2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em\"></span><span class=\"mord\">3</span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span>. This can be improved with tiling, using tiles of size <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msqrt><mi>M</mi></msqrt></mrow><annotation encoding=\"application/x-tex\">\\sqrt{M}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.04em;vertical-align:-0.1133em\"></span><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9267em\"><span class=\"svg-align\" style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\" style=\"padding-left:0.833em\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">M</span></span></span><span style=\"top:-2.8867em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1133em\"><span></span></span></span></span></span></span></span></span> to <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">Θ</mi><mrow><mo fence=\"true\">(</mo><mfrac><msup><mi>n</mi><mn>3</mn></msup><mrow><mi>b</mi><msqrt><mi>M</mi></msqrt></mrow></mfrac><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\Theta\\left(\\frac{n^3}{b\\sqrt{M}}\\right)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.8em;vertical-align:-0.65em\"></span><span class=\"mord\">Θ</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size2\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0179em\"><span style=\"top:-2.5374em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">b</span><span class=\"mord sqrt mtight\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9323em\"><span class=\"svg-align\" style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mtight\" style=\"padding-left:0.833em\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em\">M</span></span></span><span style=\"top:-2.8923em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"hide-tail mtight\" style=\"min-width:0.853em;height:1.08em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1077em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.394em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8913em\"><span style=\"top:-2.931em;margin-right:0.0714em\"><span class=\"pstrut\" style=\"height:2.5em\"></span><span class=\"sizing reset-size3 size1 mtight\"><span class=\"mord mtight\">3</span></span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.538em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size2\">)</span></span></span></span></span></span>, where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>b</mi></mrow><annotation encoding=\"application/x-tex\">b</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathnormal\">b</span></span></span></span> is the size of the cache line and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>M</mi></mrow><annotation encoding=\"application/x-tex\">M</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">M</span></span></span></span> is the total size of the cache in bytes. A systolic array further improves this to the optimal number of memory reads of <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><msup><mi>n</mi><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">2n^2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em\"></span><span class=\"mord\">2</span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span>. In theory, each element of the input and weights is <strong>read only once</strong>, and is used many times as it <em>flows</em> through the array. In practice this is not the case, given that you have a fixed dimension for your systolic array and a dynamic workload consisting of many problem sizes. Even if the array is only partially utilised, the entire unit must do work, which can harm power efficiency. Furthermore, keeping the systolic array fed requires complex hardware and software infrastructure.</p>\n<p>Do we see the theoretical efficiency gains from a systolic array play out in practice? The TPUv4 <a href=\"https://arxiv.org/pdf/2304.01433\" rel=\"nofollow\" target=\"_blank\">uses 1.3x-1.9x less power</a> than its generational counterpart the A100. The systolic array is a staple for any hardware startup looking to build an AI accelerator, including OpenAI, who are <a href=\"https://www.reuters.com/technology/openai-set-finalize-first-custom-chip-design-this-year-2025-02-10/\" rel=\"nofollow\" target=\"_blank\">reported</a> to be taping out their new systolic array based accelerator this year.</p>\n<h2>Tenstorrent</h2>\n<p><a href=\"https://tenstorrent.com/\" rel=\"nofollow\" target=\"_blank\">Tenstorrent</a> is a startup founded in 2016 with the ambitious goal of creating an AI\naccelerator and supporting software stack <em>from scratch</em> capable of competing\nwith Nvidia. Their upcoming accelerator, Blackhole, offers an excellent case study in how the principles we&#x27;ve derived might be implemented in practice.</p>\n<h3>Architecture</h3>\n<p><img src=\"/npu/Tensix.png\" alt=\"Tensix Core\"/></p>\n<p>Blackhole&#x27;s layout should feel familiar to those who have seen that of a GPU, with a grid of independent cores. However, the resemblance is only surface level. We can see that the chip features families of specialized cores, namely <strong>D</strong> for\ndata movement, <strong>T</strong> for Tensix (compute), <strong>E</strong> for Ethernet, <strong>CPU</strong> cores for\nrunning Linux, and some additional cores for board management. This specialization embodies our principle of optimized hardware subsystems for their intended functions.</p>\n<p>The Tensix core, analogous to Nvidia&#x27;s Streaming Multiprocessor, serves as the fundamental compute unit. Each Tensix core contains:</p>\n<ul>\n<li>Two data movement cores, designed for initiating and managing asynchronous transactions on\nthe Network on Chip (NoC) - directly implementing our &quot;asynchronous first&quot; principle.</li>\n<li>1.5MB of fast local SRAM - adhering to our scratchpad principle.</li>\n<li>A 32×32 (reminiscent of <a href=\"https://openai.com/index/triton/\" rel=\"nofollow\" target=\"_blank\">OpenAI&#x27;s Triton</a>) matrix tile engine, analogous to Nvidia&#x27;s Tensor Cores.</li>\n<li>Three compute cores, two specialized for packing and unpacking data types - aligning with our &quot;hardware support for low precision datatypes&quot; principle.</li>\n</ul>\n<p>The NoC connects the entire grid of cores together, allowing cores to copy data\nbetween each other and across the ethernet network to any other\naccelerator. This is the underlying thesis behind Tenstorrent: cost-effective\naccelerators with massive high bandwidth scale out.</p>\n<p>Supporting this architecture is a novel software stack that represents a departure from traditional programming models. Unlike CUDA, where Shared Memory/L1 is undefined between kernel invocations, Tenstorrent&#x27;s architecture allows intermediate results to remain in L1 and undergo successive transformations. This approach is almost systolic in nature: designed for repeated transformations before returning to memory — strongly adhering to our core principle of minimizing data movement.</p>\n<h3>The GDDR6 Gambit</h3>\n<p>The most controversial design choice for Blackhole is the refusal to use\nHigh Bandwidth Memory. Tenstorrent has opted to use GDDR6 for Blackhole with a bandwidth of only 512GB/s — contrast this with the H100 with 3.35TB/s.\nThe CEO, <a href=\"https://en.wikipedia.org/wiki/Jim_Keller_(engineer)\" rel=\"nofollow\" target=\"_blank\">Jim Keller</a>, <a href=\"https://www.youtube.com/watch?v=lPX1H3jW8ZQ\" rel=\"nofollow\" target=\"_blank\">has stated</a> that HBM is too expensive for the accelerators to hit commodity pricing. How accurate really is this?</p>\n<p>According to <a href=\"https://www.barrons.com/authors/tae-kim\" rel=\"nofollow\" target=\"_blank\">Tae Kim of Barrons</a>, the total bill of materials (BOM) for a H100\nwas <a href=\"https://x.com/firstadopter/status/1691877797487165443\" rel=\"nofollow\" target=\"_blank\">$3320</a> in 2023. Combining this with the figure of <a href=\"https://semianalysis.com/2023/05/29/ai-server-cost-analysis-memory-is/\" rel=\"nofollow\" target=\"_blank\">$15/GB</a> from SemiAnalysis for the HBM3 memory, the memory\ncontributes ~36% of the overall BOM of the H100. GDDR6 is <a href=\"https://www.dramexchange.com/\" rel=\"nofollow\" target=\"_blank\">far cheaper</a> than\nHBM.</p>\n<p>By leveraging an outsized L1, asynchronous transfers, the NoC, and a more\nflexible programming model, Tenstorrent is hoping that they don&#x27;t need HBM. The\nwinds of AI architecture do seem to be blowing in their favour. The introduction of MLA have shown that attention may not be forever memory bound.\nFurthermore, if they can crack low latency, high bandwidth interconnect, then\nthe burden of loading model parameters can be distributed across accelerators as\nfollows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>T</mi><mtext>mem</mtext></msub><mo>=</mo><mfrac><mrow><mn>2</mn><mo>⋅</mo><mi>P</mi></mrow><mrow><mi>N</mi><mo>⋅</mo><msub><mi>A</mi><mtext>bw</mtext></msub></mrow></mfrac></mrow><annotation encoding=\"application/x-tex\">T_{\\text{mem}} = \\frac{2 \\cdot P}{N \\cdot A_{\\text{bw}}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">T</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.1389em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">mem</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1963em;vertical-align:-0.836em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3603em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">N</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">bw</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">P</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>Where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi></mrow><annotation encoding=\"application/x-tex\">P</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">P</span></span></span></span> is our model parameters, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">N</span></span></span></span> is our number of accelerators and\n<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>A</mi><mtext>bw</mtext></msub></mrow><annotation encoding=\"application/x-tex\">A_{\\text{bw}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">A</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord text mtight\"><span class=\"mord mtight\">bw</span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> is our memory bandwidth.</p>\n<h3>Challenges</h3>\n<p>Tenstorrent still has huge challenges ahead of them, the largest of which being\nsoftware. The &quot;CUDA moat&quot; is usually the first topic of discussion when disparaging any hardware upstart. The moat as\nit&#x27;s commonly understood is the maturity and (relative) robustness of the CUDA software stack. However, the <em>real</em> moat is the 1000&#x27;s of engineers worldwide who have internalized the SIMT paradigm. Developing an entire software stack from zero, <strong>transparently</strong> scaling it out to 1000s of accelerators <strong>and</strong> forcing engineers to trade <a href=\"https://en.wikipedia.org/wiki/Flynn%27s_taxonomy\" rel=\"nofollow\" target=\"_blank\">SIMT for MIMD</a> may be a bridge too far.</p>\n<p>Furthermore, Nvidia is now <a href=\"https://s201.q4cdn.com/141608511/files/doc_financials/2025/q3/ed2a395c-5e9b-4411-8b4a-a718d192155a.pdf\" rel=\"nofollow\" target=\"_blank\">royally flush with cash</a>, sitting with ~$38B of cash equivalents as of Q3 2024, and this industry is <strong>extremely capital intensive</strong>. This is primarily a problem for 2 reasons:</p>\n<ol>\n<li>Nvidia is capable of reserving the cream of the crop when it comes to process nodes.</li>\n<li>If the paradigm trends more memory bound, not less, Nvidia is first in line for HBM4e onwards.</li>\n</ol>\n<p>However, if Tenstorrent can pull off a solid software stack in the next ~24 months and AI\narchitecture trends continue to move in their favour, they&#x27;ll be a strong\ncontender.</p>\n<h2>Skate to where the puck is going</h2>\n<p>The iteration time for designing and fabricating a new chip is typically 2-3 years. The AI paradigm has\nbeen moving <strong>much faster</strong> than this, with new architectures and datatypes being\ncreated on an increasing cadence. This is a <strong>huge problem</strong> for hardware architects, as you may end up with hardware optimized for\nthe models of yesteryear. If you&#x27;re a hyperscaler <a href=\"https://blogs.microsoft.com/on-the-issues/2025/01/03/the-golden-opportunity-for-american-ai/\" rel=\"nofollow\" target=\"_blank\">spending <strong>$80B</strong></a> on hardware and the paradigm shifts dramatically before the 3 years\nrequired to recoup your costs<sup><a href=\"#user-content-fn-1\" id=\"user-content-fnref-1-2\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">3</a></sup>, well\nyou&#x27;re out of luck! Whilst no one can concretely predict future architectural advances, is there any way we can gain some semblance of directionality of the paradigm in order to derisk our investments?</p>\n<p>If we believe that the search space of possible methods for constructing AGI is <a href=\"https://youtu.be/Qgd3OK5DZWI?si=PRlwlb8AWfkQ0MvW&amp;t=288\" rel=\"nofollow\" target=\"_blank\">&quot;large and\nsparse&quot;</a>, then it would be prudent to take inspiration from our only existence proof: the\nhuman brain. In his remarkably prescient <a href=\"https://www.youtube.com/watch?v=Qgd3OK5DZWI\" rel=\"nofollow\" target=\"_blank\">talk</a> from 2010, Demis Hassabis\noutlines an iterative approach to developing the components of AGI via <a href=\"https://en.wikipedia.org/wiki/Systems_neuroscience\" rel=\"nofollow\" target=\"_blank\">Systems Neuroscience</a>.</p>\n<p>Systems neuroscience does not mindlessly mimic the brain, but attempts to understand/extract the underlying algorithms and <em>creatively</em> and <em>pragmatically</em> reimplement them in software and hardware. He\nfurther explores this argument in a 2017 paper<sup><a href=\"#user-content-fn-14\" id=\"user-content-fnref-14\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">18</a></sup>, highlighting parallels between previous AI innovations and their counterparts in the brain.\nThese aren&#x27;t coincidental similarities, but rather convergent evolution toward optimal solutions in the landscape of intelligence.</p>\n<p>Hassabis&#x27; thesis has continued to hold true. Frontier models are now all Mixture of Experts, which exhibit the same regional specialization we see in the brain. Looking forward through this lens, what other biological analogues could we see be implemented in-silico? I would hazard a guess at episodic memory systems and something similar to the <a href=\"https://youtu.be/Qgd3OK5DZWI?si=PDlMbShiOG4hH1bX&amp;t=1809\" rel=\"nofollow\" target=\"_blank\">hippocampal-neocortical consolidation system</a>.</p>\n<h2><strong>Hard</strong>ware startups</h2>\n<p>The growth in demand for AI inference will not abate for the foreseeable future. Hardware architects attempting to hedge their bets when designing inference hardware in this market will almost certainly fail. Given that from specialisation comes efficiency, and flexibility and specialisation are diametrically opposed, one must be bold and bet on the way the winds of AI architecture will blow. Luckily, systems neuroscience can give us a lens through which to attempt to peer into the future.</p>\n<p>This principle of bold specialization extends beyond just startups. Every programmer is effectively an investor — their time and expertise are their capital. Being an early adopter of a technology that goes on to succeed can pay huge dividends, but this requires the same willingness to make bold, calculated bets on specific technological directions rather than hedging as a <em>generalist</em>.</p>\n<p>The principles we&#x27;ve derived: minimizing data movement, optimizing memory hierarchies, and designing for scale-out — provide a roadmap for those willing to make such bets in the inference hardware space. It remains to be seen if there exists a David to take down Goliath.</p>\n<p>Thanks to Erik Kaunismäki, Madeline Ephgrave, Luca Peric and Amine Dirhoussi for their insightful feedback on this post. Thanks to Felix LeClair and Martin Chang for their corrections.</p>\n<h2>Resources</h2>\n<ul>\n<li><a href=\"https://www.youtube.com/watch?v=MC223HlPdK0\" rel=\"nofollow\" target=\"_blank\">Stanford Seminar on H100</a></li>\n<li><a href=\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&amp;arnumber=9373921\" rel=\"nofollow\" target=\"_blank\">Compute Substrate for Software 2.0 (2021)</a></li>\n<li><a href=\"https://arxiv.org/pdf/2109.14320\" rel=\"nofollow\" target=\"_blank\">Mitigating bottlenecks on Coral TPU</a></li>\n<li><a href=\"https://arxiv.org/pdf/2007.00072\" rel=\"nofollow\" target=\"_blank\">Data movement is all you need</a></li>\n<li><a href=\"https://www.youtube.com/watch?v=nHHsYp7ZkHQ\" rel=\"nofollow\" target=\"_blank\">VLIW lecture @ ETH Zurich</a></li>\n<li><a href=\"https://www.youtube.com/watch?v=XkgtANeDrm8\" rel=\"nofollow\" target=\"_blank\">Systolic lecture ETH Zurich</a></li>\n<li><a href=\"https://docs.nvidia.com/deeplearning/performance/dl-performance-gpu-background/index.html#gpu-perf\" rel=\"nofollow\" target=\"_blank\">NVIDIA Deep Learning Performance guide</a></li>\n<li><a href=\"https://www.telesens.co/2018/07/26/understanding-roofline-charts/\" rel=\"nofollow\" target=\"_blank\">Roofline model</a></li>\n<li><a href=\"https://www.telesens.co/2018/07/30/systolic-architectures/\" rel=\"nofollow\" target=\"_blank\">Systolic arrays</a></li>\n<li><a href=\"https://x.com/Tim_Dettmers/status/1856338240099221674\" rel=\"nofollow\" target=\"_blank\">Tim Dettmers on overtraining and data types</a></li>\n<li><a href=\"https://www.corsix.org/\" rel=\"nofollow\" target=\"_blank\">Corsix blog on Tenstorrent</a></li>\n<li><a href=\"https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/\" rel=\"nofollow\" target=\"_blank\">Hopper architecture in depth</a></li>\n<li><a href=\"https://docs.nvidia.com/deeplearning/performance/pdf/GPU-Performance-Background-User-Guide.pdf\" rel=\"nofollow\" target=\"_blank\">Nvidia GPU performance guide</a></li>\n<li><a href=\"https://jax.readthedocs.io/en/latest/pallas/tpu/pipelining.html\" rel=\"nofollow\" target=\"_blank\">TPU pipelining</a></li>\n<li><a href=\"https://iitd-plos.github.io/col729/lec/matrix_multiplication.html\" rel=\"nofollow\" target=\"_blank\">Cache <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"script\">O</mi></mrow><annotation encoding=\"application/x-tex\">\\mathcal{O}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathcal\" style=\"margin-right:0.02778em\">O</span></span></span></span> complexity of matrix multiplication</a></li>\n<li><a href=\"https://arxiv.org/pdf/2411.04330\" rel=\"nofollow\" target=\"_blank\">Serve in what you train</a></li>\n<li><a href=\"https://www.nextplatform.com/2024/02/27/he-who-can-pay-top-dollar-for-hbm-memory-controls-ai-training/\" rel=\"nofollow\" target=\"_blank\">HBM cost analysis</a></li>\n<li><a href=\"https://old.hotchips.org/hc31/HC31_T3_Cloud_TPU_Codesign.pdf\" rel=\"nofollow\" target=\"_blank\">TPU Hotchips (systolic array)</a></li>\n<li><a href=\"https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture\" rel=\"nofollow\" target=\"_blank\">MLA from Epoch</a></li>\n<li><a href=\"https://le.qun.ch/en/blog/2023/05/13/transformer-batching/\" rel=\"nofollow\" target=\"_blank\">Batching effects on GPT</a></li>\n<li><a href=\"https://arxiv.org/pdf/2205.14135\" rel=\"nofollow\" target=\"_blank\">Flash Attention 1</a></li>\n<li><a href=\"https://arxiv.org/ftp/arxiv/papers/2304/2304.01433.pdf\" rel=\"nofollow\" target=\"_blank\">TPUv4i SRAM</a></li>\n</ul>\n<section data-footnotes=\"true\" class=\"footnotes\"><h2 class=\"sr-only\" id=\"footnote-label\">Footnotes</h2>\n<ol>\n<li id=\"user-content-fn-2\">\n<p>D Patterson. (2004), &quot;Latency lags bandwith&quot;, <a href=\"https://dl.acm.org/doi/10.1145/1022594.1022596\" rel=\"nofollow\" target=\"_blank\">https://dl.acm.org/doi/10.1145/1022594.1022596</a> <a href=\"#user-content-fnref-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 1\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-3\">\n<p>H Abdelkhalik et al. (2022), &quot;Demystifying the Nvidia Ampere Architecture through Microbenchmarking and Instruction-level Analysis&quot;, <a href=\"https://arxiv.org/pdf/2208.11174\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/2208.11174</a> <a href=\"#user-content-fnref-3\" data-footnote-backref=\"\" aria-label=\"Back to reference 2\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-1\">\n<p>Jouppi et al. (2021), &quot;Ten Lessons From Three Generations Shaped Google’s TPUv4i&quot;, <a href=\"https://gwern.net/doc/ai/scaling/hardware/2021-jouppi.pdf\" rel=\"nofollow\" target=\"_blank\">https://gwern.net/doc/ai/scaling/hardware/2021-jouppi.pdf</a> <a href=\"#user-content-fnref-1\" data-footnote-backref=\"\" aria-label=\"Back to reference 3\" class=\"data-footnote-backref\">↩</a> <a href=\"#user-content-fnref-1-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 3-2\" class=\"data-footnote-backref\">↩<sup>2</sup></a></p>\n</li>\n<li id=\"user-content-fn-4\">\n<p>J Dean. (2019), &quot;The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design&quot;, <a href=\"https://arxiv.org/pdf/1911.05289\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/1911.05289</a> <a href=\"#user-content-fnref-4\" data-footnote-backref=\"\" aria-label=\"Back to reference 4\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-17\">\n<p>Legg, S et al. &quot;Universal intelligence: A definition of machine intelligence.&quot;, <a href=\"https://arxiv.org/pdf/0712.3329\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/0712.3329</a> <a href=\"#user-content-fnref-17\" data-footnote-backref=\"\" aria-label=\"Back to reference 5\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-5\">\n<p>Kumar et al. (2024), &quot;Scaling Laws for Precision&quot;, <a href=\"https://arxiv.org/pdf/2411.04330\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/2411.04330</a> <a href=\"#user-content-fnref-5\" data-footnote-backref=\"\" aria-label=\"Back to reference 6\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-6\">\n<p>Austin et al. (2025), &quot;How to Scale Your Model&quot;, <a href=\"https://jax-ml.github.io/scaling-book\" rel=\"nofollow\" target=\"_blank\">https://jax-ml.github.io/scaling-book</a> <a href=\"#user-content-fnref-6\" data-footnote-backref=\"\" aria-label=\"Back to reference 7\" class=\"data-footnote-backref\">↩</a> <a href=\"#user-content-fnref-6-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 7-2\" class=\"data-footnote-backref\">↩<sup>2</sup></a> <a href=\"#user-content-fnref-6-3\" data-footnote-backref=\"\" aria-label=\"Back to reference 7-3\" class=\"data-footnote-backref\">↩<sup>3</sup></a> <a href=\"#user-content-fnref-6-4\" data-footnote-backref=\"\" aria-label=\"Back to reference 7-4\" class=\"data-footnote-backref\">↩<sup>4</sup></a></p>\n</li>\n<li id=\"user-content-fn-7\">\n<p>Chen, Carol. (2022), &quot;Transformer Inference Arithmetic&quot;, <a href=\"https://kipp.ly/blog/transformer-inference-arithmetic/\" rel=\"nofollow\" target=\"_blank\">https://kipp.ly/blog/transformer-inference-arithmetic/</a> <a href=\"#user-content-fnref-7\" data-footnote-backref=\"\" aria-label=\"Back to reference 8\" class=\"data-footnote-backref\">↩</a> <a href=\"#user-content-fnref-7-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 8-2\" class=\"data-footnote-backref\">↩<sup>2</sup></a></p>\n</li>\n<li id=\"user-content-fn-8\">\n<p>T Dao et al. (2022), &quot;Flashattention: Fast and memory-efficient exact attention with io-awareness&quot;, <a href=\"https://arxiv.org/pdf/2205.14135\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/2205.14135</a> <a href=\"#user-content-fnref-8\" data-footnote-backref=\"\" aria-label=\"Back to reference 9\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-15\">\n<p>Liu, Aixin, et al. &quot;Deepseek-v3 technical report.&quot;, <a href=\"https://arxiv.org/abs/2412.19437\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/abs/2412.19437</a> <a href=\"#user-content-fnref-15\" data-footnote-backref=\"\" aria-label=\"Back to reference 10\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-16\">\n<p>Liu, Aixin, et al. &quot;Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model.&quot;, <a href=\"https://arxiv.org/pdf/2405.04434\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/2405.04434</a> <a href=\"#user-content-fnref-16\" data-footnote-backref=\"\" aria-label=\"Back to reference 11\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-9\">\n<p>W Kwon et al. (2023), &quot;Efficient Memory Management for Large Language Model Serving with PagedAttention&quot;, <a href=\"https://arxiv.org/pdf/2309.06180\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/2309.06180</a> <a href=\"#user-content-fnref-9\" data-footnote-backref=\"\" aria-label=\"Back to reference 12\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-10\">\n<p>Sasaki, T. (2013), &quot;The axon as a unique computational unit in neurons&quot;, <a href=\"https://pubmed.ncbi.nlm.nih.gov/23298528/\" rel=\"nofollow\" target=\"_blank\">https://pubmed.ncbi.nlm.nih.gov/23298528/</a> <a href=\"#user-content-fnref-10\" data-footnote-backref=\"\" aria-label=\"Back to reference 13\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-18\">\n<p>Yu, Gyeong-In, et al. &quot;Orca: A distributed serving system for Transformer-Based generative models.&quot;, <a href=\"https://www.usenix.org/system/files/osdi22-yu.pdf\" rel=\"nofollow\" target=\"_blank\">https://www.usenix.org/system/files/osdi22-yu.pdf</a> <a href=\"#user-content-fnref-18\" data-footnote-backref=\"\" aria-label=\"Back to reference 14\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-11\">\n<p>Jouppi et al. (2017), &quot;In-Datacenter Performance Analysis of a Tensor Processing Unit&quot;, <a href=\"https://arxiv.org/abs/1704.04760\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/abs/1704.04760</a> <a href=\"#user-content-fnref-11\" data-footnote-backref=\"\" aria-label=\"Back to reference 15\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-12\">\n<p>Jouppi et al. (2021), &quot;Ten Lessons From Three Generations Shaped Google’s TPUv4i&quot;, <a href=\"https://gwern.net/doc/ai/scaling/hardware/2021-jouppi.pdf\" rel=\"nofollow\" target=\"_blank\">https://gwern.net/doc/ai/scaling/hardware/2021-jouppi.pdf</a> <a href=\"#user-content-fnref-12\" data-footnote-backref=\"\" aria-label=\"Back to reference 16\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-13\">\n<p>H.T Kung. (1982), &quot;Why Systolic Architectures&quot;, <a href=\"https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf\" rel=\"nofollow\" target=\"_blank\">https://www.eecs.harvard.edu/~htk/publication/1982-kung-why-systolic-architecture.pdf</a> <a href=\"#user-content-fnref-13\" data-footnote-backref=\"\" aria-label=\"Back to reference 17\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-14\">\n<p>Hassabis et al. &quot;Neuroscience-inspired artificial intelligence.&quot;, <a href=\"https://pubmed.ncbi.nlm.nih.gov/28728020/\" rel=\"nofollow\" target=\"_blank\">https://pubmed.ncbi.nlm.nih.gov/28728020/</a> <a href=\"#user-content-fnref-14\" data-footnote-backref=\"\" aria-label=\"Back to reference 18\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n</ol>\n</section>",
            "url": "https://fleetwood.dev/posts/domain-specific-architectures",
            "title": "Domain specific architectures for AI inference",
            "summary": "With exploding demand for AI inference, many hardware startups are designing\nDomain Specific Architectures. Working backwards from the Transformer workload, we will identify optimal design\nchoices, promising hardware and how to predict the future of inference.\n",
            "date_modified": "2025-03-08T00:00:00.000Z",
            "author": {
                "name": "Christopher Fleetwood",
                "url": "https://fleetwood.dev"
            }
        },
        {
            "id": "https://fleetwood.dev/posts/you-could-have-designed-SOTA-positional-encoding",
            "content_html": "<blockquote>\n<p><strong>Gall&#x27;s Law</strong> <br/>\n<!-- -->A complex system that works is invariably found to have evolved from a simple\nsystem that worked <br/>\n<!-- -->— John Gall</p>\n</blockquote>\n<p>This post walks you through the step-by-step discovery of state-of-the-art positional encoding in transformer models. We will achieve\nthis by iteratively improving our approach to encoding position, arriving at <strong>Ro</strong>tary <strong>P</strong>ostional <strong>E</strong>ncoding (RoPE) used in the latest <a href=\"https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/\" rel=\"nofollow\" target=\"_blank\">LLama 3.2</a> release and most modern transformers. This post intends to limit the mathematical knowledge required to follow along, but some basic linear algebra, trigonometry and understanding of self attention is expected.</p>\n<h2>Problem Statement</h2>\n<blockquote>\n<p>You shall know a word by the company it keeps <br/>\n<!-- -->— John Rupert Firth</p>\n</blockquote>\n<p>As with all problems, it is best to first start with understanding <strong>exactly</strong> what we are trying to achieve. The self attention mechanism in transformers is utilized to understand relationships\nbetween tokens in a sequence. Self attention is a <strong>set</strong> operation, which\nmeans it is <strong>permutation equivariant</strong>. If we do not\nenrich self attention with positional information, many important relationships are\n<strong>incapable of being determined</strong>.</p>\n<p>This is best demonstrated by example.</p>\n<h2>Motivating Example</h2>\n<p>Consider this sentence with the same word in different positions:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>The dog chased another dog</mtext></mrow><annotation encoding=\"application/x-tex\">\\text{The dog chased another dog}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8889em;vertical-align:-0.1944em\"></span><span class=\"mord text\"><span class=\"mord\">The dog chased another dog</span></span></span></span></span></span>\n<p>Intuitively, &quot;dog&quot; refers to two different entities. Let&#x27;s see what happens if we first tokenize them, map to the real token embeddings of <strong>Llama 3.2 1B</strong> and pass them through <a href=\"https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html\" rel=\"nofollow\" target=\"_blank\">torch.nn.MultiheadAttention</a>.</p>\n<pre class=\"language-python\"><code class=\"language-python code-highlight\"><span class=\"code-line line-number\" line=\"1\"><span class=\"token keyword\">import</span> torch\n</span><span class=\"code-line line-number\" line=\"2\"><span class=\"token keyword\">import</span> torch<span class=\"token punctuation\">.</span>nn <span class=\"token keyword\">as</span> nn\n</span><span class=\"code-line line-number\" line=\"3\"><span class=\"token keyword\">from</span> transformers <span class=\"token keyword\">import</span> AutoTokenizer<span class=\"token punctuation\">,</span> AutoModel\n</span><span class=\"code-line line-number\" line=\"4\">\n</span><span class=\"code-line line-number\" line=\"5\">model_id <span class=\"token operator\">=</span> <span class=\"token string\">&quot;meta-llama/Llama-3.2-1B&quot;</span>\n</span><span class=\"code-line line-number\" line=\"6\">tok <span class=\"token operator\">=</span> AutoTokenizer<span class=\"token punctuation\">.</span>from_pretrained<span class=\"token punctuation\">(</span>model_id<span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"7\">model <span class=\"token operator\">=</span> AutoModel<span class=\"token punctuation\">.</span>from_pretrained<span class=\"token punctuation\">(</span>model_id<span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"8\">\n</span><span class=\"code-line line-number\" line=\"9\">text <span class=\"token operator\">=</span> <span class=\"token string\">&quot;The dog chased another dog&quot;</span>\n</span><span class=\"code-line line-number\" line=\"10\">tokens <span class=\"token operator\">=</span> tok<span class=\"token punctuation\">(</span>text<span class=\"token punctuation\">,</span> return_tensors<span class=\"token operator\">=</span><span class=\"token string\">&quot;pt&quot;</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">[</span><span class=\"token string\">&quot;input_ids&quot;</span><span class=\"token punctuation\">]</span>\n</span><span class=\"code-line line-number\" line=\"11\">embeddings <span class=\"token operator\">=</span> model<span class=\"token punctuation\">.</span>embed_tokens<span class=\"token punctuation\">(</span>tokens<span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"12\">hdim <span class=\"token operator\">=</span> embeddings<span class=\"token punctuation\">.</span>shape<span class=\"token punctuation\">[</span><span class=\"token operator\">-</span><span class=\"token number\">1</span><span class=\"token punctuation\">]</span>\n</span><span class=\"code-line line-number\" line=\"13\">\n</span><span class=\"code-line line-number\" line=\"14\">W_q <span class=\"token operator\">=</span> nn<span class=\"token punctuation\">.</span>Linear<span class=\"token punctuation\">(</span>hdim<span class=\"token punctuation\">,</span> hdim<span class=\"token punctuation\">,</span> bias<span class=\"token operator\">=</span><span class=\"token boolean\">False</span><span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"15\">W_k <span class=\"token operator\">=</span> nn<span class=\"token punctuation\">.</span>Linear<span class=\"token punctuation\">(</span>hdim<span class=\"token punctuation\">,</span> hdim<span class=\"token punctuation\">,</span> bias<span class=\"token operator\">=</span><span class=\"token boolean\">False</span><span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"16\">W_v <span class=\"token operator\">=</span> nn<span class=\"token punctuation\">.</span>Linear<span class=\"token punctuation\">(</span>hdim<span class=\"token punctuation\">,</span> hdim<span class=\"token punctuation\">,</span> bias<span class=\"token operator\">=</span><span class=\"token boolean\">False</span><span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"17\">mha <span class=\"token operator\">=</span> nn<span class=\"token punctuation\">.</span>MultiheadAttention<span class=\"token punctuation\">(</span>embed_dim<span class=\"token operator\">=</span>hdim<span class=\"token punctuation\">,</span> num_heads<span class=\"token operator\">=</span><span class=\"token number\">4</span><span class=\"token punctuation\">,</span> batch_first<span class=\"token operator\">=</span><span class=\"token boolean\">True</span><span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"18\">\n</span><span class=\"code-line line-number\" line=\"19\"><span class=\"token keyword\">with</span> torch<span class=\"token punctuation\">.</span>no_grad<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n</span><span class=\"code-line line-number\" line=\"20\">    <span class=\"token keyword\">for</span> param <span class=\"token keyword\">in</span> mha<span class=\"token punctuation\">.</span>parameters<span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">:</span>\n</span><span class=\"code-line line-number\" line=\"21\">        nn<span class=\"token punctuation\">.</span>init<span class=\"token punctuation\">.</span>normal_<span class=\"token punctuation\">(</span>param<span class=\"token punctuation\">,</span> std<span class=\"token operator\">=</span><span class=\"token number\">0.1</span><span class=\"token punctuation\">)</span> <span class=\"token comment\"># Initialize weights to be non-negligible</span>\n</span><span class=\"code-line line-number\" line=\"22\">\n</span><span class=\"code-line line-number\" line=\"23\">output<span class=\"token punctuation\">,</span> _ <span class=\"token operator\">=</span> mha<span class=\"token punctuation\">(</span>W_q<span class=\"token punctuation\">(</span>embeddings<span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> W_k<span class=\"token punctuation\">(</span>embeddings<span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> W_v<span class=\"token punctuation\">(</span>embeddings<span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span>\n</span><span class=\"code-line line-number\" line=\"24\">\n</span><span class=\"code-line line-number\" line=\"25\">dog1_out <span class=\"token operator\">=</span> output<span class=\"token punctuation\">[</span><span class=\"token number\">0</span><span class=\"token punctuation\">,</span> <span class=\"token number\">2</span><span class=\"token punctuation\">]</span>\n</span><span class=\"code-line line-number\" line=\"26\">dog2_out <span class=\"token operator\">=</span> output<span class=\"token punctuation\">[</span><span class=\"token number\">0</span><span class=\"token punctuation\">,</span> <span class=\"token number\">5</span><span class=\"token punctuation\">]</span>\n</span><span class=\"code-line line-number\" line=\"27\"><span class=\"token keyword\">print</span><span class=\"token punctuation\">(</span><span class=\"token string-interpolation\"><span class=\"token string\">f&quot;Dog output identical?: </span><span class=\"token interpolation\"><span class=\"token punctuation\">{</span>torch<span class=\"token punctuation\">.</span>allclose<span class=\"token punctuation\">(</span>dog1_out<span class=\"token punctuation\">,</span> dog2_out<span class=\"token punctuation\">,</span> atol<span class=\"token operator\">=</span><span class=\"token number\">1e-6</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">}</span></span><span class=\"token string\">&quot;</span></span><span class=\"token punctuation\">)</span> <span class=\"token comment\">#True</span>\n</span></code></pre>\n<p>As we can see, without any positional information, the output of a (multi\nheaded) self attention operation is <strong>identical for the same token in\ndifferent positions</strong>, despite the tokens clearly representing distinct entities. Let&#x27;s begin designing a method of enhancing self attention with positional information, such that it can determine relationships between words encoded by\ntheir positions.</p>\n<p>To understand and design an optimal encoding scheme, let&#x27;s explore some desirable properties such a scheme should have.</p>\n<h2>Desirable Properties</h2>\n<p>Let&#x27;s try and define some desirable properties that will make the optimization\nprocess as easy as possible.</p>\n<h4>Property 1 - Unique encoding for each position (across sequences)</h4>\n<p>Each position needs a unique encoding that remains consistent regardless of sequence length - a token at position 5 should have the same encoding whether the current sequence is of length 10 or 10,000.</p>\n<h4>Property 2 - Linear relation between two encoded positions</h4>\n<p>The relationship between positions should be mathematically simple. If we know the encoding for position <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span>, it should be straightforward to compute the encoding for position <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi><mo>+</mo><mi>k</mi></mrow><annotation encoding=\"application/x-tex\">p+k</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span></span></span></span>, making it easier for the model to learn positional patterns.</p>\n<p>If you think about how we represent numbers on a number line, it&#x27;s easy to understand that 5 is 2 steps away from 3, or that 10 is 5 steps from 15. The same intuitive relationship should exist in our encodings.</p>\n<h4>Property 3 - Generalizes to longer sequences than those encountered in training</h4>\n<p>To increase our models&#x27; utility in the real world, they should generalize outside\ntheir training distribution. Therefore, our encoding scheme needs to be\nadaptable enough to handle unexpected input lengths, without\nviolating any of our other desirable properties.</p>\n<h4>Property 4 - Generated by a deterministic process the model can learn</h4>\n<p>It would be ideal if our positional encodings could be drawn from a\ndeterministic process. This should allow the model to learn the mechanism\nbehind our encoding scheme efficiently.</p>\n<h4>Property 5 - Extensible to multiple dimensions</h4>\n<p>With multimodal models becoming the norm, it is crucial that our positional\nencoding scheme can naturally extend from <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">1D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">1</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> to <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">nD</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\">n</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span>. This will allow models to consume data like images or brain scans, which are <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">2D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>4</mn><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">4D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">4</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span>\nrespectively.</p>\n<p>Now we know the ideal properties (henceforth referred to as <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><msub><mi>r</mi><mi>n</mi></msub></mrow><annotation encoding=\"application/x-tex\">Pr_n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">P</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1514em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">n</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>), let&#x27;s start designing and iterating on our encoding scheme.</p>\n<h2>Integer Position Encoding</h2>\n<p>The first approach that may jump to mind is simply to add the integer value of the token position to each component of the token embedding, with values ranging from <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>0</mn><mo>→</mo><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">0 \\rightarrow L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">0</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">→</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\">L</span></span></span></span> where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>L</mi></mrow><annotation encoding=\"application/x-tex\">L</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\">L</span></span></span></span> is the\nlength of our current sequence.</p>\n<video src=\"/positional-encoding/IntegerEncoding.mp4\"></video>\n<p>In the above animation, we create our positional encoding vector for the token <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mstyle mathcolor=\"#699C52\"><mtext>chased</mtext></mstyle></mrow><annotation encoding=\"application/x-tex\">\\color{#699C52}\\text{chased}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord text\" style=\"color:#699C52\"><span class=\"mord\" style=\"color:#699C52\">chased</span></span></span></span></span> from the index and add it to our token embedding. The embedding values here are a subset of the real values from <strong>Llama 3.2 1B</strong>. We can observe that they&#x27;re clustered around 0. This\nis desirable to avoid <a href=\"https://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/readings/L15%20Exploding%20and%20Vanishing%20Gradients.pdf\" rel=\"nofollow\" target=\"_blank\">vanishing or exploding gradients</a> during training and therefore is something we&#x27;d like to maintain throughout the model.</p>\n<p>It&#x27;s clear that our current naïve approach is going to cause problems. The magnitude of the position value\nvastly exceeds the actual values of our input. This means the signal-to-noise\nratio is very low, and it&#x27;s hard for the model to separate the semantic\ninformation from the positional information.</p>\n<p>With this new knowledge, a natural follow on might be to normalize the position value by <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mfrac><mn>1</mn><mi>N</mi></mfrac></mrow><annotation encoding=\"application/x-tex\">\\frac{1}{N}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.1901em;vertical-align:-0.345em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8451em\"><span style=\"top:-2.655em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.10903em\">N</span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.394em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.345em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span>. This constrains the values between 0 and 1, but introduces another problem. If we choose <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>N</mi></mrow><annotation encoding=\"application/x-tex\">N</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.10903em\">N</span></span></span></span> to be the length of the current sequence, then the position values will be completely different for each sequence of differing lengths, violating <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><msub><mi>r</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">Pr_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">P</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>.</p>\n<p>Is there a better way to ensure our numbers are between 0 and 1?\nIf we thought really hard about this for a while, we might come up with switching\nfrom decimal to binary numbers.</p>\n<h2>Binary Position Encoding</h2>\n<p>Instead of adding our (potentially normalized) integer position to each\ncomponent of the embedding, we could instead convert it into its binary\nrepresentation and <em>s t r e t c h</em> our value out to match our embedding dimension, as demonstrated below.</p>\n<video src=\"/positional-encoding/BinaryEncoding.mp4\"></video>\n<p>We&#x27;ve converted the position of interest (252) into its binary representation\n(11111100) and added each bit to the corresponding component of the\ntoken embedding. The least significant bit (LSB) will cycle between 0 and 1 for every\nsubsequent token, whilst the most significant bit (MSB) will cycle every\n<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msup><mn>2</mn><mrow><mi>n</mi><mo>−</mo><mn>1</mn></mrow></msup></mrow><annotation encoding=\"application/x-tex\">2^{n-1}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em\"></span><span class=\"mord\"><span class=\"mord\">2</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">n</span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\">1</span></span></span></span></span></span></span></span></span></span></span></span> tokens where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">n</span></span></span></span> is the number of bits.\nYou can see the positional encoding vector for different indices in the animation below <sup><a href=\"#user-content-fn-1\" id=\"user-content-fnref-1\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">1</a></sup>.</p>\n<video src=\"/positional-encoding/BinaryPositionalEncodingPlot.mp4\"></video>\n<p>We&#x27;ve solved the value range problem, and we now have unique encodings that are\nconsistent across different sequence lengths. What happens if we plot a low dimensional version of our token embedding and visualize the addition of our binary positional vector for different values.</p>\n<video src=\"/positional-encoding/BinaryVector3D.mp4\"></video>\n<p>We can see that the result is very &quot;jumpy&quot; (as we might expect from the\ndiscrete nature of binary). The optimization process likes smooth, continuous and\npredictable changes. Do we know any functions with similar value ranges that are smooth and continuous?</p>\n<p>If we looked around a little, we might notice that both <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>sin</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\sin</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6679em\"></span><span class=\"mop\">sin</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>cos</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\cos</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mop\">cos</span></span></span></span> fit the bill!</p>\n<h2>Sinusoidal positional encoding</h2>\n<video src=\"/positional-encoding/SteppedPositionalEncodingPlot.mp4\"></video>\n<p>The above animation visualizes our position embedding if each component is\nalternatively drawn from <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>sin</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\sin</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6679em\"></span><span class=\"mop\">sin</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>cos</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\cos</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mop\">cos</span></span></span></span> with gradually increasing\nwavelengths. If you compare it with the previous animation, you&#x27;ll notice a striking similarity!</p>\n<p>We&#x27;ve now arrived at Sinusoidal embeddings; originally defined in the <a href=\"https://arxiv.org/abs/1706.03762\" rel=\"nofollow\" target=\"_blank\">Attention is all you need</a> paper.\nLet&#x27;s look at the equations:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>P</mi><msub><mi>E</mi><mrow><mo stretchy=\"false\">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mo separator=\"true\">,</mo><mn>2</mn><mi>i</mi><mo stretchy=\"false\">)</mo></mrow></msub><mo>=</mo><mstyle mathcolor=\"#58C4DD\"><mi>sin</mi><mo>⁡</mo><mrow><mo fence=\"true\">(</mo><mstyle mathcolor=\"black\"><mfrac><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow><msup><mn>10000</mn><mrow><mn>2</mn><mi>i</mi><mi mathvariant=\"normal\">/</mi><mi>d</mi></mrow></msup></mfrac><mstyle mathcolor=\"#58C4DD\"></mstyle></mstyle><mo fence=\"true\" mathcolor=\"#58C4DD\">)</mo></mrow><mstyle mathcolor=\"black\"><mspace linebreak=\"newline\"></mspace><mspace width=\"1em\"></mspace><mspace linebreak=\"newline\"></mspace><mi>P</mi><msub><mi>E</mi><mrow><mo stretchy=\"false\">(</mo><mi>p</mi><mi>o</mi><mi>s</mi><mo separator=\"true\">,</mo><mn>2</mn><mi>i</mi><mo>+</mo><mn>1</mn><mo stretchy=\"false\">)</mo></mrow></msub><mo>=</mo><mstyle mathcolor=\"#FC6255\"><mi>cos</mi><mo>⁡</mo><mrow><mo fence=\"true\">(</mo><mstyle mathcolor=\"black\"><mfrac><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow><msup><mn>10000</mn><mrow><mn>2</mn><mi>i</mi><mi mathvariant=\"normal\">/</mi><mi>d</mi></mrow></msup></mfrac><mstyle mathcolor=\"#FC6255\"></mstyle></mstyle><mo fence=\"true\" mathcolor=\"#FC6255\">)</mo></mrow><mstyle mathcolor=\"black\"><mspace linebreak=\"newline\"></mspace></mstyle></mstyle></mstyle></mstyle></mrow><annotation encoding=\"application/x-tex\">PE_{(pos,2i)} = \\color{#58C4DD}\\sin\\left(\\color{black}\\frac{pos}{10000^{2i/d}}\\color{#58C4DD}\\right)\\color{black} \\\\ \n\\quad \\\\\nPE_{(pos,2i+1)} = \\color{#FC6255}\\cos\\left(\\color{black}\\frac{pos}{10000^{2i/d}}\\color{#FC6255}\\right)\\color{black} \\\\ </annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0385em;vertical-align:-0.3552em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">P</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em\">E</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.5198em;margin-left:-0.0576em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mopen mtight\">(</span><span class=\"mord mathnormal mtight\">p</span><span class=\"mord mathnormal mtight\">os</span><span class=\"mpunct mtight\">,</span><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mclose mtight\">)</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3552em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.854em;vertical-align:-0.704em\"></span><span class=\"mop\" style=\"color:#58C4DD\"><span style=\"color:#58C4DD\">s</span><span style=\"color:#58C4DD\">i</span><span style=\"color:#58C4DD\">n</span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\" style=\"color:#58C4DD\"><span class=\"mopen delimcenter\" style=\"color:#58C4DD;top:0em\"><span class=\"delimsizing size2\" style=\"color:#58C4DD\"><span style=\"color:#58C4DD\">(</span></span></span><span class=\"mord\" style=\"color:black\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em\"><span style=\"top:-2.296em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\" style=\"color:black\"><span class=\"mord\" style=\"color:black\">1000</span><span class=\"mord\" style=\"color:black\"><span class=\"mord\" style=\"color:black\">0</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.814em\"><span style=\"top:-2.989em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\" style=\"color:black\"><span class=\"mord mtight\" style=\"color:black\"><span class=\"mord mtight\" style=\"color:black\">2</span><span class=\"mord mathnormal mtight\" style=\"color:black\">i</span><span class=\"mord mtight\" style=\"color:black\">/</span><span class=\"mord mathnormal mtight\" style=\"color:black\">d</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"color:black;border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\" style=\"color:black\"><span class=\"mord mathnormal\" style=\"color:black\">p</span><span class=\"mord mathnormal\" style=\"color:black\">os</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.704em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"color:#58C4DD;top:0em\"><span class=\"delimsizing size2\" style=\"color:#58C4DD\"><span style=\"color:#58C4DD\">)</span></span></span></span></span><span class=\"mspace newline\" style=\"color:black\"></span><span class=\"base\"><span class=\"strut\" style=\"height:0em\"></span><span class=\"mspace\" style=\"color:black;margin-right:1em\"></span></span><span class=\"mspace newline\" style=\"color:black\"></span><span class=\"base\"><span class=\"strut\" style=\"height:1.0385em;vertical-align:-0.3552em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em;color:black\">P</span><span class=\"mord\" style=\"color:black\"><span class=\"mord mathnormal\" style=\"margin-right:0.05764em;color:black\">E</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.5198em;margin-left:-0.0576em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\" style=\"color:black\"><span class=\"mord mtight\" style=\"color:black\"><span class=\"mopen mtight\" style=\"color:black\">(</span><span class=\"mord mathnormal mtight\" style=\"color:black\">p</span><span class=\"mord mathnormal mtight\" style=\"color:black\">os</span><span class=\"mpunct mtight\" style=\"color:black\">,</span><span class=\"mord mtight\" style=\"color:black\">2</span><span class=\"mord mathnormal mtight\" style=\"color:black\">i</span><span class=\"mbin mtight\" style=\"color:black\">+</span><span class=\"mord mtight\" style=\"color:black\">1</span><span class=\"mclose mtight\" style=\"color:black\">)</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3552em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\" style=\"color:black\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.854em;vertical-align:-0.704em\"></span><span class=\"mop\" style=\"color:#FC6255\"><span style=\"color:#FC6255\">c</span><span style=\"color:#FC6255\">o</span><span style=\"color:#FC6255\">s</span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\" style=\"color:#FC6255\"><span class=\"mopen delimcenter\" style=\"color:#FC6255;top:0em\"><span class=\"delimsizing size2\" style=\"color:#FC6255\"><span style=\"color:#FC6255\">(</span></span></span><span class=\"mord\" style=\"color:black\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em\"><span style=\"top:-2.296em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\" style=\"color:black\"><span class=\"mord\" style=\"color:black\">1000</span><span class=\"mord\" style=\"color:black\"><span class=\"mord\" style=\"color:black\">0</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.814em\"><span style=\"top:-2.989em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\" style=\"color:black\"><span class=\"mord mtight\" style=\"color:black\"><span class=\"mord mtight\" style=\"color:black\">2</span><span class=\"mord mathnormal mtight\" style=\"color:black\">i</span><span class=\"mord mtight\" style=\"color:black\">/</span><span class=\"mord mathnormal mtight\" style=\"color:black\">d</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"color:black;border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\" style=\"color:black\"><span class=\"mord mathnormal\" style=\"color:black\">p</span><span class=\"mord mathnormal\" style=\"color:black\">os</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.704em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"color:#FC6255;top:0em\"><span class=\"delimsizing size2\" style=\"color:#FC6255\"><span style=\"color:#FC6255\">)</span></span></span></span></span><span class=\"mspace newline\" style=\"color:black\"></span></span></span></span>\n<p>where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi><mi>o</mi><mi>s</mi></mrow><annotation encoding=\"application/x-tex\">pos</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord mathnormal\">os</span></span></span></span> is the tokens position index, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em\"></span><span class=\"mord mathnormal\">i</span></span></span></span> is the component index\nin the positional encoding vector, and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>d</mi></mrow><annotation encoding=\"application/x-tex\">d</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathnormal\">d</span></span></span></span> is the model dimension. <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>10</mn><mo separator=\"true\">,</mo><mn>000</mn></mrow><annotation encoding=\"application/x-tex\">10,000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em\"></span><span class=\"mord\">10</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">000</span></span></span></span> is the <strong>base wavelength</strong> (henceforth referred to as\n<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>θ</mi></mrow><annotation encoding=\"application/x-tex\">\\theta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span></span></span></span>), which we stretch or compress as a function of the component index. I encourage you to plug in some realistic values to get a feel for this\ngeometric progression.</p>\n<p>There&#x27;s a few parts of this equation that are confusing at first glance. How did the\nauthors choose <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>10</mn><mo separator=\"true\">,</mo><mn>000</mn></mrow><annotation encoding=\"application/x-tex\">10,000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em\"></span><span class=\"mord\">10</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">000</span></span></span></span>? Why are we using <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>sin</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\sin</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6679em\"></span><span class=\"mop\">sin</span></span></span></span> <strong>and</strong> <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>cos</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\cos</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mop\">cos</span></span></span></span> for even and odd positions respectively?</p>\n<p>It seems that using <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>10</mn><mo separator=\"true\">,</mo><mn>000</mn></mrow><annotation encoding=\"application/x-tex\">10,000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8389em;vertical-align:-0.1944em\"></span><span class=\"mord\">10</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">000</span></span></span></span> for the base wavelength was determined experimentally <sup><a href=\"#user-content-fn-2\" id=\"user-content-fnref-2\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">2</a></sup>. Deciphering the usage of both <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>sin</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\sin</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6679em\"></span><span class=\"mop\">sin</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>cos</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\cos</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mop\">cos</span></span></span></span> is more involved, but crucial\nfor our iterative approach to understanding. The key here is our desire for a linear relation between two encoded positions (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>P</mi><msub><mi>r</mi><mn>2</mn></msub></mrow><annotation encoding=\"application/x-tex\">Pr_2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8333em;vertical-align:-0.15em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.13889em\">P</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>). To understand how using <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>sin</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\sin</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6679em\"></span><span class=\"mop\">sin</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>cos</mi><mo>⁡</mo></mrow><annotation encoding=\"application/x-tex\">\\cos</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mop\">cos</span></span></span></span> in tandem produce this linear relation, we will have to dive into some trigonometry.</p>\n<p>Consider a sequence of sine and cosine pairs, each associated with a frequency <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>ω</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\omega_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>. Our goal is to find a linear transformation matrix <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">M</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{M}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6861em\"></span><span class=\"mord mathbf\">M</span></span></span></span> that can shift these sinusoidal functions by a fixed offset <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>k</mi></mrow><annotation encoding=\"application/x-tex\">k</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span></span></span></span>:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"bold\">M</mi><mo>⋅</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><mo>=</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mo stretchy=\"false\">(</mo><mi>p</mi><mo>+</mo><mi>k</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mo stretchy=\"false\">(</mo><mi>p</mi><mo>+</mo><mi>k</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathbf{M} \\cdot \\begin{bmatrix} \\sin(\\omega_i p) \\\\ \\cos(\\omega_i p) \\end{bmatrix} = \\begin{bmatrix} \\sin(\\omega_i(p + k)) \\\\ \\cos(\\omega_i(p + k)) \\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6861em\"></span><span class=\"mord mathbf\">M</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">p</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">))</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">p</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">))</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span></span>\n<p>The frequencies <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>ω</mi><mi>i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\omega_i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> follow a geometric progression that decreases with dimension index <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi></mrow><annotation encoding=\"application/x-tex\">i</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6595em\"></span><span class=\"mord mathnormal\">i</span></span></span></span>, defined as:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi>ω</mi><mi>i</mi></msub><mo>=</mo><mfrac><mn>1</mn><msup><mn>10000</mn><mrow><mn>2</mn><mi>i</mi><mi mathvariant=\"normal\">/</mi><mi>d</mi></mrow></msup></mfrac></mrow><annotation encoding=\"application/x-tex\">\\omega_i = \\frac{1}{10000^{2i/d}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.0254em;vertical-align:-0.704em\"></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.3214em\"><span style=\"top:-2.296em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">1000</span><span class=\"mord\"><span class=\"mord\">0</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.814em\"><span style=\"top:-2.989em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">2</span><span class=\"mord mathnormal mtight\">i</span><span class=\"mord mtight\">/</span><span class=\"mord mathnormal mtight\">d</span></span></span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.704em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span>\n<p>To find this transformation matrix, we can express it as a general 2×2 matrix with unknown coefficients <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>u</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">u_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>v</mi><mn>1</mn></msub></mrow><annotation encoding=\"application/x-tex\">v_1</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>u</mi><mn>2</mn></msub></mrow><annotation encoding=\"application/x-tex\">u_2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>, and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi>v</mi><mn>2</mn></msub></mrow><annotation encoding=\"application/x-tex\">v_2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.5806em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>u</mi><mn>1</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>v</mi><mn>1</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>u</mi><mn>2</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>v</mi><mn>2</mn></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><mo>⋅</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><mo>=</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mo stretchy=\"false\">(</mo><mi>p</mi><mo>+</mo><mi>k</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mo stretchy=\"false\">(</mo><mi>p</mi><mo>+</mo><mi>k</mi><mo stretchy=\"false\">)</mo><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\begin{bmatrix} u_1 &amp; v_1 \\\\ u_2 &amp; v_2 \\end{bmatrix} \\cdot \\begin{bmatrix} \\sin(\\omega_i p) \\\\ \\cos(\\omega_i p) \\end{bmatrix} = \\begin{bmatrix} \\sin(\\omega_i(p+k)) \\\\ \\cos(\\omega_i(p+k)) \\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">p</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">))</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">p</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">))</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span></span>\n<p>By applying the trigonometric addition theorem to the right-hand side, we can expand this into:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>u</mi><mn>1</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>v</mi><mn>1</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>u</mi><mn>2</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>v</mi><mn>2</mn></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><mo>⋅</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><mo>=</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo><mo>−</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\begin{bmatrix} u_1 &amp; v_1 \\\\ u_2 &amp; v_2 \\end{bmatrix} \\cdot \\begin{bmatrix} \\sin(\\omega_i p) \\\\ \\cos(\\omega_i p) \\end{bmatrix} = \\begin{bmatrix} \\sin(\\omega_i p)\\cos(\\omega_i k) + \\cos(\\omega_i p)\\sin(\\omega_i k) \\\\ \\cos(\\omega_i p)\\cos(\\omega_i k) - \\sin(\\omega_i p)\\sin(\\omega_i k) \\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span></span>\n<p>This expansion gives us a system of two equations by matching coefficients:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd class=\"mtr-glue\"></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>u</mi><mn>1</mn></msub><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mo>+</mo><msub><mi>v</mi><mn>1</mn></msub><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd class=\"mtr-glue\"></mtd><mtd class=\"mml-eqn-num\"></mtd></mtr><mtr><mtd class=\"mtr-glue\"></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><msub><mi>u</mi><mn>2</mn></msub><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mo>+</mo><msub><mi>v</mi><mn>2</mn></msub><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mo>−</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo><mo>+</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd class=\"mtr-glue\"></mtd><mtd class=\"mml-eqn-num\"></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{align}\nu_1\\sin(\\omega_i p) + v_1\\cos(\\omega_i p) &amp;= \\cos(\\omega_i k)\\sin(\\omega_i p) + \\sin(\\omega_i k)\\cos(\\omega_i p) \\\\\nu_2\\sin(\\omega_i p) + v_2\\cos(\\omega_i p) &amp;= -\\sin(\\omega_i k)\\sin(\\omega_i p) + \\cos(\\omega_i k)\\cos(\\omega_i p)\n\\end{align}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em\"></span><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.91em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.91em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span></span></span><span class=\"tag\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.75em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"eqn-num\"></span></span><span style=\"top:-2.25em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"eqn-num\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span></span></span></span>\n<p>By comparing terms with <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\sin(\\omega_i p)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">\\cos(\\omega_i p)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span> on both sides, we can solve for the unknown coefficients:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left right left\" columnspacing=\"0em 1em 0em\"><mtr><mtd class=\"mtr-glue\"></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><msub><mi>u</mi><mn>1</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><msub><mi>v</mi><mn>1</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd class=\"mtr-glue\"></mtd><mtd class=\"mml-eqn-num\"></mtd></mtr><mtr><mtd class=\"mtr-glue\"></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><msub><mi>u</mi><mn>2</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mo>−</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><msub><mi>v</mi><mn>2</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mo>=</mo><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd class=\"mtr-glue\"></mtd><mtd class=\"mml-eqn-num\"></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{align}\nu_1 &amp;= \\cos(\\omega_i k) &amp; v_1 &amp;= \\sin(\\omega_i k) \\\\\nu_2 &amp;= -\\sin(\\omega_i k) &amp; v_2 &amp;= \\cos(\\omega_i k)\n\\end{align}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3em;vertical-align:-1.25em\"></span><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.91em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">u</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.91em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:1em\"></span><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.91em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.91em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span></span></span><span class=\"tag\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.75em\"><span style=\"top:-3.75em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"eqn-num\"></span></span><span style=\"top:-2.25em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"eqn-num\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.25em\"><span></span></span></span></span></span></span></span></span>\n<p>These solutions give us our final transformation matrix <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"bold\">M</mi><mi mathvariant=\"bold\">k</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathbf{M_k}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8361em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathbf mtight\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span>:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"bold\">M</mi><mi mathvariant=\"bold\">k</mi></msub><mo>=</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mo>−</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>k</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathbf{M_k} = \\begin{bmatrix} \\cos(\\omega_i k) &amp; \\sin(\\omega_i k) \\\\ -\\sin(\\omega_i k) &amp; \\cos(\\omega_i k) \\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8361em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathbf mtight\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03148em\">k</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span></span>\n<p>If you&#x27;ve done any game programming before, you might notice that the\nresult of our derivation is oddly familiar. That&#x27;s right, it&#x27;s the <a href=\"https://en.wikipedia.org/wiki/Rotation_matrix\" rel=\"nofollow\" target=\"_blank\">Rotation Matrix!</a> <sup><a href=\"#user-content-fn-3\" id=\"user-content-fnref-3\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">3</a></sup>.</p>\n<p>So the encoding scheme designed by <a href=\"https://en.wikipedia.org/wiki/Noam_Shazeer\" rel=\"nofollow\" target=\"_blank\">Noam Shazeer</a> in <a href=\"https://arxiv.org/abs/1706.03762\" rel=\"nofollow\" target=\"_blank\">Attention is all you need</a> was already encoding relative position as a rotation back in 2017! It took another <strong>4 years</strong> to go from Sinusoidal Encoding to RoPE, despite rotations already being on the table...</p>\n<h2>Absolute vs Relative Position Encoding</h2>\n<p>With the knowledge in hand that rotations are important here, let&#x27;s\nreturn to our motivating example and try to discover some intuitions for our next iteration.</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mtable rowspacing=\"0.25em\" columnalign=\"right left\" columnspacing=\"0em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mspace width=\"0.7em\"></mspace><mn>0</mn><mspace width=\"1.4em\"></mspace><mn>1</mn><mspace width=\"2em\"></mspace><mn>2</mn><mspace width=\"2.6em\"></mspace><mn>3</mn><mspace width=\"2.4em\"></mspace><mn>4</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mtext>The dog chased another dog</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mspace width=\"0.3em\"></mspace><mtext>-2</mtext><mspace width=\"1.4em\"></mspace><mtext>-1</mtext><mspace width=\"1.7em\"></mspace><mn>0</mn><mspace width=\"2.6em\"></mspace><mn>1</mn><mspace width=\"2.4em\"></mspace><mn>2</mn></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"true\"><mrow><mrow></mrow><mrow><mtext>The dog </mtext><mstyle mathcolor=\"#699C52\"><mtext>chased </mtext><mstyle mathcolor=\"black\"><mtext>another dog</mtext></mstyle></mstyle></mrow></mrow></mstyle></mtd></mtr></mtable><annotation encoding=\"application/x-tex\">\\begin{align*}\n&amp;\\hspace{0.7em}0 \\hspace{1.4em} 1 \\hspace{2em} 2 \\hspace{2.6em} 3 \\hspace{2.4em} 4\\\\\n&amp;\\text{The dog chased another dog} \\\\\n\\\\\n&amp;\\hspace{0.3em}\\text{-2} \\hspace{1.4em} \\text{-1} \\hspace{1.7em} 0 \\hspace{2.6em} 1 \\hspace{2.4em} 2\\\\\n&amp;\\text{The dog \\color{#699C52}chased \\color{black}another dog}\n\\end{align*}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:7.5em;vertical-align:-3.5em\"></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-r\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4em\"><span style=\"top:-6em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"mord\"></span></span><span style=\"top:-4.5em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"mord\"></span></span><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"mord\"></span></span><span style=\"top:-1.5em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"mord\"></span></span><span style=\"top:0em\"><span class=\"pstrut\" style=\"height:2.84em\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.5em\"><span></span></span></span></span></span><span class=\"col-align-l\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4em\"><span style=\"top:-6.16em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.7em\"></span><span class=\"mord\">0</span><span class=\"mspace\" style=\"margin-right:1.4em\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:2em\"></span><span class=\"mord\">2</span><span class=\"mspace\" style=\"margin-right:2.6em\"></span><span class=\"mord\">3</span><span class=\"mspace\" style=\"margin-right:2.4em\"></span><span class=\"mord\">4</span></span></span><span style=\"top:-4.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mord text\"><span class=\"mord\">The dog chased another dog</span></span></span></span><span style=\"top:-1.66em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mspace\" style=\"margin-right:0.3em\"></span><span class=\"mord text\"><span class=\"mord\">-2</span></span><span class=\"mspace\" style=\"margin-right:1.4em\"></span><span class=\"mord text\"><span class=\"mord\">-1</span></span><span class=\"mspace\" style=\"margin-right:1.7em\"></span><span class=\"mord\">0</span><span class=\"mspace\" style=\"margin-right:2.6em\"></span><span class=\"mord\">1</span><span class=\"mspace\" style=\"margin-right:2.4em\"></span><span class=\"mord\">2</span></span></span><span style=\"top:-0.16em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"></span><span class=\"mord text\"><span class=\"mord\">The dog </span><span class=\"mord\" style=\"color:#699C52\">chased </span><span class=\"mord\" style=\"color:black\">another dog</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:3.5em\"><span></span></span></span></span></span></span></span></span></span></span></span>\n<p>Above we can see the absolute positions of our tokens, and the relative\npositions from <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mstyle mathcolor=\"#699C52\"><mtext>chased</mtext></mstyle></mrow><annotation encoding=\"application/x-tex\">\\color{#699C52}\\text{chased}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord text\" style=\"color:#699C52\"><span class=\"mord\" style=\"color:#699C52\">chased</span></span></span></span></span> to every other token. With Sinusoidal Encoding, we\ngenerated a separate vector which represents the absolute position,\nand using some trigonometric trickery we were able to encode relative positions.</p>\n<p>When we&#x27;re trying to understand these sentences, does it matter that <em>this</em> word is the 2157th word in this blog post? Or do we care about its relationship to the words around it? The absolute position of a word rarely matters for meaning - what matters is how words relate to each other.</p>\n<h2>Positional encoding in context</h2>\n<p>From this point on, it&#x27;s key to consider positional encoding <strong>in the context of</strong>\nself attention. To reiterate, the self-attention mechanism enables the model to weigh the importance of different elements in an input sequence and dynamically adjust their influence on the output.</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mtext>Attn</mtext><mo stretchy=\"false\">(</mo><mi>Q</mi><mo separator=\"true\">,</mo><mi>K</mi><mo separator=\"true\">,</mo><mi>V</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mtext>softmax</mtext><mrow><mo fence=\"true\">(</mo><mfrac><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><msqrt><msub><mi>d</mi><mi>k</mi></msub></msqrt></mfrac><mo fence=\"true\">)</mo></mrow><mi>V</mi></mrow><annotation encoding=\"application/x-tex\">\\text{Attn}(Q, K, V) = \\text{softmax}\\left(\\frac{QK^T}{\\sqrt{d_k}}\\right)V</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord text\"><span class=\"mord\">Attn</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\">Q</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">K</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">V</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4684em;vertical-align:-0.95em\"></span><span class=\"mord text\"><span class=\"mord\">softmax</span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">(</span></span><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.5183em\"><span style=\"top:-2.2528em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8572em\"><span class=\"svg-align\" style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\" style=\"padding-left:0.833em\"><span class=\"mord\"><span class=\"mord mathnormal\">d</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.03148em\">k</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.8172em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.1828em\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">Q</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">K</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em\">T</span></span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.93em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">)</span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">V</span></span></span></span></span>\n<p>In all our previous iterations, we&#x27;ve generated a separate positional encoding\nvector and <strong>added</strong> it to our token embedding prior to our <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Q</mi></mrow><annotation encoding=\"application/x-tex\">Q</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8778em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">Q</span></span></span></span>, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>K</mi></mrow><annotation encoding=\"application/x-tex\">K</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">K</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>V</mi></mrow><annotation encoding=\"application/x-tex\">V</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">V</span></span></span></span> projections.\nBy adding the positional information directly to our token embedding, we are\n<strong>polluting</strong> the semantic information with the positional information. We should\nbe attempting to encode the information without modifying the norm. Shifting to multiplicative is the\nkey.</p>\n<p>Using the dictionary analogy, when looking up a word (query) in our dictionary (keys), nearby words should have more influence than distant ones. The influence of one token upon another is determined by the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>Q</mi><msup><mi>K</mi><mi>T</mi></msup></mrow><annotation encoding=\"application/x-tex\">QK^T</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0358em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">Q</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">K</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8413em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.13889em\">T</span></span></span></span></span></span></span></span></span></span></span> dot product - so that&#x27;s exactly where we should focus our positional encoding!</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mover accent=\"true\"><mi>a</mi><mo>⃗</mo></mover><mo>⋅</mo><mover accent=\"true\"><mi>b</mi><mo>⃗</mo></mover><mo>=</mo><mi mathvariant=\"normal\">∣</mi><mover accent=\"true\"><mi>a</mi><mo>⃗</mo></mover><mi mathvariant=\"normal\">∣</mi><mi mathvariant=\"normal\">∣</mi><mover accent=\"true\"><mi>b</mi><mo>⃗</mo></mover><mi mathvariant=\"normal\">∣</mi><mi>cos</mi><mo>⁡</mo><mi>θ</mi></mrow><annotation encoding=\"application/x-tex\">\\vec{a} \\cdot \\vec{b} = |\\vec{a}| |\\vec{b}| \\cos \\theta</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.714em\"></span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.714em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\">a</span></span><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2355em\"><span class=\"overlay\" style=\"height:0.714em;width:0.471em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.471em\" height=\"0.714em\" style=\"width:0.471em\" viewBox=\"0 0 471 714\" preserveAspectRatio=\"xMinYMin\"><path d=\"M377 20c0-5.333 1.833-10 5.5-14S391 0 397 0c4.667 0 8.667 1.667 12 5\n3.333 2.667 6.667 9 10 19 6.667 24.667 20.333 43.667 41 57 7.333 4.667 11\n10.667 11 18 0 6-1 10-3 12s-6.667 5-14 9c-28.667 14.667-53.667 35.667-75 63\n-1.333 1.333-3.167 3.5-5.5 6.5s-4 4.833-5 5.5c-1 .667-2.5 1.333-4.5 2s-4.333 1\n-7 1c-4.667 0-9.167-1.833-13.5-5.5S337 184 337 178c0-12.667 15.667-32.333 47-59\nH213l-171-1c-8.667-6-13-12.333-13-19 0-4.667 4.333-11.333 13-20h359\nc-16-25.333-24-45-24-59z\"></path></svg></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.9774em\"></span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9774em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\">b</span></span><span style=\"top:-3.2634em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2355em\"><span class=\"overlay\" style=\"height:0.714em;width:0.471em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.471em\" height=\"0.714em\" style=\"width:0.471em\" viewBox=\"0 0 471 714\" preserveAspectRatio=\"xMinYMin\"><path d=\"M377 20c0-5.333 1.833-10 5.5-14S391 0 397 0c4.667 0 8.667 1.667 12 5\n3.333 2.667 6.667 9 10 19 6.667 24.667 20.333 43.667 41 57 7.333 4.667 11\n10.667 11 18 0 6-1 10-3 12s-6.667 5-14 9c-28.667 14.667-53.667 35.667-75 63\n-1.333 1.333-3.167 3.5-5.5 6.5s-4 4.833-5 5.5c-1 .667-2.5 1.333-4.5 2s-4.333 1\n-7 1c-4.667 0-9.167-1.833-13.5-5.5S337 184 337 178c0-12.667 15.667-32.333 47-59\nH213l-171-1c-8.667-6-13-12.333-13-19 0-4.667 4.333-11.333 13-20h359\nc-16-25.333-24-45-24-59z\"></path></svg></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.2274em;vertical-align:-0.25em\"></span><span class=\"mord\">∣</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.714em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\">a</span></span><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2355em\"><span class=\"overlay\" style=\"height:0.714em;width:0.471em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.471em\" height=\"0.714em\" style=\"width:0.471em\" viewBox=\"0 0 471 714\" preserveAspectRatio=\"xMinYMin\"><path d=\"M377 20c0-5.333 1.833-10 5.5-14S391 0 397 0c4.667 0 8.667 1.667 12 5\n3.333 2.667 6.667 9 10 19 6.667 24.667 20.333 43.667 41 57 7.333 4.667 11\n10.667 11 18 0 6-1 10-3 12s-6.667 5-14 9c-28.667 14.667-53.667 35.667-75 63\n-1.333 1.333-3.167 3.5-5.5 6.5s-4 4.833-5 5.5c-1 .667-2.5 1.333-4.5 2s-4.333 1\n-7 1c-4.667 0-9.167-1.833-13.5-5.5S337 184 337 178c0-12.667 15.667-32.333 47-59\nH213l-171-1c-8.667-6-13-12.333-13-19 0-4.667 4.333-11.333 13-20h359\nc-16-25.333-24-45-24-59z\"></path></svg></span></span></span></span></span></span></span><span class=\"mord\">∣∣</span><span class=\"mord accent\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9774em\"><span style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord mathnormal\">b</span></span><span style=\"top:-3.2634em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"accent-body\" style=\"left:-0.2355em\"><span class=\"overlay\" style=\"height:0.714em;width:0.471em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.471em\" height=\"0.714em\" style=\"width:0.471em\" viewBox=\"0 0 471 714\" preserveAspectRatio=\"xMinYMin\"><path d=\"M377 20c0-5.333 1.833-10 5.5-14S391 0 397 0c4.667 0 8.667 1.667 12 5\n3.333 2.667 6.667 9 10 19 6.667 24.667 20.333 43.667 41 57 7.333 4.667 11\n10.667 11 18 0 6-1 10-3 12s-6.667 5-14 9c-28.667 14.667-53.667 35.667-75 63\n-1.333 1.333-3.167 3.5-5.5 6.5s-4 4.833-5 5.5c-1 .667-2.5 1.333-4.5 2s-4.333 1\n-7 1c-4.667 0-9.167-1.833-13.5-5.5S337 184 337 178c0-12.667 15.667-32.333 47-59\nH213l-171-1c-8.667-6-13-12.333-13-19 0-4.667 4.333-11.333 13-20h359\nc-16-25.333-24-45-24-59z\"></path></svg></span></span></span></span></span></span></span><span class=\"mord\">∣</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span></span></span></span></span>\n<p>The geometric interpretation of the dot product shown above gives us a magnificent insight.\nWe can modulate the result of our dot product of two vectors purely by\nincreasing or decreasing the angle between them. Furthermore, by rotating the\nvector, we have absolutely zero impact on the norm of the vector, which encodes\nthe semantic information of our token.</p>\n<p>So now we know where to focus our <em>attention</em>, and have seen from another <em>angle</em> why a\nrotation might be a sensible &quot;channel&quot; in which to encode our positional\ninformation, let&#x27;s put it all together!</p>\n<h2><strong>Ro</strong>tary <strong>P</strong>ositional <strong>E</strong>mbedding</h2>\n<p><strong>Ro</strong>tary <strong>P</strong>ostional <strong>E</strong>mbedding or RoPE was defined in the\n<a href=\"https://arxiv.org/pdf/2104.09864\" rel=\"nofollow\" target=\"_blank\">RoFormer paper</a> (<a href=\"https://x.com/bojone1993\" rel=\"nofollow\" target=\"_blank\">Jianlin Su</a> designed it independently on his blog <a href=\"https://kexue.fm/archives/8130\" rel=\"nofollow\" target=\"_blank\">here</a> and <a href=\"https://kexue.fm/archives/8265\" rel=\"nofollow\" target=\"_blank\">here</a>).\nWhile it may seem like voodoo if you skip to the end result, by thinking about Sinusoidal Encoding in the\ncontext of self attention (and more specifically dot products), we can see how\nit all comes together.</p>\n<p>Much like in Sinusoidal Encoding, we decompose our vectors (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">q</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{q}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em\"></span><span class=\"mord mathbf\">q</span></span></span></span> or <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">k</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{k}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathbf\">k</span></span></span></span>, instead of pre-projection <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">x</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{x}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4444em\"></span><span class=\"mord mathbf\">x</span></span></span></span>) into 2D pairs/chunks. Rather than encoding <em>absolute</em> position directly by adding a vector we drew from sinusoidal functions of slowly decreasing frequencies, we cut to the chase and encode <em>relative</em> position by <strong>multiplying each pair with the rotation matrix</strong>.</p>\n<p>Let <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">q</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{q}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em\"></span><span class=\"mord mathbf\">q</span></span></span></span> or <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">k</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{k}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathbf\">k</span></span></span></span> be our input vector at position <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span>. We create a block diagonal matrix\nwhere <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"bold\">M</mi><mi mathvariant=\"bold\">i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathbf{M_i}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8361em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathbf mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> is the corresponding rotation matrix for that component\npairs desired rotation:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>R</mi><mo stretchy=\"false\">(</mo><mi mathvariant=\"bold\">q</mi><mo separator=\"true\">,</mo><mi>p</mi><mo stretchy=\"false\">)</mo><mo>=</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi mathvariant=\"bold\">M</mi><mn mathvariant=\"bold\">1</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi mathvariant=\"bold\">M</mi><mn mathvariant=\"bold\">2</mn></msub></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mo lspace=\"0em\" rspace=\"0em\">⋱</mo></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi mathvariant=\"bold\">M</mi><mrow><mi mathvariant=\"bold\">d</mi><mi mathvariant=\"bold\">/</mi><mn mathvariant=\"bold\">2</mn></mrow></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>1</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>2</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi mathvariant=\"normal\">⋮</mi><mpadded height=\"0em\" voffset=\"0em\"><mspace mathbackground=\"black\" width=\"0em\" height=\"1.5em\"></mspace></mpadded></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mi>d</mi></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">R(\\mathbf{q}, p) = \\begin{pmatrix} \\mathbf{M_1} &amp; &amp; &amp; \\\\ &amp; \\mathbf{M_2} &amp; &amp; \\\\ &amp; &amp; \\ddots &amp; \\\\ &amp; &amp; &amp; \\mathbf{M_{d/2}} \\end{pmatrix} \\begin{pmatrix} q_1 \\\\ q_2 \\\\ \\vdots \\\\ q_d \\end{pmatrix} </annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em\">R</span><span class=\"mopen\">(</span><span class=\"mord mathbf\">q</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:5.46em;vertical-align:-2.48em\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em\"><span style=\"top:-4.65em\"><span class=\"pstrut\" style=\"height:6.8em\"></span><span style=\"width:0.875em;height:4.800em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"4.800em\" viewBox=\"0 0 875 4800\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,1284c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-1292c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em\"><span style=\"top:-4.81em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathbf mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-1.21em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em\"><span style=\"top:-4.81em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathbf mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-1.21em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em\"><span style=\"top:-4.81em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"minner\">⋱</span></span></span><span style=\"top:-1.21em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em\"><span style=\"top:-4.81em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"></span></span><span style=\"top:-1.21em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.5198em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathbf mtight\">d/2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3552em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.65em\"><span style=\"top:-4.65em\"><span class=\"pstrut\" style=\"height:6.8em\"></span><span style=\"width:0.875em;height:4.800em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"4.800em\" viewBox=\"0 0 875 4800\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,1209\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-1344c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.15em\"><span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.95em\"><span style=\"top:-4.95em\"><span class=\"pstrut\" style=\"height:7.4em\"></span><span style=\"width:0.875em;height:5.400em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"5.400em\" viewBox=\"0 0 875 5400\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,1884c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-1892c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.45em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.98em\"><span style=\"top:-5.8275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.6275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.7675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord\">⋮</span><span class=\"mord rule\" style=\"border-right-width:0em;border-top-width:1.5em;bottom:0em\"></span></span></span></span><span style=\"top:-1.5675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">d</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.48em\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.95em\"><span style=\"top:-4.95em\"><span class=\"pstrut\" style=\"height:7.4em\"></span><span style=\"width:0.875em;height:5.400em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"5.400em\" viewBox=\"0 0 875 5400\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,1809\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-1944c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.45em\"><span></span></span></span></span></span></span></span></span></span></span></span>\n<p>Much the same as Sinusoidal Encoding, <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><msub><mi mathvariant=\"bold\">M</mi><mi mathvariant=\"bold\">i</mi></msub></mrow><annotation encoding=\"application/x-tex\">\\mathbf{M_i}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8361em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathbf mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span> is simply:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msub><mi mathvariant=\"bold\">M</mi><mi mathvariant=\"bold\">i</mi></msub><mo>=</mo><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mo>−</mo><mi>sin</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><msub><mi>ω</mi><mi>i</mi></msub><mi>p</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\mathbf{M_i} = \\begin{bmatrix} \\cos(\\omega_i p) &amp; \\sin(\\omega_i p) \\\\ -\\sin(\\omega_i p) &amp; \\cos(\\omega_i p) \\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8361em;vertical-align:-0.15em\"></span><span class=\"mord\"><span class=\"mord mathbf\">M</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathbf mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">ω</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">i</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\">p</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span></span>\n<video src=\"/positional-encoding/RopeEncoding.mp4\"></video>\n<p>In practice, we don&#x27;t use a matrix multiplication to compute RoPE as it would be\ncomputationally inefficient with such a sparse matrix. Instead, we can directly apply the rotations to pairs of elements independently, taking advantage of the regular pattern in the computation:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><msubsup><mi>R</mi><mrow><mi mathvariant=\"normal\">Θ</mi><mo separator=\"true\">,</mo><mi>p</mi></mrow><mi>d</mi></msubsup><mi>q</mi><mo>=</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>1</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>2</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>3</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>4</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi mathvariant=\"normal\">⋮</mi><mpadded height=\"0em\" voffset=\"0em\"><mspace mathbackground=\"black\" width=\"0em\" height=\"1.5em\"></mspace></mpadded></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mrow><mi>d</mi><mo>−</mo><mn>1</mn></mrow></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mi>d</mi></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow><mo>⊙</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>1</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>1</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>2</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>2</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi mathvariant=\"normal\">⋮</mi><mpadded height=\"0em\" voffset=\"0em\"><mspace mathbackground=\"black\" width=\"0em\" height=\"1.5em\"></mspace></mpadded></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mrow><mi>d</mi><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>cos</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mrow><mi>d</mi><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow></msub></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow><mo>+</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mo>−</mo><msub><mi>q</mi><mn>2</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>1</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mo>−</mo><msub><mi>q</mi><mn>4</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mn>3</mn></msub></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi mathvariant=\"normal\">⋮</mi><mpadded height=\"0em\" voffset=\"0em\"><mspace mathbackground=\"black\" width=\"0em\" height=\"1.5em\"></mspace></mpadded></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mo>−</mo><msub><mi>q</mi><mi>d</mi></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><msub><mi>q</mi><mrow><mi>d</mi><mo>−</mo><mn>1</mn></mrow></msub></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow><mo>⊙</mo><mrow><mo fence=\"true\">(</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>1</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>1</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>2</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mn>2</mn></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi mathvariant=\"normal\">⋮</mi><mpadded height=\"0em\" voffset=\"0em\"><mspace mathbackground=\"black\" width=\"0em\" height=\"1.5em\"></mspace></mpadded></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mrow><mi>d</mi><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow></msub></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>sin</mi><mo>⁡</mo><mi>p</mi><msub><mi>θ</mi><mrow><mi>d</mi><mi mathvariant=\"normal\">/</mi><mn>2</mn></mrow></msub></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">)</mo></mrow></mrow><annotation encoding=\"application/x-tex\">R_{\\Theta,p}^d q = \\begin{pmatrix} \nq_1 \\\\\nq_2 \\\\\nq_3 \\\\\nq_4 \\\\\n\\vdots \\\\\nq_{d-1} \\\\\nq_d\n\\end{pmatrix} \\odot \\begin{pmatrix}\n\\cos p\\theta_1 \\\\\n\\cos p\\theta_1 \\\\\n\\cos p\\theta_2 \\\\\n\\cos p\\theta_2 \\\\\n\\vdots \\\\\n\\cos p\\theta_{d/2} \\\\\n\\cos p\\theta_{d/2}\n\\end{pmatrix} + \\begin{pmatrix}\n-q_2 \\\\\nq_1 \\\\\n-q_4 \\\\\nq_3 \\\\\n\\vdots \\\\\n-q_d \\\\\nq_{d-1}\n\\end{pmatrix} \\odot \\begin{pmatrix}\n\\sin p\\theta_1 \\\\\n\\sin p\\theta_1 \\\\\n\\sin p\\theta_2 \\\\\n\\sin p\\theta_2 \\\\\n\\vdots \\\\\n\\sin p\\theta_{d/2} \\\\\n\\sin p\\theta_{d/2}\n\\end{pmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.2822em;vertical-align:-0.3831em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.00773em\">R</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8991em\"><span style=\"top:-2.453em;margin-left:-0.0077em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mtight\">Θ</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\">p</span></span></span></span><span style=\"top:-3.113em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">d</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3831em\"><span></span></span></span></span></span></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:9.06em;vertical-align:-4.28em\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,5484c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-5492c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.78em\"><span style=\"top:-7.6275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-6.4275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-5.2275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.0275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">4</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.1675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord\">⋮</span><span class=\"mord rule\" style=\"border-right-width:0em;border-top-width:1.5em;bottom:0em\"></span></span></span></span><span style=\"top:-0.9675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em\"><span></span></span></span></span></span></span></span></span><span style=\"top:0.2325em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">d</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.28em\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,5409\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-5544c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⊙</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:9.06em;vertical-align:-4.28em\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,5484c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-5492c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.78em\"><span style=\"top:-7.6275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-6.4275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-5.2275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.0275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.1675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord\">⋮</span><span class=\"mord rule\" style=\"border-right-width:0em;border-top-width:1.5em;bottom:0em\"></span></span></span></span><span style=\"top:-0.9675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.5198em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mtight\">/2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3552em\"><span></span></span></span></span></span></span></span></span><span style=\"top:0.2325em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">cos</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.5198em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mtight\">/2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3552em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.28em\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,5409\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-5544c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">+</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:9.06em;vertical-align:-4.28em\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,5484c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-5492c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.78em\"><span style=\"top:-7.6275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-6.4275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-5.2275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">4</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.0275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.1675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord\">⋮</span><span class=\"mord rule\" style=\"border-right-width:0em;border-top-width:1.5em;bottom:0em\"></span></span></span></span><span style=\"top:-0.9675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\">−</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathnormal mtight\">d</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:0.2325em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">q</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3361em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mbin mtight\">−</span><span class=\"mord mtight\">1</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2083em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.28em\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,5409\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-5544c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⊙</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:9.06em;vertical-align:-4.28em\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M863,9c0,-2,-2,-5,-6,-9c0,0,-17,0,-17,0c-12.7,0,-19.3,0.3,-20,1\nc-5.3,5.3,-10.3,11,-15,17c-242.7,294.7,-395.3,682,-458,1162c-21.3,163.3,-33.3,349,\n-36,557 l0,5484c0.2,6,0,26,0,60c2,159.3,10,310.7,24,454c53.3,528,210,\n949.7,470,1265c4.7,6,9.7,11.7,15,17c0.7,0.7,7,1,19,1c0,0,18,0,18,0c4,-4,6,-7,6,-9\nc0,-2.7,-3.3,-8.7,-10,-18c-135.3,-192.7,-235.5,-414.3,-300.5,-665c-65,-250.7,-102.5,\n-544.7,-112.5,-882c-2,-104,-3,-167,-3,-189\nl0,-5492c0,-162.7,5.7,-314,17,-454c20.7,-272,63.7,-513,129,-723c65.3,\n-210,155.3,-396.3,270,-559c6.7,-9.3,10,-15.3,10,-18z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.78em\"><span style=\"top:-7.6275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-6.4275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">1</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-5.2275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-4.0275em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3011em\"><span style=\"top:-2.55em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.1675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord\">⋮</span><span class=\"mord rule\" style=\"border-right-width:0em;border-top-width:1.5em;bottom:0em\"></span></span></span></span><span style=\"top:-0.9675em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.5198em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mtight\">/2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3552em\"><span></span></span></span></span></span></span></span></span><span style=\"top:0.2325em\"><span class=\"pstrut\" style=\"height:3.6875em\"></span><span class=\"mord\"><span class=\"mop\">sin</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\">p</span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3448em\"><span style=\"top:-2.5198em;margin-left:-0.0278em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">d</span><span class=\"mord mtight\">/2</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3552em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.28em\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.7499em\"><span style=\"top:-6.7499em\"><span class=\"pstrut\" style=\"height:11em\"></span><span style=\"width:0.875em;height:9.000em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.875em\" height=\"9.000em\" viewBox=\"0 0 875 9000\"><path d=\"M76,0c-16.7,0,-25,3,-25,9c0,2,2,6.3,6,13c21.3,28.7,42.3,60.3,\n63,95c96.7,156.7,172.8,332.5,228.5,527.5c55.7,195,92.8,416.5,111.5,664.5\nc11.3,139.3,17,290.7,17,454c0,28,1.7,43,3.3,45l0,5409\nc-3,4,-3.3,16.7,-3.3,38c0,162,-5.7,313.7,-17,455c-18.7,248,-55.8,469.3,-111.5,664\nc-55.7,194.7,-131.8,370.3,-228.5,527c-20.7,34.7,-41.7,66.3,-63,95c-2,3.3,-4,7,-6,11\nc0,7.3,5.7,11,17,11c0,0,11,0,11,0c9.3,0,14.3,-0.3,15,-1c5.3,-5.3,10.3,-11,15,-17\nc242.7,-294.7,395.3,-681.7,458,-1161c21.3,-164.7,33.3,-350.7,36,-558\nl0,-5544c-2,-159.3,-10,-310.7,-24,-454c-53.3,-528,-210,-949.7,\n-470,-1265c-4.7,-6,-9.7,-11.7,-15,-17c-0.7,-0.7,-6.7,-1,-18,-1z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:4.2501em\"><span></span></span></span></span></span></span></span></span></span></span></span>\n<p>That&#x27;s all there is to it! By artfully applying our rotations to 2D chunks of <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">q</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{q}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6389em;vertical-align:-0.1944em\"></span><span class=\"mord mathbf\">q</span></span></span></span> and\n<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"bold\">k</mi></mrow><annotation encoding=\"application/x-tex\">\\mathbf{k}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathbf\">k</span></span></span></span> prior to their dot product, and switching from additive to\nmultiplicative, we can gain a big performance boost in evaluations <sup><a href=\"#user-content-fn-4\" id=\"user-content-fnref-4\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">4</a></sup>.</p>\n<h2>Extending RoPE to <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">n</span></span></span></span>-Dimensions</h2>\n<p>We&#x27;ve explored the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">1D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">1</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> case for RoPE and by this point I hope you&#x27;ve gained an\nintuitive understanding of an admittedly unintuitive component of transformers.\nFinally, let&#x27;s explore extending it to higher dimensions, such as images.</p>\n<p>A natural first intuition could be to directly use the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mi>x</mi></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mi>y</mi></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><annotation encoding=\"application/x-tex\"> \\begin{bmatrix} x \\\\ y \\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.4em;vertical-align:-0.95em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">[</span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.45em\"><span style=\"top:-3.61em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">x</span></span></span><span style=\"top:-2.41em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.95em\"><span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size3\">]</span></span></span></span></span></span> coordinate pairs from the image. This might seem intuitive, after all, we were almost arbitrarily pairing up our components previously. However, this would be a mistake!</p>\n<p>In the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">1D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">1</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> case, we encode the relative position <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi><mo>−</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">m - n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em\"></span><span class=\"mord mathnormal\">m</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">n</span></span></span></span> through a rotation of pairs\nof values from our input vector. For <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2</mn><mi>D</mi></mrow><annotation encoding=\"application/x-tex\">2D</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">D</span></span></span></span> data, we need to encode both horizontal and vertical relative positions, say <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi><mo>−</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">m - n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em\"></span><span class=\"mord mathnormal\">m</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">n</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>i</mi><mo>−</mo><mi>j</mi></mrow><annotation encoding=\"application/x-tex\">i - j</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.7429em;vertical-align:-0.0833em\"></span><span class=\"mord mathnormal\">i</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.854em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.05724em\">j</span></span></span></span> independently. RoPE&#x27;s brilliance lies in how it handles multiple dimensions. Instead of trying to encode all positional information in a single rotation, we pair components <strong>within the same dimension</strong> and rotate those, otherwise we would be intermixing the <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>x</mi></mrow><annotation encoding=\"application/x-tex\">x</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">x</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>y</mi></mrow><annotation encoding=\"application/x-tex\">y</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">y</span></span></span></span> offset information. By handling each dimension independently, we maintain the natural structure of the space. This can generalize to as many dimensions as required!</p>\n<h2>The future of positional encoding</h2>\n<p>Is RoPE the final incarnation of positional encoding? This <a href=\"https://arxiv.org/pdf/2410.06205\" rel=\"nofollow\" target=\"_blank\">recent paper</a> from DeepMind deeply analyses RoPE and highlights some fundamental problems.</p>\n<p>I anticipate some future breakthroughs, perhaps taking inspiration from\nsignal processing with ideas like wavelets or hierarchical implementations. As models\nare increasingly quantized for deployment, I&#x27;d also expect to see some\ninnovation in encoding schemes that remain robust under low-precision arithmetic.</p>\n<h2>Conclusion</h2>\n<p>Positional encoding has and continues to be treated as an after thought in\ntransformers. I believe we should view it differently - self attention has an\nAchilles heel that has been repeatedly patched.</p>\n<p>I hope this blog post showed you that you too could have discovered state of the\nart positional encoding, despite it being unintuitive at first. In a follow up\npost I&#x27;d love to explore practical implementation details for RoPE in order to\nmaximise performance.</p>\n<p>Thanks to Madeline Ephgrave for proof reading this, and thanks to <a href=\"http://www.jmmcd.net/\" rel=\"nofollow\" target=\"_blank\">James\nMcDermott</a> for his insightful corrections.</p>\n<h2>References</h2>\n<ul>\n<li><a href=\"https://kazemnejad.com/blog/transformer_architecture_positional_encoding/\" rel=\"nofollow\" target=\"_blank\">https://kazemnejad.com/blog/transformer_architecture_positional_encoding/</a></li>\n<li><a href=\"https://blog.eleuther.ai/rotary-embeddings/\" rel=\"nofollow\" target=\"_blank\">https://blog.eleuther.ai/rotary-embeddings/</a></li>\n<li><a href=\"https://www.youtube.com/watch?v=T3OT8kqoqjc\" rel=\"nofollow\" target=\"_blank\">https://www.youtube.com/watch?v=T3OT8kqoqjc</a></li>\n<li><a href=\"https://arxiv.org/pdf/1706.03762\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/1706.03762</a></li>\n<li><a href=\"https://arxiv.org/pdf/2410.06205\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/2410.06205</a></li>\n<li><a href=\"https://arxiv.org/pdf/2104.09864\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/pdf/2104.09864</a></li>\n</ul>\n<section data-footnotes=\"true\" class=\"footnotes\"><h2 class=\"sr-only\" id=\"footnote-label\">Footnotes</h2>\n<ol>\n<li id=\"user-content-fn-1\">\n<p>Binary and Sinusoidal animations are reproductions of animations contained\nin <a href=\"https://www.youtube.com/watch?v=T3OT8kqoqjc0\" rel=\"nofollow\" target=\"_blank\">this</a> video. <a href=\"#user-content-fnref-1\" data-footnote-backref=\"\" aria-label=\"Back to reference 1\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-2\">\n<p>Using <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>θ</mi><mo>=</mo><mn>10000</mn></mrow><annotation encoding=\"application/x-tex\">\\theta = 10000</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">θ</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6444em\"></span><span class=\"mord\">10000</span></span></span></span> gives us <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo stretchy=\"false\">⌊</mo><mn>2</mn><mi>π</mi><mo>⋅</mo><mn>10000</mn><mo stretchy=\"false\">⌋</mo></mrow><annotation encoding=\"application/x-tex\">\\lfloor 2 \\pi \\cdot 10000 \\rfloor</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mopen\">⌊</span><span class=\"mord\">2</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">π</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">⋅</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\">10000</span><span class=\"mclose\">⌋</span></span></span></span> unique positions, or a\ntheoretical upper bound on the context length at ~63,000. <a href=\"#user-content-fnref-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 2\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-3\">\n<p>Pieces of this post are based on <a href=\"https://kazemnejad.com/blog/transformer_architecture_positional_encoding/\" rel=\"nofollow\" target=\"_blank\">this fantastic\npost</a>\nby <a href=\"https://kazemnejad.com/\" rel=\"nofollow\" target=\"_blank\">Amirhossein Kazemnejad</a>. <a href=\"#user-content-fnref-3\" data-footnote-backref=\"\" aria-label=\"Back to reference 3\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-4\">\n<p>For empirical evidence, see <a href=\"https://blog.eleuther.ai/rotary-embeddings/\" rel=\"nofollow\" target=\"_blank\">this</a> great post by EleutherAI. <a href=\"#user-content-fnref-4\" data-footnote-backref=\"\" aria-label=\"Back to reference 4\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n</ol>\n</section>",
            "url": "https://fleetwood.dev/posts/you-could-have-designed-SOTA-positional-encoding",
            "title": "You could have designed state of the art Positional Encoding",
            "summary": "Rotary Position Encoding (RoPE) can seem intimidating at first. This post endeavours to rediscover RoPE by iteratively improving upon proposed positional encoding schemes.",
            "date_modified": "2024-11-17T00:00:00.000Z",
            "author": {
                "name": "Christopher Fleetwood",
                "url": "https://fleetwood.dev"
            }
        },
        {
            "id": "https://fleetwood.dev/posts/a-first-principles-analysis-of-consumer-smart-glasses",
            "content_html": "<p>In the novel <a href=\"https://garethstack.com/wp-content/uploads/2013/08/accelerando.pdf\" rel=\"nofollow\" target=\"_blank\">Accelerando by Charles Stross</a>, the protagonist <em>Manfred Macx</em> is a power user of smartglasses. These glasses dramatically enhance his ability to interact with the world. Over the course of the book, we learn about the glasses&#x27; incredible capabilities, from real-time language translation to dispatching agents to perform complex tasks across the internet. At one point, Manfred has the glasses stolen from him, and it becomes clear that they form a significant part of his cognition.</p>\n<p>Offloading parts of your cognition to a device is already commonplace today. Many of us already rely on our smartphones for upcoming events, directions, reminders etc. If you take the idea of having a &quot;silicon lobe&quot; to its logical conclusion, you end up at... a computer being physically integrated into your brain aka <a href=\"https://neuralink.com/\" rel=\"nofollow\" target=\"_blank\">Neuralink</a>. This is true in the novel too, with the glasses being &quot;jacked in&quot; to the brain directly, allowing for a higher bandwidth interface and direct sensory manipulation.</p>\n<p>In the interim between the iPhone and Neuralink, I predict we will see some kind of Augmented Reality (AR) glasses hit the mainstream for consumers, similar to those in the book (minus the direct neural interface). This device may become ~as integral to your life as Manfreds, where losing them is equivalent to a lobotomy!</p>\n<p>This post is a deep dive into the following:</p>\n<ul>\n<li>What are the concrete use cases for AR glasses?</li>\n<li>What are the key challenges facing AR glasses?</li>\n<li>What will the first successful consumer product look like, and when will they arrive?</li>\n<li>What are the key innovations required for the ultimate consumer smartglasses?</li>\n</ul>\n<h1>What are AR glasses?</h1>\n<p>AR glasses are a form of wearable technology that superimposes digital information onto the real world. While all AR glasses are smartglasses, not all smartglasses are AR glasses.</p>\n<p>Below you can see the <a href=\"https://www.ray-ban.com/uk/ray-ban-meta-smart-glasses\" rel=\"nofollow\" target=\"_blank\">2024 Ray-Ban + Meta smartglasses</a>, one of the first forays into the consumer market. I&#x27;ll be using them as a reference point throughout this post.</p>\n<div></div>\n<p>Notice how the arms are quite a bit thicker than standard sunglasses from Ray-Ban. These glasses pack in a huge amount of electronics to provide users with a rich &quot;AI + Audio&quot; experience. Using the integrated camera and spatial audio, you can ask questions, listen to music and use AI to perform other tasks. However, they&#x27;re lacking the crucial component of AR glasses: <strong>a display</strong>. Meta isn&#x27;t stopping there though, and are expected to announce <a href=\"https://www.theverge.com/23022611/meta-facebook-nazare-ar-glasses-roadmap-2024\" rel=\"nofollow\" target=\"_blank\">Orion and Hypernova</a>, their true AR glasses, at Meta Connect on September 25 - 26, 2024 <sup><a href=\"#user-content-fn-1\" id=\"user-content-fnref-1\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">1</a></sup>.</p>\n<h1>Use Cases</h1>\n<p>In order to usurp the smartphone and become the primary interface with the digital world, AR glasses must provide a significant boost in utility. The use case that everyone is talking about today is AI integration, and for good reason. In my opinion, every human could get a staggering amount of utility from having <em>insert SOTA LLM</em> deeply integrated into their lives. These LLMs are already proving incredibly useful in chatbot form, however they&#x27;ve quickly become bottlenecked - not by IQ, but by <strong>context</strong> and <strong>bandwidth</strong>.</p>\n<ul>\n<li><strong>Context</strong>: See and hear everything the user does (privacy and regulation notwithstanding, Zuckerberg highlighted this use case at Meta Connect 2023 <sup><a href=\"#user-content-fn-2\" id=\"user-content-fnref-2\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">2</a></sup>).</li>\n<li><strong>Bandwidth</strong>: Higher information transfer between the user and the computer (the average person types at a shockingly slow ~40wpm <sup><a href=\"#user-content-fn-3\" id=\"user-content-fnref-3\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">3</a></sup>).</li>\n</ul>\n<p>Perhaps the latest wave of AI innovation will be the catalyst for AR glasses to hit the mainstream. To see what the glasses might be capable of, let&#x27;s analyse some use cases.</p>\n<h3>1 - Contextual Overlay</h3>\n<p>The obvious use case for AR glasses is providing updates and real-time information in a timely and <strong>contextual</strong> manner. Microsoft has achieved a modicum of success in the enterprise AR market with their <a href=\"https://www.microsoft.com/en-gb/hololens/buy\" rel=\"nofollow\" target=\"_blank\">HoloLens 2</a>, which is primarily targeting manufacturing and military applications. The glasses provide real-time, hands-free instructions to workers, allowing them to carry out routine tasks more efficiently. Despite being relatively absent from the public consciousness, the HoloLens line has sold over half a million units since its release <sup><a href=\"#user-content-fn-4\" id=\"user-content-fnref-4\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">4</a></sup>.</p>\n<h3>2 - Sensory Control and Manipulation</h3>\n<p>Whilst we won&#x27;t be able to directly interface with our auditory and visual systems for at least the next 5 years, that doesn&#x27;t mean to say that AR glasses can&#x27;t give users a new found control over their senses. One of the most liberating things about the proliferation of high quality Active Noise Cancelling (ANC) headphones is the ability for an individual to filter their audio input. The market has responded well to this ability, and in 2020, Apple&#x27;s revenue from AirPods sales was larger than the entirety of Twitter, Spotify and Square combined <sup><a href=\"#user-content-fn-5\" id=\"user-content-fnref-5\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">5</a></sup>.</p>\n<p>What would this look like for visual data? Perhaps something reminiscent of &quot;Times Square with Adblock on&quot;.</p>\n<p><img src=\"/ar-glasses/ts-adblock.gif\" alt=\"Times Square Adblock\"/></p>\n<p>The ability to modify your visual input is a powerful one, and it could be used for more than just blocking ads. For example, during a conversation with a person speaking a different language, you could use a diffusion model to modify their lip movements to match your language. This, plus ANC and a speech-to-speech model, could be an excellent start to the dissolution of language barriers. In order to achieve both of these use cases, the AR would need to be extremely high-resolution and low-latency, which is a significant challenge we&#x27;ll discuss later.</p>\n<h3>3 - Personal Assistant Functions</h3>\n<blockquote>\n<p>A person&#x27;s name is to that person, the sweetest, most important sound in any language.</p>\n<p>— Dale Carnegie</p>\n</blockquote>\n<p>Whilst this is perhaps a subset of &quot;Contextual Overlay&quot;, it&#x27;s worth highlighting, as it is a compelling use case far beyond what smartphones can provide today. By providing pieces of data throughout the course of a day, everyone could become more charismatic and <em>thoughtful</em>. You&#x27;ll never have to forget a name or a face ever again.</p>\n<p>You can take this further too. According to Carnegie <sup><a href=\"#user-content-fn-6\" id=\"user-content-fnref-6\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">6</a></sup>, Abraham Lincoln would dedicate considerable time to learning about the interests and backgrounds of people he was scheduled to meet. This allowed him to engage them in conversations about topics they were passionate about, greatly boosting his likeability. With AR glasses, everyone could do this effortlessly (much to the chagrin of those of us who have expended considerable effort practicing this).\nThis is on top of the more obvious personal assistant utilities! However, for this to be totally effective, the glasses would need to serialize everything you ever see or hear, which is quite the social and regulatory challenge.</p>\n<h3>Other Uses</h3>\n<p>The outlined use cases above are just the tip of the iceberg. In Accelerando, the glasses also perform a variety of futuristic tasks, such as:</p>\n<ul>\n<li>Dispatching AI agents to perform complex analysis across the internet</li>\n<li>Direct neural interfacing for higher bandwidth communication</li>\n<li>Personalized vital sign monitoring</li>\n</ul>\n<p>Even without these capabilities, by simply combining contextual overlay, sensory control and personal assistant functions, we can already see the immense value that AR glasses could provide to users within the next ~3 years. Let&#x27;s dive into the technical challenges that must be overcome to make this vision a reality.</p>\n<h1>Physical Constraints</h1>\n<p>In order to better understand the technical challenges, let&#x27;s first establish some axioms based on the physical constraints of the glasses.</p>\n<h3>Weight</h3>\n<p>There is surprisingly little literature on the acceptable weight of consumer glasses, so let&#x27;s look at some current products on the market.</p>\n<p><img src=\"/ar-glasses/weight_plot.png\" alt=\"AR Glasses weight\"/></p>\n<p>Anecdotally, it seems like the Meta smartglasses are light enough for all day wear, with the <a href=\"https://uk.shop.xreal.com/products/xreal-air-2-ultra\" rel=\"nofollow\" target=\"_blank\">XREAL Air 2 Ultra</a> not being too far off. Given the potential utility of the glasses, I think using 75g as an upper bound is reasonable. This also aligns with Meta&#x27;s rumoured Hypernova glasses we mentioned earlier, which reportedly sit at ~70g <sup><a href=\"#user-content-fn-7\" id=\"user-content-fnref-7\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">7</a></sup>.</p>\n<h3>Volume &amp; Battery Life</h3>\n<p>Another concrete axiom we can build off of is volume. The shape of the human head is going to stay consistent for the foreseeable future, and whilst there is some leeway to be found in the flexibilty of social dynamics, we can expect the form factor (and therefore volume) of the glasses to be approximately similar to glasses today. To calculate a baseline volume, I went to the closest demo location and measured the Meta smartglasses with calipers.</p>\n<p>From my measurements, the volume of each arm is approximately <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>10</mn><mi>c</mi><msup><mi>m</mi><mn>3</mn></msup></mrow><annotation encoding=\"application/x-tex\">10cm^3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em\"></span><span class=\"mord\">10</span><span class=\"mord mathnormal\">c</span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span></span></span></span></span></span></span></span>, so <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>20</mn><mi>c</mi><msup><mi>m</mi><mn>3</mn></msup></mrow><annotation encoding=\"application/x-tex\">20cm^3</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.8141em\"></span><span class=\"mord\">20</span><span class=\"mord mathnormal\">c</span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">3</span></span></span></span></span></span></span></span></span></span></span> for both. If we naïvely used the entire volume of a single arm as a battery (and allocate the rest to electronics), and assuming we use a SOTA battery with <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>800</mn><mtext>Wh/L</mtext></mrow><annotation encoding=\"application/x-tex\">800\\text{Wh/L}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\">800</span><span class=\"mord text\"><span class=\"mord\">Wh/L</span></span></span></span></span> at a voltage of <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>3.7</mn><mi>V</mi></mrow><annotation encoding=\"application/x-tex\">3.7V</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">3.7</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">V</span></span></span></span>, we can calculate the maximum possible battery life of <strong>any</strong> smartglasses as follows:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mn>800</mn><mtext>Wh/L</mtext><mo>×</mo><mn>0.01</mn><mtext>L</mtext><mo>=</mo><mn>8</mn><mi>W</mi><mi>h</mi><mspace linebreak=\"newline\"></mspace><mn>8</mn><mi>W</mi><mi>h</mi><mo>÷</mo><mn>3.7</mn><mi>V</mi><mo>=</mo><mn>2162</mn><mi>m</mi><mi>A</mi><mi>h</mi></mrow><annotation encoding=\"application/x-tex\">800 \\text{Wh/L} \\times 0.01 \\text{L} = 8Wh \\\\\n8Wh \\div 3.7V = 2162mAh</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\">800</span><span class=\"mord text\"><span class=\"mord\">Wh/L</span></span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">×</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">0.01</span><span class=\"mord text\"><span class=\"mord\">L</span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord\">8</span><span class=\"mord mathnormal\">Wh</span></span><span class=\"mspace newline\"></span><span class=\"base\"><span class=\"strut\" style=\"height:0.7778em;vertical-align:-0.0833em\"></span><span class=\"mord\">8</span><span class=\"mord mathnormal\">Wh</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">÷</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">3.7</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">V</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord\">2162</span><span class=\"mord mathnormal\">m</span><span class=\"mord mathnormal\">A</span><span class=\"mord mathnormal\">h</span></span></span></span></span>\n<p><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>2162</mn><mi>m</mi><mi>A</mi><mi>h</mi></mrow><annotation encoding=\"application/x-tex\">2162mAh</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord\">2162</span><span class=\"mord mathnormal\">m</span><span class=\"mord mathnormal\">A</span><span class=\"mord mathnormal\">h</span></span></span></span> is approximately half of the battery life of a modern smartphone, but is an order of magnitude larger than the battery that currently ships in the Meta smartglasses highlighted below (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>154</mn><mi>m</mi><mi>A</mi><mi>h</mi></mrow><annotation encoding=\"application/x-tex\">154mAh</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord\">154</span><span class=\"mord mathnormal\">m</span><span class=\"mord mathnormal\">A</span><span class=\"mord mathnormal\">h</span></span></span></span>). To put it simply, if we want 8 hours of battery life, the glasses are limited to <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>3600</mn><mi>J</mi><mi mathvariant=\"normal\">/</mi><mi>h</mi></mrow><annotation encoding=\"application/x-tex\">3600J/h</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1em;vertical-align:-0.25em\"></span><span class=\"mord\">3600</span><span class=\"mord mathnormal\" style=\"margin-right:0.09618em\">J</span><span class=\"mord\">/</span><span class=\"mord mathnormal\">h</span></span></span></span> (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>1</mn><mi>W</mi><mi>h</mi><mo>=</mo><mo>=</mo><mn>3600</mn><mi>J</mi></mrow><annotation encoding=\"application/x-tex\">1Wh == 3600J</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6944em\"></span><span class=\"mord\">1</span><span class=\"mord mathnormal\">Wh</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">==</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord\">3600</span><span class=\"mord mathnormal\" style=\"margin-right:0.09618em\">J</span></span></span></span>).</p>\n<p><img src=\"/ar-glasses/mrb-battery.png\" alt=\"Glasses teardown\"/></p>\n<p>Maximising battery life is absolutely critical for AR glasses, and it&#x27;s part of what makes them such an enticing challenge. It requires a full stack solution, from efficient software to efficient hardware to efficient power management. From the above teardown, we can see that Meta hasn&#x27;t exactly maximised volume usage, so I expect that later generations of glasses could have significantly longer battery life. Perhaps like in aeroplanes and electric cars, we will see power sources being used as structural components in the glasses to increase volume utilisation.</p>\n<h1>Technical Challenges</h1>\n<p>On top of the physical constraints, we have to deal with the following technical challenges:</p>\n<ul>\n<li>Near Eye Optics</li>\n<li>Form factor</li>\n<li>Compute requirements</li>\n<li>Heat dissipation</li>\n<li>What does it look like when it&#x27;s off?</li>\n<li>Eye glow</li>\n<li>Efficient SLAM</li>\n<li>Accurate eye, facial &amp; hand tracking</li>\n</ul>\n<p>We won&#x27;t explore all of these challenges in this post, but progress is being made everywhere! Let&#x27;s take a look at the most significant challenge: near eye optics.</p>\n<h2>Near Eye Optics - The Fundamental Challenge</h2>\n<blockquote>\n<p>Success of smartglasses in a consumer acceptable form factor begins &amp; ends with near eye optics.</p>\n<p>— Christopher Grayson, 2016</p>\n</blockquote>\n<p>The above quote seems to be the underlying truth behind the lack of true consumer AR glasses. Therefore, we will devote significant time and energy to understanding the two key components of near eye optical systems and the advances required.</p>\n<p><img src=\"/ar-glasses/compare-vr-ar.png\" alt=\"VR AR Comparison\"/></p>\n<p>The above diagram from Radiant Vision Systems <sup><a href=\"#user-content-fn-8\" id=\"user-content-fnref-8\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">8</a></sup> demonstrates why AR optics are much more challenging than VR. On the left hand side, you can see the simplicity of a VR optical system - the light from a display is fed directly into the eye with lenses for magnification. On the right hand side, you can see the AR optical system, with a compact display offset from the eye. This requires a complex optical combiner to overlay the digital image onto the real world.</p>\n<h3>Why are the optics so challenging?</h3>\n<p><a href=\"https://kguttag.com/\" rel=\"nofollow\" target=\"_blank\">Karl Guttag</a> (A legend in the AR space), frequently comments that AR is many orders of magnitude more difficult than VR. The two major bottlenecks in AR optics that require significant advances are the underlying display technology for image generation, and the optical combiner for merging real-world and digital imagery into a single view.</p>\n<p>To start to get a sense of the challenges, let&#x27;s break down the requirements for the display:</p>\n<ul>\n<li>Small enough to fit into the glasses&#x27; form factor (order of magnitude smaller than a VR display)</li>\n<li>Not placed directly in front of the eye, but offset with the light routed to the eye</li>\n<li>Positioned close to the eye, so the light must be heavily modified for the eye to focus on it</li>\n<li>Extremely bright (<span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mn>10</mn><mo separator=\"true\">,</mo><mn>000</mn><mi>c</mi><mi>d</mi><mi mathvariant=\"normal\">/</mi><msup><mi>m</mi><mn>2</mn></msup></mrow><annotation encoding=\"application/x-tex\">10,000 cd/m^2</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0641em;vertical-align:-0.25em\"></span><span class=\"mord\">10</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord\">000</span><span class=\"mord mathnormal\">c</span><span class=\"mord mathnormal\">d</span><span class=\"mord\">/</span><span class=\"mord\"><span class=\"mord mathnormal\">m</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span></span></span></span> (nits) to eye ideally), to compete with sunlight and massive losses during transmission</li>\n<li>High-resolution (to cover a ~100° field of view (FoV) with human eye acuity of 1 arcminute, ideally 6K6K resolution is required <strong>for each eye</strong> <sup><a href=\"#user-content-fn-9\" id=\"user-content-fnref-9\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">9</a></sup>. This equates to roughly 1µm pixel pitch)</li>\n<li>Low power consumption</li>\n</ul>\n<p>These stringent requirements are compounded by the fact the optical combiner will inherently degrade the light from both the real world and the display!</p>\n<h3>Optical Combiner</h3>\n<p>How do we actually combine the light from the two sources? Below you can see a map from IDTechEx of all optical combiner methods:</p>\n<p><img src=\"/ar-glasses/optical-combiners.png\" alt=\"Optical Combiners\"/></p>\n<p>As you can see, optical combiners are divided into waveguides and non-waveguides. I&#x27;m not going to discuss non-waveguides in this post, as from my research it seems like waveguides are going to be the technology of choice for the first generation of consumer AR glasses.</p>\n<h3>Waveguides</h3>\n<p>The function of a waveguide is fundamentally quite simple - guide the light from the display into the viewer&#x27;s eye. Waveguides are usually defined by their underlying mechanism: <strong>diffraction</strong> or  <strong>reflection</strong>. Whilst diffractive waveguides have been the dominant technology up until now, I believe reflective waveguides are the future of AR glasses due to their superior efficiency and image quality. For more information on waveguides, I highly recommend reading this post by Optofidelity <sup><a href=\"#user-content-fn-10\" id=\"user-content-fnref-10\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">10</a></sup>.</p>\n<img src=\"/ar-glasses/lumus.png\" alt=\"Lumus Waveguide\" maxWidth=\"70%\"/>\n<p>Above you can see a diagram of a reflective waveguide from <a href=\"https://lumusvision.com/\" rel=\"nofollow\" target=\"_blank\">Lumus Vision</a>. Lumus Vision is an extremely interesting company that has pioneered reflective waveguides. The light is generated by the micro-display, and expanded and directed into the eye by the waveguide through a series of transflective mirrors. Lumus has been working on reflective waveguides for many years, but they have proved to be extremely difficult to manufacture at scale. However, <a href=\"https://www.schott.com/en-no/news-and-media/media-releases/2024/lumus-and-schott-strengthen-manufacturing-partnership\" rel=\"nofollow\" target=\"_blank\">Lumus has recently renewed its partnership with the glass manufacturing giant Schott</a> to manufacture their waveguides at a new facility in Malaysia. This is a significant step forward for the technology, and could pave the way for the first generation of consumer AR glasses.</p>\n<h3>Display Technology</h3>\n<p>There are a plethora of display technologies available, however only a limited subset are applicable to AR glasses. This is because the losses in a waveguide based system are astronomical, as demonstrated in the diagram below from Karl Guttag <sup><a href=\"#user-content-fn-11\" id=\"user-content-fnref-11\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">11</a></sup>.</p>\n<img src=\"/ar-glasses/waveguide-efficiency.webp\" alt=\"Diffractive Waveguides\" maxWidth=\"70%\"/>\n<p>In order for a suitable amount of nits to reach the eye, the display must be capable of generating a huge amount of light. For this reason, Liquid Crystal on Silicon (LCOS) and MicroLED are the most promising display technologies for AR glasses. Many industry experts see MicroLED as the future of AR displays, with companies like Apple and Meta investing heavily in the technology. Read more about MicroLED in this excellent article by Karl Guttag <sup><a href=\"#user-content-fn-11\" id=\"user-content-fnref-11-2\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">11</a></sup>.</p>\n<h3>Compute</h3>\n<p>AR glasses are a full stack problem, and the compute requirements are variable and intense. Let&#x27;s break down the compute workload into its core components:</p>\n<ul>\n<li><strong>Simultaneous Localization and Mapping (SLAM)</strong>: Understanding the environment</li>\n<li><strong>Neural Networks</strong>: Running a multitude of AI models for different tasks</li>\n<li><strong>Rendering</strong>: Meshing the real world and digital world together</li>\n<li><strong>Transmission</strong>: Funneling data to and from the glasses</li>\n<li><strong>Other</strong>: Everything else a standard smartphone does</li>\n</ul>\n<p>Offloading as much compute as possible to auxiliary devices is key to the success of AR glasses, for both power and performance reasons. Most humans carry around a supercomputer in their pocket, and we can push even more intensive workloads to the cloud, much like Meta is already doing for their current generation of smartglasses.</p>\n<img src=\"/ar-glasses/compute-move.png\" alt=\"Compute Offloading\" maxWidth=\"70%\"/>\n<p>Each task has to be carefully placed somewhere on this unconjoined triangle of success, trading off latency, power usage and compute requirements. For the neural network use case, using a Router LLM <sup><a href=\"#user-content-fn-12\" id=\"user-content-fnref-12\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">12</a></sup> to evaluate query difficulty and selecting the most appropriate device for the workload seems like an interesting avenue.</p>\n<h3>Transmission</h3>\n<p>In order to leverage the two external compute sources, you will need to have a high bandwidth communication channel. In the cloud case, this will obviously be done via WiFi or cellular networks. However, for the glasses-to-phone connection, more care must be taken in selecting the communication protocol. Bluetooth 5.0, whilst ubiquitous and low power, has an extremely limited bandwidth of 2Mbps, which is far from sufficient for AR glasses. Ultra Wideband (UWB) <sup><a href=\"#user-content-fn-13\" id=\"user-content-fnref-13\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">13</a></sup> could be the answer, and has been shipping in the iPhone since 2019, albeit purely for real-time location not data transfer. With a theoretical transfer rate of 1Gbps, it may be the solution that AR and other high bandwidth peripherals are looking for. Meta has been filing numerous patents relating to UWB (e.g <a href=\"https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20240235606\" rel=\"nofollow\" target=\"_blank\">20240235606</a>), so we may see this technology in their glasses soon.</p>\n<h1>Tomorrow&#x27;s AR Glasses</h1>\n<img src=\"/ar-glasses/tomorrows-glasses.png\" alt=\"Tomorrow&#x27;s AR\" maxWidth=\"75%\"/>\n<p>The AR glasses we get in the next ~3 years will not be the ultimate form. In order to rival Manfred&#x27;s glasses in Accelerando, we will need better specs. The above schematic is taken from one of Meta&#x27;s most recent patents <sup><a href=\"#user-content-fn-14\" id=\"user-content-fnref-14\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">14</a></sup>, and demonstrates some interesting innovations such as transducers in both the nose bridge and arms for more immersive spatial audio.</p>\n<p>In my opinion, the most salient difference between the first and final generation of AR glasses will be the display technology, particularly the resolution and FOV. As the visual fidelity of the display approaches that of the human eye, the line between the real and the virtual will become blurred, then vanish entirely. This may arrive sooner than you think, with commerical micro displays already hitting 4K4K resolution @ ~3µm pixel pitch <sup><a href=\"#user-content-fn-15\" id=\"user-content-fnref-15\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">15</a></sup>.</p>\n<h1>Assembling a team</h1>\n<p>If we were to assemble a team to build the ultimate AR glasses, it would look something like this:</p>\n<ul>\n<li><a href=\"https://www.kguttag.com\" rel=\"nofollow\" target=\"_blank\">Karl Guttag</a>: AR industry veteran, from whom much of my knowledge is derived</li>\n<li><a href=\"https://scholar.google.com/citations?user=jNiCmBQAAAAJ&amp;hl=en\" rel=\"nofollow\" target=\"_blank\">Shin-Tson Wu</a>: Dr Wu. and his progeny will be the creators of the ultimate AR display</li>\n<li><a href=\"https://library.ucdavis.edu/person/oliver-kreylos/\" rel=\"nofollow\" target=\"_blank\">Oliver Kreylos</a>: Who thoroughly dashed my &quot;lasers are the answer&quot; theory</li>\n<li><a href=\"https://sites.cc.gatech.edu/home/thad/\" rel=\"nofollow\" target=\"_blank\">Thad Starner</a>: Original Google Glass team</li>\n<li><a href=\"https://spie.org/profile/Bernard.Kress-16356\" rel=\"nofollow\" target=\"_blank\">Bernard Kress</a>: Author of the book on AR optics <sup><a href=\"#user-content-fn-16\" id=\"user-content-fnref-16\" data-footnote-ref=\"true\" aria-describedby=\"footnote-label\">16</a></sup>, currently at Google</li>\n</ul>\n<h1>Conclusion</h1>\n<p>From all my research, it seems like Meta is well positioned in this market. I would not be surprised to see exponential adoption if they can deliver on their 2028 roadmap. With the recent advancements in AI, the value proposition for AR glasses has gone from exciting to essential. With developments from Lumus and Meta&#x27;s recent patent filings, it seems like this technology may finally be approaching its &quot;iPhone moment&quot; - provided the near eye optics problem can be solved.</p>\n<p>Once the &quot;iPhone moment&quot; is reached, there will be a plethora of regulatory and social challenges to overcome, as highlighted in this <a href=\"https://www.smbc-comics.com/comic/augmented-3\" rel=\"nofollow\" target=\"_blank\">excellent SMBC comic</a>. In spite of these challenges, I&#x27;m looking forward to the future of AR glasses, and the startups and innovations that will come with them.</p>\n<p>TLDR: Long $META.</p>\n<p>I&#x27;m grateful to Mithun Hunsur, Madeline Ephgrave &amp; Benjamin Perkins for their feedback on this post.</p>\n<section data-footnotes=\"true\" class=\"footnotes\"><h2 class=\"sr-only\" id=\"footnote-label\">Footnotes</h2>\n<ol>\n<li id=\"user-content-fn-1\">\n<p><a href=\"https://www.roadtovr.com/report-meta-ar-glasses-orion-connect-2024/\" rel=\"nofollow\" target=\"_blank\">https://www.roadtovr.com/report-meta-ar-glasses-orion-connect-2024/</a> <a href=\"#user-content-fnref-1\" data-footnote-backref=\"\" aria-label=\"Back to reference 1\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-2\">\n<p><a href=\"https://x.com/fleetwood___/status/1811386711953358927\" rel=\"nofollow\" target=\"_blank\">https://x.com/fleetwood___/status/1811386711953358927</a> <a href=\"#user-content-fnref-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 2\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-3\">\n<p><a href=\"https://en.wikipedia.org/wiki/Words_per_minute\" rel=\"nofollow\" target=\"_blank\">https://en.wikipedia.org/wiki/Words_per_minute</a> <a href=\"#user-content-fnref-3\" data-footnote-backref=\"\" aria-label=\"Back to reference 3\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-4\">\n<p><a href=\"https://www.buildwagon.com/What-happened-to-the-Hololens.html\" rel=\"nofollow\" target=\"_blank\">https://www.buildwagon.com/What-happened-to-the-Hololens.html</a> <a href=\"#user-content-fnref-4\" data-footnote-backref=\"\" aria-label=\"Back to reference 4\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-5\">\n<p><a href=\"https://www.acumenfinancial.co.uk/advice/airpods-revenue-vs-top-tech-companies/\" rel=\"nofollow\" target=\"_blank\">https://www.acumenfinancial.co.uk/advice/airpods-revenue-vs-top-tech-companies/</a> <a href=\"#user-content-fnref-5\" data-footnote-backref=\"\" aria-label=\"Back to reference 5\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-6\">\n<p><a href=\"https://www.rfpmm.org/pdf/how-to-win-friends-and-influence-people.pdf\" rel=\"nofollow\" target=\"_blank\">https://www.rfpmm.org/pdf/how-to-win-friends-and-influence-people.pdf</a> <a href=\"#user-content-fnref-6\" data-footnote-backref=\"\" aria-label=\"Back to reference 6\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-7\">\n<p><a href=\"https://www.uploadvr.com/meta-hud-glasses-wont-be-ray-bans/\" rel=\"nofollow\" target=\"_blank\">https://www.uploadvr.com/meta-hud-glasses-wont-be-ray-bans/</a> <a href=\"#user-content-fnref-7\" data-footnote-backref=\"\" aria-label=\"Back to reference 7\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-8\">\n<p><a href=\"https://www.radiantvisionsystems.com/blog/ride-wave-augmented-reality-devices-rely-waveguides\" rel=\"nofollow\" target=\"_blank\">https://www.radiantvisionsystems.com/blog/ride-wave-augmented-reality-devices-rely-waveguides</a> <a href=\"#user-content-fnref-8\" data-footnote-backref=\"\" aria-label=\"Back to reference 8\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-9\">\n<p><a href=\"https://www.mdpi.com/2076-3417/8/12/2366\" rel=\"nofollow\" target=\"_blank\">https://www.mdpi.com/2076-3417/8/12/2366</a> <a href=\"#user-content-fnref-9\" data-footnote-backref=\"\" aria-label=\"Back to reference 9\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-10\">\n<p><a href=\"https://www.optofidelity.com/insights/blogs/comparing-and-contrasting-different-waveguide-technologies-diffractive-reflective-and-holographic-waveguides\" rel=\"nofollow\" target=\"_blank\">https://www.optofidelity.com/insights/blogs/comparing-and-contrasting-different-waveguide-technologies-diffractive-reflective-and-holographic-waveguides</a> <a href=\"#user-content-fnref-10\" data-footnote-backref=\"\" aria-label=\"Back to reference 10\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-11\">\n<p><a href=\"https://kguttag.com/2023/03/12/microleds-with-waveguides-ces-ar-vr-mr-2023-pt-7/\" rel=\"nofollow\" target=\"_blank\">https://kguttag.com/2023/03/12/microleds-with-waveguides-ces-ar-vr-mr-2023-pt-7/</a> <a href=\"#user-content-fnref-11\" data-footnote-backref=\"\" aria-label=\"Back to reference 11\" class=\"data-footnote-backref\">↩</a> <a href=\"#user-content-fnref-11-2\" data-footnote-backref=\"\" aria-label=\"Back to reference 11-2\" class=\"data-footnote-backref\">↩<sup>2</sup></a></p>\n</li>\n<li id=\"user-content-fn-12\">\n<p><a href=\"https://arxiv.org/abs/2406.18665\" rel=\"nofollow\" target=\"_blank\">https://arxiv.org/abs/2406.18665</a> <a href=\"#user-content-fnref-12\" data-footnote-backref=\"\" aria-label=\"Back to reference 12\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-13\">\n<p><a href=\"https://en.wikipedia.org/wiki/Ultra-wideband\" rel=\"nofollow\" target=\"_blank\">https://en.wikipedia.org/wiki/Ultra-wideband</a> <a href=\"#user-content-fnref-13\" data-footnote-backref=\"\" aria-label=\"Back to reference 13\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-14\">\n<p><a href=\"https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20240118423\" rel=\"nofollow\" target=\"_blank\">https://image-ppubs.uspto.gov/dirsearch-public/print/downloadPdf/20240118423</a> <a href=\"#user-content-fnref-14\" data-footnote-backref=\"\" aria-label=\"Back to reference 14\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-15\">\n<p><a href=\"https://kguttag.com/2024/04/20/mixed-reality-at-ces-ar-vr-mr-2024-part-3-display-devices/\" rel=\"nofollow\" target=\"_blank\">https://kguttag.com/2024/04/20/mixed-reality-at-ces-ar-vr-mr-2024-part-3-display-devices/</a> <a href=\"#user-content-fnref-15\" data-footnote-backref=\"\" aria-label=\"Back to reference 15\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-16\">\n<p><a href=\"https://spie.org/Publications/Book/2559303\" rel=\"nofollow\" target=\"_blank\">https://spie.org/Publications/Book/2559303</a> <a href=\"#user-content-fnref-16\" data-footnote-backref=\"\" aria-label=\"Back to reference 16\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n</ol>\n</section>",
            "url": "https://fleetwood.dev/posts/a-first-principles-analysis-of-consumer-smart-glasses",
            "title": "A first principles analysis of Smartglasses",
            "summary": "Smartglasses are a form of Augmented Reality (AR) providing significant compute power. What are the current fundamental limitations prohibiting the consumer adoption of smartglasses, and when can we expect their widespread adoption?",
            "date_modified": "2024-07-19T00:00:00.000Z",
            "author": {
                "name": "Christopher Fleetwood",
                "url": "https://fleetwood.dev"
            }
        },
        {
            "id": "https://fleetwood.dev/posts/fmri-timeseries-similarity",
            "content_html": "<p>In order to perform classification on a functional brain scan it first\nundergoes many preprocessing steps. One of these steps is the transformation from\nthe timeseries output of the fMRI scan, transforming a <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi><mo lspace=\"0em\" rspace=\"0em\">×</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">m{\\times}n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em\"></span><span class=\"mord mathnormal\">m</span><span class=\"mord\"><span class=\"mord\">×</span></span><span class=\"mord mathnormal\">n</span></span></span></span> (where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>m</mi></mrow><annotation encoding=\"application/x-tex\">m</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">m</span></span></span></span> is\nthe number of timepoints recorded and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord mathnormal\">n</span></span></span></span> is the number of brain regions used)\nto an <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>n</mi><mo lspace=\"0em\" rspace=\"0em\">×</mo><mi>n</mi></mrow><annotation encoding=\"application/x-tex\">n{\\times}n</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em\"></span><span class=\"mord mathnormal\">n</span><span class=\"mord\"><span class=\"mord\">×</span></span><span class=\"mord mathnormal\">n</span></span></span></span> matrix of similiarity values (called a connectome). This\nsimilarity value is a measure of the neural synchronization between the 2 regions.</p>\n<p>So how do we quantify the similarity of 2 different timeseries? This blog post will\nexplore the common ways of quantifying time series similarity in a neuroscientific\nsetting. Before we get into the actual methods used to calculate time series similarity,\nwe need to cover the corner stone of almost all of the methods we are about to\nexplore - covariance.</p>\n<h2>Covariance</h2>\n<p><strong>Covariance</strong> is simply a measure of how two random variables change together.\nBelow is the formula for calculating covariance:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi mathvariant=\"normal\">Σ</mi><mo>⁡</mo><mo>=</mo><mi mathvariant=\"normal\">E</mi><mo>⁡</mo><mrow><mo fence=\"true\">[</mo><mrow><mo fence=\"true\">(</mo><mi>X</mi><mo>−</mo><mi mathvariant=\"normal\">E</mi><mo>⁡</mo><mo stretchy=\"false\">[</mo><mi>X</mi><mo stretchy=\"false\">]</mo><mo fence=\"true\">)</mo></mrow><msup><mrow><mo fence=\"true\">(</mo><mi>X</mi><mo>−</mo><mi mathvariant=\"normal\">E</mi><mo>⁡</mo><mo stretchy=\"false\">[</mo><mi>X</mi><mo stretchy=\"false\">]</mo><mo fence=\"true\">)</mo></mrow><mi mathvariant=\"normal\">T</mi></msup><mo fence=\"true\">]</mo></mrow></mrow><annotation encoding=\"application/x-tex\">\\operatorname{\\Sigma} = \\operatorname{E}\\left[\\left(X-\\operatorname{E}[X]\\right ) \\left (X-\\operatorname{E}[X]\\right)^\\mathrm{T}\\right]</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mop\"><span class=\"mord mathrm\">Σ</span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:1.8em;vertical-align:-0.65em\"></span><span class=\"mop\"><span class=\"mord mathrm\">E</span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\"><span class=\"delimsizing size2\">[</span></span><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mop\"><span class=\"mord mathrm\">E</span></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mclose\">]</span><span class=\"mclose delimcenter\" style=\"top:0em\">)</span></span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"minner\"><span class=\"minner\"><span class=\"mopen delimcenter\" style=\"top:0em\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mbin\">−</span><span class=\"mspace\" style=\"margin-right:0.2222em\"></span><span class=\"mop\"><span class=\"mord mathrm\">E</span></span><span class=\"mopen\">[</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mclose\">]</span><span class=\"mclose delimcenter\" style=\"top:0em\">)</span></span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.9812em\"><span style=\"top:-3.2029em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mathrm mtight\">T</span></span></span></span></span></span></span></span><span class=\"mclose delimcenter\" style=\"top:0em\"><span class=\"delimsizing size2\">]</span></span></span></span></span></span></span>\n<p>In the case of fMRI, we have a multivariate random variable, allowing us to use\nMaximum Likelihood Estimation to estimate the covariance matrix. Below is a toy\nexample of our estimated covariance matrix.</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mo fence=\"true\">[</mo><mtable rowspacing=\"0.16em\" columnalign=\"center center center\" columnspacing=\"1em\"><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>v</mi><mi>a</mi><mi>r</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>c</mi><mi>o</mi><mi>v</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>c</mi><mi>o</mi><mi>v</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Z</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>c</mi><mi>o</mi><mi>v</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>v</mi><mi>a</mi><mi>r</mi><mo stretchy=\"false\">(</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>c</mi><mi>o</mi><mi>v</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>c</mi><mi>o</mi><mi>v</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Z</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>c</mi><mi>o</mi><mi>v</mi><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel=\"0\" displaystyle=\"false\"><mrow><mi>v</mi><mi>a</mi><mi>r</mi><mo stretchy=\"false\">(</mo><mi>Z</mi><mo stretchy=\"false\">)</mo></mrow></mstyle></mtd></mtr></mtable><mo fence=\"true\">]</mo></mrow><annotation encoding=\"application/x-tex\">\\begin{bmatrix}\nvar(X) &amp; cov(X,Y) &amp; cov(X,Z) \\\\\ncov(X,Y) &amp; var(Y) &amp; cov(X,Y) \\\\\ncov(X,Z) &amp; cov(X,Y) &amp; var(Z)\n\\end{bmatrix}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:3.6em;vertical-align:-1.55em\"></span><span class=\"minner\"><span class=\"mopen\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.05em\"><span style=\"top:-4.05em\"><span class=\"pstrut\" style=\"height:5.6em\"></span><span style=\"width:0.667em;height:3.600em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.667em\" height=\"3.600em\" viewBox=\"0 0 667 3600\"><path d=\"M403 1759 V84 H666 V0 H319 V1759 v0 v1759 h347 v-84\nH403z M403 1759 V0 H319 V1759 v0 v1759 h84z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.55em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mtable\"><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.05em\"><span style=\"top:-4.21em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mord mathnormal\">a</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mclose\">)</span></span></span><span style=\"top:-3.01em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">co</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mclose\">)</span></span></span><span style=\"top:-1.81em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">co</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">Z</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.55em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.05em\"><span style=\"top:-4.21em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">co</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mclose\">)</span></span></span><span style=\"top:-3.01em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mord mathnormal\">a</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mclose\">)</span></span></span><span style=\"top:-1.81em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">co</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.55em\"><span></span></span></span></span></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"arraycolsep\" style=\"width:0.5em\"></span><span class=\"col-align-c\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.05em\"><span style=\"top:-4.21em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">co</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">Z</span><span class=\"mclose\">)</span></span></span><span style=\"top:-3.01em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\">co</span><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mclose\">)</span></span></span><span style=\"top:-1.81em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">v</span><span class=\"mord mathnormal\">a</span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">r</span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07153em\">Z</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.55em\"><span></span></span></span></span></span></span></span><span class=\"mclose\"><span class=\"delimsizing mult\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:2.05em\"><span style=\"top:-4.05em\"><span class=\"pstrut\" style=\"height:5.6em\"></span><span style=\"width:0.667em;height:3.600em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"0.667em\" height=\"3.600em\" viewBox=\"0 0 667 3600\"><path d=\"M347 1759 V0 H0 V84 H263 V1759 v0 v1759 H0 v84 H347z\nM347 1759 V0 H263 V1759 v0 v1759 h84z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.55em\"><span></span></span></span></span></span></span></span></span></span></span></span>\n<p>This estimated covariance matrix is called the empirical covariance matrix. In\npractice we don&#x27;t use the covariance matrix for our fMRI analyis.\nThis for a number of reasons:</p>\n<ul>\n<li>Covariance is bound between <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mo>−</mo><mi mathvariant=\"normal\">∞</mi></mrow><annotation encoding=\"application/x-tex\">-\\infty</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6667em;vertical-align:-0.0833em\"></span><span class=\"mord\">−</span><span class=\"mord\">∞</span></span></span></span> and <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi mathvariant=\"normal\">∞</mi></mrow><annotation encoding=\"application/x-tex\">\\infty</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.4306em\"></span><span class=\"mord\">∞</span></span></span></span> making it less\nsuitable for downstream classifiers.</li>\n<li>Covariance coefficients are not standardized and cannot be used to quantify the\nstrength of the relationship.</li>\n<li>Symmetric Positive semi-definite (SPD) matricies do not naturally form a Euclidean space (this will be\nimportant later).</li>\n<li>When the number of features is large relative to the number of observations,\nthe sample/empirical covariance matrix has excessive estimation error.</li>\n</ul>\n<p>Many of the following approaches will aim to address some or all of the above\ndrawbacks with empirical covariance.</p>\n<p>To address the excessive estimation error, it is common to perform a\ntransformation to the covariance coefficients, known as <strong>&quot;shrinkage&quot;</strong>. In\ntheir seminal paper, &quot;Honey, I shrunk the Sample Covariance Matrix&quot; [<a href=\"http://www.ledoit.net/honey.pdf\" rel=\"nofollow\" target=\"_blank\">1</a>], Ledoit &amp; Wolf\nproposed using shrinkage to regularize the sample covariance matrix.\n&quot;Shrinkage&quot; as the name implies, pulls the most extreme covariance coefficients\ntowards more central values. This not only resolves our excessive estimation\nerror, it can also make the matrix easily invertable by encouraging numerical\nstability. (The interested reader should consult the scikit-learn docs\n[<a href=\"https://scikit-learn.org/stable/auto_examples/covariance/plot_covariance_estimation.html\" rel=\"nofollow\" target=\"_blank\">2</a>] )</p>\n<p>Now that we have a well conditioned covariance matrix, we can attempt to address\nsome of the other identified drawbacks.</p>\n<h2>Canonical Approaches</h2>\n<p><strong>Pearson&#x27;s correlation coefficient</strong> (Pearson&#x27;s <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>R</mi></mrow><annotation encoding=\"application/x-tex\">R</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.00773em\">R</span></span></span></span>) or simply correlation, is\nthe most commonly used method to quantify similarity between 2 fMRI timeseries.\nCorrelation is a linear metric computed from the covariance of the 2 timeseries.\nBelow is the mathematical formula to compute correlation for a pair of random\nvariables:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mstyle scriptlevel=\"0\" displaystyle=\"true\"><msub><mi>ρ</mi><mrow><mi>X</mi><mo separator=\"true\">,</mo><mi>Y</mi></mrow></msub><mo>=</mo><mfrac><mrow><mi mathvariant=\"normal\">cov</mi><mo>⁡</mo><mo stretchy=\"false\">(</mo><mi>X</mi><mo separator=\"true\">,</mo><mi>Y</mi><mo stretchy=\"false\">)</mo></mrow><mrow><msub><mi>σ</mi><mi>X</mi></msub><msub><mi>σ</mi><mi>Y</mi></msub></mrow></mfrac></mstyle></mrow><annotation encoding=\"application/x-tex\">{\\displaystyle \\rho _{X,Y}={\\frac {\\operatorname {cov} (X,Y)}{\\sigma _{X}\\sigma _{Y}}}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:2.263em;vertical-align:-0.836em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">ρ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct mtight\">,</span><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em\">Y</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.427em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">σ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.07847em\">X</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\" style=\"margin-right:0.03588em\">σ</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3283em\"><span style=\"top:-2.55em;margin-left:-0.0359em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.22222em\">Y</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mop\"><span class=\"mord mathrm\" style=\"margin-right:0.01389em\">cov</span></span><span class=\"mopen\">(</span><span class=\"mord mathnormal\" style=\"margin-right:0.07847em\">X</span><span class=\"mpunct\">,</span><span class=\"mspace\" style=\"margin-right:0.1667em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.22222em\">Y</span><span class=\"mclose\">)</span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.836em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></span></span>\n<p>Correlation is widely used in neuroscience as it has long statistical history\nand is bound between -1 and 1. However, correlation does have some disadvantages.\nThe below figure should demonstrate one clearly:</p>\n<p><img src=\"/corr.gif\" alt=\"Correlation Decay\" title=\"Correlation Decay\"/></p>\n<p>Due to correlations linear nature, the same timeseries being slightly out of\nphase causes a huge decrease in the correlation value. Additionally, correlation\nprovides no distinction between whether 2 regions are directly\nconnected or indirectly connected via another region. To account for this, we\ncan use partial correlation!</p>\n<p><strong>Partial correlation</strong> is a variant of PCC that attempts to address\ndistinguishing between direct and indirect connections. This is done by\ncomputing correlation between regions after regressing all other timeseries.\nPartial correlation is computed from the inverse of the covariance matrix\n(this is where the shrinkage comes in handy), also known as the precision matrix.\nBelow is the mathematical formula for computing partial correlation for a pair\nof random variables:</p>\n<span class=\"katex-display\"><span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\" display=\"block\"><semantics><mrow><mi>ρ</mi><mo>=</mo><mo>−</mo><mfrac><msub><mi>p</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><msqrt><mrow><msub><mi>p</mi><mrow><mi>i</mi><mi>i</mi></mrow></msub><msub><mi>p</mi><mrow><mi>j</mi><mi>j</mi></mrow></msub></mrow></msqrt></mfrac></mrow><annotation encoding=\"application/x-tex\">\\rho = -{\\frac {p_{ij}}{\\sqrt {p_{ii} p_{jj}}}}</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">ρ</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span><span class=\"mrel\">=</span><span class=\"mspace\" style=\"margin-right:0.2778em\"></span></span><span class=\"base\"><span class=\"strut\" style=\"height:2.1763em;vertical-align:-1.0688em\"></span><span class=\"mord\">−</span><span class=\"mord\"><span class=\"mord\"><span class=\"mopen nulldelimiter\"></span><span class=\"mfrac\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.1076em\"><span style=\"top:-2.314em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord sqrt\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.6572em\"><span class=\"svg-align\" style=\"top:-3em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\" style=\"padding-left:0.833em\"><span class=\"mord\"><span class=\"mord mathnormal\">p</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\">ii</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.15em\"><span></span></span></span></span></span></span><span class=\"mord\"><span class=\"mord mathnormal\">p</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em\">jj</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span><span style=\"top:-2.6172em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"hide-tail\" style=\"min-width:0.853em;height:1.08em\"><svg xmlns=\"http://www.w3.org/2000/svg\" width=\"400em\" height=\"1.08em\" viewBox=\"0 0 400000 1080\" preserveAspectRatio=\"xMinYMin slice\"><path d=\"M95,702\nc-2.7,0,-7.17,-2.7,-13.5,-8c-5.8,-5.3,-9.5,-10,-9.5,-14\nc0,-2,0.3,-3.3,1,-4c1.3,-2.7,23.83,-20.7,67.5,-54\nc44.2,-33.3,65.8,-50.3,66.5,-51c1.3,-1.3,3,-2,5,-2c4.7,0,8.7,3.3,12,10\ns173,378,173,378c0.7,0,35.3,-71,104,-213c68.7,-142,137.5,-285,206.5,-429\nc69,-144,104.5,-217.7,106.5,-221\nl0 -0\nc5.3,-9.3,12,-14,20,-14\nH400000v40H845.2724\ns-225.272,467,-225.272,467s-235,486,-235,486c-2.7,4.7,-9,7,-19,7\nc-6,0,-10,-1,-12,-3s-194,-422,-194,-422s-65,47,-65,47z\nM834 80h400000v40h-400000z\"></path></svg></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3828em\"><span></span></span></span></span></span></span></span><span style=\"top:-3.23em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"frac-line\" style=\"border-bottom-width:0.04em\"></span></span><span style=\"top:-3.677em\"><span class=\"pstrut\" style=\"height:3em\"></span><span class=\"mord\"><span class=\"mord\"><span class=\"mord mathnormal\">p</span><span class=\"msupsub\"><span class=\"vlist-t vlist-t2\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.3117em\"><span style=\"top:-2.55em;margin-left:0em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\"><span class=\"mord mathnormal mtight\" style=\"margin-right:0.05724em\">ij</span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.2861em\"><span></span></span></span></span></span></span></span></span></span><span class=\"vlist-s\">​</span></span><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:1.0688em\"><span></span></span></span></span></span><span class=\"mclose nulldelimiter\"></span></span></span></span></span></span></span>\n<p>where <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>p</mi></mrow><annotation encoding=\"application/x-tex\">p</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.625em;vertical-align:-0.1944em\"></span><span class=\"mord mathnormal\">p</span></span></span></span> is the respective precision matrix (obtained through inversion of our\nwell conditioned covariance matrix).</p>\n<h2>Novel Approaches</h2>\n<p><strong>Tangent space parameterization</strong> is a recently proposed solution to the problematic\nnature of SPD matricies, in that they don&#x27;t naturally form Euclidean spaces.\nVaroquax et al. [<a href=\"https://arxiv.org/abs/1008.5070\" rel=\"nofollow\" target=\"_blank\">3</a>] proposed a solution to this\nsticky problem. Whilst I won&#x27;t profess to be an expert on differential\nmanifolds or Riemmenian geometry, here is how I understand it.</p>\n<p>Any covariance matrix is a symmetric positive semi-definite matrix. In order to\npreserve the geometry during our transformations, we need to\nperform a projection into a space where our geometry is preserved.\nAs SPD matricies form a diffentiable manifold, we can use this fact and\nRiemannian geometry to keep our geometric intutions in tact.</p>\n<p><img src=\"/manifold.jpg\" alt=\"Manifold\" title=\"Manifold\"/></p>\n<a href=\"https://juliahub.com/docs/PosDefManifold/vb3YZ/0.4.8/introToRiemannianGeometry/#Intro-to-Riemannian-Geometry-1\" styles=\"text-align: center;\"><p>Image source</p></a>\n<p>The above image makes it clear that if we perform Euclidean operations like\nsubtraction in our manifold, the distances and therefore the geometry will break\ndown. By defining a homomorphic (ie structure preserving) tangent space, we can\napproximate the distance on the manifold using Euclidean distance in the tangent\nspace!</p>\n<p>But we aren&#x27;t quite out of the woods yet, since we still need to define the\nreference point <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>G</mi></mrow><annotation encoding=\"application/x-tex\">G</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:0.6833em\"></span><span class=\"mord mathnormal\">G</span></span></span></span> on the manifold. As all of our covariance matricies need\nto be projected into the same tangent space, we need to choose an appropriate\nreference point. To do this, Varoquax et al. compute the geometric mean of the\nmatricies to define a reference space.</p>\n<p>In their paper, Dadi et al. [<a href=\"https://hal.inria.fr/hal-01824205v3\" rel=\"nofollow\" target=\"_blank\">4</a>] showed\nthat across many fMRI classification tasks, tangent space parameterization\nshould be preferred over both correlation and partial correlation.</p>\n<h3>Dynamic Time Warping</h3>\n<p><strong>Dynamic Time Warping</strong> (DTW) is not like the previously discussed methods, as it does\nnot rely on covariance matricies. DTW is one of the canonical algorithms used for measuring\nsimilarity between timeseries. It is commonly used in scenarios where timeseries\nmay have a different length or be temporally offest. DTW is a non linear metric,\nand calculates the optimal match between two given sequences with certain\nrestrictions.</p>\n<p>DTW was first proposed for usage with fMRI timeseries by Meszlényi et al. [<a href=\"https://www.frontiersin.org/articles/10.3389/fnins.2017.00075/full\" rel=\"nofollow\" target=\"_blank\">5</a>].\nThey propose its value for fMRI timeseries analysis as it has been noted that\n&quot;dynamic switch of brain states&quot; can cause non-stationary time lags. If there\nare known or unknown mechanisms in the brain that cause regional time differences\nin hemodynamic response - DTW could be a great way to account for this.</p>\n<p>Below is a Rust implementation of windowed DTW.\nThe window refers to the maximal warping distance.</p>\n<pre class=\"language-rust\"><code class=\"language-rust code-highlight\"><span class=\"code-line line-number\" line=\"1\"><span class=\"token keyword\">let</span> m <span class=\"token operator\">=</span> s<span class=\"token punctuation\">.</span><span class=\"token function\">len</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"2\"><span class=\"token keyword\">let</span> n <span class=\"token operator\">=</span> t<span class=\"token punctuation\">.</span><span class=\"token function\">len</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">)</span> <span class=\"token operator\">+</span> <span class=\"token number\">1</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"3\"><span class=\"token keyword\">let</span> <span class=\"token keyword\">mut</span> dtw <span class=\"token operator\">=</span> <span class=\"token class-name\">Array</span><span class=\"token punctuation\">::</span><span class=\"token function\">from_elem</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>m<span class=\"token punctuation\">,</span> n<span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> <span class=\"token keyword\">f64</span><span class=\"token punctuation\">::</span><span class=\"token constant\">MAX</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"4\">\n</span><span class=\"code-line line-number\" line=\"5\">dtw<span class=\"token punctuation\">[</span><span class=\"token punctuation\">[</span><span class=\"token number\">0</span><span class=\"token punctuation\">,</span> <span class=\"token number\">0</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> <span class=\"token number\">0</span><span class=\"token punctuation\">.</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"6\">\n</span><span class=\"code-line line-number\" line=\"7\"><span class=\"token keyword\">let</span> max_window <span class=\"token operator\">=</span> <span class=\"token keyword\">i32</span><span class=\"token punctuation\">::</span><span class=\"token function\">max</span><span class=\"token punctuation\">(</span><span class=\"token operator\">*</span>window<span class=\"token punctuation\">,</span> <span class=\"token keyword\">i32</span><span class=\"token punctuation\">::</span><span class=\"token function\">abs</span><span class=\"token punctuation\">(</span><span class=\"token punctuation\">(</span>n <span class=\"token operator\">-</span> m<span class=\"token punctuation\">)</span> <span class=\"token keyword\">as</span> <span class=\"token keyword\">i32</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"8\"><span class=\"token keyword\">for</span> si <span class=\"token keyword\">in</span> <span class=\"token number\">1</span><span class=\"token punctuation\">..</span>n <span class=\"token punctuation\">{</span>\n</span><span class=\"code-line line-number\" line=\"9\">    <span class=\"token keyword\">let</span> lower_bound <span class=\"token operator\">=</span> <span class=\"token keyword\">i32</span><span class=\"token punctuation\">::</span><span class=\"token function\">max</span><span class=\"token punctuation\">(</span><span class=\"token number\">1</span><span class=\"token punctuation\">,</span> si <span class=\"token keyword\">as</span> <span class=\"token keyword\">i32</span> <span class=\"token operator\">-</span> max_window<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"10\">    <span class=\"token keyword\">let</span> upper_bound <span class=\"token operator\">=</span> <span class=\"token keyword\">i32</span><span class=\"token punctuation\">::</span><span class=\"token function\">min</span><span class=\"token punctuation\">(</span>m <span class=\"token keyword\">as</span> <span class=\"token keyword\">i32</span><span class=\"token punctuation\">,</span> si <span class=\"token keyword\">as</span> <span class=\"token keyword\">i32</span> <span class=\"token operator\">+</span> max_window<span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"11\">    <span class=\"token keyword\">for</span> ti <span class=\"token keyword\">in</span> lower_bound <span class=\"token keyword\">as</span> <span class=\"token keyword\">usize</span><span class=\"token punctuation\">..</span>upper_bound <span class=\"token keyword\">as</span> <span class=\"token keyword\">usize</span> <span class=\"token punctuation\">{</span>\n</span><span class=\"code-line line-number\" line=\"12\">        <span class=\"token keyword\">let</span> cost <span class=\"token operator\">=</span> <span class=\"token function\">distance_fn</span><span class=\"token punctuation\">(</span><span class=\"token operator\">&amp;</span>s<span class=\"token punctuation\">[</span>si <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> <span class=\"token operator\">&amp;</span>t<span class=\"token punctuation\">[</span>ti <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"13\">        dtw<span class=\"token punctuation\">[</span><span class=\"token punctuation\">[</span>si<span class=\"token punctuation\">,</span> ti<span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span> <span class=\"token operator\">=</span> cost <span class=\"token operator\">+</span> \n</span><span class=\"code-line line-number\" line=\"14\">            <span class=\"token keyword\">f64</span><span class=\"token punctuation\">::</span><span class=\"token function\">min</span><span class=\"token punctuation\">(</span>\n</span><span class=\"code-line line-number\" line=\"15\">                <span class=\"token keyword\">f64</span><span class=\"token punctuation\">::</span><span class=\"token function\">min</span><span class=\"token punctuation\">(</span>dtw<span class=\"token punctuation\">[</span><span class=\"token punctuation\">[</span>si <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">,</span> ti<span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">,</span> dtw<span class=\"token punctuation\">[</span><span class=\"token punctuation\">[</span>si<span class=\"token punctuation\">,</span> ti <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">)</span><span class=\"token punctuation\">,</span> \n</span><span class=\"code-line line-number\" line=\"16\">                dtw<span class=\"token punctuation\">[</span><span class=\"token punctuation\">[</span>si <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">,</span> ti <span class=\"token operator\">-</span> <span class=\"token number\">1</span><span class=\"token punctuation\">]</span><span class=\"token punctuation\">]</span>\n</span><span class=\"code-line line-number\" line=\"17\">            <span class=\"token punctuation\">)</span><span class=\"token punctuation\">;</span>\n</span><span class=\"code-line line-number\" line=\"18\">    <span class=\"token punctuation\">}</span>\n</span><span class=\"code-line line-number\" line=\"19\"><span class=\"token punctuation\">}</span>\n</span></code></pre>\n<p>Dynamic Time Warping does have some disadvantages. One particularly\ntroublesome one being the computational complexity - <span class=\"katex\"><span class=\"katex-mathml\"><math xmlns=\"http://www.w3.org/1998/Math/MathML\"><semantics><mrow><mi>O</mi><mo stretchy=\"false\">(</mo><msup><mi>n</mi><mn>2</mn></msup><mo stretchy=\"false\">)</mo></mrow><annotation encoding=\"application/x-tex\">O(n^2)</annotation></semantics></math></span><span class=\"katex-html\" aria-hidden=\"true\"><span class=\"base\"><span class=\"strut\" style=\"height:1.0641em;vertical-align:-0.25em\"></span><span class=\"mord mathnormal\" style=\"margin-right:0.02778em\">O</span><span class=\"mopen\">(</span><span class=\"mord\"><span class=\"mord mathnormal\">n</span><span class=\"msupsub\"><span class=\"vlist-t\"><span class=\"vlist-r\"><span class=\"vlist\" style=\"height:0.8141em\"><span style=\"top:-3.063em;margin-right:0.05em\"><span class=\"pstrut\" style=\"height:2.7em\"></span><span class=\"sizing reset-size6 size3 mtight\"><span class=\"mord mtight\">2</span></span></span></span></span></span></span></span><span class=\"mclose\">)</span></span></span></span>. The paper from\nMeszlényi et al. does come with a DTW implementation for fMRI data on GitHub,\nhowever it is extremely cumbersome and difficult to get running (particularly\non HPC systems where you don&#x27;t have root access.)</p>\n<p>To address this I wrote RustDTW, a Python module backed by Rust code that works\nnatively within Python on numpy matricies. Check out an example of using DTW\nfor fMRI classification <a href=\"https://github.com/FL33TW00D/rustDTW/blob/master/examples/classification/ABIDE_classification.ipynb\" rel=\"nofollow\" target=\"_blank\">here.</a>\nMy basic testing shows that it should be nearly 10x faster than a mutlithreaded\nPython implementation.</p>\n<h2>Classification Performance</h2>\n<p>Whilst not meant to be an exhaustive exploration of the different connectivity\nmetrics explored above (and itself is not an exhaustive exploration of available\nconnectivity metrics, for more please see Pervaiz et al. [<a href=\"https://www.biorxiv.org/content/10.1101/741595v1.full.pdf\" rel=\"nofollow\" target=\"_blank\">6</a>]\nI thought I would demonstrate a comparison between them using a large open source dataset.\nYou can check out the inner workings and run it for yourself on Colab <a href=\"https://colab.research.google.com/github/FL33TW00D/rustDTW/blob/master/examples/classification/ABIDE_classification.ipynb\" rel=\"nofollow\" target=\"_blank\">here</a>!</p>\n<p>For those just interested in the results, the below graph summarizes the\nperformance of each of the similarity metrics.</p>\n<p><img src=\"/accuracy.png\" alt=\"Accuracy\" title=\"Accuracy\"/></p>\n<p>The results are pretty interesting! We can see that both non-linear matching and\ntransforming the covariance matricies into a natural space can provide improvements\nover standard correlation. Perhaps a future metric will incorporate\ncharacteristics from both tangent space parameterization and DTW to improve\nperformance.</p>",
            "url": "https://fleetwood.dev/posts/fmri-timeseries-similarity",
            "title": "An exploration of fMRI timeseries similarity metrics",
            "summary": "A key step in diagnosing pathology using functional magnetic resonance imaging (fMRI) data is computing the similarity of timeseries, as this is what determines if 2 brain regions are \"connected\" or not. This post explores different methods of computing timeseries similarity",
            "date_modified": "2021-07-10T00:00:00.000Z",
            "author": {
                "name": "Christopher Fleetwood",
                "url": "https://fleetwood.dev"
            }
        }
    ]
}