<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://learn.tenzin.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://learn.tenzin.io/" rel="alternate" type="text/html" /><updated>2026-06-30T16:14:06+00:00</updated><id>https://learn.tenzin.io/feed.xml</id><title type="html">Notebook</title><subtitle>Notes and references as I learn new things.</subtitle><author><name>Tenzin Lhakhang</name></author><entry><title type="html">kubeadm Command Reference</title><link href="https://learn.tenzin.io/2026/06/30/kubeadm.html" rel="alternate" type="text/html" title="kubeadm Command Reference" /><published>2026-06-30T00:00:00+00:00</published><updated>2026-06-30T00:00:00+00:00</updated><id>https://learn.tenzin.io/2026/06/30/kubeadm</id><content type="html" xml:base="https://learn.tenzin.io/2026/06/30/kubeadm.html"><![CDATA[<h2 id="summary">Summary</h2>

<p>kubeadm is a <em>setup tool</em>, not a daemon. It generates config and hands off to the kubelet. On the CKA it shows up most in Cluster Architecture, Installation &amp; Configuration (25%) — upgrade workflow and token regeneration are near-certain tasks.</p>

<h2 id="key-concepts">Key concepts</h2>

<p>The mental model: kubeadm writes files; the kubelet acts on them. Everything the control plane “runs” — apiserver, etcd, scheduler, controller-manager — the kubelet runs as static pods by watching <code class="language-plaintext highlighter-rouge">/etc/kubernetes/manifests/</code>. kubeadm writes those manifests and the PKI certs they reference, then exits. This explains most of kubeadm’s behavior: why bouncing static pods requires touching the manifests directory, why cert renewal doesn’t restart anything automatically, why kubeadm itself isn’t present after init.</p>

<p>Almost every subcommand supports phase-level execution (<code class="language-plaintext highlighter-rouge">kubeadm &lt;cmd&gt; phase --help</code>), which lets you run one step in isolation. Useful when something fails mid-init or mid-upgrade.</p>

<h3 id="kubeadm-init">kubeadm init</h3>

<p>Bootstraps the first control-plane node. Under the hood: generates the CA and all component certs under <code class="language-plaintext highlighter-rouge">/etc/kubernetes/pki/</code>, writes static pod manifests to <code class="language-plaintext highlighter-rouge">/etc/kubernetes/manifests/</code> (kubelet picks these up immediately), creates the <code class="language-plaintext highlighter-rouge">kubelet-config</code> ConfigMap in <code class="language-plaintext highlighter-rouge">kube-system</code>, and prints a join command.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>kubeadm init <span class="se">\</span>
  <span class="nt">--pod-network-cidr</span><span class="o">=</span>10.244.0.0/16 <span class="se">\</span>
  <span class="nt">--apiserver-advertise-address</span><span class="o">=</span>10.0.0.10 <span class="se">\</span>
  <span class="nt">--control-plane-endpoint</span><span class="o">=</span>10.0.0.10:6443 <span class="se">\</span>
  <span class="nt">--kubernetes-version</span><span class="o">=</span>v1.35.0 <span class="se">\</span>
  <span class="nt">--upload-certs</span> <span class="se">\</span>
  <span class="nt">--cri-socket</span><span class="o">=</span>unix:///var/run/containerd/containerd.sock
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">--control-plane-endpoint</code> should point to a load balancer or stable DNS — not the node IP — if there’s any chance of adding more control-plane nodes later. It can’t be cleanly retrofitted after init.</p>

<p><code class="language-plaintext highlighter-rouge">--upload-certs</code> stores the control-plane certs encrypted in a Secret in <code class="language-plaintext highlighter-rouge">kube-system</code> so additional control-plane nodes can pull them during <code class="language-plaintext highlighter-rouge">join --control-plane</code>. The encryption key expires in ~2h.</p>

<p>Post-init, copy the kubeconfig so kubectl works as your user:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">mkdir</span> <span class="nt">-p</span> <span class="nv">$HOME</span>/.kube
<span class="nb">sudo cp</span> <span class="nt">-i</span> /etc/kubernetes/admin.conf <span class="nv">$HOME</span>/.kube/config
<span class="nb">sudo chown</span> <span class="si">$(</span><span class="nb">id</span> <span class="nt">-u</span><span class="si">)</span>:<span class="si">$(</span><span class="nb">id</span> <span class="nt">-g</span><span class="si">)</span> <span class="nv">$HOME</span>/.kube/config
</code></pre></div></div>

<h3 id="kubeadm-join">kubeadm join</h3>

<p>Joins a node to the cluster. The mechanism differs between worker and control-plane nodes. For a worker: the kubelet presents the bootstrap token to the apiserver, gets a CSR signed (TLS bootstrapping), and from that point authenticates with its own client cert — the token is scaffolding it no longer needs. For an additional control-plane node: same flow, plus it pulls the control-plane certs from the Secret <code class="language-plaintext highlighter-rouge">--upload-certs</code> created, so it can serve the same CA-signed apiserver cert.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Worker</span>
<span class="nb">sudo </span>kubeadm <span class="nb">join </span>10.0.0.10:6443 <span class="se">\</span>
  <span class="nt">--token</span> abcdef.0123456789abcdef <span class="se">\</span>
  <span class="nt">--discovery-token-ca-cert-hash</span> sha256:&lt;<span class="nb">hash</span><span class="o">&gt;</span>

<span class="c"># Additional control-plane node</span>
<span class="nb">sudo </span>kubeadm <span class="nb">join </span>10.0.0.10:6443 <span class="se">\</span>
  <span class="nt">--token</span> &lt;token&gt; <span class="se">\</span>
  <span class="nt">--discovery-token-ca-cert-hash</span> sha256:&lt;<span class="nb">hash</span><span class="o">&gt;</span> <span class="se">\</span>
  <span class="nt">--control-plane</span> <span class="se">\</span>
  <span class="nt">--certificate-key</span> &lt;cert-key&gt;
</code></pre></div></div>

<p>The CA cert hash pins which cluster the joining node is allowed to trust — it prevents a rogue apiserver from hijacking the join. Compute it manually if you lost the init output:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>openssl x509 <span class="nt">-pubkey</span> <span class="nt">-in</span> /etc/kubernetes/pki/ca.crt <span class="se">\</span>
  | openssl rsa <span class="nt">-pubin</span> <span class="nt">-outform</span> der 2&gt;/dev/null <span class="se">\</span>
  | openssl dgst <span class="nt">-sha256</span> <span class="nt">-hex</span> | <span class="nb">sed</span> <span class="s1">'s/^.* //'</span>
</code></pre></div></div>

<h3 id="kubeadm-token">kubeadm token</h3>

<p>Bootstrap tokens are Kubernetes Secrets in <code class="language-plaintext highlighter-rouge">kube-system</code> with names like <code class="language-plaintext highlighter-rouge">bootstrap-token-&lt;id&gt;</code>. The apiserver has a token authentication plugin that reads these Secrets — no separate token service. They exist only to get a new kubelet through TLS bootstrapping; once it has a client cert, the token is irrelevant. That’s why 24h TTL is the right default: long enough to bootstrap, short enough to minimize the exposure window.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubeadm token list
<span class="nb">sudo </span>kubeadm token create <span class="nt">--ttl</span> 2h
<span class="nb">sudo </span>kubeadm token delete &lt;token&gt;
</code></pre></div></div>

<p>The one to memorize — prints a complete, ready-to-paste join command:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>kubeadm token create <span class="nt">--print-join-command</span>
</code></pre></div></div>

<h3 id="kubeadm-upgrade">kubeadm upgrade</h3>

<p>The upgrade sequence is ordered by dependency: the apiserver must be upgraded before any node-level component can talk to the new API. So control-plane first, then workers. The <code class="language-plaintext highlighter-rouge">kubeadm</code> binary must be upgraded before running it on each node because kubeadm reads its own version to know what to install and what manifests to generate — it can’t upgrade to a version it doesn’t know.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>kubeadm upgrade plan                 <span class="c"># read-only; shows current vs available</span>
<span class="nb">sudo </span>kubeadm upgrade apply v1.35.1        <span class="c"># first control-plane node only — upgrades static pod manifests</span>
<span class="nb">sudo </span>kubeadm upgrade node                 <span class="c"># every other node (additional control-plane nodes + workers)</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">upgrade apply</code> rewrites the static pod manifests in <code class="language-plaintext highlighter-rouge">/etc/kubernetes/manifests/</code>. The kubelet notices and restarts the control-plane pods. <code class="language-plaintext highlighter-rouge">upgrade node</code> does the same for secondary control-plane nodes, plus syncs the <code class="language-plaintext highlighter-rouge">kubelet-config</code> ConfigMap down to <code class="language-plaintext highlighter-rouge">/var/lib/kubelet/config.yaml</code> (which <code class="language-plaintext highlighter-rouge">upgrade apply</code> doesn’t do on the first node, because it knows it’s writing that ConfigMap as part of the upgrade and will sync it on the next <code class="language-plaintext highlighter-rouge">upgrade node</code> call on workers).</p>

<p>Full upgrade sequence (memorize this rhythm):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># === First control-plane node ===</span>

<span class="c"># Upgrade the kubeadm binary first</span>
<span class="nb">sudo </span>apt-mark unhold kubeadm
<span class="nb">sudo </span>apt-get update <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>apt-get <span class="nb">install</span> <span class="nt">-y</span> <span class="nv">kubeadm</span><span class="o">=</span>1.35.1-<span class="k">*</span>
<span class="nb">sudo </span>apt-mark hold kubeadm

<span class="nb">sudo </span>kubeadm upgrade plan
<span class="nb">sudo </span>kubeadm upgrade apply v1.35.1

<span class="c"># Drain, upgrade kubelet, uncordon</span>
kubectl drain &lt;cp-node&gt; <span class="nt">--ignore-daemonsets</span>
<span class="nb">sudo </span>apt-mark unhold kubelet kubectl
<span class="nb">sudo </span>apt-get update <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>apt-get <span class="nb">install</span> <span class="nt">-y</span> <span class="nv">kubelet</span><span class="o">=</span>1.35.1-<span class="k">*</span> <span class="nv">kubectl</span><span class="o">=</span>1.35.1-<span class="k">*</span>
<span class="nb">sudo </span>apt-mark hold kubelet kubectl
<span class="nb">sudo </span>systemctl daemon-reload <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>systemctl restart kubelet
kubectl uncordon &lt;cp-node&gt;

<span class="c"># === Each worker node (SSH into each) ===</span>

<span class="nb">sudo </span>apt-mark unhold kubeadm
<span class="nb">sudo </span>apt-get update <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>apt-get <span class="nb">install</span> <span class="nt">-y</span> <span class="nv">kubeadm</span><span class="o">=</span>1.35.1-<span class="k">*</span>
<span class="nb">sudo </span>apt-mark hold kubeadm

<span class="nb">sudo </span>kubeadm upgrade node   <span class="c"># syncs kubelet config, upgrades local control-plane components if present</span>

<span class="c"># Back on the control-plane node (or wherever kubectl is configured):</span>
kubectl drain &lt;worker&gt; <span class="nt">--ignore-daemonsets</span>

<span class="c"># Back on the worker:</span>
<span class="nb">sudo </span>apt-mark unhold kubelet kubectl
<span class="nb">sudo </span>apt-get update <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>apt-get <span class="nb">install</span> <span class="nt">-y</span> <span class="nv">kubelet</span><span class="o">=</span>1.35.1-<span class="k">*</span> <span class="nv">kubectl</span><span class="o">=</span>1.35.1-<span class="k">*</span>
<span class="nb">sudo </span>apt-mark hold kubelet kubectl
<span class="nb">sudo </span>systemctl daemon-reload <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>systemctl restart kubelet

<span class="c"># Back on control-plane:</span>
kubectl uncordon &lt;worker&gt;
</code></pre></div></div>

<h3 id="kubeadm-certs">kubeadm certs</h3>

<p>kubeadm is the CA for the cluster — it generated <code class="language-plaintext highlighter-rouge">/etc/kubernetes/pki/ca.key</code> and signed all component certs from it. <code class="language-plaintext highlighter-rouge">certs renew</code> re-signs certs against the same CA (which has a 10-year lifetime; component certs default to 1 year). Auto-renewal happens on <code class="language-plaintext highlighter-rouge">upgrade apply</code>.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>kubeadm certs check-expiration       <span class="c"># shows kubeadm-managed PKI only</span>
<span class="nb">sudo </span>kubeadm certs renew all
<span class="nb">sudo </span>kubeadm certs renew apiserver        <span class="c"># renew a specific cert</span>
</code></pre></div></div>

<p>After renewing, you must restart the static-pod control plane. The running apiserver/etcd/etc. processes opened their cert files at startup and hold those file descriptors — the renewed cert files on disk don’t take effect until the processes restart. The cleanest way is to move manifests out and back:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /etc/kubernetes/manifests
<span class="nb">sudo mkdir</span> <span class="nt">-p</span> /tmp/m <span class="o">&amp;&amp;</span> <span class="nb">sudo mv</span> <span class="k">*</span>.yaml /tmp/m/ <span class="o">&amp;&amp;</span> <span class="nb">sleep </span>20 <span class="o">&amp;&amp;</span> <span class="nb">sudo mv</span> /tmp/m/<span class="k">*</span>.yaml <span class="nb">.</span>
</code></pre></div></div>

<p>Note: <code class="language-plaintext highlighter-rouge">check-expiration</code> only shows the kubeadm-managed PKI. Kubelet client/serving certs rotate separately via the CSR mechanism and won’t appear here.</p>

<h3 id="kubeadm-config">kubeadm config</h3>

<p>Mostly useful for templating and pre-pulling images before an air-gapped init:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubeadm config print init-defaults        <span class="c"># print a full InitConfiguration/ClusterConfiguration template</span>
<span class="nb">sudo </span>kubeadm config images list           <span class="c"># images kubeadm needs for this version</span>
<span class="nb">sudo </span>kubeadm config images pull           <span class="c"># pre-pull them</span>
</code></pre></div></div>

<h3 id="kubeadm-reset">kubeadm reset</h3>

<p>Undoes what init/join did — cleans up manifests, certs, etcd member, kubelet state. Use before re-joining a broken node. Note it doesn’t clean CNI state or iptables rules, so do that manually:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>kubeadm reset
<span class="nb">sudo rm</span> <span class="nt">-rf</span> /etc/cni/net.d <span class="nv">$HOME</span>/.kube/config
<span class="nb">sudo </span>iptables <span class="nt">-F</span> <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>iptables <span class="nt">-t</span> nat <span class="nt">-F</span> <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>iptables <span class="nt">-t</span> mangle <span class="nt">-F</span> <span class="o">&amp;&amp;</span> <span class="nb">sudo </span>iptables <span class="nt">-X</span>
</code></pre></div></div>

<h3 id="etcd-backup--restore">etcd backup &amp; restore</h3>

<p>Not a kubeadm subcommand, but kubeadm owns the etcd PKI so the certs are in a known place. etcdctl needs TLS creds to talk to etcd; these are always the same paths on a kubeadm cluster:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Backup</span>
<span class="nb">sudo </span><span class="nv">ETCDCTL_API</span><span class="o">=</span>3 etcdctl snapshot save /opt/snapshot.db <span class="se">\</span>
  <span class="nt">--endpoints</span><span class="o">=</span>https://127.0.0.1:2379 <span class="se">\</span>
  <span class="nt">--cacert</span><span class="o">=</span>/etc/kubernetes/pki/etcd/ca.crt <span class="se">\</span>
  <span class="nt">--cert</span><span class="o">=</span>/etc/kubernetes/pki/etcd/server.crt <span class="se">\</span>
  <span class="nt">--key</span><span class="o">=</span>/etc/kubernetes/pki/etcd/server.key

<span class="c"># Verify</span>
<span class="nb">sudo </span><span class="nv">ETCDCTL_API</span><span class="o">=</span>3 etcdctl snapshot status /opt/snapshot.db <span class="nt">--write-out</span><span class="o">=</span>table

<span class="c"># Restore to a new data dir</span>
<span class="nb">sudo </span><span class="nv">ETCDCTL_API</span><span class="o">=</span>3 etcdctl snapshot restore /opt/snapshot.db <span class="se">\</span>
  <span class="nt">--data-dir</span><span class="o">=</span>/var/lib/etcd-restore
</code></pre></div></div>

<p>After restore, point the etcd static pod at the new data dir: edit <code class="language-plaintext highlighter-rouge">/etc/kubernetes/manifests/etcd.yaml</code>, update <code class="language-plaintext highlighter-rouge">--data-dir</code> and the <code class="language-plaintext highlighter-rouge">hostPath</code> volume to <code class="language-plaintext highlighter-rouge">/var/lib/etcd-restore</code>. The kubelet restarts etcd on the new data.</p>

<h3 id="paths-reference">Paths reference</h3>

<table>
  <thead>
    <tr>
      <th>Path</th>
      <th>What it is</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/etc/kubernetes/manifests/</code></td>
      <td>Static pod manifests — kubelet watches this</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/etc/kubernetes/pki/</code></td>
      <td>Control-plane certs and keys (kubeadm’s CA lives here)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/etc/kubernetes/pki/etcd/</code></td>
      <td>etcd-specific certs</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/etc/kubernetes/admin.conf</code></td>
      <td>Cluster-admin kubeconfig</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/var/lib/kubelet/config.yaml</code></td>
      <td>kubelet runtime config (synced from <code class="language-plaintext highlighter-rouge">kubelet-config</code> ConfigMap)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/var/lib/etcd/</code></td>
      <td>etcd data directory (default)</td>
    </tr>
  </tbody>
</table>

<h2 id="gotchas">Gotchas</h2>

<p><strong>Upgrade kubeadm binary before running upgrade commands on each node.</strong> kubeadm bootstraps itself — it needs to know the target version to generate correct manifests.</p>

<p><strong><code class="language-plaintext highlighter-rouge">upgrade apply</code> is first-control-plane-node-only; every subsequent node uses <code class="language-plaintext highlighter-rouge">upgrade node</code>.</strong> <code class="language-plaintext highlighter-rouge">apply</code> writes the new <code class="language-plaintext highlighter-rouge">kubelet-config</code> ConfigMap; <code class="language-plaintext highlighter-rouge">upgrade node</code> reads it. Running <code class="language-plaintext highlighter-rouge">apply</code> on a second control-plane node targets the wrong entrypoint and will either fail or mismatch state.</p>

<p><strong>Drain before upgrading the kubelet, uncordon after.</strong> Restarting kubelet disrupts pods running on that node — drain ensures they’ve already been rescheduled elsewhere.</p>

<p><strong>After <code class="language-plaintext highlighter-rouge">certs renew</code>, you must bounce the static pods.</strong> Renewing writes new files to disk. The running processes still have old file descriptors. They don’t notice the new files until they restart. A kubelet restart alone doesn’t help — the static pods are child processes of the kubelet that need their own restart.</p>

<p><strong><code class="language-plaintext highlighter-rouge">--control-plane-endpoint</code> must be set at init time.</strong> There’s no clean way to add it after the fact because it’s baked into multiple kubeconfigs and the apiserver cert SAN list.</p>

<p><strong>Local edits to <code class="language-plaintext highlighter-rouge">/var/lib/kubelet/config.yaml</code> don’t survive an upgrade.</strong> The <code class="language-plaintext highlighter-rouge">kubelet-config</code> phase overwrites it from the ConfigMap. The ConfigMap is the source of truth.</p>

<h2 id="open-questions">Open questions</h2>

<ul>
  <li>Kubelet serving certs rotate via CSR — what controls the rotation interval, and how does the kubelet signal readiness to rotate?</li>
  <li>In air-gapped clusters, <code class="language-plaintext highlighter-rouge">kubeadm config images list</code> covers control-plane images. What about CNI and CoreDNS — are those included, or is that a separate pre-pull step?</li>
</ul>

<h2 id="references">References</h2>

<ul>
  <li><a href="https://kubernetes.io/docs/reference/setup-tools/kubeadm/">kubeadm reference</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/">Upgrading kubeadm clusters</a></li>
  <li><a href="https://kubernetes.io/docs/setup/best-practices/certificates/">PKI certificates and requirements</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#backing-up-an-etcd-cluster">Backing up etcd</a></li>
</ul>]]></content><author><name>Tenzin Lhakhang</name></author><category term="kubernetes" /><category term="kubeadm" /><category term="cka" /><category term="cluster-administration" /><summary type="html"><![CDATA[kubeadm subcommands, flags, and workflows — upgrade sequence, token regeneration, and cert management.]]></summary></entry><entry><title type="html">CKA Exam Preparation — 7-Day Plan &amp;amp; Reference</title><link href="https://learn.tenzin.io/2026/06/24/cka-exam-prep.html" rel="alternate" type="text/html" title="CKA Exam Preparation — 7-Day Plan &amp;amp; Reference" /><published>2026-06-24T00:00:00+00:00</published><updated>2026-06-24T00:00:00+00:00</updated><id>https://learn.tenzin.io/2026/06/24/cka-exam-prep</id><content type="html" xml:base="https://learn.tenzin.io/2026/06/24/cka-exam-prep.html"><![CDATA[<h2 id="summary">Summary</h2>

<p>A tailored 7-day study plan for the CKA exam (Kubernetes v1.35, target date July 9 2026), plus an exam-day reference covering allowed resources, domain weights, and doc links you can open during the exam. Assumes solid prior knowledge — focused on exam mechanics and muscle memory, not concept introductions.</p>

<h2 id="exam-setup-at-a-glance">Exam setup at a glance</h2>

<table>
  <thead>
    <tr>
      <th>Item</th>
      <th>Detail</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>K8s version</td>
      <td>v1.35</td>
    </tr>
    <tr>
      <td>Time</td>
      <td>120 minutes</td>
    </tr>
    <tr>
      <td>Tasks</td>
      <td>~15–25, each weighted (do high-weight first)</td>
    </tr>
    <tr>
      <td>Pass</td>
      <td>66% (partial credit)</td>
    </tr>
    <tr>
      <td>Clusters</td>
      <td>~6 clusters, each with its own nodes; switch with <code class="language-plaintext highlighter-rouge">kubectl config use-context</code></td>
    </tr>
    <tr>
      <td>Node access</td>
      <td>SSH to control-plane/worker nodes, with sudo</td>
    </tr>
    <tr>
      <td>Cluster tooling</td>
      <td>kubeadm, etcd, CoreDNS, a CNI plugin; Helm + Kustomize expected</td>
    </tr>
    <tr>
      <td>Allowed docs</td>
      <td>kubernetes.io/docs, kubernetes.io/blog, helm.sh/docs (one browser tab)</td>
    </tr>
    <tr>
      <td>Simulator</td>
      <td>2x Killer.sh sessions (CKA-A, CKA-B), 17 questions each, 36h access per session</td>
    </tr>
  </tbody>
</table>

<p>Re-verify the K8s version at the <a href="https://docs.linuxfoundation.org/tc-docs/certification/faq-cka-ckad-cks">LF candidate FAQ</a> a few days before the exam — it realigns 4–8 weeks after a K8s release.</p>

<p>Critical exam habit: every task starts with a context-switch command. Run it. A correct answer in the wrong cluster scores zero.</p>

<p>This plan assumes solid knowledge of Kubernetes internals (kubeadm, etcd/PKI, CNI, networking). It is tuned for exam speed, imperative fluency, and the highest-weight domains — not for learning concepts from scratch.</p>

<h2 id="allowed-resources-during-the-exam">Allowed resources during the exam</h2>

<p>Open-book, but restricted to these sites through the browser inside the exam VM — no Google, Stack Overflow, GitHub, or personal notes. The docs search bar is fine, but you must not open external search results.</p>

<ul>
  <li><a href="https://kubernetes.io/docs">Kubernetes Docs</a></li>
  <li><a href="https://kubernetes.io/blog/">Kubernetes Blog</a></li>
  <li><a href="https://helm.sh/docs">Helm Docs</a></li>
  <li><a href="https://gateway-api.sigs.k8s.io">Gateway API Docs</a> (CKA-only)</li>
  <li>Quick Reference links shown in a given task’s info box</li>
</ul>

<p>Every per-domain link below resolves to one of these allowed domains. The <a href="https://docs.linuxfoundation.org/tc-docs/certification/faq-cka-ckad-cks">LF candidate FAQ</a> and the <a href="https://github.com/cncf/curriculum">CNCF curriculum</a> are pre-exam reference only — they will not open during the exam.</p>

<h2 id="grading-breakdown--doc-links">Grading breakdown + doc links</h2>

<p>Five domains. <strong>Troubleshooting (30%) and Cluster Architecture (25%) are 55% of the exam</strong> — spend your time accordingly.</p>

<h3 id="1-cluster-architecture-installation--configuration--25">1. Cluster Architecture, Installation &amp; Configuration — 25%</h3>

<p>RBAC, kubeadm cluster create/upgrade, HA control plane, Helm, Kustomize, extension interfaces (CNI/CSI/CRI), CRDs, Operators, etcd backup/restore.</p>

<ul>
  <li><a href="https://kubernetes.io/docs/reference/access-authn-authz/rbac/">RBAC</a></li>
  <li><a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/">kubeadm: create cluster</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/">kubeadm: upgrade</a></li>
  <li><a href="https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/">HA topology</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/">etcd backup/restore</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/manage-kubernetes-objects/kustomization/">Kustomize</a></li>
  <li><a href="https://helm.sh/docs/">Helm</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/">CRDs</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/extend-kubernetes/operator/">Operators</a></li>
</ul>

<h3 id="2-workloads--scheduling--15">2. Workloads &amp; Scheduling — 15%</h3>

<p>Deployments &amp; rollouts, ConfigMaps/Secrets, HPA, scheduling (affinity, taints/tolerations, nodeSelector).</p>

<ul>
  <li><a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/">Deployments</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/configuration/configmap/">ConfigMaps</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/configuration/secret/">Secrets</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/">HPA</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/">Assigning Pods to nodes</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/">Taints &amp; tolerations</a></li>
</ul>

<h3 id="3-storage--10">3. Storage — 10%</h3>

<p>StorageClasses, dynamic provisioning, PV/PVC, access modes, reclaim policies.</p>

<ul>
  <li><a href="https://kubernetes.io/docs/concepts/storage/storage-classes/">Storage classes</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/">Persistent volumes</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/storage/dynamic-provisioning/">Dynamic provisioning</a></li>
</ul>

<h3 id="4-services--networking--20">4. Services &amp; Networking — 20%</h3>

<p>Pod connectivity, NetworkPolicies, Service types &amp; endpoints, Gateway API, Ingress, CoreDNS.</p>

<ul>
  <li><a href="https://kubernetes.io/docs/concepts/services-networking/service/">Services</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/">Network policies</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/services-networking/ingress/">Ingress</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/services-networking/gateway/">Gateway API (concepts)</a></li>
  <li><a href="https://gateway-api.sigs.k8s.io">Gateway API (full reference)</a></li>
  <li><a href="https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/">DNS for services/pods</a></li>
</ul>

<h3 id="5-troubleshooting--30--highest-weight">5. Troubleshooting — 30% ← highest weight</h3>

<p>Cluster/node failures, control-plane components, kubelet, networking, app failures, resource monitoring, container logs.</p>

<ul>
  <li><a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/">Debug a cluster</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/debug/debug-application/debug-running-pod/">Debug running Pods</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/debug/debug-application/debug-service/">Debug services</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/kubelet/">kubelet troubleshooting</a></li>
  <li><a href="https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-usage-monitoring/">Resource usage monitoring</a></li>
</ul>

<h2 id="day-0-before-day-1">Day 0 (before Day 1)</h2>

<p>Save one Killer.sh session for the final dry run — activate the first now, each session is only 36h. Set up a local 2-node kubeadm sandbox (1 control-plane + 2 workers on throwaway Ubuntu 24.04 VMs) so you have a cluster you can break and rebuild fast.</p>

<p>Lock in your exam shell setup so it’s automatic on exam day:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">alias </span><span class="nv">k</span><span class="o">=</span>kubectl
<span class="nb">export </span><span class="k">do</span><span class="o">=</span><span class="s2">"--dry-run=client -o yaml"</span>
<span class="nb">export </span><span class="nv">now</span><span class="o">=</span><span class="s2">"--force --grace-period=0"</span>
<span class="nb">source</span> &lt;<span class="o">(</span>kubectl completion bash<span class="o">)</span>
<span class="nb">complete</span> <span class="nt">-o</span> default <span class="nt">-F</span> __start_kubectl k
<span class="c"># vim: set ts=2 sw=2 et, and 'set paste' awareness for YAML</span>
</code></pre></div></div>

<h2 id="the-7-day-plan">The 7-day plan</h2>

<h3 id="day-1--troubleshooting-i-30-domain-part-1">Day 1 — Troubleshooting I (30% domain, part 1)</h3>

<p>Highest weight, so it goes first and gets two days.</p>

<ul>
  <li>Broken kubelet: stop it, misconfigure <code class="language-plaintext highlighter-rouge">/var/lib/kubelet/config.yaml</code> and <code class="language-plaintext highlighter-rouge">--kubeconfig</code>, fix it. <code class="language-plaintext highlighter-rouge">systemctl status kubelet</code>, <code class="language-plaintext highlighter-rouge">journalctl -u kubelet</code>.</li>
  <li>Static pod failures: break a control-plane manifest in <code class="language-plaintext highlighter-rouge">/etc/kubernetes/manifests/</code>, watch the API server / scheduler / controller-manager go down, recover.</li>
  <li><code class="language-plaintext highlighter-rouge">kubectl get nodes</code> NotReady scenarios: CNI down, kubelet down, cert expiry, disk pressure.</li>
  <li>Drill: <code class="language-plaintext highlighter-rouge">crictl ps</code>, <code class="language-plaintext highlighter-rouge">crictl logs</code>, <code class="language-plaintext highlighter-rouge">crictl inspect</code> on a node where kubectl can’t reach a pod.</li>
  <li><strong>Reps target:</strong> rebuild a NotReady node to Ready 5+ times until it’s reflexive.</li>
</ul>

<h3 id="day-2--troubleshooting-ii--etcd">Day 2 — Troubleshooting II + etcd</h3>

<ul>
  <li>App-level debugging: CrashLoopBackOff, ImagePullBackOff, wrong probes, OOMKills, pending pods (taints, resources, affinity). Read events first: <code class="language-plaintext highlighter-rouge">kubectl get events --sort-by=.metadata.creationTimestamp</code>.</li>
  <li>Service/networking debug: endpoints empty, wrong selector, DNS resolution (<code class="language-plaintext highlighter-rouge">kubectl run tmp --rm -it --image=busybox -- nslookup svc</code>).</li>
  <li>
    <p><strong>etcd snapshot + restore</strong> end-to-end:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">ETCDCTL_API</span><span class="o">=</span>3 etcdctl snapshot save /tmp/snap.db <span class="se">\</span>
  <span class="nt">--cacert</span><span class="o">=</span>... <span class="nt">--cert</span><span class="o">=</span>... <span class="nt">--key</span><span class="o">=</span>... <span class="nt">--endpoints</span><span class="o">=</span>https://127.0.0.1:2379
<span class="nv">ETCDCTL_API</span><span class="o">=</span>3 etcdctl snapshot restore /tmp/snap.db <span class="nt">--data-dir</span><span class="o">=</span>/var/lib/etcd-restore
<span class="c"># then point the static pod manifest at the new data-dir and restart</span>
</code></pre></div>    </div>
  </li>
  <li><strong>Reps target:</strong> full backup→destroy→restore→verify cycle 3 times under 10 min each.</li>
</ul>

<h3 id="day-3--cluster-architecture-i-25-domain">Day 3 — Cluster Architecture I (25% domain)</h3>

<ul>
  <li><code class="language-plaintext highlighter-rouge">kubeadm init</code> from scratch + join a worker. Time yourself.</li>
  <li><strong>kubeadm upgrade</strong> of a control-plane node and a worker (<code class="language-plaintext highlighter-rouge">kubeadm upgrade plan/apply</code>, drain, <code class="language-plaintext highlighter-rouge">apt</code> the kubelet/kubectl, uncordon). Near-guaranteed exam task — make it boring.</li>
  <li>RBAC: create a Role/ClusterRole + binding + ServiceAccount, then <strong>verify</strong> with <code class="language-plaintext highlighter-rouge">kubectl auth can-i --as=system:serviceaccount:ns:sa</code>.</li>
  <li><strong>Reps target:</strong> one clean upgrade + one RBAC grant verified by <code class="language-plaintext highlighter-rouge">auth can-i</code>.</li>
</ul>

<h3 id="day-4--cluster-architecture-ii-helm-kustomize-crds">Day 4 — Cluster Architecture II: Helm, Kustomize, CRDs</h3>

<ul>
  <li>Helm: <code class="language-plaintext highlighter-rouge">repo add</code>, <code class="language-plaintext highlighter-rouge">search</code>, <code class="language-plaintext highlighter-rouge">install</code>, <code class="language-plaintext highlighter-rouge">upgrade</code>, <code class="language-plaintext highlighter-rouge">--set</code>, <code class="language-plaintext highlighter-rouge">template</code>, <code class="language-plaintext highlighter-rouge">uninstall</code>, list releases. Know <code class="language-plaintext highlighter-rouge">helm install --dry-run</code>.</li>
  <li>Kustomize: base + overlay, <code class="language-plaintext highlighter-rouge">kubectl apply -k</code>, patches, <code class="language-plaintext highlighter-rouge">images:</code> and <code class="language-plaintext highlighter-rouge">replicas:</code> transforms.</li>
  <li>CRDs/Operators: apply a CRD, create a CR instance, install a simple operator. Understand the apply order (CRD before CR).</li>
  <li><strong>Reps target:</strong> install one chart via Helm and one app via Kustomize overlay, both from docs only.</li>
</ul>

<h3 id="day-5--services--networking-20--storage-10">Day 5 — Services &amp; Networking (20%) + Storage (10%)</h3>

<ul>
  <li>Services: ClusterIP/NodePort/LoadBalancer, multi-port, <code class="language-plaintext highlighter-rouge">kubectl expose</code>, endpoints inspection.</li>
  <li><strong>Gateway API</strong> (GA, on the exam): GatewayClass → Gateway → HTTPRoute. Practice from the docs — the YAML is less reflexive than Ingress.</li>
  <li>Ingress + ingress controller routing rules.</li>
  <li>NetworkPolicy: default-deny, then allow-by-label/namespace/port — both ingress and egress.</li>
  <li>CoreDNS: inspect the ConfigMap, understand cluster DNS resolution paths.</li>
  <li>Storage: StorageClass + dynamic PVC, manual PV/PVC bind, access modes, reclaim policy, resize.</li>
  <li><strong>Reps target:</strong> a default-deny NetworkPolicy + one allow rule, and a dynamically-provisioned PVC bound to a pod.</li>
</ul>

<h3 id="day-6--workloads--scheduling-15--speed-pass">Day 6 — Workloads &amp; Scheduling (15%) + speed pass</h3>

<ul>
  <li>Deployments: rollout, <code class="language-plaintext highlighter-rouge">set image</code>, <code class="language-plaintext highlighter-rouge">rollout undo</code>, <code class="language-plaintext highlighter-rouge">rollout status</code>, scaling, revision history.</li>
  <li>ConfigMaps/Secrets: create imperatively, mount as env and as volume.</li>
  <li>Scheduling: nodeSelector, affinity/anti-affinity, taints/tolerations, manual scheduling via <code class="language-plaintext highlighter-rouge">nodeName</code>, static pods.</li>
  <li>HPA: create against a Deployment with metrics-server present.</li>
  <li><strong>Speed pass:</strong> 10–12 mixed imperative tasks against the clock. Everything via <code class="language-plaintext highlighter-rouge">k ... $do &gt; x.yaml &amp;&amp; vim x.yaml &amp;&amp; k apply -f x.yaml</code>. No hand-writing YAML from a blank file.</li>
</ul>

<h3 id="day-7--full-dry-run--cleanup">Day 7 — Full dry run + cleanup</h3>

<p>Use your second Killer.sh session as a timed 2-hour mock. Treat it as the real thing: context-switch every task, flag-and-skip hard ones, high-weight first. Review only the questions you missed and re-drill those exact mechanics. Re-read the <a href="https://docs.linuxfoundation.org/tc-docs/certification/faq-cka-ckad-cks">LF candidate FAQ</a> and confirm the K8s version hasn’t shifted. Keep it a light day otherwise — don’t cram new material the night before.</p>

<h2 id="exam-day-tactics">Exam-day tactics</h2>

<ul>
  <li>First 60 seconds: set aliases, <code class="language-plaintext highlighter-rouge">$do</code>/<code class="language-plaintext highlighter-rouge">$now</code>, completion, vim config.</li>
  <li>Read each task’s <strong>weight</strong>. Do the heavy ones first; flag and skip anything that stalls you &gt;~8 min.</li>
  <li>Always run the provided <code class="language-plaintext highlighter-rouge">kubectl config use-context</code> line first. Confirm with <code class="language-plaintext highlighter-rouge">k config current-context</code>.</li>
  <li>Bookmark the doc pages above; navigate by docs search, don’t browse.</li>
  <li>Imperative-first: generate YAML with <code class="language-plaintext highlighter-rouge">$do</code>, edit, apply. Hand-write only what you must (PV, NetworkPolicy, Gateway).</li>
  <li>Partial credit is real — get the easy 70% of a task done rather than perfecting one.</li>
  <li>Verify your work: <code class="language-plaintext highlighter-rouge">k get</code>, <code class="language-plaintext highlighter-rouge">k describe</code>, <code class="language-plaintext highlighter-rouge">auth can-i</code>, <code class="language-plaintext highlighter-rouge">nslookup</code>, <code class="language-plaintext highlighter-rouge">curl</code> from a temp pod.</li>
</ul>

<h2 id="quick-command-reference">Quick command reference</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>k run nginx <span class="nt">--image</span><span class="o">=</span>nginx <span class="nv">$do</span>
k create deploy web <span class="nt">--image</span><span class="o">=</span>nginx <span class="nt">--replicas</span><span class="o">=</span>3 <span class="nv">$do</span>
k expose deploy web <span class="nt">--port</span><span class="o">=</span>80 <span class="nt">--target-port</span><span class="o">=</span>8080 <span class="nt">--type</span><span class="o">=</span>NodePort <span class="nv">$do</span>
k create cm app <span class="nt">--from-literal</span><span class="o">=</span><span class="nv">KEY</span><span class="o">=</span>val <span class="nv">$do</span>
k create secret generic db <span class="nt">--from-literal</span><span class="o">=</span><span class="nv">pw</span><span class="o">=</span>s3cr3t <span class="nv">$do</span>
k create role r <span class="nt">--verb</span><span class="o">=</span>get,list <span class="nt">--resource</span><span class="o">=</span>pods <span class="nv">$do</span>
k create rolebinding rb <span class="nt">--role</span><span class="o">=</span>r <span class="nt">--serviceaccount</span><span class="o">=</span>ns:sa <span class="nv">$do</span>
k get events <span class="nt">--sort-by</span><span class="o">=</span>.metadata.creationTimestamp
k get pods <span class="nt">-A</span> <span class="nt">-o</span> wide <span class="nt">--field-selector</span> status.phase!<span class="o">=</span>Running
</code></pre></div></div>

<h2 id="open-questions">Open questions</h2>

<ul>
  <li>Which specific kubeadm upgrade scenario is most likely — minor version bump or patch? Worth timing a minor bump end-to-end.</li>
  <li>Gateway API: confirm whether HTTPRoute hostnames or path matching is more commonly tested.</li>
</ul>

<h2 id="references">References</h2>

<ul>
  <li><a href="https://docs.linuxfoundation.org/tc-docs/certification/faq-cka-ckad-cks">LF candidate FAQ</a> — exam format, policies, retake rules</li>
  <li><a href="https://github.com/cncf/curriculum">CNCF CKA curriculum</a> — official domain/task list (pre-exam reference only)</li>
  <li><a href="https://killer.sh">Killer.sh CKA simulator</a> — 2 sessions included with exam purchase</li>
</ul>]]></content><author><name>Tenzin Lhakhang</name></author><category term="kubernetes" /><category term="cka" /><category term="certification" /><category term="study-plan" /><category term="devops" /><summary type="html"><![CDATA[7-day study schedule and exam-day reference for the Certified Kubernetes Administrator (CKA) exam, tuned for exam speed and the highest-weight domains.]]></summary></entry><entry><title type="html">Problem Solving — S. Ian Robertson</title><link href="https://learn.tenzin.io/2025/11/16/problem-solving-robertson.html" rel="alternate" type="text/html" title="Problem Solving — S. Ian Robertson" /><published>2025-11-16T00:00:00+00:00</published><updated>2025-11-16T00:00:00+00:00</updated><id>https://learn.tenzin.io/2025/11/16/problem-solving-robertson</id><content type="html" xml:base="https://learn.tenzin.io/2025/11/16/problem-solving-robertson.html"><![CDATA[<h2 id="summary">Summary</h2>

<p>At its core, problem solving is navigating toward a goal — but the goal itself isn’t always clear in advance. Sometimes it’s precisely defined; sometimes you’ll only recognize it when you see it. That second case turns out to be surprisingly common in real work, and it means that goal ambiguity is often the actual difficulty, not the path-finding.</p>

<p>The field has been studied from several distinct traditions that each illuminated a different piece. <strong>Behaviourists</strong> (Thorndike, Watson, Skinner) explained problem solving as trial-and-error learning shaped by reinforcement — useful for describing how organisms adapt, but silent on the internal structures that make complex reasoning possible. <strong>Gestalt psychologists</strong> (Wertheimer, Duncker, Koffka) shifted focus to perception and restructuring: insight — the sudden “Aha!” — wasn’t random but arose from a reorganization of how the problem was seen. <strong>Cognitive psychologists</strong> (Newell, Simon, Anderson) formalized this into the problem space framework: a problem is defined by an initial state, a goal state, and operators that move between them; solving it is search through that space. <strong>Educationalists</strong> (Polya, Schoenfeld) brought the question down to practice — what heuristics and metacognitive habits actually help people solve problems they’ve never seen before?</p>

<p>Robertson’s book draws on all four traditions. The cognitive framework is central, but the Gestalt insight into representation and the educationalist emphasis on deliberate strategy both run through the treatment.</p>

<p><strong>Problem taxonomy</strong> — what kind of problem you have determines which approaches apply:</p>

<table>
  <thead>
    <tr>
      <th>Axis</th>
      <th>Poles</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Knowledge requirement</td>
      <td><strong>Knowledge-lean</strong> (domain-general strategies suffice) ↔ <strong>Knowledge-rich</strong> (requires domain-specific expertise)</td>
    </tr>
    <tr>
      <td>Goal specification</td>
      <td><strong>Well-defined</strong> (all needed info is given or inferable) ↔ <strong>Ill-defined</strong> (goal or constraints are vague)</td>
    </tr>
    <tr>
      <td>Solver experience</td>
      <td><strong>Semantically lean</strong> (little relevant experience) ↔ <strong>Semantically rich</strong> (deep experience of this problem type)</td>
    </tr>
    <tr>
      <td>Solution character</td>
      <td><strong>Standard</strong> (deliberate search through operators) ↔ <strong>Insight</strong> (solution preceded by sudden restructuring)</td>
    </tr>
  </tbody>
</table>

<p>A knowledge-lean, well-defined problem — a logic puzzle, a river-crossing problem — calls for very different approaches than an ill-defined, knowledge-rich one like debugging a production incident or designing a system under unclear requirements. Identifying which kind of problem you’re facing before starting is itself a meta-skill.</p>

<p>Schemas are the memory structures that make this classification fast in practice: organized representations of problem types in long-term memory that let you recognize “this is a problem of type X” and retrieve the relevant approach without having to reason from scratch. Building them deliberately is most of what expertise development looks like.</p>

<h2 id="key-concepts">Key concepts</h2>

<h3 id="representation-determines-whats-reachable">Representation determines what’s reachable</h3>

<p>Newell and Simon’s foundational work frames every problem as a <strong>problem space</strong>: an initial state, a goal state, a set of operators that move between states, and a search process that navigates the space. This formalism makes representation precise — it isn’t just “how you think about the problem,” it’s the structure that determines which states exist, which operators are legal, and therefore which solutions are reachable at all. A representation that omits a relevant variable or encodes a constraint wrong doesn’t make the problem harder; it makes the correct solution unreachable by definition, no matter how hard or cleverly you search.</p>

<p>The <strong>Hobbits and Orcs</strong> problem makes this concrete. Three hobbits and three orcs must cross a river using a boat that holds at most two; orcs can never outnumber hobbits on either bank. Greeno (1974) and Thomas (1974) both studied how solvers approach this. Many people represent the state informally — a vague mental image of who is where — and quickly run into what feel like dead ends or illegal moves. The difficulty isn’t the logic; it’s the representation. A precise tuple <code class="language-plaintext highlighter-rouge">(hobbits_left, orcs_left, boat_side)</code> makes every legal operator explicit: enumerate the moves that keep each component valid, and the solution path becomes a straightforward search. Solvers who represented the state space this way made far fewer errors and reached the solution faster. The problem didn’t change; the representation did.</p>

<p>The <strong>nine-dot problem</strong> is a starker example. Connect nine dots arranged in a 3×3 grid using four straight lines without lifting the pen. Most people fail because they self-impose a square boundary around the dots — a constraint the problem statement never stated. The search space they’re exploring doesn’t contain the solution. The boundary isn’t in the problem; it’s in the representation they constructed when they read it.</p>

<p>This last point is what makes representation especially tricky. Bransford, Barclay, and Franks (1972) showed that people don’t encode problem statements neutrally — they construct an interpretation, and that interpretation can include inferences and assumptions the statement never made. You read “nine dots in a grid” and your encoding includes an imaginary boundary. You read “the server is timing out” and your encoding includes an implicit assumption about which server. The construction happens automatically; catching it requires deliberate effort.</p>

<p>A practical version: if you represent a failing Kubernetes pod as “the image is broken,” your search focuses on the registry and build pipeline. Represent it as “the pod can’t reach a ready state” and you open the search to readiness probes, resource limits, scheduling constraints, and network policy. The operators available to you are determined by the representation. Choosing the frame first is the highest-leverage step before doing anything else.</p>

<h3 id="heuristics-are-useful-but-bounded">Heuristics are useful but bounded</h3>

<p>Heuristics are mental shortcuts that make search tractable — without them, the space of possible moves in almost any real problem is too large to explore exhaustively. Robertson focuses on three: hill climbing (always take the step that moves closest to the goal), means-end analysis (identify the gap between current state and goal state, then select an operator that reduces it), and working backward (start from the goal and ask what would need to be true immediately before it).</p>

<p>Each is genuinely useful, and each has a characteristic failure mode. Hill climbing fails when you have to move <em>away</em> from the goal to ultimately reach it — a maze with dead ends is the simple version, but software dependency resolution is a real one where installing the right package requires temporarily removing a conflicting one. Means-end analysis can get locked onto a subgoal that isn’t actually necessary, spending effort closing a gap the solution doesn’t require; a classic example is a student who rewrites a proof step to get closer to a known lemma, not realizing the proof doesn’t need that lemma at all. Working backward is powerful for debugging (trace from the error back through the call stack to the source) but doesn’t generalize well to open-ended design problems where the goal state is fuzzy.</p>

<p>The Gestalt tradition identified two mechanisms that explain why heuristics break down even when the solver has all the information they need.</p>

<p><strong>Functional fixedness</strong> (Duncker 1945): solvers encode objects in their conventional role and fail to see them as candidates for other uses. The thumbtack box in the candle problem never enters the search as a shelf because the initial encoding locked it out as a container. This isn’t a reasoning error — it’s a representational one. A real-world version: a developer who can’t see a config file as a substitute for a database entry because “config files don’t work that way” is imposing a constraint that exists only in the encoding, not in the problem.</p>

<p><strong>Set effects / Einstellung</strong> (Luchins 1942): solving a class of problems with a particular method creates a persistent bias toward that method even when simpler alternatives exist. Luchins’s water jug problems show this directly: participants who learned a complex filling procedure applied it to problems with a trivial direct solution, often without noticing the shortcut was available. The set operates below awareness, biasing which operators the solver considers at all.</p>

<p>Wertheimer’s distinction between <strong>reproductive thinking</strong> (applying learned procedures regardless of fit) and <strong>productive thinking</strong> (reasoning from the actual structure of the current problem) frames both: functional fixedness and set effects are both forms of structural blindness. Knowing this is why strategy rotation works — it isn’t just a general fallback, it’s specifically a way of fighting the implicit exclusions that set and fixedness introduce.</p>

<h3 id="insight-requires-leaving-the-current-search-space">Insight requires leaving the current search space</h3>

<p>Insight — the sudden shift that makes a previously intractable problem feel obvious — isn’t a random gift. The key theoretical reframe comes from Kaplan and Simon (1990): conventional problem solving is searching <strong>in</strong> a representation, navigating the problem space you already have. Insight is searching <strong>for</strong> a representation — finding a better problem space first. This isn’t a categorically different cognitive process; it’s search operating over the space of possible framings rather than the space of moves within a given framing. When you’re stuck, you aren’t following the wrong path — you’re on the wrong map.</p>

<p>Ohlsson’s <strong>Representational Change Theory (RCT)</strong> explains why impasse happens and how the brain escapes it. The initial representation is assembled from salient features and whatever long-term memory retrieves first. If the solution isn’t reachable from that representation, you hit impasse — forward progress stops regardless of effort. Four escape mechanisms:</p>

<ul>
  <li><strong>Re-encoding</strong>: attention shifts to different features; activation is subtracted from elements that produced dead ends and redistributed across others</li>
  <li><strong>Constraint relaxation</strong>: drop constraints inadvertently imposed — the nine-dot imaginary boundary, the tack box as container-only</li>
  <li><strong>Elaboration</strong>: add information that wasn’t in the initial encoding</li>
  <li><strong>Chunk decomposition</strong>: break elements into sub-elements, exposing structure hidden at the chunk level (matchstick arithmetic is the canonical case)</li>
</ul>

<p>These are largely unconscious — they happen through spreading activation and cued retrieval, not deliberate reasoning.</p>

<p>MacGregor et al.’s <strong>Criterion for Satisfactory Progress Theory (CSPT)</strong> adds the conscious side. Solvers use hill climbing while monitoring progress. When expected progress isn’t achieved — criterion failure — they look for other states to expand. Working memory capacity matters: solvers who can look further ahead before hitting criterion failure are more likely to recognize that a solution requires an initially unpromising move. In the nine-dot problem, people who could anticipate further consequences of each line were more likely to discover that lines must extend outside the imaginary boundary.</p>

<p>RCT and CSPT aren’t competing — they’re complementary. Gilhooly and Murphy (2005) found evidence for both unconscious (Type 1) and conscious (Type 2) processes in insight, with the balance depending on problem type. Perceptual and spatial insight problems draw more on unconscious chunk decomposition; verbal and logical ones involve more conscious monitoring and search. Incubation — stepping away from a stuck problem — works because it allows unconscious re-encoding to run without conscious hill climbing suppressing the process by reinforcing the failed frame.</p>

<p>Norman Maier’s <strong>two-string problem</strong> demonstrates the unconscious mechanism directly: participants must tie together two strings that are too far apart to reach simultaneously. Most keep trying variations on walking between them. The solution requires reframing one string as a pendulum. Solvers who were casually nudged toward pendulum-like motion by the experimenter solved it faster without realizing why — the nudge triggered re-encoding below awareness. A software equivalent: the developer who’s spent an hour convinced the bug is a logic error in a function, takes a five-minute walk, and immediately wonders whether it’s an encoding issue upstream — and it is. The information didn’t change; the representation it was organized into did.</p>

<h3 id="expertise-is-schema-density-not-raw-speed">Expertise is schema density, not raw speed</h3>

<p>Expert performance has three characteristic signatures. First, <strong>fast categorization</strong>: experts classify problems by deep structure, not surface features. Chi, Feltovich, and Glaser (1981) found that novice physics students grouped problems by surface appearance (“this is an inclined plane problem”) while experts grouped them by underlying principle (“this is a conservation of energy problem”) — a fundamentally different representation that immediately implies a solution strategy. Second, <strong>perceptual chunking</strong>: experts encode larger meaningful configurations as single units, letting them process more information per cognitive step. Third, <strong>long-term working memory</strong>: Ericsson and Kintsch (1995) documented that experts develop domain-specific encoding strategies that effectively extend working memory into long-term memory — they can retrieve and manipulate information that novices would need to hold in limited-capacity short-term memory.</p>

<p>The chess evidence established all three. De Groot (1965) first showed that grandmasters didn’t examine more moves than masters — they examined better moves, because they recognized meaningful board configurations immediately. Chase and Simon (1973) made the mechanism explicit: expert players reconstructed realistic mid-game positions from five-second exposures; when pieces were placed randomly, the advantage disappeared entirely. The skill was in pattern recognition applied to meaningful chess structure, not memory capacity or processing speed. The same signatures appear in radiology (Lesgold et al. 1988: experts see clinically meaningful regions; novices see individual features), programming (McKeithen et al. 1981: expert programmers chunk code into functional units), and physics problem solving.</p>

<p><strong>How expertise develops.</strong> Three models from the chapter each illuminate a different dimension. Fitts and Posner (1967) describe three stages: a <em>cognitive</em> stage (resource-intensive, reliant on declarative knowledge, slow and error-prone), an <em>associative</em> stage (rules are compiled, performance becomes more automatic), and an <em>autonomous</em> stage (highly proficient, no longer reliant on conscious control). Dreyfus (1997) extends this to five stages — novice, advanced beginner, competent, proficient, expert — and makes a point often missed: rule-based systems cannot account for expert intuition. Experts don’t follow rules; they perceive situations holistically and respond based on pattern recognition accumulated over thousands of exposures. This is why experts are often poor at explaining what they do. Glaser (1996) focuses on the changing locus of control: from <em>external support</em> (dependent on teachers and coaches), through <em>transitional</em> (support gradually withdrawn), to <em>self-regulatory</em> (learning fully under the learner’s own direction).</p>

<p>The <strong>Power Law of Practice</strong> describes the shape of skill acquisition: rapid improvement early, dramatically diminishing returns as practice accumulates (Newell &amp; Rosenbloom 1981). This is consistent across domains — it’s a structural feature of skill acquisition, not a motivation problem. It explains why early study of a domain has very high return and why pushing past intermediate competence requires disproportionate effort.</p>

<p><strong>Deliberate practice</strong> (Ericsson, Krampe &amp; Tesch-Römer 1993) is what determines how fast you move through those stages. Not all practice is equal. Deliberate practice is effortful, aimed at the edges of current ability, with immediate and accurate feedback, focused on specific components that need improvement. Comfortable practice within well-established skill produces much less improvement than shorter periods of targeted, difficult work at the boundary of competence. Ericsson and Charness (1994) extend this: expertise involves continuous effortful adaptation to domain demands, not consolidation of learned routines.</p>

<p><strong>The intermediate effect.</strong> Boshuizen and Schmidt (1990, 1992) documented a striking pattern in medical expertise: people with intermediate knowledge can <em>outperform</em> both novices and experts on some tasks, and be <em>outperformed by novices</em> on others. Intermediates encode too much detail — they “process too much garbage” — and so recall more clinical propositions than experts, but make worse diagnostic decisions because the detail obscures the pattern. Experts have compiled their knowledge into higher-level scripts that suppress irrelevant detail. The implication: accumulating more knowledge at the same representational level isn’t always progress; what matters is whether knowledge is compiled into higher-level schemas.</p>

<p><strong>Domain specificity.</strong> Expertise doesn’t transfer across domains. Content knowledge is domain-specific (deep physics knowledge doesn’t help with medical diagnosis). Task knowledge — procedural and strategic knowledge — transfers only to closely related domains that share a genuine skill overlap. There is no such thing as a general expert reasoner in the meaningful sense; there are domain experts and the domains may or may not share transferable skills.</p>

<p><strong>The automaticity paradox.</strong> Automatisation ought to make expert performance brittle — if procedures run without conscious access, how do experts handle novel problems? Hatano and Inagaki (1986) resolve this with the distinction between <em>routine expertise</em> (fast, efficient, handles typical problems with compiled procedures) and <em>adaptive expertise</em> (flexible, handles novel and exceptional cases through schemas that explicitly represent exceptions and edge cases). Expert schemas aren’t just libraries of common cases — they include strategies and knowledge structures for when the common case doesn’t apply. This is what distinguishes a senior engineer who can debug an unfamiliar system from one who can only optimize systems they built.</p>

<p><strong>Downsides of expertise.</strong> Two documented failure modes. Ottati et al. (2015) demonstrated the <em>earned dogmatism effect</em>: people who perceive themselves as expert become more closed-minded — expertise feels like a license to dismiss alternatives without engaging with them. Fisher and Keil (2015) documented the <em>illusion of explanatory depth</em>: experts may have forgotten how much they’ve forgotten; they report more explanatory competence than they actually possess when pressed to generate detailed mechanistic explanations. Both are calibration failures rather than competence failures — the expertise is real, but the metacognitive monitoring of its limits is broken.</p>

<h3 id="memory-is-reconstructive-not-retrieval">Memory is reconstructive, not retrieval</h3>

<p>A common mental model of memory is a recording — you store an experience, and later you play it back. The cognitive science is very different: memory is reconstructive. When you remember something, the brain assembles the recollection from fragments, filling gaps with inference, prior expectations, and current context. This makes memory fast and flexible but also systematically unreliable in specific ways.</p>

<p>Elizabeth Loftus’s research on eyewitness testimony is the most studied example. Her experiments showed that the phrasing of a post-event question — “How fast were the cars going when they <em>smashed</em> into each other?” vs. “contacted” — significantly changed what participants later remembered, including whether they recalled seeing broken glass that wasn’t there. The memory wasn’t retrieved; it was reconstructed around the implicit suggestion. A more technical analogue: write a complex module, set it aside for six months, then need to modify it. You’ll feel confident about how it works — and be wrong about specific details in ways that cause bugs. The confidence is real; it comes from the reconstruction feeling fluent, not from the reconstruction being accurate.</p>

<p>The consequence for schema building: a schema that gets reinforced through repeated reconstruction of a slightly wrong version becomes a well-practiced error. Schema quality — whether the pattern you’ve chunked is actually correct — matters as much as schema quantity.</p>

<p><strong>Retrieval is surface-driven, not structure-driven.</strong> When you encounter a problem and search memory for a relevant analogue, retrieval is largely governed by surface similarity — whether the objects, vocabulary, and perceptual features of the new situation match something stored. Structural similarity, which is what actually determines whether the analogue will help, doesn’t drive retrieval as reliably. Gentner, Rattermann, and Forbus (1993) demonstrated this directly: people were more likely to spontaneously retrieve surface-similar analogues even when the structurally similar analogue was the one that would actually solve the problem. The two properties — <em>retrievability</em> and <em>inferential soundness</em> — come apart in ways that matter enormously.</p>

<p>This is the mechanism behind the Gick &amp; Holyoak fortress/tumor finding. Participants had the relevant structural analogue in memory — they’d just read the fortress convergence story — but didn’t spontaneously apply it to the radiation problem because the surface domains are completely different. The analogue was there; the retrieval process didn’t surface it. Only when participants were explicitly prompted to use the story as an analogy did transfer occur at meaningful rates.</p>

<p>Chen, Mo, and Honomichl (2004) showed that even analogues learned long ago can be retrieved effectively when structural features were encoded strongly at learning time. The practical implication: encoding structural labels matters. Not just “I solved a rate problem” but “I solved a convergence problem” — a label that primes retrieval when the next convergence problem appears regardless of what domain it’s in. Encoding strategies like elaboration and dual coding work partly because they build richer reconstructive cues, not because they improve storage of raw content.</p>

<h3 id="transfer-requires-structural-similarity">Transfer requires structural similarity</h3>

<p>The hope that becoming good at hard problems in one domain makes you better at hard problems in general is mostly false. Transfer of skill happens when two problems share underlying abstract structure — not surface features, not domain vocabulary, not difficulty level.</p>

<p><strong>Transfer types.</strong> Transfer can be positive (prior experience helps), negative (prior experience interferes — what the Einstellung effect produces when a learned procedure is applied where it no longer fits), near (similar domain, slightly different problem), or far (different domain, same underlying structure). Far positive transfer is what people usually mean when they talk about generalizable problem-solving skill, and it’s also the hardest to reliably produce.</p>

<p>Problems with identical underlying structure but different surface content are called <strong>isomorphs</strong>. Simon and Hayes (1976) demonstrated that isomorphic problems can vary dramatically in difficulty: two river-crossing problems with identical state-space structure differ in solution rate because the surface framing changes which operators feel natural to try. <strong>Homomorphs</strong> are structurally similar but not identical — the same template with slight variations in constraints. Both transfer when solvers notice the structural match; neither transfers reliably when they don’t.</p>

<p><strong>Three kinds of similarity</strong> determine whether transfer occurs. <em>Surface similarity</em> means the objects or perceptual features of the two problems resemble each other. <em>Relational similarity</em> (also called structural similarity) means the relations between objects are the same — the underlying structure maps. <em>Procedural similarity</em> means the same solution method applies even if surface and structure differ. Surface similarity drives what gets <em>retrieved</em> from memory; relational similarity determines whether the analogue is actually <em>useful</em>. These two come apart: you can retrieve a surface-similar analogue that gives you nothing, or fail to retrieve a structurally identical one because it looks nothing like the current problem.</p>

<p><strong>Successful analogical transfer requires three things in sequence</strong>: retrieval of a relevant analogue from long-term memory, mapping of corresponding roles across source and target (which objects in the source play the same role as which objects in the target), and adaptation of the source solution to the target context. Failure at any step blocks transfer. Most naturally occurring transfer failure is a retrieval failure — the right analogue is in memory but wasn’t retrieved because it doesn’t look like the current problem on the surface.</p>

<p>Gick and Holyoak’s (1983) radiation/fortress study is the canonical demonstration. A doctor must destroy a tumor using radiation, but a strong-enough ray damages surrounding tissue; the solution is multiple weak rays converging from different directions. Most participants fail. Participants who had previously read a story about a general who converged small army groups along separate roads to capture a fortress solved it at much higher rates — the underlying structure is identical. But participants who read the story without being prompted to use it as an analogy transferred poorly anyway. The structural mapping existed; retrieval didn’t surface it without explicit prompting.</p>

<p><strong>Gentner’s Structure Mapping Theory (SMT)</strong> formalizes what makes an analogy work. The central claim: analogy is a mapping of relational structure from a source to a target, independent of the objects involved. What gets mapped isn’t attributes of objects (“both involve radiation” / “both involve armies”) but the <em>relations between objects</em> — the structural skeleton. The <strong>principle of systematicity</strong> says that mappings which preserve higher-order relations (relations between relations) are preferred over mappings that only match first-order features. “The atom is like a solar system” works because the relational structure maps: electrons orbit the nucleus as planets orbit the sun, and the force-governs-orbit relationship holds in both — even though the objects share no surface features. A mapping based only on “both are round” would score low on systematicity and license few useful inferences.</p>

<p>Gentner and Gentner (1983) showed that the choice of source analogy materially affects what inferences people draw. Participants given a “flowing water” analogy for electricity reasoned well about resistance; those given a “teeming crowds” analogy reasoned better about batteries. The metaphor wasn’t decorative — it determined which structural inferences were made.</p>

<p><strong>Metaphors are compressed analogies.</strong> Novel metaphors require active structural comparison; as they become conventional, they become category assertions and lose their live analogical force. Bowdle and Gentner (2005) call this the “career of metaphor”: “electricity flows” was once a live structural analogy, now it’s nearly literal. The freshest analogies are often most useful, before the structural mapping has been compressed into a label that no longer prompts active reasoning.</p>

<p>In practice: the habit worth building is asking “what underlying pattern does this resemble?” — not “have I seen something like this before?” A DNS caching bug and a memory interference problem look nothing alike on the surface but share the same structure of stale state contaminating fresh queries. Recognizing that is transfer. Recognizing that “both involve the network” is not.</p>

<h3 id="metacognition-is-the-compounding-lever">Metacognition is the compounding lever</h3>

<p>Metacognition is cognition about cognition — monitoring and regulating your own thinking processes as you work. It’s distinct from domain knowledge and distinct from raw ability. Two people with equal skill can have very different outcomes if one actively tracks what strategy they’re using, whether it’s working, and when to switch, while the other just pushes forward.</p>

<p>Alan Schoenfeld’s research on mathematical problem solving makes this concrete. He compared novice and expert mathematicians working through unfamiliar problems in think-aloud protocols. Novices almost never stopped to evaluate whether what they were doing was working — they’d invest ten or fifteen minutes in a fruitless approach without questioning the approach itself. Expert mathematicians regularly paused: “Is this going anywhere? What else could I try? Have I seen this structure before?” This meta-monitoring wasn’t slower; it was faster overall, because it cut off long detours early. The software equivalent is the developer who rubber-ducks early (forcing articulation of the problem often reveals the bug immediately) versus the one who keeps reading the same function expecting a different insight.</p>

<p>Glaser’s (1996) change-of-agency model frames this as a developmental arc. In the early stage of any domain, monitoring is externalized: teachers, coaches, and rubrics provide the feedback loop that the learner can’t yet run internally. In the transitional stage, the learner begins developing self-monitoring — identifying criteria for high performance and tracking their own progress against them. The final, self-regulatory stage is the metacognitive ideal: the learner designs their own practice environment, identifies their own gaps, and decides when external input is worth seeking. The stages are not just a progression of skill — they’re a progression of metacognitive internalization. Chi, Glaser, and Rees (1983) found a concrete instance of this in physics: experts were better than novices not only at solving problems but at assessing a problem’s difficulty in advance and knowing which schemas applied. Knowing what you don’t yet know — and being right about it — is a metacognitive skill as much as a domain knowledge skill.</p>

<p>Metacognition is also the mechanism that makes everything else in this post actually work. You can only rotate strategies deliberately if you’re monitoring which strategy you’re using. You can only trigger diffuse mode at the right moment if you’re tracking how long you’ve been stuck. You can only build schemas from experience if you reflect on what happened rather than just moving on. Externalizing the monitoring loop — logging representations, strategies, and outcomes — gives you data about your own thinking patterns that you can’t access from inside a stuck problem.</p>

<h2 id="practical-takeaways">Practical takeaways</h2>

<p>These are reference items — templates and habits that apply the mechanisms above.</p>

<h3 id="before-starting-build-the-representation">Before starting: build the representation</h3>

<ol>
  <li><strong>Classify the problem type</strong> — knowledge-lean or knowledge-rich? Well-defined or ill-defined? This determines which approaches apply and whether you have a schema to retrieve.</li>
  <li><strong>Name the structural pattern</strong> — “Is this a convergence problem? A search problem? A constraint satisfaction problem?” The structural label primes retrieval of relevant analogues across domains, regardless of surface domain.</li>
  <li><strong>Goal state</strong> — what exactly must be achieved?</li>
  <li><strong>Current state</strong> — what is known and observed?</li>
  <li><strong>Operators</strong> — what actions, techniques, or tools are available?</li>
  <li><strong>Constraints</strong> — which are real? which are assumed?</li>
  <li><strong>Unknowns</strong> — what data is missing?</li>
  <li><strong>Alternative forms</strong> — can this be restated visually, mathematically, or stepwise?</li>
</ol>

<p>For each constraint, explicitly label it: real (externally imposed), assumed (inferred but unverified), negotiable, or unnecessary. Most breakthroughs come from discovering a constraint was assumed, not real.</p>

<h3 id="when-stuck-rotate-strategies">When stuck: rotate strategies</h3>

<p>If progress has stopped, the strategy is probably the problem. Before rotating, ask: is the approach I’m reaching for there because the surface of this problem matches a prior one — not because the structure does? Surface similarity drives retrieval, and the Einstellung effect means the most available strategy may be the wrong one. Then rotate:</p>

<ul>
  <li><strong>Means-end analysis</strong> — identify gaps, reduce them systematically</li>
  <li><strong>Working backward</strong> — start from goal state, reverse-engineer; good for debugging</li>
  <li><strong>Generate-and-test</strong> — brainstorm multiple paths, test small steps quickly</li>
  <li><strong>Constraint relaxation</strong> — what if X isn’t actually required?</li>
  <li><strong>Structural analogy</strong> — what shares the same relational structure as this problem? (not “have I seen something like this before?”)</li>
  <li><strong>Divide-and-conquer</strong> — split into smaller subproblems</li>
  <li><strong>Diagram/visualization</strong> — turn the abstract into shapes or flow</li>
  <li><strong>Counterfactual</strong> — what would I do if I had to solve this in 10 minutes?</li>
</ul>

<h3 id="after-812-minutes-stuck-trigger-diffuse-mode">After 8–12 minutes stuck: trigger diffuse mode</h3>

<ul>
  <li>Walk, switch context, sketch abstractly, restate with different constraints</li>
  <li>“How would a complete beginner see this?”</li>
  <li>“What problem would have this as a solution?” (reverse the problem)</li>
  <li>Re-label the structural pattern — “Is this actually a decomposition problem? A convergence problem?” Shifting the label can trigger re-encoding without leaving the desk.</li>
  <li>Change modality: text → diagram → equation → pseudocode</li>
</ul>

<h3 id="building-schemas">Building schemas</h3>

<p>A schema entry needs: the <strong>structural label</strong> (the relational pattern, not the surface domain), conditions of applicability, failure modes, and a worked example. The structural label is what makes the schema retrievable across domains — not “Kubernetes networking problem” but “multi-hop dependency with invisible intermediary.”</p>

<p>Intermediate effect warning: accumulating more detail at the same representational level isn’t progress. The goal is compilation into higher-level patterns that suppress irrelevant detail. If you’re recalling more propositions but solving problems no faster, you’re at the intermediate plateau, not advancing toward expert.</p>

<p>Deliberate practice means drilling at the boundary of current competence — problems that feel effortful, not fluent. Comfortable repetition within established skill produces little schema improvement.</p>

<p>Domain examples:</p>
<ul>
  <li>Kubernetes: bootstrap steps, PV/PVC patterns, CNI configs — structural label: ordered dependency chains with external state</li>
  <li>Networking: DHCP/DNS/mDNS resolution chains — structural label: hierarchical lookup with caching and staleness</li>
  <li>Math: factoring, substitution, completing the square — structural label: reformulation to expose exploitable structure</li>
  <li>Programming: container setup patterns, FP combinators, OOP encapsulation — structural label: abstraction of shared behavior behind stable interfaces</li>
</ul>

<h3 id="encoding-new-material">Encoding new material</h3>

<ul>
  <li><strong>Structural labels first</strong> — encode not just what you did, but what structural pattern it was. “I fixed a DNS caching bug” is a surface label. “I solved a stale-state contamination problem” is a structural label that will retrieve when a memory interference problem or cache invalidation problem appears in a different domain.</li>
  <li><strong>Relational skeleton</strong> — encode what roles the elements play relative to each other, not just what they are. The relations are what transfers; the objects don’t.</li>
  <li><strong>Elaborative encoding</strong> — link to an existing mental model, personal experience, or analogy; don’t just restate the definition</li>
  <li><strong>Dual coding</strong> — pair verbal with visual (diagram, boxes-and-arrows, timeline)</li>
  <li><strong>Spaced repetition</strong> — review sequence: day 1 → 3 → 7 → 14 → 30</li>
</ul>

<h3 id="finding-structural-transfers">Finding structural transfers</h3>

<p>The central question is “What relational structure does this problem have?” — before asking what else shares it. Not “have I seen something like this before?” (surface-driven) but “what structural pattern is this?” (structure-driven).</p>

<p>Three-step transfer checklist:</p>
<ol>
  <li><strong>Retrieve</strong> — find a structural analogue, not a surface match. If nothing comes to mind, label the structure more abstractly until something in memory responds.</li>
  <li><strong>Map</strong> — identify which objects in the source play the same role as which objects in the target.</li>
  <li><strong>Adapt</strong> — modify the source solution to fit the target’s constraints.</li>
</ol>

<p>If step 1 fails, it’s a retrieval failure, not a knowledge failure — the analogue may be there but stored under the wrong label. Build a personal analogy library: for each solved problem, record the structural label. These become retrieval hooks for future problems across all domains.</p>

<p>Structural mappings worth building:</p>
<ul>
  <li>State machine ↔ Kubernetes node join flow (ordered transitions with guard conditions)</li>
  <li>Stale cache ↔ memory interference (prior state contaminating current lookup)</li>
  <li>Convergence ↔ radiation/fortress problem (distributed resources applied simultaneously to a single point)</li>
  <li>Graph search ↔ network troubleshooting (traversal with cost or constraint per edge)</li>
  <li>Abstraction barrier ↔ Terraform module / OOP encapsulation (internal structure hidden behind stable interface)</li>
  <li>Substitution ↔ code refactoring (replace one form with a structure-preserving equivalent)</li>
</ul>

<h3 id="problem-solving-journal">Problem-solving journal</h3>

<p>For each non-trivial problem, log:</p>
<ul>
  <li>The original representation and how it changed</li>
  <li>Strategies tried and where each stalled</li>
  <li>What finally worked</li>
  <li><strong>Structural label</strong> — what pattern was this really? (convergence, search, decomposition, etc.)</li>
  <li><strong>Negative transfer check</strong> — did a prior method bias the search based on surface similarity?</li>
  <li><strong>Transfer source</strong> — if an analogue helped, what made it findable — surface or structural recognition?</li>
  <li>What the experience revealed about your thinking patterns</li>
</ul>

<p>Over time this becomes a personal structural pattern library — more useful than a log of domain-specific events.</p>

<h3 id="daily-loop">Daily loop</h3>

<ol>
  <li>Represent the problem (5 min)</li>
  <li>Classify the problem type and name the structural pattern (1–2 min)</li>
  <li>Pick a strategy appropriate to that type (1–2 min)</li>
  <li>Try for 8–12 min</li>
  <li>If stuck → diffuse mode or rotate strategy</li>
  <li>Solve or decompose further</li>
  <li>Reflect: log the structural label, any negative transfer, and what finally worked</li>
</ol>

<h3 id="anchoring-learning-to-real-projects">Anchoring learning to real projects</h3>

<p>Abstract practice rarely transfers. Attach new concepts to a real system, and when you do, explicitly name the structural mapping — not just “I applied X to Y” but “X and Y share structure Z, so the solution transferred.”</p>

<ul>
  <li>New networking concept → test it in the homelab; label the structural pattern</li>
  <li>New math technique → apply to a simulation or procedural graphic; identify the relational skeleton</li>
  <li>New OOP pattern → build a real CLI or module; name what abstraction it implements</li>
  <li>New Linux internals → reproduce the behavior in a Kubernetes context; map the structural equivalence</li>
</ul>

<p>The anchor is transferable only if the relational structure is explicit — otherwise it stays a domain-specific fact.</p>

<h3 id="deliberate-creativity-practice">Deliberate creativity practice</h3>

<p>This is the practice of building adaptive expertise — the schemas that cover exceptions, not just typical cases. Routine expertise handles familiar problems efficiently; adaptive expertise handles novel ones. The gap is trained by working on problems that force re-encoding, constraint relaxation, and chunk decomposition (Ohlsson’s RCT escape mechanisms).</p>

<p>Insight-heavy problems serve this function: math puzzles requiring non-obvious transformations, lateral thinking puzzles, spatial problems, short creative coding exercises (fractals, simulations, animations). Not a break from analytical work — targeted training for the parts of problem-solving that deliberate domain practice doesn’t reach.</p>

<h2 id="gotchas">Gotchas</h2>

<ul>
  <li><strong>Assuming constraints are real.</strong> The nine-dot and candle problems hinge on constraints the solver invented. Labeling them explicitly is the forcing function.</li>
  <li><strong>Hill climbing into a local minimum.</strong> Feels productive right up until it doesn’t. If progress has stopped, the strategy is probably the problem.</li>
  <li><strong>Mistaking familiarity for understanding.</strong> Memory reconstructs — you can feel confident in a recalled solution that’s actually wrong. Schema quality matters as much as quantity.</li>
  <li><strong>Transfer doesn’t happen automatically.</strong> Solving lots of coding problems doesn’t make you better at math. Structure must be made explicit and encoded by structural label, not surface domain.</li>
  <li><strong>Pushing harder when stuck blocks insight.</strong> Diffuse mode is the mechanism, not a break.</li>
  <li><strong>Negative transfer is invisible.</strong> The Einstellung effect operates below awareness — you apply a method because the surface matched a prior problem, without realizing the structure doesn’t. It doesn’t feel like a mistake; it feels like experience.</li>
  <li><strong>Retrieval feels right even when it’s wrong.</strong> Surface similarity drives what gets retrieved from memory. You’ll feel like you remembered the relevant analogue when you actually retrieved a surface match that doesn’t structurally help. Confidence in retrieval is not evidence of structural fitness.</li>
  <li><strong>More knowledge isn’t always progress.</strong> The intermediate effect: accumulating propositions at the same representational level can worsen performance on some tasks relative to novices. Progress requires compilation into higher-level schemas, not just accumulation of facts.</li>
  <li><strong>Expertise breeds dogmatism.</strong> The earned dogmatism effect: perceiving yourself as expert creates a license to dismiss alternatives without engaging. Expertise can narrow search as much as deepen it.</li>
</ul>

<h2 id="open-questions">Open questions</h2>

<ul>
  <li>How much of metacognitive skill is domain-general vs. domain-specific?</li>
  <li>What’s the minimum effective chunk — how explicit does a schema need to be documented before it becomes overhead rather than leverage?</li>
  <li>Does deliberate structural labeling at encoding actually improve far-transfer retrieval in naturalistic (non-lab) settings, or only in prompted conditions like Gick &amp; Holyoak?</li>
  <li>When does negative transfer override positive, and is there a reliable real-time signal that surface has overridden structure in your own search?</li>
  <li>Does the intermediate-effect plateau have to be endured, or can deliberate practice accelerate compilation into higher-level schemas?</li>
  <li>Is adaptive expertise trainable directly, or is it a byproduct of sufficient breadth of deliberate practice across varied problem types?</li>
</ul>

<h2 id="references">References</h2>

<ul>
  <li>Boshuizen, H.P.A., &amp; Schmidt, H. G. (1992). <a href="https://doi.org/10.1207/s15516709cog1602_1">On the role of biomedical knowledge in clinical reasoning by experts, intermediates and novices</a>. <em>Cognitive Science</em>, 16(2), 153–184</li>
  <li>Bowdle, B. F., &amp; Gentner, D. (2005). <a href="https://doi.org/10.1037/0033-295X.112.1.193">The career of metaphor</a>. <em>Psychological Review</em>, 112(1), 193–216</li>
  <li>Bransford, J. D., Barclay, J. R., &amp; Franks, J. J. (1972). Sentence memory: A constructive versus interpretative approach. <em>Cognitive Psychology</em>, 3, 193–209</li>
  <li>Chase, W. G., &amp; Simon, H. A. (1973). <em>The Mind’s Eye in Chess</em>. Academic Press</li>
  <li>Chen, Z., Mo, L., &amp; Honomichl, R. (2004). <a href="https://doi.org/10.1037/0096-3445.133.3.415">Having the memory of an elephant: Long-term retrieval and the use of analogues in problem solving</a>. <em>Journal of Experimental Psychology: General</em>, 133(3), 415–433</li>
  <li>Chi, M.T.H., Feltovich, P. J., &amp; Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. <em>Cognitive Science</em>, 5(2), 121–152</li>
  <li>Chi, M.T.H., Glaser, R., &amp; Rees, E. (1983). Expertise in problem solving. In R. J. Sternberg (Ed.), <em>Advances in the Psychology of Human Intelligence</em> (Vol. 2, pp. 7–75). Erlbaum</li>
  <li>De Groot, A. D. (1965). <em>Thought and Choice in Chess</em>. Mouton</li>
  <li>Dreyfus, H. L. (1997). Intuitive, deliberative, and calculative models of expert performance. In C. E. Zsambok &amp; G. Klein (Eds.), <em>Naturalistic Decision Making</em> (pp. 17–28). Lawrence Erlbaum</li>
  <li>Duncker, K. (1945). On problem solving. <em>Psychological Monographs</em>, 58</li>
  <li>Ericsson, K. A., &amp; Charness, N. (1994). <a href="https://doi.org/10.1037/0003-066X.49.8.725">Expert performance: Its structure and acquisition</a>. <em>American Psychologist</em>, 49(8), 725–747</li>
  <li>Ericsson, K. A., &amp; Kintsch, W. (1995). Long-term working memory. <em>Psychological Review</em>, 102, 211–245</li>
  <li>Ericsson, K. A., Krampe, R. T., &amp; Tesch-Römer, C. (1993). <a href="https://doi.org/10.1037/0033-295X.100.3.363">The role of deliberate practice in the acquisition of expert performance</a>. <em>Psychological Review</em>, 100(3), 363–406</li>
  <li>Fisher, M., &amp; Keil, F. C. (2015). <a href="https://doi.org/10.1111/cogs.12280">The curse of expertise</a>. <em>Cognitive Science</em>, 40(5), 1251–1269</li>
  <li>Fitts, P. M., &amp; Posner, M. I. (1967). <em>Human Performance</em>. Brooks/Cole</li>
  <li>Flavell, J. H. (1979). <a href="https://doi.org/10.1037/0003-066X.34.10.906">Metacognition and cognitive monitoring</a>. <em>American Psychologist</em>, 34(10), 906–911</li>
  <li>Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. <em>Cognitive Science</em>, 7, 155–170</li>
  <li>Gentner, D., &amp; Gentner, D. R. (1983). Flowing waters or teeming crowds: Mental models of electricity. In <em>Mental Models</em>. Lawrence Erlbaum</li>
  <li>Gentner, D., Rattermann, M. J., &amp; Forbus, K. D. (1993). The roles of similarity in transfer: Separating retrievability from inferential soundness. <em>Cognitive Psychology</em>, 25, 524–575</li>
  <li>Gick, M. L., &amp; Holyoak, K. J. (1983). <a href="https://doi.org/10.1016/0010-0285(83)90002-6">Schema induction and analogical transfer</a>. <em>Cognitive Psychology</em>, 15(1), 1–38</li>
  <li>Gilhooly, K. J., &amp; Murphy, P. (2005). Differentiating insight from non-insight problems. <em>Thinking &amp; Reasoning</em>, 11(3), 279–302</li>
  <li>Glaser, R. (1996). Changing the agency for learning: Acquiring expert performance. In K. A. Ericsson (Ed.), <em>The Road to Excellence</em> (pp. 303–311). Lawrence Erlbaum</li>
  <li>Greeno, J. G. (1974). Hobbits and Orcs: Acquisition of a sequential concept. <em>Cognitive Psychology</em>, 6, 270–292</li>
  <li>Hatano, G., &amp; Inagaki, K. (1986). Two courses of expertise. In H. Stevenson, H. Azuma, &amp; K. Hakuta (Eds.), <em>Child Development in Japan</em> (pp. 262–272). Freeman</li>
  <li>Holyoak, K. J., &amp; Koh, K. (1987). Surface and structural similarity in analogical transfer. <em>Memory and Cognition</em>, 15(4), 332–340</li>
  <li>Jonassen, D. H. (1997). Instructional design models for well-structured and ill-structured problem-solving learning outcomes. <em>Educational Technology Research and Development</em>, 45(1), 65–94</li>
  <li>Kaplan, C. A., &amp; Simon, H. A. (1990). <a href="https://doi.org/10.1016/0010-0285(90)90008-R">In search of insight</a>. <em>Cognitive Psychology</em>, 22(3), 374–419</li>
  <li>Knoblich, G., Ohlsson, S., Haider, H., &amp; Rhenius, D. (1999). Constraint relaxation and chunk decomposition in insight problem solving. <em>Journal of Experimental Psychology: Learning, Memory, and Cognition</em>, 25(6), 1534–1556</li>
  <li>Loftus, E. F. (1975). <a href="https://doi.org/10.1016/0010-0285(75)90023-7">Leading questions and the eyewitness report</a>. <em>Cognitive Psychology</em>, 7(4), 560–572</li>
  <li>Luchins, A. S. (1942). Mechanization in problem solving: The effect of Einstellung. <em>Psychological Monographs</em>, 54(248)</li>
  <li>MacGregor, J. N., Ormerod, T. C., &amp; Chronicle, E. P. (2001). <a href="https://doi.org/10.1037/0278-7393.27.1.176">Information processing and insight</a>. <em>Journal of Experimental Psychology: Learning, Memory, and Cognition</em>, 27(1), 176–201</li>
  <li>Newell, A., &amp; Rosenbloom, P. S. (1981). Mechanisms of skill acquisition and the law of practice. In J. R. Anderson (Ed.), <em>Cognitive Skills and their Acquisition</em> (pp. 1–56). Erlbaum</li>
  <li>Newell, A., &amp; Simon, H. A. (1972). <em>Human Problem Solving</em>. Prentice-Hall</li>
  <li>Ohlsson, S. (1992). Information processing explanations of insight and related phenomena. In M. T. Keane &amp; K. J. Gilhooly (Eds.), <em>Advances in the Psychology of Thinking</em> (pp. 1–43). Harvester-Wheatsheaf</li>
  <li>Ottati, V., Price, E. D., Wilson, C., &amp; Sumaktoyo, N. (2015). When self-perceptions of expertise increase closed-minded cognition: The earned dogmatism effect. <em>Journal of Experimental Social Psychology</em>, 61(1), 131–138</li>
  <li>Polya, G. (1957). <em>How to Solve It</em>. Princeton University Press</li>
  <li>Robertson, S. I. <a href="https://learning.oreilly.com/library/view/problem-solving-2nd/9781317496007/"><em>Problem Solving</em>, 2nd ed.</a> (Psychology Press)</li>
  <li>Simon, H. A. (1973). The structure of ill-structured problems. <em>Artificial Intelligence</em>, 4, 181–201</li>
  <li>Simon, H. A., &amp; Hayes, J. R. (1976). <a href="https://doi.org/10.1016/0010-0285(76)90022-0">The understanding process: Problem isomorphs</a>. <em>Cognitive Psychology</em>, 8, 165–190</li>
  <li>Simon, H. A., &amp; Newell, A. (1971). <a href="https://doi.org/10.1037/h0030806">Human problem solving: The state of the theory in 1970</a>. <em>American Psychologist</em>, 26(2), 145–159</li>
  <li>Sio, U. N., &amp; Ormerod, T. C. (2009). <a href="https://doi.org/10.1037/a0014212">Does incubation enhance problem solving? A meta-analytic review</a>. <em>Psychological Bulletin</em>, 135(1), 94–120</li>
  <li>Wertheimer, M. (1945). <em>Productive Thinking</em>. Harper &amp; Row</li>
  <li><a href="https://en.wikipedia.org/wiki/Metacognition">Metacognition</a> — Wikipedia</li>
  <li><a href="https://en.wikipedia.org/wiki/Reconstructive_memory">Reconstructive memory</a> — Wikipedia</li>
  <li><a href="https://en.wikipedia.org/wiki/Transfer_of_learning">Transfer of learning</a> — Wikipedia</li>
</ul>]]></content><author><name>Tenzin Lhakhang</name></author><category term="cognitive-science" /><category term="problem-solving" /><category term="learning" /><category term="books" /><summary type="html"><![CDATA[Notes on Robertson's cognitive framework for problem solving — representation, heuristics, insight, schemas, and metacognition.]]></summary></entry></feed>