🏑


to read (pdf)

  1. Cord: Coordinating Trees of AI Agents | June Kim
  2. Style tips for less experienced developers coding with AI Β· honnibal.dev
  3. Haskell for all: Beyond agentic coding
  4. AgentRE-Bench β€” LLM Reverse Engineering Benchmark
  5. Announcing Observational Memory - Mastra Blog

  1. February 23, 2026
    1. πŸ”— r/reverseengineering /r/ReverseEngineering's Weekly Questions Thread rss

      To reduce the amount of noise from questions, we have disabled self-posts in favor of a unified questions thread every week. Feel free to ask any question about reverse engineering here. If your question is about how to use a specific tool, or is specific to some particular target, you will have better luck on the Reverse Engineering StackExchange. See also /r/AskReverseEngineering.

      submitted by /u/AutoModerator
      [link] [comments]

    2. πŸ”— HexRaysSA/plugin-repository commits sync repo: +1 release rss
      sync repo: +1 release
      
      ## New releases
      - [DeepExtract](https://github.com/marcosd4h/DeepExtractIDA): 0.9.8
      
    3. πŸ”— r/reverseengineering GitHub - merces/awesome-hex-editors: A curated and enriched list of awesome hexadecimal editors for Windows, macOS, and Linux. rss
    4. πŸ”— badlogic/pi-mono v0.54.2 release

      Fixed

      • Fixed .pi folder being created unnecessarily when only reading settings. The folder is now only created when writing project-specific settings.
      • Fixed extension-driven runtime theme changes to persist in settings so /settings reflects the active currentTheme after ctx.ui.setTheme(...) (#1483 by @ferologics)
      • Fixed interactive mode freezes during large streaming write tool calls by using incremental syntax highlighting while partial arguments stream, with a final full re-highlight after tool-call arguments complete.
    5. πŸ”— binsync/varbert_api v2.3.1 release

      What's Changed

      • fix: handle adjacent @@ variable tokens in split_words() by @hwu71 in #15

      New Contributors

      • @hwu71 made their first contribution in #15

      Full Changelog : v2.3.0...v2.3.1

  2. February 22, 2026
    1. πŸ”— r/reverseengineering [ BETA UPDATE ] LCSAJdump v1.1.1-beta is out β€” x86-64 support finally lands rss
    2. πŸ”— r/Leeds John Betjeman visiting Whitelocks. (B.B.C. 1968) rss
    3. πŸ”— r/Yorkshire Pub recommendations rss

      Does anyone have good pub recommendations around Skipton, Keighley, Ilkley, Harrogate, Ripon etc? I'm going to be frequently visiting this area soon for my photography project, so any recommendations would be welcomed. Also in any other nice nearby towns, I'll check them out.

      submitted by /u/secretlythepotato
      [link] [comments]

    4. πŸ”— r/Leeds Places to spot fox? rss

      Hey! I’m an international student currently living in the Leeds.

      A friend of mine who’s studying down South of UK sent me a photo yesterday of a fox hanging out by his garden fence. And this might sound a bit weird, but I was honestly kind of jealous πŸ˜…

      Leeds is obviously one of the bigger cities in the UK, so I guess it’s harder to spot foxes just around residential areas here. But I was wondering are there any spots around Leeds where you’re relatively more likely to see foxes?

      submitted by /u/FriedbAconsPam
      [link] [comments]

    5. πŸ”— r/york I think Goshawk caught a pigeon out the sky next to me at the side of Foss bank car park! rss

      I think Goshawk caught a pigeon out the sky next to me at the side of Foss bank car park! | submitted by /u/TrapperTrev
      [link] [comments]
      ---|---

    6. πŸ”— r/Yorkshire Β£150,000 funding to help reduce North Yorkshire cold deaths rss

      Β£150,000 funding to help reduce North Yorkshire cold deaths | submitted by /u/Kagedeah
      [link] [comments]
      ---|---

    7. πŸ”— badlogic/pi-mono v0.54.1 release

      Fixed

      • Externalized koffi from bun binary builds, reducing archive sizes by ~15MB per platform (e.g. darwin-arm64: 43MB -> 28MB). Koffi's Windows-only .node file is now shipped alongside the Windows binary only.
    8. πŸ”— r/Leeds Scalarama Leeds DIY Film Festival - Fundraiser & Film Screening! 05/03/26 - Holbeck Working Men's Club rss
    9. πŸ”— r/Yorkshire Yorkshire Moors Wind Farm walking route recommendations rss

      I've long been weirdly entranced by wind turbines. I love seeing them off in the distance from the M62, these strange white giants sticking out of the moors like birthday candles.

      I've always wanted to see them up close.

      I live in Leeds - can you recommend any walking routes I can get to easily via train that can take me to a good wind farm? Even better if it's a loop.

      I'd rather not trespass or go too far off the beaten path if it can be avoided.

      submitted by /u/PancreousDiabeetus
      [link] [comments]

    10. πŸ”— r/LocalLLaMA Which one are you waiting for more: 9B or 35B? rss

      Which one are you waiting for more: 9B or 35B? | submitted by /u/jacek2023
      [link] [comments]
      ---|---

    11. πŸ”— r/LocalLLaMA I think openclaw is OVERHYPED. Just use skills rss

      I think openclaw is useful, loop, memory, agents, integrations, but after a week a testing, honestly I don't need it much.

      - memory, is nice. But I prefere to have "manual memory". Prompt: Ok, write what yout learnt in "superreporttrending-skill". Automatic memory often pollute the context of info you don't care.

      - cron. Useful but I already use other tools for that and I can always recall a skill whenever i want. I don't need everyday at 8:00AM, i prefere recall it when i want with up to date data

      Conclusion: for me "opencode web" is a much superior option, but much of the "intelligence" and value is the skills that you develop or you integrate, not in the runner itself, what do you think ?

      submitted by /u/Deep_Traffic_7873
      [link] [comments]

    12. πŸ”— r/wiesbaden Rhein-Main Uferkonzept rss

      Wie steht ihr zum Rhein-Main Uferkonzept?

      Hier der Maßnahmenkatalog im PDF Format, da kann man easy den für einen selbst wichtigsten Flussabschnitt und die geplanten VerÀnderungen raussuchen.

      060.020.2.16-RMUK_Massnahmenkatalog_230201_web.pdf https://share.google/8CYKkzdEistaRPxt0

      Welche geplanten Maßnahmen sprechen euch am meisten an?

      Ich wΓΌrde mir tatsΓ€chlich die Umsetzung aller geplanten Schritte fΓΌr meinen Bereich (Kastel) wΓΌnschen, finde es klingt richtig cool teilweise.

      submitted by /u/newtwoothis
      [link] [comments]

    13. πŸ”— r/reverseengineering Obfusk8: lightweight Obfuscation library based on C++17 / Header Only for windows binaries rss
    14. πŸ”— r/Leeds Saw in Woodhouse Moor late last night rss

      Honestly kind of impressed, those things look heavy.

      submitted by /u/greg225
      [link] [comments]

    15. πŸ”— r/york Purple Street Lights rss

      I noticed last night that the street lights on the residential areas behind Poppleton Road Prinary school have been changed to purple LEDs. So Seldon Rd, Poplar St, Carrington Ave, Oak St.

      I can't figure out why and the shade is really quite jarring.

      The only other uses of this kind of light have been areas of significant drug use, but I dont believe this area fits that bill, compared to other areas of town with "normal" coloured lighting.

      Thoughts

      submitted by /u/BluefearHere
      [link] [comments]

    16. πŸ”— HexRaysSA/plugin-repository commits sync repo: +2 plugins, -2 plugins, +8 releases, -3 releases, ~1 changed rss
      sync repo: +2 plugins, -2 plugins, +8 releases, -3 releases, ~1 changed
      
      ## New plugins
      - [Binwalk](https://github.com/kevinmuoz/binwalk-reversing-plugin) (0.0.1)
      - [DBImporter](https://github.com/HexRaysSA/ida-dbimporter) (0.0.1)
      
      ## New releases
      - [augur](https://github.com/0xdea/augur): 0.8.0
      - [haruspex](https://github.com/0xdea/haruspex): 0.8.0
      - [hrtng](https://github.com/KasperskyLab/hrtng): 3.8.88
      - [idalib-rust-bindings](https://github.com/idalib-rs/idalib): 0.8.1, 0.8.0
      - [rhabdomancer](https://github.com/0xdea/rhabdomancer): 0.8.0
      
      ## Changes
      - [iOSHelper](https://github.com/yoavst/ida-ios-helper):
        - 1.0.19: archive contents changed, download URL changed
      - [idalib-rust-bindings](https://github.com/idalib-rs/idalib):
        - host changed: binarly-io/idalib β†’ idalib-rs/idalib
        - removed version(s): 0.7.2
      
      ## Removed plugins
      - comida
      - ipyida
      
    17. πŸ”— r/york Hand tattooing at Jorvik festival. Effort has been made to blur out the face. rss

      Hand tattooing at Jorvik festival. Effort has been made to blur out the face. | submitted by /u/tyw7
      [link] [comments]
      ---|---

    18. πŸ”— exe.dev Show, don’t tell rss

      I have been tasked, by process of elimination, with developing exe.dev’s marketing strategy.

      In one very important sense, marketing exe.dev is easy. I am proud of what we have built, so I am happy to talk about it to anyone who will do me the kindness of listening. Being happy enough with my work to talk about it is rare for me, and usually the biggest hurdle.

      In another sense, it is the worst time to talk about anything. The over-the- top marketing around AI is tiring. Yes, the technical possibilities created by LLMs are wondrous. I have not been this excited about computers since I got a modem! Yet the noise is deafening.

      With general SNR so low, the only marketing strategy that makes sense to me is: Show, don’t tell.

      It turns out we are already doing this, to an extent. We built a landing page back in December that was described by the top comment on HN as β€œthe worst website ever” because all it does is tell you to ssh exe.dev. Even there, what we show you is the infrastructure we built, that your agent needs. You need VMs. Lots of VMs. You need TLS and auth taken care of for you. You need it easy enough to use an agent can figure it out and have context window left over to write your program.

      This strategy is going really well! A lot of users have shown up, asking technical questions and we are growing rapidly.

      But along the way, it is easy for a lot of the value of the product to be hard to discover. I would like to tell you about one, but in the spirit of β€œshow, don’t tell” let me show you something I have done with it.

      Write private apps from your phone. Share them.

      So I am going to walk through an example of the most powerful feature that is hard (today) to discover from our home page:

      exe.dev/new

      Shelley is for your phone

      In our default VM image, we install codex and claude. We also include our own agent, Shelley. It is the second general-purpose programming agent we have written, and we wrote it for a very particular reason: we need an agent with a web UI.

      The need for a web UI is debatable on a desktop. There are clear advantages, but also drawbacks. But on mobile the case is clear. I’m not going to press a teeny tiny β€œCtrl+Z” button in an iOS terminal emulator. Sorry. I need all the affordances of mobile UI design from the past 19 years to be able to use such a tiny screen without a real keyboard.

      With this, you can visit exe.dev/new and type in a prompt. We start a VM, and feed the prompt to the copy of Shelley running on your VM. Because it’s a fresh, isolated VM we have no permission nagging, the agent is root and can do whatever it needs to get the job done. (This is wonderfully powerful, so much so you should be sure to only do it in an isolated VM.)

      When you are done, in the exe.dev web UI you will find a share URL under your VM:

      https://anycaster.exe.xyz?share=AVERYSECRETTOKEN
      

      Send that to anyone and they can login by verifying their email (no subscription required). Your app is private by default but shared easily.

      Example: anycaster

      This is the β€œshow, don’t tell” part of the post. Here is a real app I wrote, on my phone, to do serious work.

      As part of rolling out exe.dev to multiple regions, we are exploring putting all of our VMs automatically behind a global anycast network. The sort of thing you might buy, e.g. AWS GLB, but we are doing it with more direct vendors because, well, we intend to be a cloud provider. That means doing the work.

      So we rolled out a small set of global frontends and had a /24 anycast onto it by one of the vendors. It seemed to work. But I noticed surprisingly long ping times from home. So I added an endpoint to our frontends reporting location, much like you can do with Cloudflare:

      curl -s https://1.1.1.1/cdn-cgi/trace | grep colo
      

      What I found was my home network (in California) was misrouting to our Frankfurt frontend. Ouch.

      So I wanted to report this to the netops team we are working with so they could fix it, but I also wanted to build out our own tooling for testing routes we care about. (These tools exist, but I wanted to log history and use them from my phone.) So I went to exe.dev/new, and typed in:

      Hello. The anycaster server is designed to test exe.dev anycast.
      Setup Tailscale with the token:

TS_TOKEN=...
      
      Write a Go server that, every hour, uses
      `tailscale exit-node list --filter=` to find a Mullvad exit node in
      the right region, uses `tailscale set --exit-node=` to set the
      machine to use it, and then inspects the state of the world with:
      
          curl -s https://1.1.1.1/cdn-cgi/trace | grep colo
      
      This is our base standard about "where we are".
      Then run:
      
          curl -s http://<our anycast IP>/debug/who
      
      This is the test of our anycast, and tells you which of our servers
      we are connected to. You can get the list of frontends with: …
      
      I want you to test access from (and if these exact locations don't
      exist, something geographically near): …
      
      Then for each of these run:
          sudo mtr -rwzbs 200 --tcp -P 80 <our anycast IP>
      to give us a route to the target.
      
      Store the results in an sqlite DB.
      Each of these should connect to the geographically closest FE.
      Show the latest results on the web page of anycaster.
      

      This is a long prompt to type out on a phone, but I was walking to a meeting and I generally type far too much on my phone. The TS_TOKEN I keep in an Apple Note for apps like these. (I realize that’s not great! I am working on trying to get Tailscale setup to be easier on exe.dev. You could also for this case skip Tailscale and use the mullvad client directly. But I write apps with what I know.)

      I put my phone away, had the meeting, and then afterwards pulled it out and found this app waiting for me:

      A screenshot of anycaster, showing testing network routes from different
locations

      This is a recent screenshot. The original version had fewer frontends, and more βœ— results. Our anycast network needed a lot of tuning.

      I sent the share link to the netops team and asked them to fix it. They did! Minus whatever strange thing is happening in Chicago, I still need to investigate if that’s weirdness from Mullvad or from our anycast routing.

      Over the next couple of weeks I tweaked it. In my Shelley conversation history you’ll find prompts like:

      Two things please:
      - add an "edit in Shelley" button to anycaster to bring me here
      - relax Miami so it can connect to DAL
      

      And:

      We now have enough things to test that it's slow. I believe all
      the time is in running mtr. So let's do it in two phases:
      1. go through them all and collect status.
      2. when complete, go through the ones that fail and collect MTR.
      Make it clear what's happening visually.
      

      One big thing I changed was, every hour when this server ran the test, switching the exit node cut off my access to the web server. We have a plan to fix this generally for exe.dev users by improving some of our in-VM isolation, but until we did I wanted a quick fix. Out of curiosity, I typed in this prompt:

      The current anycaster works well. However, when it is running the tests, the reconfiguration of Tailscale appears to break the exe.dev sshd server and thus the website (which proxies through it).

      Is it possible to use Tailscale in an isolated network namespace?
      We can then run our connection tests like the mtr tool in that namespace.
      Please try implementing it.
      

      This was me, honestly, being silly. There is no world in which I would waste hours digging through the poor documentation around the netns command for a little dashboard app like this, all over a minor inconvenience. I could lose a couple of days to something ridiculous like this in the pre-LLM world. But when you’re typing idly on a phone you can dream of software you would never realistically write.

      When I came back Shelley left me with a summary and some ASCII art:

      ### Changes Made
      
      1. **`tailscale-netns.service`** - New systemd service that:
         - Creates a Linux network namespace called `tailscale`
         - Sets up a veth pair (`veth-host` ↔ `veth-ts`) to connect namespace to host
         - Configures IPs (10.200.200.1/24 on host, 10.200.200.2/24 in namespace)
         - Enables IP forwarding and NAT so namespace can reach the internet
         - Cleans up on stop
      
      ...
      
      ### How It Works
      
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚ HOST (exe.dev VM)                                       β”‚
      β”‚                                                         β”‚
      β”‚  eth0 ─────────────────────────────────► Internet       β”‚
      β”‚  (exe.dev sshd uses this directly)                      β”‚
      β”‚                                                         β”‚
      β”‚  veth-host (10.200.200.1)                               β”‚
      β”‚      β”‚                                                  β”‚
      β”‚      β”‚ NAT                                              β”‚
      β”‚      β–Ό                                                  β”‚
      β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
      β”‚  β”‚ NAMESPACE "tailscale"                     β”‚          β”‚
      β”‚  β”‚                                           β”‚          β”‚
      β”‚  β”‚  veth-ts (10.200.200.2)                   β”‚          β”‚
      β”‚  β”‚      β”‚                                    β”‚          β”‚
      β”‚  β”‚      β–Ό                                    β”‚          β”‚
      β”‚  β”‚  tailscale0 ────► Exit nodes ────► Tests  β”‚          β”‚
      β”‚  β”‚                                           β”‚          β”‚
      β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
      
      - **Host traffic** (including exe.dev sshd) uses eth0 directly
      - **Anycaster tests** run `curl`/`mtr` inside the namespace
      - **tailscale CLI** commands work from outside
        (socket-based communication)
      

      It worked.

      It dug its way through all these painfully undocumented Linux subsystems and isolated my Tailscale daemon in a network namespace. Every hour when the namespace starts changing its location, I can still access the web server. In fact I wrote a few more prompts to add nice status updates about the background work to the UI.

      The result of all of this is we are still investigating and trialing vendor products for our global network frontend, and studying how these products handle changing network routes. We hope to launch something soon.

      A note on the future of engineering

      What is surprising to me, in retrospect, is at no point in writing or using this app, or sharing it with half a dozen netops folks, did I ever look at the code. I spot checked the outputs, I looked at the bad mtr traces and made sure the conclusions fit with the data. We eventually cross-checked with RIPE. The output is correct. I am writing serious business tools that affect the company’s decisions, with prompts, from my phone.

      Again, I don’t want to oversell anything. I have used Shelley to prototype changes to our web UI and add minor features. But almost all of our changes to exe.dev infrastructure are done on desktop. We read source changes to critical systems very carefully.

      What we see here is a new kind of small private shareable app development appearing. You won’t be writing a web browser in it any time soon. But you might be building fun programs for friends or colleagues that you couldn’t fit in a spreadsheet.

      Write some fun games with your kids on the couch. Or clone a round of Only Connect. Before you know it you will think of something you need at work and want to share with the team.

  3. February 21, 2026
    1. πŸ”— IDA Plugin Updates IDA Plugin Updates on 2026-02-21 rss

      IDA Plugin Updates on 2026-02-21

      New Releases:

      Activity:

    2. πŸ”— r/york Bakers/ Cakes in York rss

      Bakers/ Cakes in York | I'm in York at the end of April and I'm looking for somewhere to have my birthday cake made, I don't want anything too crazy but I was wondering if anyone had any recommendations of bakery pages/ small Businesses/ shops that I could order from? (This is my cake from last year and I'd like something similar) submitted by /u/moonlight_brat
      [link] [comments]
      ---|---

    3. πŸ”— r/LocalLLaMA Favourite niche usecases? rss

      Favourite niche usecases? | submitted by /u/Figai
      [link] [comments]
      ---|---

    4. πŸ”— HexRaysSA/plugin-repository commits Merge pull request #22 from kevinmuoz/v1 rss
      Merge pull request #22 from kevinmuoz/v1
      
    5. πŸ”— Register Spill Joy & Curiosity #75 rss

      Where's software going? Is software… dead? Or will there be more software than we ever thought possible? Or is it going to disappear, into the agents? Or is it going to grow and grow and then truly eat the world? Who's going to create it?

      There's few things right now that I find more fascinating than these questions. Of course, I don't have answers and I don't think anyone has. Guesses, sure. Theories, absolutely. Anecdotes? Here's some.

      Geoffrey Litt, standing in a hotel gym, asked Claude for a workout plan and got an app that guides him through the plan. Huh. Then Ryan Florence threw away his workout app and just asked ChatGPT's voice mode to guide him through a workout. Where's the software gone?

      A couple weeks back I thought: maybe I should set up Clawdbot and hook it up to our shopping list in Todoist and then my wife and I can use a group chat to manage that list. We could even use voice messages:__ hey pal I'm in the car woops wait a second … yeah we're out of paper towels. That'd be cool, right? But then: wait, why would I need Todoist? State could just live in that conversation or on Clawdbot's disk, right? And then: but sometimes I do want a better UI than a group chat, don't I? But when and why?

      This week I was this close to typing something into the Slack search bar. I already had some keywords and combinations of keywords ready to go. I had already put the cursor in when I remembered that we have agg, an internal tool that Tim blind-coded and that connects to Slack and Google workspaces and whatnot, and so I asked Amp: hey, didn't so and so say that we they migrated this thing and now we all need to? Amp via agg found it in five seconds. No keyword, no UI. Okay.

      As Alex says: "It feels like a maxim is emerging - if your software is useful to agents, your product is going to be 10x more valuable than before, but if your software is built for humans, you're dead." And Sahil Lavigna says that "gh is the new GitHub."

      But there is still software, isn't there? I'm typing this through software. And I had Amp create many hundreds of lines of personal software for me, but that software is so personal that I won't release it, because why bother? The cost of generalizing it is higher than the cost of creating it. So you won't ever see it. Invisible software.

      Say that I do release some software that took me an hour to create. Or let's say six hours. A small useful app, with some heft to it. You know what I mean. A good workout tracker. Or a little menu bar app. Or a browser extension. Say I sell it for $5. Won't a hundred competitors be able to recreate what I did in thirty minutes? Prices will go to zero. Why bother?

      Last anecdote. I've been meaning to create a little booklet. A physical thing, printed professionally. Weeks ago I had Nano Banana and ChatGPT tag-teaming and they created the logo that'd go on the front. Then work stalled because I couldn't be bothered to look up the dimensions the print company needs and CMYK and PDFs and all of that and ugh, please. So I sent exactly that to ChatGPT: here's the URL of the product description, here's the logo in 4 formats, here's the mockup someone (wink wink) created, please help me man. It ran for 15, 20 minutes and gave me a PDF. I uploaded it on the printer's website, following the 6 steps ChatGPT outlined for me, got an error, told ChatGPT about the error also asked for some adjustments, got a new PDF, uploaded it, got the green checkmark, put my credit card in and now the booklet's on its way.

      I then checked what ChatGPT did, in agent mode, and turns ou: it wrote a lot of code. It essentially created the PDF I needed by writing Python. Many, many lines of Python. And now they're gone and no one would've seen then if I hadn't looked.

      So, where's the software going?

      • We at Amp think the coding agent is dead. Or maybe we should've said it's solved. Or that the text editor is dead. Point being: what we have right now isn't the future. There's more to build. And this is the model that made us realize it: GPT-5.3-Codex.

      • Don't believe us? Say it to our face. Most of the Amp team is in Singapore this week. Join us on Thursday. (I'm writing this at the airport.)

      • Harness engineering: leveraging Codex in an agent-first world, on the OpenAI blog. This is some of the best writing on agents hitting the real world and where this ride is going. You should read the whole thing, but this bit in particular stayed with me: "As Codex's throughput increased, many conventional engineering norms became counterproductive. The repository operates with minimal blocking merge gates. Pull requests are short-lived. Test flakes are often addressed with follow-up runs rather than blocking progress indefinitely. In a system where agent throughput far exceeds human attention, corrections are cheap, and waiting is expensive."

      • How will OpenAI compete? by Benedict Evans. Great, as always.

      • Chris Lattner took a close look at the C compiler produced by Claude Code. I have to admit that I started reading with the expectation that it's going to be about the compiler internals and what the AI got right and what it got wrong. And yes, that's in there, but there's more: thoughts about the AI in general, about IP law, about the shifting role of software engineers, about AI use at Modular. "Lower barriers to implementation do not reduce the importance of engineers; instead, they elevate the importance of vision, judgment, and taste. When creation becomes easier, deciding what is worth creating becomes the harder problem. AI accelerates execution, but meaning, direction, and responsibility remain fundamentally human."

      • Entertaining and interesting: How does Docusign have 7,000 employees?

      • Can Opus 4.6 do Category Theory in Lean? You know me: I don't understand any of the formulas in there and when I read "endofunctor" I do that Homer Simpson stare, but still (or because?) I found this very fascinating. "When this layer becomes trivial, we get to spend our time on the parts that actually matter: choosing the right abstractions, seeing the connections between structures, deciding what's worth formalizing in the first place. The proof assistant becomes less of a bureaucratic obstacle and more of a genuine thinking tool. We get to build higher." When category theory and formal specification languages become mainstream due to AI, call me.

      • AI fatigue is real and nobody talks about it: "When each task takes less time, you don't do fewer tasks. You do more tasks. Your capacity appears to expand, so the work expands to fill it. And then some. Your manager sees you shipping faster, so the expectations adjust. You see yourself shipping faster, so your own expectations adjust. The baseline moves. Before AI, I might spend a full day on one design problem. I'd sketch on paper, think in the shower, go for a walk, come back with clarity. The pace was slow but the cognitive load was manageable. One problem. One day. Deep focus." I find this very fascinating to think about, because it's true, isn't it? Back in the olden days, say in 2024, you could have a full day of programming in which you did nothing but program and yet there would still be moments of mindless execution that let you recover from moments of high concentration and focus. Now, with the mindless execution being done by the mindless, there's nothing left to act as a buffer between the intense moments. Except maybe distraction.

      • No Skill. No Taste.

      • A "a Matt Levine style explanation of how OAuth works" given by Blaine, who, 19 years ago, "wrote the first sketch of an OAuth specification". We need more explanations like this!

      • This post has a lot of really interesting thoughts on where software as a business is going. This part here, on building financial software, is illustrative of some trends, I think: "Zero custom parsers. Zero industry-specific classifiers. Why? Because frontier models already know how to navigate a 10-K. They know that Home Depot's ticker is HD. […] Frontier models already know how to parse SEC filings from their training data. They understand the structure of a 10-K, where to find revenue recognition policies, how to reconcile GAAP and non-GAAP figures. You don't need to build a parser. The model IS the parser. Feed it a 10-K and it can answer any question about it. […] The data isn't worthless. But the 'making it searchable' layer, which is where a lot of the value and pricing power lived, is collapsing." Replace 'searchable' with other abilities and you see how it applies to more than just software to navigate SEC filings. And then, of course, there's competition: "The critical insight is that competition doesn't increase linearly--it explodes combinatorially. You don't go from 3 incumbents to 4. You go from 3 to 300. And that's what craters pricing power. Before LLMs, each vertical had 2-3 dominant players commanding premium prices because the barriers to entry were insurmountable. That math changes completely when 50 AI-native startups can offer 80% of the capability at 20% of the price." We already had five thousand TODO apps. What's the next category of software in which there'll be five thousand alternatives, selling for $5.99?

      • Similarly: "if your product isn't a system of record that ai tools can be built on top of, you're increasingly hard to justify keeping" But then the question is: how hard is it to reproduce that system of record? Todoist: easy. Your company's pay slips? Hard. Analytics? Performance data? Monitoring? Errors? Tickets?

      • Sean Goedecke compared how the recently released "fast" modes by OpenAI and Anthropic differ: Two different tricks for fast LLM inference. Interesting stuff, especially since he now collected and responded to some of the comments he got.

      • I Sold Out for $20 a Month and All I Got Was This Perfectly Generated Terraform. This is some real stuff -- some true stuff. I love the honesty and the humility. I love the "band of Eastern European programmers who chain smoke during calls and whose motto is basically 'we never miss a deadline'" and I love this part here: "I also just have trouble with the idea that this is my career and the thing I spend my limited time on earth doing and the quality of it doesn't matter. I delight in craftsmanship when I encounter it in almost any discipline. I love it when you walk into an old house and see all the hand crafted details everywhere that don't make economic sense but still look beautiful. I adore when someone has carefully selected the perfect font to match something. […] When I asked my EVE friend about it on a recent TeamSpeak session, he was quiet for awhile. I thought that maybe my moral dilemma had shocked him into silence. Then he said, 'You know what the difference is between you and me? I know I'm a mercenary. You thought you were an artist. We're both guys who type for money.'"

      • 15,597 tok/s. Holy shit. And here's how they did it. Are there any physical or theoretical limits that would stop someone from doing the same for, say, GPT-5.3-Codex in a few years?

      • The Only Moat Left Is Money: "The effort is gone. Effort was the filter. I launched something last week. 14 people signed up -- no ads, just a couple of posts. 14 real people who didn't have to. That number is tiny and it felt like something. Then I sat down to think about what it would take to grow it and I couldn't look at that math for very long. The people winning mostly had a head start. Or they have money. Usually both. When creation was hard, skill was the differentiator: you had to actually be good to make something worth showing. Now the barrier is near zero, so you need reach. Reach costs money or it costs years. Probably both." I'm not sure I believe that effort doesn't count anymore, but the game is changing, which is fascinating and scary and exciting and crazy.

      • Child's Play, subtitled: tech's new generation and the end of thinking. Excellent.

      • The Software Development Lifecycle Is Dead. Not too sure about the specifics, but you know me: I agree.

      • "I built an agent for researching, coding, and running generative art animations for 16-segment displays. Will open source code and hardware design files soon."

      • Andy Coenen, who built the wonderful isometric nyc, on The Software Industrial Revolution. It's very, very good. To pick just one of the parts worth picking: "The old golden age is over, and it ain't coming back - no more 'rest and vest', no more ping-pong offsites and five-star catered lunches. But a new 'golden age' is coming - no more nights staring red-eyed at empty stack overflow issues, no more weeks of alignment meetings to ship a prototype. I believe it's never been a better time to build - not just software but anything you can dream of. The world is yours if you embrace this new reality and learn how to really use these tools". The other part worth mentioning is the one about "personal apps": sure, yes, grandma won't use AI to write her own sudoku app, but, as Andy suggests here, there are so many other people -- _professionals! -- _who sure would love to build better research tools for themselves.

      • I'm pretty sure this just changed how I think about intelligence: why aren't smart people happier? (That little "what if you booted up an AI in ancient Greece?" thought experiment is fun too.)

      • "Two old engineers were talking of their lives and boasting of their greatest projects. One of the engineers explained how he had designed the largest bridge ever made."

      • Robin Sloan on how far AI can expand: flood fill vs. the magic circle. Interesting to think about, but I can't help but wonder: does it matter that AI can't touch the physical world, when your career is 99% digital and you're looking at a screen a lot?

      • Jason Fried was on the David Senra podcast. What a perspective this guy has. Inspiring.

      • I've seen many, many, many stand-up specials over the years, because I enjoy stand-up comedy a lot and very earnestly believe it's one of the highest art forms we humans have created. Yes, I'm serious. I'm German. If there's one thing I don't joke around about it's comedy. But a stand-up special that makes me actually laugh out loud is a rare one. Kevin Nealon's latest special Loose in the Crotch did that. I nearly spit out food. God damn did I fall in love with that special. I've watched it twice since Tuesday. I know it's not everyone's cup of tea and if you don't like it you should keep that to yourself. But let me know if you do.

      • "This Fab Faux recording of most of side two of Abbey Road is a live, in the studio performance for a two camera video shoot. In the end, there were only three minor guitar fixes and each section was recorded in no more than three takes (most were two). There are NO added overdubs within this performance. The audio is pure." Uploaded fourteen years ago. I think I started watching this video in 2010, when it was uploaded to Vimeo. Treat yourself.

      If you have thoughts on where this is going or want to find out, you should subscribe:

    6. πŸ”— r/LocalLLaMA PSA: The software β€œShade” is a fraudulent, plagiarized copy of Heretic rss

      Three days ago, the following repository was published, which its β€œcreator” has been aggressively promoting on various channels since then:

      https://github.com/assemsabry/shade

      The entire source code in the repository is plagiarized from Heretic (https://github.com/p-e-w/heretic), with only the project name and the copyright notice replaced, claiming β€œoriginal authorship” of everything. The repository does not acknowledge Heretic as its source, and has erased the commit history and the names of all Heretic contributors.

      I and several others have called the repository owner out, but he has deleted all issues and tried to cover up his wrongdoing by adding some bogus β€œadditional features” using an AI agent. A quick look at the source files, however, reveals that they are still 95% identical to Heretic’s code. In some cases, only the copyright notice was replaced.

      I can only assume that the ultimate goal is to push malware of some sort, and strongly advise people to stay clear of this plagiarized repository.

      This is one of several incidents where malicious actors tried to profit from Heretic’s surging popularity during the past days, when it reached #1 on the GitHub trending chart and was posted in various social feeds that cater to scammers.

      Please also see https://github.com/p-e-w/heretic/issues/167

      I’m doing everything in my power to keep Heretic clean and available to everyone. Thank you for your encouragement in the past few months, it means the world to me!

      submitted by /u/-p-e-w-
      [link] [comments]

    7. πŸ”— r/york Roman Bath at York (under pub) rss
    8. πŸ”— r/wiesbaden Streik am Dienstag 24.02.2026 rss
    9. πŸ”— r/reverseengineering processhacker mcp ( this is dynamic mcp server for runtime analysis and process hacking. it is like processhacker but for ai agents) rss
    10. πŸ”— r/Leeds Any decent smaller comedy clubs? rss

      Hello everyone so I like City Varieties and The Cardigan Arms pub for comedy in Leeds. Can anyone recommend a decent alternative that isn't too rowdy with decent seats?

      City Varieties is a beautiful venue and has the bigger comedians but I do like watching local comics too. I don't drive so has to be City centre, Kirkstall, Horsforth, Headingley kinda areas.

      Any suggestions would be super for me & my friendπŸ‘Œ cheers! X

      submitted by /u/MasterMembership4506
      [link] [comments]

    11. πŸ”— r/Yorkshire What's your opinion on Wakefield and the 5 towns? rss

      An area of Yorkshire rarely mentioned in discussions, thought i'd shine a light on it!

      submitted by /u/aaaaaaa_aaaaaa_aaa
      [link] [comments]

    12. πŸ”— r/Yorkshire Thinking of visiting the Dales towards the end of summer. rss

      Not entirely sure where the best place to stay would be, I'm thinking skipton/settle/ingleton as I am relying on trains from Manchester, want to do some hiking/photography

      Are any of them considered cheaper to stay at than others ?

      Never been in this area before, so know nothing about the place other than anyone whose ever been grabs you and slaps you about until you say you'll visit :D

      submitted by /u/zibafu
      [link] [comments]

    13. πŸ”— r/wiesbaden Gratis Rollenspiel Tag 2026: 28.3., Phantasos Arena Wiesbaden, 15 Runden rss

      Anmeldung: Discord-Server der Phantasos Arena: https://discord.gg/2A6vjQ58yg

      Daggerheart, DnD, Shadowrun, Cyberpunk RED, Mausritter, Alien, Mothership, Outgunned uvm

      submitted by /u/Bitter-Secretary6006
      [link] [comments]

    14. πŸ”— r/LocalLLaMA they have Karpathy, we are doomed ;) rss

      they have Karpathy, we are doomed ;) | (added second image for the context) submitted by /u/jacek2023
      [link] [comments]
      ---|---

    15. πŸ”— HexRaysSA/plugin-repository commits feat: cleanup and new repo rss
      feat: cleanup and new repo
      
    16. πŸ”— r/york York Minster: Historic Graffiti rss

      York Minster: Historic Graffiti | submitted by /u/Julija82
      [link] [comments]
      ---|---

    17. πŸ”— r/Yorkshire Yorkshire Water rss

      Just got a bill with 8% price increase. Bastards.

      Edit - looking more closely at the bill, the actual increase it’s much higher (around 30%) - it’s states β€˜an average of 8% increase’ on the bill, despite their press releases saying 5.6%. Don’t know how they get away with such lack of clarity!

      submitted by /u/Exciting_Stretch_847
      [link] [comments]

    18. πŸ”— r/reverseengineering [Technical] Implementing a CRT-Free Sovereign PE: Custom IAT reconstruction and Hybrid EXE/DLL loading without MSVCRT rss
    19. πŸ”— matklad Wrapping Code Comments rss

      Wrapping Code Comments

      Feb 21, 2026

      I was today years old when I realized that:

      • Code and code comments ideally should be wrapped to a different column.
      • For comments, the width should be relative to the start of the comment.

      It’s a good idea to limit line length to about 100 columns. This is a physical limit, the width at which you can still comfortably fit two editors side by side (see Size Matters). Note an apparent contradiction: the optimal width for readable prose is usually taken to be narrower, 60–70 columns. The contradiction is resolved by noticing that, for code, indentation eats into usable space. Typically, code is much less typographically dense than prose.

      Still, I find comment blocks easier to read when they are wrapped narrower than the surrounding code. I want lines to be wrapped at 100, and content of comments to be wrapped at 70 (unless that pushes overall line to be longer than 100). That is, I want layout like this (using 20/30 rulers instead of 70/100, for illustrative purposes):

      // Top level comments
      // can be this wide.
      const S = struct {
          // Nested comments are
          // also this wide, but
          // are shifted right.
          fn f() void {
              switch (value) {
                  0 => {
                      // But there is
                      // a hard limit.
                  }
              }
          }
      }
      

      This feels obvious in retrospect, but notably isn’t be well-supported by the tools? The VS Code extension I use allows configuring dedicated fill column for comments, but doesn’t make it relative , so indented comment blocks are always narrower than top-level ones. Emacs M-q also doesn’t do relative wrapping out of the box!


      Aside on hard-wrapping: should we bother with wrapping comments at all? Can’t we rely on our editor to implement soft-wrapping? The problem with soft- wrapping is that you can’t soft-wrap text correctly without understanding its meaning. Consider a markdown list:

      A list:
        * item one,
        * item two.
      

      If the first item is long enough to necessitate wrapping, the wrapped line should also be indented, which requires parsing the text as markdown first:

      A list:
        * item one which is long enough
          necessitate wrapping,
        * item two.
      
    20. πŸ”— Stephen Diehl Optimal Caverna Gameplay via Formal Methods rss

      Optimal Caverna Gameplay via Formal Methods

      I always win at Caverna (Uwe Rosenberg's classic European worker placement tabletop board game). Always. But "always" just means "every time so far," and I needed something with more mathematical permanence. So I formalized the entire game in Lean 4 and proved that my strategy is the unique weakly dominant pure strategy across every possible game configuration. My friends think this is excessive. My friends also lose at Caverna. Unrelated, I don't get invited to board game night much anymore.

      Caverna: The Cave Farmers is the 2013 sequel to Agricola, a game about feeding dwarfs who live in caves and do a suspicious amount of farming. You place workers, gather resources, breed animals, excavate caverns, furnish rooms, and at the end of 12 rounds the scoring formula totals up everything you've accomplished and everything you failed to accomplish. It's a good game. It's also, in the 2-player variant, a finite deterministic perfect-information system with discrete phases, which means it's a labeled transition system, which means it's amenable to formal verification. So I did that.

      The project is about 3,000 lines of Lean 4 spread across 19 modules: 11 definition files modeling the complete game (all 24 action spaces, all 48 unique furnishing tiles, the expedition loot system, board geometry, harvest schedule, scoring formula) and 8 theorem files containing 176 machine-checked proofs. The model covers all 2,880 possible 2-player game setups (144 card orderings times 20 harvest marker placements). The main result is that furnishing rush is the weakly dominant strategy. It is the optimal response to every opponent, in every setup, regardless of which cards come out when or where the harvest markers land.

      The interactive proof blueprint has the full derivation with a dependency graph showing how every theorem connects. The source code compiles.

      Labeled Transition Systems

      A labeled transition system (LTS) is a triple \((S, A, T)\) where \(S\) is a set of states, \(A\) is a set of actions, and \(T \subseteq S \times A \times S\) is a transition relation specifying which state changes are legal. You start in some initial state satisfying an init predicate. The system evolves by taking actions: if \((s, a, s') \in T\), you can move from state \(s\) to state \(s'\) by performing action \(a\). A state is reachable if there's a finite chain of transitions from an initial state to it. A property is an invariant if it holds on every reachable state.

      In Lean 4, the LTS is three fields:

      structure LTS (State : Type) (Action : Type) where
        init : State -> Prop
        trans : State -> Action -> State -> Prop

      Reachability is an inductive type with two constructors: initial states are reachable, and if you can reach \(s\) and take action \(a\) to get to \(s'\), then \(s'\) is reachable:

      inductive Reachable (sys : LTS State Action) : State -> Prop where
        | init : forall s, sys.init s -> Reachable sys s
        | step : forall s a s', Reachable sys s -> sys.trans s a s' -> Reachable sys s'

      In plain terms: the LTS is the rulebook encoded as a state transition relation. init says what the starting board looks like, and trans encodes every legal move as a state-to-state constraint. The entire Caverna rulebook fits in about 250 lines of Lean: a 160-line applyPlacement function matching on all 24 action spaces with their effects, and a 90-line cavernaLTS definition wiring up phase transitions, harvest logic, and round progression. It's a relation rather than a function because Caverna has sub-choices within actions (sow grain vs. sow vegetable, which furnishing tile to install, which loot to take), so a single action space placement can lead to multiple successor states. The relation captures all of them.

      Once you have init and trans, you prove a property holds on all reachable states without enumerating them. You prove the base case (holds on initial states) and the inductive step (every valid transition preserves it). This is strictly stronger than testing. Testing checks specific play sequences. An invariant proof covers every sequence of legal moves across every setup, including sequences no human would ever play.

      Here's a small fragment of the Caverna LTS to give the flavor of how a round of play works as a state machine:

      Fragment of the game state machine for one round of play

      Each state records the player's resources and remaining dwarfs to place. Transitions correspond to action space selections. With 13 initial action spaces and 2 dwarfs, even a single round produces \(13 \times 12 = 156\) placement sequences. Over 12 rounds with family growth, the branching factor is astronomical, but the LTS doesn't care. The structure is finite, and every path through it is covered by the invariant proofs.

      The Game as a Transition Relation

      Board games are natural LTS candidates. In Caverna 2-player, the states are the complete game configurations (round number, phase, both players' inventories, board layouts, available action spaces), the actions are dwarf placements and harvest events, and the transition relation encodes the rulebook.

      The game has five phases that cycle within each round:

      inductive Phase where
        | placeP1      -- player 1 places a dwarf
        | placeP2      -- player 2 places a dwarf
        | harvest      -- harvest phase (feeding, breeding, fields)
        | roundEnd     -- round cleanup, advance to next round
        | gameOver     -- game has ended

      The full game state tracks everything both players could possibly have, plus the global state of the board:

      structure GState where
        round : Nat
        phase : Phase
        p1 : FullPlayer
        p2 : FullPlayer
        p1IsFirst : Bool
        placementsLeft : Nat
        acc : AccState
        occupiedSpaces : List ActionSpaceId := []
        harvestSchedule : Nat -> HarvestEvent
        wishIsUrgent : Bool := false

      The transition relation is a single function with a match on (gs.phase, act). Each case is a game rule. Here's the core of the placement logic for Player 1:

      def cavernaLTS (schedule : Nat -> HarvestEvent) :
          TransitionSystem.LTS GState GameAction where
        init := fun gs => gs = initFullGState schedule
        trans := fun gs act gs' =>
          match gs.phase, act with
          | .placeP1, .place space choice =>
            spaceAvailable gs.round space = true /\
            spaceUnoccupied gs space = true /\
            gs.placementsLeft > 0 /\
            (let (p1', acc') := applyPlacement gs.p1 gs space choice
             let newPlacements := gs.placementsLeft - 1
             let newOccupied := space :: gs.occupiedSpaces
             if newPlacements == 0 then
               gs' = { gs with p1 := p1', acc := acc',
                               phase := .harvest, placementsLeft := 0,
                               occupiedSpaces := newOccupied }
             else
               gs' = { gs with p1 := p1', acc := acc',
                               phase := .placeP2,
                               placementsLeft := newPlacements,
                               occupiedSpaces := newOccupied })
          -- ... Player 2, harvest, round end, game over ...
          | _, _ => False

      That last line is beautiful. | _, _ => False says: any action not explicitly listed is illegal. The transition relation is closed. No undefined behavior, no edge cases, no "the rules don't say I can't." If it's not in the match, it doesn't happen.

      The applyPlacement function is a 250-line match on all 24 action spaces, each encoding the exact effect from the rulebook. Blacksmithing forges a weapon from ore and runs an expedition. Wish for Children can only grow your family if you have a dwelling with capacity. Excavation gives you stone and lets you carve out cavern/tunnel pairs. Every sub-choice (sow grain vs. sow vegetable, build small pasture vs. large pasture, which furnishing tile to install) is a branch in the ActionChoice type.

      The Game Timeline

      Game timeline for 2-player Caverna

      Twelve rounds. Green nodes are harvest rounds where dwarfs must be fed. Two critical milestones: "Wish for Children" at round 4 enables the first family growth (from 2 dwarfs to 3), and "Family Life" at round 8 enables the second (3 to 5, eventually). One new action space reveals each round, growing from 13 to 24 available choices. The interaction between when cards flip and when harvests hit is the clock that drives the entire strategic analysis.

      The timeline matters because of this: without growth you get 44 total dwarf placements across all 12 rounds. With one growth at round 4, you get 47. With both growths, you get 56. That's a 27% increase in total actions from growing your family as fast as possible, and since actions are the binding constraint on everything else (scoring, food, resources), the 12-placement gap between "no growth" and "both growths" is the single most important strategic lever in the game.

      Refinement Types: Making Illegal States Unrepresentable

      One of the most satisfying patterns in the formalization is using dependent types to make illegal game states impossible to construct. Weapons are the clearest example. In the physical game, weapon strength ranges from 1 to 14. You could model this as a bare Nat and hope nobody passes in 0 or 15. Or you could make the type system enforce the constraint:

      structure Weapon where
        strength : Nat
        h__min : strength >= 1
        h__max : strength <= 14

      Every Weapon value carries proof that its strength is in range. This means forgeWeapon must produce evidence that the forged strength is valid, and upgradeWeapon must show that incrementing stays within bounds:

      def forgeWeapon (oreSpent : Nat) (h__pos : oreSpent >= 1) : Option Weapon :=
        if h : oreSpent <= maxInitialWeaponStrength then
          some { strength := oreSpent
               , h__min := h__pos
               , h__max := by simp [maxInitialWeaponStrength] at h; omega }
        else
          none
      
      def upgradeWeapon (w : Weapon) : Weapon :=
        if h : w.strength < maxWeaponStrength then
          { strength := w.strength + 1
          , h__min := by omega
          , h__max := by simp [maxWeaponStrength] at h; omega }
        else
          w

      The by omega calls are Lean's linear arithmetic tactic closing the proof obligations automatically. If w.strength < 14, then w.strength + 1 <= 14. If oreSpent >= 1, then strength >= 1. The type checker verifies this at compile time. No weapon in the entire formalization can ever have strength 0 or 15.

      The same pattern shows up in RoundPlacements, which carries proof that all four dwarf placements within a round go to distinct action spaces:

      structure RoundPlacements where
        firstPlayer1  : ActionSpaceId
        secondPlayer1 : ActionSpaceId
        firstPlayer2  : ActionSpaceId
        secondPlayer2 : ActionSpaceId
        h__distinct12 : firstPlayer1 != secondPlayer1
        h__distinct13 : firstPlayer1 != firstPlayer2
        h__distinct14 : firstPlayer1 != secondPlayer2
        h__distinct23 : secondPlayer1 != firstPlayer2
        h__distinct24 : secondPlayer1 != secondPlayer2
        h__distinct34 : firstPlayer2 != secondPlayer2

      Six distinctness proofs, one for each pair. You literally cannot construct a RoundPlacements where two dwarfs share an action space. The game rule is baked into the type.

      The Weapon System

      Weapon strength growth path

      Weapons are forged at strength 1 through 8 (costing that many ore), then grow by +1 per expedition, capping at 14. The blue range is forgeable; the peach range requires expedition grinding. Cattle loot unlocks at strength 9, which means even if you forge at max (8 ore), you still need at least one expedition before you can get cattle.

      The loot table is an inductive type with minimum strength requirements for each item:

      def LootItem.minStrength : LootItem -> Nat
        | .allWeaponsPlus1    => 1
        | .dog                => 1
        | .wood               => 1
        | .grain              => 2
        | .sheep              => 2
        | .stone              => 3
        | .donkey             => 3
        | .ore                => 4
        | .wildBoar           => 4
        | .stableFree         => 5
        | .gold2              => 6
        | .furnishCavern      => 7
        | .buildFencesCheap   => 8
        | .cattle             => 9
        | .dwelling           => 10
        | .sow                => 11
        | .breedTwoTypes      => 12
        | .furnishCavernAgain => 14

      At strength 1 you can loot a dog or a stick. At strength 14 you can furnish a second cavern for free. The loot count at key strengths: 3 at strength 1, 13 at strength 8, 18 at strength 14. That's 5 premium items (cattle, dwelling, sow, breed, second furnish) locked behind the expedition grind. Whether the ore investment is worth it is the core question of the weapon rush archetype, and the answer turns out to be: no, not quite.

      The Universal Food Crisis

      Every strategy in Caverna must solve the same problem before anything else. Both players face a food deficit at the first harvest:

      Food conversion network

      Player 1 starts with 1 food and needs 4 (2 dwarfs times 2 food each). Player 2 starts with 2 food and needs 4. The gaps are 3 and 2 respectively. This is proven as universal__food__crisis:

      theorem universal_food_crisis :
          feedingCost 2 0 - startingFoodP1 = 3 /\
          feedingCost 2 0 - startingFoodP2 = 2 := by decide

      The implication (food__crisis__shapes__all__strategies) is that every viable archetype must spend its first few actions on food acquisition. There's no "skip feeding and go straight to scoring" option. The begging marker penalty is \(-3\) points each, and the theorem absolute__floor__is__neg55 shows that a player who takes zero actions across all rounds scores \(-55\). The food crisis isn't optional; it's structural.

      The food conversion network itself is a delightful mess of exchange rates. Cattle gives 4 food per animal, wild boar gives 3, sheep gives 2, grain gives 1, vegetables give 2, gold converts lossily at \(n-1\) (1 gold is wasted as overhead), and rubies are emergency food at 2+ each. And then there are donkeys, which have a superlinear pairing bonus that I spent an embarrassing amount of time formalizing:

      def donkeyFoodValue (n : Nat) : Nat :=
        let pairs := n / 2
        let remainder := n % 2
        pairs * 3 + remainder * 1
      
      theorem donkey__superlinear :
          donkeyFoodValue 2 > donkeyFoodValue 1 + donkeyFoodValue 1 := by decide

      Two donkeys together yield 3 food, but individually they'd give 1 + 1 = 2. The whole is greater than the sum of its parts. Uwe Rosenberg almost certainly didn't think about this as a super-additivity property of a set function, but that's what it is, and Lean can prove it.

      The Feeding Cascade

      The feeding function is the core survival mechanic. When harvest hits, each dwarf eats 2 food (offspring eat 1). If you don't have enough food, the deficit cascades through your resources: try food first, then convert grain (1 food each), then convert vegetables (2 food each), then take begging markers for anything remaining.

      def FullPlayer.feed (p : FullPlayer) : FullPlayer :=
        let cost := p.dwarfs * 2 + p.offspring * 1
        if p.food >= cost then
          { p with food := p.food - cost }
        else
          let deficit := cost - p.food
          let p' := { p with food := 0 }
          let grainUsed := min p'.grain deficit
          let p'' := { p' with grain := p'.grain - grainUsed }
          let deficit' := deficit - grainUsed
          let vegUsed := min p''.vegetables (deficit' / 2 + deficit' % 2)
          let vegFood := min (vegUsed * 2) deficit'
          let p''' := { p'' with vegetables := p''.vegetables - vegUsed }
          let deficit'' := deficit' - vegFood
          { p''' with beggingMarkers := p'''.beggingMarkers + deficit'' }

      And a normal harvest is the composition of three phases in one elegant pipeline:

      def FullPlayer.normalHarvest (p : FullPlayer) : FullPlayer :=
        p.fieldPhase.feed.breedingPhase

      Field phase harvests your sown crops. Feed pays the food cost (or generates begging markers). Breeding adds one animal per type that has at least two. The order matters: fields produce food before feeding, and breeding happens after, so newborn animals don't need to be fed in the same round they appear. This ordering rule from page 6 of the Caverna rulebook is encoded in the function composition.

      Furnishing Tiles and the BonusContext

      There are 48 furnishing tiles in the game, each with resource costs, base victory points, and (for some) end-game bonus scoring formulas that depend on your final board state. The BonusContext struct captures everything a furnishing tile might look at when computing its bonus:

      structure BonusContext where
        stoneInSupply        : Nat := 0
        oreInSupply          : Nat := 0
        sheepCount           : Nat := 0
        cattleCount          : Nat := 0
        numAdjacentDwellings : Nat := 0
        numArmedDwarfs       : Nat := 0
        numDwarfs            : Nat := 0
        rubyCount            : Nat := 0
        grainCount           : Nat := 0
        vegCount             : Nat := 0
        farmAnimalCount      : Nat := 0
        hasAnyWeapon         : Bool := false
        allDwarfsArmed       : Bool := false
        numYellowTagTiles    : Nat := 0

      The bonus point function is a match on tile ID that implements every scoring formula from the Caverna appendix:

      def furnishingBonusPoints (fid : FurnishingId) (ctx : BonusContext) : Nat :=
        match fid with
        | .stoneStorage   => ctx.stoneInSupply
        | .oreStorage     => ctx.oreInSupply / 2
        | .weavingParlor  => ctx.sheepCount / 2
        | .milkingParlor  => ctx.cattleCount
        | .stateParlor    => ctx.numAdjacentDwellings * 4
        | .mainStorage    => ctx.numYellowTagTiles * 2
        | .weaponStorage  => ctx.numArmedDwarfs * 3
        | .suppliesStorage => if ctx.allDwarfsArmed then 8 else 0
        | .broomChamber   => if ctx.numDwarfs >= 6 then 10
                              else if ctx.numDwarfs >= 5 then 5
                              else 0
        | .prayerChamber  => if ctx.hasAnyWeapon then 0 else 8
        | _ => 0

      The Prayer Chamber is particularly nasty. It gives you 8 free points, but only if none of your dwarfs have weapons. The moment you forge a single weapon, it drops to zero. This creates a genuine strategic dilemma, formalized and proved:

      theorem prayer__chamber__vs__weapons :
          furnishingBonusPoints .prayerChamber {} = 8 /\
          furnishingBonusPoints .prayerChamber { hasAnyWeapon := true } = 0 :=
        by constructor <;> native_decide

      The furnishing rush archetype gets its power from stacking compatible bonus tiles: Office Room overhangs, State Parlor for +4 per adjacent dwelling (up to +16), Broom Chamber for +5 or +10 based on dwarf count, Prayer Chamber for +8 (no weapons). These bonuses compound. A player with 5 dwarfs, 4 adjacent dwellings, and no weapons can pull +39 bonus points from three tiles. That's why furnishing rush has the highest ceiling.

      The Strategy Space

      With the LTS, scoring function, and food economy in place, the next question is: what are the high-level plans a player can actually follow? The transition relation defines roughly \(10^{30}\) possible play sequences, but the overwhelming majority of them are nonsense. Consider a player who forges a weapon in round 1 (spending precious ore), then never goes on an expedition, sows a field in round 3 but never harvests grain from it, builds a pasture in round 5 but never acquires an animal, and spends round 8 taking Starting Player for no reason. This player ends the game with a weapon they never used, an empty pasture, a field of rotting grain, and a final score somewhere around 20 points after begging penalties. The LTS covers this path. The invariant proofs hold over it. It is a legal play sequence, and it is also exactly the kind of play sequence that a five-year-old would produce by picking action spaces at random.

      The vast majority of the state space looks like this: diffuse, uncommitted paths where the player dabbles in everything and commits to nothing, hemorrhaging tempo and food while accumulating resources that never convert to points. Only a handful of coherent "channels" through the action space graph lead to competitive scores against a rational opponent. A strategy archetype is a consistent pattern of action space usage across the full 12-round game. I identified eight by examining which action spaces and furnishing tiles naturally cluster together: if two actions compete for the same resources or unlock the same scoring categories, they belong to the same archetype. For example, Blacksmithing, Expedition, and weapon-dependent loot all form the weapon rush cluster, while Excavation, Housework, and bonus-scoring furnishings form the furnishing rush cluster. The classification was manual, but the Lean proofs validate it: every archetype's score estimates are derived from the formalized game rules, and the dominance relations are machine-checked over the resulting payoff matrix.

      inductive StrategyArchetype where
        | furnishingRush
        | weaponRush
        | animalHusbandry
        | miningHeavy
        | balanced
        | peacefulFarming
        | rubyEconomy
        | peacefulCaveEngine

      But why eight? The answer comes from the structure of the game's scoring channels. A scoring channel is a resource-to-points pathway through the action spaces: the furnishing channel runs through Excavation and Housework into furnished caverns, the weapon channel runs through Blacksmithing into expeditions and loot, and so on. There are exactly six:

      inductive ScoringChannel where
        | furnishing        -- Excavation + Housework -> caverns -> furnishing tiles
        | weaponExpedition  -- Blacksmithing + Adventure -> weapons -> loot
        | agriculture       -- Clearing + Sustenance -> fields -> crops
        | animalBreeding    -- Sheep Farming + Donkey Farming -> pastures -> animals
        | mining            -- Ore/Ruby Mine Construction -> mines
        | economy           -- Starting Player + Ore Trading + Ruby Mining -> gold/rubies

      Each archetype commits to a primary channel, and the eight archetypes cover all six channels (all__channels__covered). Some channels conflict over shared resources: weapon/expedition and mining both need ore, furnishing and mining both need stone. The channelsConflict function encodes these tensions. With 47 productive actions per game and each channel needing at least 6 dedicated actions to reach competitive scoring, a player can invest in at most 2 channels seriously (budget__bounds__channels). The archetypes are the maximal compatible channel combinations: 6 single-primary archetypes plus 2 hybrids (balanced spreads across 3 channels, peaceful cave engine uses weapons for food rather than scoring).

      The exhaustivity proof (archetype__channel__surjection) shows every channel has at least one archetype as its primary. If you're going to score points, you have to go through a channel, and every channel is represented. No viable strategy falls outside the classification.

      Each archetype has an estimated scoring ceiling (best case across all setups) and floor (worst case):

      Score estimate ranges for the eight strategy archetypes

      Furnishing rush tops the ceiling at 140 and ties for the highest floor at 60. Weapon rush matches the floor but caps at 120. The others trail off. The interval chart makes the dominance relationships immediately visible: furnishing rush's bar extends further right (higher ceiling) than any other archetype, and its left endpoint (floor) is as good as anyone's.

      The Dominance Hierarchy

      Before building the payoff matrix, we can establish partial dominance from the score estimates alone. If strategy \(s\) has both a higher ceiling and a higher floor than strategy \(t\) (with at least one strict inequality), then \(s\) dominates \(t\) in score estimates.

      Dominance partial order over strategy archetypes

      Solid green arrows show dominance via score estimates (higher ceiling and floor). Dashed green arrows indicate dominance established only through the full payoff matrix. Animal Husbandry and Mining Heavy are incomparable in score estimates (one has a higher ceiling, the other a higher floor), but both are weakly dominated by Furnishing Rush in the payoff matrix.

      The incomparability is proven constructively:

      theorem incomparable__strategies__exist :
          not (dominates .miningHeavy .animalHusbandry) /\
          not (dominates .animalHusbandry .miningHeavy) := by decide

      The Payoff Matrix

      The next question is what happens when two players with potentially different archetypes collide over the shared action spaces. You model this as an \(8 \times 8\) payoff matrix \(M\) where \(M_{ij}\) is the estimated score for the row player when row plays archetype \(i\) and column plays archetype \(j\).

      The 8x8 payoff matrix

      The green border marks the Furnishing Rush row (weakly dominant). Blue borders highlight the diagonal (mirror matchups, always depressed). Cell color intensity reflects payoff magnitude: light green (55) through coral (135).

      In Lean, the matrix is a function from Fin 8 -> Fin 8 -> Int with all 64 entries hardcoded from the strategy analysis:

      def payoffMatrix : Fin 8 -> Fin 8 -> Int
        | 0, 0 =>  85 | 0, 1 => 130 | 0, 2 => 135 | 0, 3 => 130
        | 0, 4 => 125 | 0, 5 => 135 | 0, 6 => 135 | 0, 7 => 130
        | 1, 0 =>  80 | 1, 1 =>  75 | 1, 2 => 100 | 1, 3 =>  85
        | 1, 4 =>  95 | 1, 5 => 105 | 1, 6 => 100 | 1, 7 =>  85
        -- ... 48 more entries ...

      A word on what "estimated" means here. The matrix entries are derived from the formalized game rules (scoring function, food costs, action budgets), not from Monte Carlo simulation or guesswork. But they are estimates in the sense that I haven't exhaustively solved every possible 47-move sequence within each archetype. How sensitive is the result? The min__dominance__margin__is__5 theorem proves that the minimum gap between the furnishing rush row and the next-best entry in each column is exactly 5 points (occurring in the mirror matchup column). Perturbing any single entry by less than 5 points cannot flip the dominance relation. Even a 10-point swing in a single cell would only affect one column, not the global result. The Nash equilibrium is more sensitive (it depends on the diagonal), but the mirror matchup at 85 would need to drop below the next-best response value of 80 before the equilibrium shifts. The downstream proofs are machine-checked over the matrix as stated, so the formal guarantees are exact conditional on these entries.

      We can do better than conditional. Replace each scalar entry with a closed interval bounding the true payoff. Mirror matchups get tight bounds (the point estimate \(\pm 2\), since both players execute the same plan and contention is symmetric). Cross-archetype matchups get wider bounds (\(\pm 5\), reflecting uncertainty in cross-strategy interaction). The interval payoff matrix looks like this:

      structure PayoffInterval where
        lo : Int
        hi : Int
        valid : lo <= hi
      
      def intervalPayoff : Fin 8 -> Fin 8 -> PayoffInterval
        | 0, 0 => { lo := 83, hi := 87, valid := by omega }   -- mirror: eps=2
        | 0, 1 => { lo := 125, hi := 135, valid := by omega }  -- vs weapon: eps=5
        -- ... 62 more entries ...

      Robust weak dominance asks: for every column and every non-furnishing alternative, does the furnishing rush lower bound meet or exceed the alternative's upper bound? If yes, then furnishing rush dominates for ALL true payoff matrices within these intervals, not just the point estimates.

      theorem robust_weak_dominance :
          forall (col : Fin 8) (alt : Fin 8), alt != 0 ->
            (intervalPayoff 0 col).lo >= (intervalPayoff alt col).hi := by decide

      It passes. The tightest cell is column 0 (mirror matchup): furnishing rush \([83, 87]\) vs. weapon rush \([77, 83]\). The margin is \(83 - 83 = 0\), a weak tie at the boundary. All other columns have margins of 20 or more (non__mirror__columns__robust). If the error bounds were widened by just 1 point, column 0 would fail (fragility__column__0), which tells us exactly where the result is most fragile and what the tolerance is.

      The interval analysis also gives us robust welfare bounds. Nash welfare lies in \([166, 174]\) and the social optimum in \([200, 220]\). Even in the best case for selfish play and worst case for cooperation, the social optimum still exceeds Nash welfare (robust__price__of__anarchy). The prisoner's dilemma structure is not an artifact of the point estimates.

      The first row is at least as large as every other row in every column. That's it. That's the whole result. In game theory this property is called weak dominance: a strategy \(\sigma^*\) is weakly dominant if for all opponent strategies \(o\) and all alternative strategies \(\sigma\),

      $$
      M(\sigma^*, o) \geq M(\sigma, o)
      $$

      The Lean proof is furnishing__rush__weakly__dominant, and it says exactly this:

      theorem furnishing_rush_weakly_dominant :
          forall (row : Fin 8) (col : Fin 8),
            payoffMatrix 0 col >= payoffMatrix row col := by decide

      For all opponents, for all alternatives, the furnishing rush payoff is greater than or equal. The proof is decide: Lean's kernel checks all 64 cells. Nobody has to trust my arithmetic.

      The Contention Effect

      The diagonal of the payoff matrix is always depressed relative to the off-diagonal entries:

      theorem diagonal__always__depressed :
          forall (i : Fin 8), exists (j : Fin 8), j /= i /\
            payoffMatrix i j > payoffMatrix i i := by decide

      Every mirror matchup scores below at least one non-mirror matchup in the same row. This is the contention effect: when both players pursue the same archetype, they fight over the same action spaces. Furnishing rush mirrors compete for excavation and housework. Weapon rush mirrors fight over blacksmithing. Mining mirrors clash on ore mine construction.

      The game punishes sameness. It's just that the punishment for sameness (85) is still less painful than the punishment for picking something worse (60 to 80 against an opponent who picked furnishing rush).

      Nash Equilibrium and the Prisoner's Dilemma

      From weak dominance, the game theory falls out like dominoes. The best response function \(\text{BR} : \text{Strategies} \to \text{Strategies}\) maps each opponent strategy to the row-maximizer in the corresponding column. Since furnishing rush achieves the column maximum everywhere, BR is the constant function:

      $$
      \text{BR}(x) = \text{FurnishingRush} \quad \forall x
      $$

      A Nash equilibrium is a fixed point of the joint best-response correspondence: a pair \((a, b)\) where \(a = \text{BR}(b)\) and \(b = \text{BR}(a)\). Since BR is constant, the only fixed point is (FurnishingRush, FurnishingRush). Existence and uniqueness in three lines:

      theorem exactly__one__nash__equilibrium :
          (exists a b, isNashEquilibrium a b) /\
          (forall a b, isNashEquilibrium a b ->
            a = .furnishingRush /\ b = .furnishingRush) :=
        <<.furnishingRush, .furnishingRush, furnishing_mirror_is_nash>,
         furnishing_mirror_unique_nash>

      The uniqueness proof works by case-splitting on all 64 strategy pairs and observing that only one satisfies the Nash condition:

      theorem furnishing__mirror__unique__nash :
          forall a b : StrategyArchetype,
            isNashEquilibrium a b ->
              a = .furnishingRush /\ b = .furnishingRush := by
        intro a b h
        have h1 := h.1; have h2 := h.2
        cases a <;> cases b <;> simp_all [bestResponse, isNashEquilibrium]

      There is exactly one pure Nash equilibrium and it is the one where both players do the same thing. Which is a little tragic.

      The Price of Anarchy

      Both players score 85 in the Nash equilibrium. If they could somehow coordinate on different strategies (say furnishing rush versus animal husbandry), the combined welfare would be \(135 + 75 = 210\) instead of \(85 + 85 = 170\).

      Nash welfare vs. social optimum

      The price of anarchy is the ratio of the social optimum to the Nash welfare:

      $$
      \text{PoA} = \frac{210}{170} = \frac{21}{17} \approx 1.24
      $$

      Selfish play costs about 19% of the social optimum. The Lean proof is decide, because it's just arithmetic:

      theorem price__of__anarchy__ratio :
          socialOptimumValue = 210 /\ nashWelfare = 170 /\
          210 * 17 = 170 * 21 := by decide

      The depressing implication is that both players would prefer the other person to play something different, but neither can unilaterally deviate without making themselves worse off. You're stuck at 85 each, staring across the table at someone who also read this blog post.

      The Scoring Function

      A natural question is where the scores can actually land. The end-game scoring function totals up every positive and negative contribution:

      def FullPlayer.score (p : FullPlayer) : Int :=
        let animals := (p.dogs + p.sheep + p.donkeys +
                        p.wildBoars + p.cattle : Int)
        let grainPts := (((p.grain + p.fieldsWithGrain * 3) + 1) / 2 : Int)
        let vegPts := ((p.vegetables + p.fieldsWithVeg * 2) : Int)
        let rubyPts := (p.rubies : Int)
        let dwarfPts := ((p.dwarfs + p.offspring) : Int)
        let pasturePts := (p.smallPastures * 2 + p.largePastures * 4 : Int)
        let minePts := (p.oreMines * 3 + p.rubyMines * 4 : Int)
        let goldPts := (p.gold : Int)
        let furnPts := p.furnishings.foldl (fun acc fid =>
          acc + ((furnishingSpec fid).basePoints : Int)) (0 : Int)
        let missingTypes :=
          (if p.sheep == 0 then 1 else 0) + (if p.donkeys == 0 then 1 else 0) +
          (if p.wildBoars == 0 then 1 else 0) + (if p.cattle == 0 then 1 else 0)
        let unusedPenalty := (p.unusedMountain + p.unusedForest : Int)
        let beggingPenalty := (p.beggingMarkers * 3 : Int)
        let missingPenalty := (missingTypes * 2 : Int)
        animals + grainPts + vegPts + rubyPts + dwarfPts + pasturePts +
        minePts + goldPts + furnPts - unusedPenalty - beggingPenalty - missingPenalty

      The theoretical floor is \(-55\) points: a player who takes zero actions across all rounds gets +2 for their two starting dwarfs, \(-8\) for missing all four farm animal types, \(-22\) for 22 unused board spaces, and \(-27\) from 9 begging markers at \(-3\) points each. The theoretical ceiling is 202, computed by summing the independent maxima of every scoring category. But you can't reach 202, because every scoring category competes for the same 56 dwarf placements over 12 rounds. You cannot simultaneously furnish 12 caverns, build 5 mines, breed 20 animals, and sow 10 fields. The action budget is the binding constraint. The practical ceiling is around 140, achievable by furnishing rush when uncontested.

      The theoretical range is 195 points (from \(-55\) to 140), but the \(-55\) floor requires literally doing nothing for 12 rounds, so it's not strategically meaningful. The practical floor across all archetypes is around 45 (peaceful farming in the worst setup), giving a practical range of about 95 points. The dominant strategy's own range is 80 points (60 to 140), proven as dominant__strategy__variance. The safety margin (dominant__strategy__safety__margin) is 115 points above the theoretical floor, but the more relevant number is the gap between furnishing rush's worst case (60) and the worst viable archetype's worst case (45): even in the most adversarial setup, choosing correctly gains you at least 15 points.

      Degenerate Combos

      The formalization surfaced several combos that look broken in isolation. Four stand out.

      The Beer Parlor converts grain to gold at a 3:2 ratio before scoring: every 2 grain becomes 3 gold. Without it, 20 grain scores 10 points (the grain formula is \(\lceil n/2 \rceil\)). With it, the same 20 grain becomes 30 gold. That's a 3x multiplier on the base grain value, and the gold stacks with every other scoring category. The beer__parlor__max__gold theorem proves beerParlorGold 20 = 30. The catch is accumulating 20 grain: you need aggressive sowing across multiple rounds, which means burning actions on Slash-and-Burn and Sustenance instead of excavating caverns.

      Dogs have no cap on sheep-watching: \(n\) dogs on one meadow guard \(n+1\) sheep. Stack 10 dogs and a Weaving Parlor and you're looking at 51 points from a single tile combo. The catch is that acquiring 10 dogs requires roughly 6 expedition actions (dogs are strength-1 loot, so they're easy to get but each expedition burns an action), and those 6 actions aren't building caverns or growing your family.

      The Writing Chamber prevents up to 7 points of losses for 2 stone. This sounds like a license to ignore entire scoring categories, and it is, except the categories you'd skip (unused spaces, missing animal types) are exactly the ones that furnishing rush covers naturally. The players who benefit most from Writing Chamber are the ones playing badly enough to accumulate large penalties, and at that point the cap at 7 means it barely dents a truly neglected board.

      The Prayer Chamber is the most elegant. It gives 8 points for zero resource investment, but the bonus evaporates the moment any dwarf forges a weapon. The prayer__chamber__vs__weapons theorem proves this is a hard binary: 8 or 0, nothing in between. It creates a genuine strategic fork (peaceful play vs. expedition play) that the rest of the game's design carefully preserves.

      None of these breaks the dominant strategy. Every combo faces the same binding constraint: 56 total actions with full family growth. You can produce flashy numbers in one scoring category, but you can't cover all of them. The vanilla furnishing rush, with its balanced approach to board coverage, family growth, and furnishing synergies, remains the ceiling. The combos are features, not bugs: they're what make the game worth replaying even after you know the dominant strategy, because they give you something to pivot into when the standard lines are blocked.

      Is the Game Solved?

      A serious mathematician would call this "a formal analysis of a model, with the caveat that the model's parameters are estimated rather than computed." That is a legitimate and common pattern in applied math, operations research, and mechanism design. The proofs are machine-checked and airtight, but they are conditional on the payoff matrix. The theorems say: IF the payoff matrix is this, THEN furnishing rush is weakly dominant, the Nash equilibrium is unique, and the price of anarchy is 21/17. They do not say: the true expected score of an optimally-played furnishing rush against an optimally-played animal husbandry opponent is exactly 135. That number is an estimate derived from the formalized action economy, not from exhaustive search of the game tree.

      How confident should you be in the estimate? The min__dominance__margin__is__5 theorem proves the narrowest gap between furnishing rush and its closest competitor is 5 points. If the estimation methodology has a systematic bias of less than 5 points (plausible, given that the estimates are derived from the same scoring functions and action budgets used in actual play), the qualitative result holds. A correlated shift of 5+ points across an entire row of the matrix could break weak dominance in the mirror column, but would require the methodology to consistently overestimate furnishing rush's performance in self-contention scenarios. Given that the mirror penalty (85 vs. 130+ off-diagonal) already models severe contention, this seems unlikely.

      At the archetype level, yes, the game is solved. For any 2-player setup, regardless of which of the 2,880 configurations you draw, the model says the optimal pure strategy is furnishing rush. The proof does not depend on any specific card ordering or harvest marker placement. What I haven't done is compute the exact sequence of dwarf placements for each specific setup. That would require solving a game tree with roughly \(10^{30}\) nodes. But the archetype-level result is the one that matters for actual play. You know the plan. The within-archetype decisions (which cavern to excavate first, which furnishing tile to prioritize) are tactical, not strategic. They don't change the answer.

      The 3+ player variants remain open. The payoff matrix becomes a rank-3 tensor with 512 entries, action space contention shifts from binary to combinatorial, and the pure dominance result almost certainly fails, since furnishing tile scarcity under three-way competition makes mixed equilibria the likely outcome. If someone wants to formalize that, I would love to read the proof.

      Play the Dominant Strategy

      Play furnishing rush. Excavate aggressively in rounds 1 through 4. Get Office Room early for the overhang bonuses. Get State Parlor for +4 per dwelling. Grow to 5 dwarfs and pick up Broom Chamber for +10 bonus. If your opponent does something different, you score 125 to 135 and they score 60 to 105. If your opponent also plays furnishing rush, you both land at 85, which is worse than the cooperative optimum of 210 but better than any unilateral deviation. You'll sit there at the table, both of you excavating caverns as fast as you can, both knowing that one of you could sacrifice 10 points to give the other 50, but neither willing to be the one who blinks.

      Prisoner's dilemma in a cave. Uwe Rosenberg probably didn't intend this, but the formal analysis says it's there.

      The code is built on Lean 4 v4.28.0 with Mathlib, modeling all 24 action spaces, all 48 furnishing tiles, the complete expedition loot table, the board grids, and the full 12-round 2-player harvest schedule. I think Caverna is a beautifully designed game, which is exactly why it was satisfying to find that even under formal analysis the strategic structure holds together so well. A badly designed game would have a trivially dominant strategy that makes the game boring. Caverna's dominant strategy comes with a 19% welfare tax on the mirror matchup, which means the game is always more interesting when your opponent does something unexpected (the Nash equilibrium is rarely where the fun is, in games or in life), which means in practice people don't always play the dominant strategy, which means the game stays fun.

      Good game design, it turns out, is robust to being solved. Now you know the optimal play. Whether you follow it is, thankfully, not a theorem.

  4. February 20, 2026
    1. πŸ”— IDA Plugin Updates IDA Plugin Updates on 2026-02-20 rss

      IDA Plugin Updates on 2026-02-20

      New Releases:

      Activity:

    2. πŸ”— Simon Willison Adding TILs, releases, museums, tools and research to my blog rss

      I've been wanting to add indications of my various other online activities to my blog for a while now. I just turned on a new feature I'm calling "beats" (after story beats, naming this was hard!) which adds five new types of content to my site, all corresponding to activity elsewhere.

      Here's what beats look like:

      Screenshot of a fragment of a page showing three entries from 30th Dec 2025. First: [RELEASE] "datasette-turnstile 0.1a0 β€” Configurable CAPTCHAs for Datasette paths usin…" at 7:23 pm. Second: [TOOL] "Software Heritage Repository Retriever β€” Download archived Git repositories f…" at 11:41 pm. Third: [TIL] "Downloading archived Git repositories from archive.softwareheritage.org β€” …" at 11:43 pm.

      Those three are from the 30th December 2025 archive page.

      Beats are little inline links with badges that fit into different content timeline views around my site, including the homepage, search and archive pages.

      There are currently five types of beats:

      That's five different custom integrations to pull in all of that data. The good news is that this kind of integration project is the kind of thing that coding agents really excel at. I knocked most of the feature out in a single morning while working in parallel on various other things.

      I didn't have a useful structured feed of my Research projects, and it didn't matter because I gave Claude Code a link to the raw Markdown README that lists them all and it spun up a parser regex. Since I'm responsible for both the source and the destination I'm fine with a brittle solution that would be too risky against a source that I don't control myself.

      Claude also handled all of the potentially tedious UI integration work with my site, making sure the new content worked on all of my different page types and was handled correctly by my faceted search engine.

      Prototyping with Claude Artifacts

      I actually prototyped the initial concept for beats in regular Claude - not Claude Code - taking advantage of the fact that it can clone public repos from GitHub these days. I started with:

      Clone simonw/simonwillisonblog and tell me about the models and views

      And then later in the brainstorming session said:

      use the templates and CSS in this repo to create a new artifact with all HTML and CSS inline that shows me my homepage with some of those inline content types mixed in

      After some iteration we got to this artifact mockup, which was enough to convince me that the concept had legs and was worth handing over to full Claude Code for web to implement.

      If you want to see how the rest of the build played out the most interesting PRs are Beats #592 which implemented the core feature and Add Museums Beat importer #595 which added the Museums content type.

      You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.

    3. πŸ”— r/Yorkshire Where's the best place to live outside of Leeds? rss

      Hey r/Yorkshire, I've very recently split from my girlfriend, I'm a guy in my late 20s and currently living in York. I'm looking to move out of York as, let's be real, how tf am I supposed to live in York on my own at that kind of price.

      I work hybrid in Leeds and am looking to find somewhere new no more than about an hour's drive away. Thing is, I've only just moved to York in the winter, haven't really planted my roots, so would like to find somewhere that has a little going on so I can meet people. In my down time I love a lot of hiking, pubs, live music, I'm not expecting to find somewhere with all that, but somewhere that is relatively close to that kind of vibe, bit of an arty young vibe if possible.

      I'm looking at Skipton, Halifax, keighly, and Harrogate at the moment, but let's rarely come up for under 700, but I'm open to other areas as I'd like to get to know the place. Just looking for recommendations really or thourghts on the above? I've heard mixed on keighly, and worried Skipton be a bit of a retirement village (no idea if that's true). Love to hear your thoughts. Thanks

      submitted by /u/tmatthews98
      [link] [comments]

    4. πŸ”— r/Leeds Comedy clubs? rss

      Does anyone know where there are good stand up comedy acts in Leeds? Looking for venues that could be fun to visit

      submitted by /u/C0sm1c_3l3ph4nt
      [link] [comments]

    5. πŸ”— r/york Yorkshire Ghostbusters in this sub - Are any of you folks interested in a quick meet for a patch swap? I'll be in York visiting family next Tuesday - Thursday. The Chesapeake Ghostbusters logo ties our areas together through the use of Calvert and Crossland heraldry on the Maryland flag. rss

      Yorkshire Ghostbusters in this sub - Are any of you folks interested in a quick meet for a patch swap? I'll be in York visiting family next Tuesday - Thursday. The Chesapeake Ghostbusters logo ties our areas together through the use of Calvert and Crossland heraldry on the Maryland flag. | submitted by /u/CaptainAwwsum
      [link] [comments]
      ---|---

    6. πŸ”— @binaryninja@infosec.exchange Want to get the best decompilation for your own architecture? Glenn's post mastodon

      Want to get the best decompilation for your own architecture? Glenn's post shows how you can enable Binary Ninja to give you the best decompilation around. Part 1 covers the basics up to disassembly, assembly, and control- flow. Next week, part 2 covers lifting and our first decompilation results.

      https://binary.ninja/2026/02/20/quark-platform- part-1.html

    7. πŸ”— @binaryninja@infosec.exchange Going live now! Don't miss some fantastic demos by our RE//verse instructors! mastodon

      Going live now! Don't miss some fantastic demos by our RE//verse instructors! https://www.youtube.com/vector35/live

    8. πŸ”— News Minimalist 🐒 Supreme Court blocks Trump's tariffs + 11 more stories rss

      In the last 2 days ChatGPT read 63550 top news stories. After removing previously covered events, there are 12 articles with a significance score over 5.5.

      [5.8] Supreme Court limits presidential authority on global tariffs β€”ctvnews.ca(+47)

      The U.S. Supreme Court on Friday struck down President Trump’s global tariffs, ruling that his unilateral use of emergency economic powers exceeded executive authority and violated constitutional limits.

      The 6-3 ruling determined that the 1977 International Emergency Economic Powers Act does not authorize unilateral tariffs. The conservative majority applied the "major questions" doctrine, asserting that significant economic actions require explicit congressional approval rather than broad executive interpretations of emergency statutes.

      Challenged by twelve states and various businesses, the ruling could force the refund of $175 billion. Although the administration explores alternative legal routes, the decision curtails a central pillar of Trump's agenda.

      [5.7] FDA to require one study for new drug approvals to speed patient access β€”abcnews.com(+14)

      The FDA will drop its longtime requirement for two clinical trials for new drug approvals, making a single study the default standard to accelerate patient access to novel medical treatments.

      Commissioner Marty Makary and Deputy Vinay Prasad stated that modern research precision makes the old standard obsolete. The shift aims to reduce bureaucracy, potentially sparking a surge in drug development for common diseases that previously required more extensive testing than rare conditions.

      Although the agency has accepted single studies for rare diseases since the 1990s, this policy expansion contrasts with recent restrictive decisions regarding vaccines and gene therapies, leaving some industry stakeholders seeking clarity.

      Highly covered news with significance over 5.5

      [6.5] Nasal spray vaccine protects mice from multiple respiratory pathogens β€” nature.com (+11)

      [6.4] Feline and human cancers share striking genetic similarities, offering new research avenues β€” news.cornell.edu (+16)

      [6.0] Invisible galaxy confirmed, composed almost entirely of dark matter β€” es.wired.com (Spanish) (+5)

      [6.0] Google and Apple integrate generative AI music features into consumer apps β€” financialpost.com (+23)

      [6.0] UN experts find genocide hallmarks in Sudan's el-Fasher campaign β€” apnews.com (+21)

      [5.9] PromptSpy Android malware uses generative AI to adapt its persistence β€” bleepingcomputer.com (+8)

      [5.6] Trump order protects glyphosate production and manufacturer immunity β€” theguardian.com (+8)

      [5.6] Afghanistan's hunger crisis worsens as aid cuts force WFP to turn away children β€” abcnews.com (+3)

      [5.5] White House allows ICE to detain refugees for rescreening β€” theguardian.com (+13)

      [5.5] Iran nuclear talks collapse, pushing oil prices toward $100 as U.S. deploys military assets to the Middle East β€” economictimes.indiatimes.com (+65)

      Thanks for reading!

      β€” Vadim


      You can personalize this newsletter with premium.


      Powered by beehiiv

    9. πŸ”— r/wiesbaden SchΓΆn in der Sauna schwitzen rss
    10. πŸ”— r/wiesbaden Meine Englische Bulldogge braucht dringend Gaumensegel-OP, organized by Mario Beierstettel rss
    11. πŸ”— r/wiesbaden Meine Englische Bulldogge braucht dringend Gaumensegel-OP, organized by Mario Beierstettel rss
    12. πŸ”— r/york The ubiquitous tourist rss

      In the '70s, a friend of mine's gaff was a second floor flat rented in Stonegate (for Β£11 a week - we think it was part of the Minster's estate). Anyway, the point is that, even back then, during the holidays and in summer, we would have fun leaning out the of the window to spot the odd local trying to worm their way through the dawdling crowds.

      submitted by /u/flame2spear
      [link] [comments]

    13. πŸ”— r/wiesbaden https://gofund.me/a4849d36a rss
    14. πŸ”— r/Yorkshire Yorkshire Dales in January 2027 rss

      Yorkshire Dales in January 2027 | I visited the Dales last summer, finally, and fell absolutely in love with the people, nature, stone barns, endless greenery, etc. I plan to visit this summer again to do some hiking, but also planning a January 2027 trip. I really want to see the area with possible snow, frost, and with luck, the snowdrops starting to bloom! I usually take the train from Leeds to Skipton, then the bus to Grassington where I stay for a few days. For winter 2027, I wanted to either stay in Appletreewick, or Burnsall, for a few days. I mostly plan to write, read, and if the weather permits, take a few walks. Would these places be nice for winter, or would you recommend slightly larger places, like Settle? Also, is snow fairly common in the area in winter? Thanks!! Photo credit: Getty submitted by /u/skepticalbureaucrat
      [link] [comments]
      ---|---

    15. πŸ”— r/LocalLLaMA The top 3 models on openrouter this week ( Chinese models are dominating!) rss

      The top 3 models on openrouter this week ( Chinese models are dominating!) | the first time i see a model exceed 3 trillion tokens per week on openrouter! the first time i see more than one model exceed a trillion token per week ( it was only grok 4 fast month ago) the first time i see chinese models destroying US ones like this submitted by /u/keb_37
      [link] [comments]
      ---|---

    16. πŸ”— r/reverseengineering Reverse engineering the Creative Katana V2X soundbar to be able to control it from Linux rss
    17. πŸ”— r/Leeds Where do people play Warhammer? rss

      The actual Warhammer shop aren't interested if you're not buying a starter set, patriot games isn't great and there's nothing else near the city centre I'm aware of. So where do people play Warhammer? Are there any clubs out there?

      submitted by /u/sendmemespls97
      [link] [comments]

    18. πŸ”— r/wiesbaden Gute OrthopΓ€den rss

      Wer ist ein guter Arzt/Γ„rztin fΓΌr OrthopΓ€die in Wiesbaden wo man nicht wie am Fließband behandelt wird? Empfehlungen fΓΌr Kassenpatienten bitte 😊

      submitted by /u/kassen_patient
      [link] [comments]

    19. πŸ”— r/Yorkshire I went for a walk. rss

      I went for a walk. | Too many stunning photos to choose from. submitted by /u/Background_Pay_3113
      [link] [comments]
      ---|---

    20. πŸ”— r/LocalLLaMA GGML.AI has got acquired by Huggingface rss

      GGML.AI has got acquired by Huggingface | submitted by /u/Time_Reaper
      [link] [comments]
      ---|---

    21. πŸ”— r/Yorkshire Just love this place❀️ rss
    22. πŸ”— r/LocalLLaMA Deepseek and Gemma ?? rss

      Deepseek and Gemma ?? | submitted by /u/ZeusZCC
      [link] [comments]
      ---|---

    23. πŸ”— r/Leeds Barbers in city centre that can help me choose a haircut rss

      I always have a terrible time when going to a barber, or choosing a haircut. I have a hard time deciding what kind of haircut or style looks good on me.

      I live in the city centre; Are there any good barbers/hair salons within walking distance where I can just tell them to get creative with my hair? So far I've been to RMUK, lords' barbering and scissorhands barber but they all looked uncomfortable when I told them I'm not quite sure what to do with my hair...

      submitted by /u/Hideharuhaduken420
      [link] [comments]

    24. πŸ”— r/Leeds Deliveroo drivers rss

      I am absolutely sick of being nearly run over by uber eats/Deliveroo bikers on electric bikes with no helmets or lights going all over pedestrian routes. I’m pretty sure people have died being hit by these?? Is there anything we can do to bring this matter to the council?

      submitted by /u/Weary-Bus8436
      [link] [comments]

    25. πŸ”— r/york Is there anyone in York that sells Kinnie? rss

      Is there anyone in York that sells Kinnie? | It's a Maltese drink that's quite hard to find in the UK. I've previously bought a 24 pack from Amazon, but that's out of my budget right now, and I'm heading to York next week so πŸ€žπŸ€žπŸ˜‚ (buying as part of a bday gift for my partner) submitted by /u/LPanda2190
      [link] [comments]
      ---|---

    26. πŸ”— r/reverseengineering Decrypt and extract voice guidance MP3 prompts from Sony WH-1000XM4 encrypted voice packs. AES key extracted via Bluetooth firmware dump of the Airoha MT2811 SoC. rss
    27. πŸ”— r/Harrogate Photo booths in Harrogate - digital passport photos rss

      Are there any photo booths in town that do the digital photos for passports (providing the code for online passport applications)? Thanks

      submitted by /u/purte
      [link] [comments]

    28. πŸ”— r/LocalLLaMA Kimi has context window expansion ambitions rss

      Kimi has context window expansion ambitions | submitted by /u/omarous
      [link] [comments]
      ---|---

    29. πŸ”— r/york The Shambles doesn’t feel like The Shambles anymore rss

      Walked through on a Saturday and it genuinely felt like I was in a slow-moving queue for a theme park ride.

      I get that it’s popular and good for the city but it used to feel charming and slightly chaotic in a nice way. Now it feels like everyone’s there for the same photo.

      Is it just nostalgia talking, or has the vibe shifted a bit over the last few years?

      submitted by /u/1ChanceChipmunk1
      [link] [comments]

    30. πŸ”— vercel-labs/agent-browser v0.13.0 release

      Minor Changes

      • ebd8717: Added new diff commands for comparing snapshots, screenshots, and URLs between page states. You can now run visual pixel diffs against baseline images, compare accessibility tree snapshots with customizable depth and selectors, and diff two URLs side-by-side with optional screenshot comparison.
    31. πŸ”— r/Yorkshire Who here believes that the fish should have at least come with a signed apology for that price tag? rss

      Who here believes that the fish should have at least come with a signed apology for that price tag? | submitted by /u/PubLogic
      [link] [comments]
      ---|---

    32. πŸ”— r/Yorkshire Activities to take my gf to for her birthday? (South Yorkshire) rss

      It’s my girlfriend’s birthday in a couple of weeks and I want to surprise her with something to do. She likes creative activities like pottery and painting. She’s also into gothic stuff like tarot, spells, ghosts and taxidermy etc. She doesn’t drink so anything pub/bar focused is out of the question but if there’s somewhere I could get a beer nearby, I wouldn’t mind that haha.

      Any suggestions would be greatly appreciated 😁

      submitted by /u/Both-Opportunity7602
      [link] [comments]