🏡


to read (pdf)

  1. I don't want your PRs anymore
  2. JitterDropper | OALABS Research
  3. DomainTools Investigations | DPRK Malware Modularity: Diversity and Functional Specialization
  4. EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild
  5. Neobrutalism components - Start making neobrutalism layouts today

  1. April 22, 2026
    1. 🔗 WerWolv/ImHex Nightly Builds release

      Nightly 0c2e881 Changelog

      • Fix nullptr deref when opening ImHex without a provider on frame 1 (#2718)
      • Fix/utf8 log alignment (#2717)
    2. 🔗 backnotprop/plannotator v0.19.0 release

      Follow @plannotator on X for updates


      Missed recent releases? Release | Highlights
      ---|---
      v0.18.0 | Annotate focus & wide modes, OpenCode origin detection, word-level inline plan diff, Markdown content negotiation, color swatches
      v0.17.10 | HTML and URL annotation, loopback binding by default, Safari scroll fix, triple-click fix, release pipeline smoke tests
      v0.17.9 | Hotfix: pin Bun to 1.3.11 for macOS binary codesign regression
      v0.17.8 | Configurable default diff type, close button for sessions, annotate data loss fix, markdown rendering polish
      v0.17.7 | Fix "fetch did not return a Response" error in OpenCode web/serve modes
      v0.17.6 | Bun.serve error handlers for diagnostic 500 responses, install.cmd cache fix
      v0.17.5 | Fix VCS detection crash when p4 not installed, install script cache path fix
      v0.17.4 | Vault browser merged into Files tab, Kanagawa themes, Pi idle session tool fix
      v0.17.3 | Sticky lane repo/branch badge overflow fix
      v0.17.2 | Supply-chain hardening, sticky toolstrip and badges, overlay scrollbars, external annotation highlighting, Conventional Comments
      v0.17.1 | Pi PR review parity, parseRemoteUrl rewrite, cross-repo clone fixes, diff viewer flash fix
      v0.17.0 | AI code review agents, token-level annotation, merge-base diffs


      What's New in v0.19.0

      v0.19.0 lands updates across all four surfaces — Plan/Annotate, Code Review, Pi, and Claude Code. Four PRs, one from a first-time contributor.

      Plan / Annotate

      GitHub-Flavored Markdown

      The in-app reader now matches GitHub's rendering across blocks and inline. Raw HTML blocks (<details>, <summary>, and friends) render through marked plus DOMPurify, with nested markdown preserved; innerHTML is set imperatively via ref+useEffect so React reconciliation doesn't collapse an open <details> on rerender. GitHub alerts (> [!NOTE], [!TIP], [!WARNING], [!CAUTION], [!IMPORTANT]) render with inline Octicons and Primer colors, honoring prefers-color-scheme. Directive containers (:::kind ... :::) cover project-specific callouts, and every heading now carries a slug-derived anchor id.

      Inline gains came alongside: bare URL autolinks with trailing-punctuation trimming; @mentions and #issue-refs that render as clickable links when the repo is a GitHub repo and styled spans otherwise; 29 curated emoji shortcodes (:wave:, :rocket:, …); and smart punctuation (curly quotes, em and en dashes, ellipsis). All inline transforms run after the code-span regex has consumed code content, so backticks stay literal for shell and regex snippets.

      The refactor that landed with the feature is as important as the feature itself: InlineMarkdown, BlockRenderer, and the new block components were pulled out of Viewer.tsx into dedicated files. Viewer dropped from 1279 to 770 lines. New block features now land in blocks/*.tsx rather than swelling Viewer further. DOMPurify's allowlist blocks on* handlers, style attrs, and scripts; sanitizeLinkUrl strips javascript:, data:, vbscript:, and file: protocols. Total bundle cost: +1.8KB gzipped.

      Copy Table as Markdown or CSV

      Every markdown table now has a small toolbar. Copy the rendered table as markdown (round-trip ready for another document) or as RFC 4180 CSV (safe to paste into a spreadsheet — commas, quotes, and newlines are escaped per the spec). Useful when a plan or annotated doc includes a comparison table you want to extract.

      Table Popout

      Tables can also pop out into a dedicated overlay for wide or dense data that doesn't fit the reader flow. The popout gives the table its own scroll container, so you can read across columns without competing with the document's own scroll position. Cycle back to the inline table when you're done.

      • All three features shipped in #597

      Code Review

      Custom Settings Per Agent

      Every review agent now has first-class model and effort/reasoning controls. Claude Review exposes --model and --effort. Codex Review exposes -m, -c model_reasoning_effort=..., and -c service_tier=fast. Code Tour supports both engines, each with its own model and effort settings. The Settings dropdowns dropped the hidden "Default" option in favor of explicit sensible defaults — Opus 4.7 / High for Claude, GPT-5.3 Codex / High for Codex, Sonnet / Medium for Tour Claude, GPT-5.3 Codex / Medium for Tour Codex.

      Codex reasoning's invalid none option is gone (codex-rs rejects it); a one- shot cookie migration rewrites existing users' none values to the default on load so nobody keeps launching a broken flag.

      Settings persistence moved to a single plannotator.agents cookie that holds the whole agent settings tree, keyed per (agent × model). Switching models reveals the effort/reasoning/fast-mode you last used with that specific model. React state is the authority; the cookie mirrors it; all mutations funnel through a single owner via functional setState, so rapid successive changes can't stale-read or lose writes.

      The job card badge now carries the full story — Claude · Opus 4.7 · High, Codex · GPT-5.3 Codex · Medium · Fast, Tour · Claude · Sonnet · Medium — and the main dropdown reads action-first: Code Review · Claude, Code Review · Codex, Code Tour.

      Code Tour

      Alongside Claude Review and Codex Review, Plannotator now ships a third review agent: Code Tour. Point it at a PR and it produces a guided walkthrough — greeting, stated intent, before/after framing, ordered stops with inline diff anchors, key takeaways, and a QA checklist — rendered in a three-page animated dialog. Similar in spirit to Cursor's and Graphite's PR walkthroughs, but wired into the same review surface you already use. Demos coming.

      The tour auto-opens when the job reaches a terminal state. Checklist state persists across dialog open/close within a review session, and pending saves are flushed on unmount with keepalive: true so closing the dialog during the 500ms debounce window never drops a tick.

      Both Claude and Codex can drive the tour. Claude streams JSONL via stdin; Codex writes to a file via --output-schema. If the model returns empty or malformed output, the job flips to failed with a clear error rather than silently 404ing the dialog. Under prefers-reduced-motion, page navigation swaps directly instead of waiting on an onAnimationEnd that would otherwise soft-lock the walkthrough behind the intro. The Claude allowlist permits gh issue view, gh api repos/*/*/issues/*, and glab issue view, so when the prompt follows a Fixes #123, the agent can actually read the linked issue.

      Under the hood, a shared createTourSession() factory owns the lifecycle — buildCommand, onJobComplete, getTour, saveChecklist — so the Bun hook server and the Pi Node server wire it up with about 25 lines of route glue each instead of the ~100 lines of duplicated provider-branch logic that the review agents used to carry. Route parity (GET /api/tour/:jobId, PUT /api/tour/:jobId/checklist) is enforced by tests across both runtimes.

      • Custom settings and Code Tour shipped in #569

      Pi

      More Flexible Planning Mode

      The Pi extension used to require a single configured plan-file path — set once via --plan-file or /plannotator-set-file and stuck with it for the session. In practice this made multi-plan workflows awkward and confused the agent when a repo already had its own plan conventions. That whole layer is gone.

      plannotator_submit_plan gained a required filePath argument, and the agent now writes its plan as a markdown file anywhere inside the working directory, passing the path at submission. Validation enforces .md or .mdx extension, rejects .. traversal and absolute paths that escape cwd, and stat-checks the file before it's read. The planning write gate allows any markdown file inside cwd, and lastSubmittedPath tracks the most recent submission so the execution phase rebuilds correctly on session resume — including after a denial. The planning system prompt suggests (but doesn't require) PLAN.md at the repo root or plans/<short-name>.md.

      Since version history in ~/.plannotator/history/{project}/{slug}/ keys off plan content (first # Heading plus date) rather than file path, free-form naming keeps version linking intact.

      Breaking changes for Pi users: the --plan-file flag, the /plannotator- set-file slash command, and the file-path argument to /plannotator have been removed. Existing workflows that relied on them need to let the agent pick the path instead.

      Claude Code

      /plannotator-last with Multiple Sessions in the Same Directory

      /plannotator-last used to pick the wrong session whenever two Claude Code sessions shared a repo. Invoked from a slash command's ! bang, Plannotator's direct parent (process.ppid) is the intermediate bash shell that the Bash tool spawned, not Claude Code itself. The old resolveSessionLogByPpid() always missed on that parent, and the mtime-based fallback picked whichever .jsonl in the project had been touched most recently — which was usually the other session.

      The fix is a four-tier resolution ladder. First, an ancestor-PID walk calls ps -o ppid= from process.ppid up to eight hops, checking ~/.claude/sessions/<pid>.json at each one; this matches the exact session deterministically. Second, a cwd-scan reads every session metadata file, filters by cwd, and picks the entry with the most recent startedAt — a better fallback than mtime when ps is unavailable. The legacy cwd-slug mtime check and ancestor directory walk remain as tiers three and four. 17 new tests cover the ladder with injectable process-tree and filesystem dependencies.

      Additional Changes

      • Prompts reference page. New /docs/reference/prompts/ page documents the three-layer message shape — system prompt owned by the CLI, user message (review prompt joined to user prompt with \n\n---\n\n), and JSON schema as a terminal constraint. Calls out that Claude and Codex review prompts are upstream-derived; only Tour's prompt is new (#569)
      • Motion library added. Code Tour's spring-driven accordions and intro composition cascade pulled in motion@12.38.0 (~30 KB gzipped) (#569)

      Install / Update

      macOS / Linux:

      curl -fsSL https://plannotator.ai/install.sh | bash
      

      Windows:

      irm https://plannotator.ai/install.ps1 | iex
      

      Claude Code Plugin: Run /plugin in Claude Code, find plannotator , and click "Update now".

      OpenCode: Clear cache and restart:

      rm -rf ~/.bun/install/cache/@plannotator
      

      Then in opencode.json:

      {
        "plugin": ["@plannotator/opencode@latest"]
      }
      

      Pi: Install or update the extension:

      pi install npm:@plannotator/pi-extension
      

      What's Changed

      • feat: Code Tour — guided PR walkthrough as a third agent provider by @backnotprop in #569
      • fix(pi): let agent submit any markdown plan file by path by @backnotprop in #595
      • feat(ui): markdown reader parity — HTML blocks, GitHub alerts, GFM inline extras by @backnotprop in #597
      • fix(session-log): walk ancestor PIDs to resolve correct session log by @elithompson in #598

      New Contributors

      Contributors

      @elithompson authored the session-log ancestor-PID walk (#598), closing a long- standing issue where /plannotator-last picked the wrong session whenever two Claude Code sessions shared a repo. First contribution to the project.

      @blimmer diagnosed and reported the session-log bug (#458) with a detailed empirical walkthrough of the process tree, which made the fix straightforward to scope.

      Full Changelog : v0.18.0...v0.19.0

    3. 🔗 Simon Willison Is Claude Code going to cost $100/month? Probably not - it's all very confusing rss

      Anthropic today quietly (as in silently, no announcement anywhere at all) updated their claude.com/pricing page (but not their Choosing a Claude plan page, which shows up first for me on Google) to add this tiny but significant detail (arrow is mine, and it's already reverted):

      Screenshot of the Claude pricing grid - Compare features across plans. Free, Pro, Max 5x and Max 20x all have the same features, with the exception of Claude Code which is on Max only and Claude Cowork which is on Pro and Max only. An arrow highlights the Claude Code for Pro cross.

      The Internet Archive copy from yesterday shows a checkbox there. Claude Code used to be a feature of the $20/month Pro plan, but according to the new pricing page it is now exclusive to the $100/month or $200/month Max plans.

      Update: don't miss the update to this post, they've already changed course a few hours after this change went live.

      So what the heck is going on? Unsurprisingly, Reddit and Hacker News and Twitter all caught fire.

      I didn't believe the screenshots myself when I first saw them - aside from the pricing grid I could find no announcement from Anthropic anywhere. Then Amol Avasare, Anthropic's Head of Growth, tweeted:

      For clarity, we're running a small test on ~2% of new prosumer signups. Existing Pro and Max subscribers aren't affected.

      And that appears to be the closest we have had to official messaging from Anthropic.

      I don't buy the "~2% of new prosumer signups" thing, since everyone I've talked to is seeing the new pricing grid and the Internet Archive has already snapped a copy. Maybe he means that they'll only be running this version of the pricing grid for a limited time which somehow adds up to "2%" of signups?

      I'm also amused to see Claude Cowork remain available on the $20/month plan, because Claude Cowork is effectively a rebranded version of Claude Code wearing a less threatening hat!

      There are a whole bunch of things that are bad about this.

      If we assume this is indeed a test, and that test comes up negative and they decide not to go ahead with it, the damage has still been extensive:

      1. A whole lot of people got scared or angry or both that a service they relied on was about to be rug-pulled. There really is a significant difference between $20/month and $100/month for most people, especially outside of higher salary countries.
      2. The uncertainty is really bad! A tweet from an employee is not the way to make an announcement like this. I wasted a solid hour of my afternoon trying to figure out what had happened here. My trust in Anthropic's transparency around pricing - a crucial factor in how I understand their products - has been shaken.
      3. Strategically, should I be taking a bet on Claude Code if I know that they might 5x the minimum price of the product?
      4. More of a personal issue, but one I care deeply about myself: I invest a great deal of effort (that's 105 posts and counting) in teaching people how to use Claude Code. I don't want to invest that effort in a product that most people cannot afford to use.

      Last month I ran a tutorial for journalists on "Coding agents for data analysis" at the annual NICAR data journalism conference. I'm not going to be teaching that audience a course that depends on a $100/month subscription!

      This also doesn't make sense to me as a strategy for Anthropic. Claude Code defined the category of coding agents. It's responsible for billions of dollars in annual revenue for Anthropic already. It has a stellar reputation, but I'm not convinced that reputation is strong enough for it to lose the $20/month trial and jump people directly to a $100/month subscription.

      OpenAI have been investing heavily in catching up to Claude Code with their Codex products. Anthropic just handed them this marketing opportunity on a plate - here's Codex engineering lead Thibault Sottiaux:

      I don't know what they are doing over there, but Codex will continue to be available both in the FREE and PLUS ($20) plans. We have the compute and efficient models to support it. For important changes, we will engage with the community well ahead of making them.

      Transparency and trust are two principles we will not break, even if it means momentarily earning less. A reminder that you vote with your subscription for the values you want to see in this world.

      I should note that I pay $200/month for Claude Max and I consider it well worth the money. I've had periods of free access in the past courtesy of Anthropic but I'm currently paying full price, and happy to do so.

      But I care about the accessibility of the tools that I work with and teach. If Codex has a free tier while Claude Code starts at $100/month I should obviously switch to Codex, because that way I can use the same tool as the people I want to teach how to use coding agents.

      Here's what I think happened. I think Anthropic are trying to optimize revenue growth - obviously - and someone pitched making Claude Code only available for Max and higher. That's clearly a bad idea, but "testing" culture says that it's worth putting even bad ideas out to test just in case they surprise you.

      So they started a test, without taking into account the wailing and gnashing of teeth that would result when their test was noticed - or accounting for the longer-term brand damage that would be caused.

      Or maybe they did account for that, and decided it was worth the risk.

      I don't think that calculation was worthwhile. They're going to have to make a very firm commitment along the lines of "we heard your feedback and we commit to keeping Claude Code available on our $20/month plan going forward" to regain my trust.

      As it stands, Codex is looking like a much safer bet for me to invest my time in learning and building educational materials around.

      Update: they've reversed it already

      In the time I was typing this blog entry Anthropic appear to have reversed course - the claude.com/pricing page now has a checkbox back in the Pro column for Claude Code. I can't find any official communication about it though.

      Let's see if they can come up with an explanation/apology that's convincing enough to offset the trust bonfire from this afternoon!

      Update 2: it may still affect 2% of signups?

      Amol on Twitter:

      was a mistake that the logged-out landing page and docs were updated for this test [embedded self-tweet]

      Getting lots of questions on why the landing page / docs were updated if only 2% of new signups were affected.

      This was understandably confusing for the 98% of folks not part of the experiment, and we've reverted both the landing page and docs changes.

      So the experiment is still running, just not visible to the rest of the world?

      You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.

    4. 🔗 HexRaysSA/plugin-repository commits sync repo: +1 plugin, +2 releases rss
      sync repo: +1 plugin, +2 releases
      
      ## New plugins
      - [threatray](https://github.com/threatray/plugin-ida) (3.0.0)
      
      ## New releases
      - [ida-search](https://github.com/milankovo/ida-search): 0.2.2
      
    5. 🔗 badlogic/pi-mono v0.68.1 release

      New Features

      Added

      • Added built-in Fireworks provider support, including FIREWORKS_API_KEY setup/docs and the default Fireworks model accounts/fireworks/models/kimi-k2p6 (#3519)

      Fixed

      • Fixed interactive inline tool images to honor configurable terminal.imageWidthCells via /settings, so tool-output images are no longer hard-capped to 60 terminal cells (#3508)
      • Fixed sessionDir in settings.json to expand ~, so portable session-directory settings no longer require a shell wrapper (#3514)
      • Fixed parallel tool-call rows to leave the pending state as soon as each tool is finalized, while still appending persisted tool results in assistant source order (#3503)
      • Fixed exported session markdown to render Markdown while showing HTML-like message content such as <file name="...">...</file> verbatim, so shared sessions match the TUI instead of letting the browser interpret message text (#3484)
      • Fixed exported session HTML to render grep and find output through their existing TUI renderers and ls output through a native template renderer, avoiding missing formatting and spacing artifacts in shared sessions (#3491 by @aliou)
      • Fixed @ autocomplete fuzzy search to follow symlinked directories and include symlinked paths in results (#3507)
      • Fixed proxied agent streams to preserve the proxy-safe serializable subset of stream options, including session, transport, retry-delay, metadata, header, cache-retention, and thinking-budget settings (#3512)
      • Hardened Anthropic streaming against malformed tool-call JSON by owning SSE parsing with defensive JSON repair, replacing the deprecated fine-grained-tool-streaming beta header with per-tool eager_input_streaming, and updating stale test model references (#3175)
      • Fixed Bedrock runtime endpoint resolution to stop pinning built-in regional endpoints over AWS_REGION / AWS_PROFILE, restoring us.* and eu.* inference profile support after v0.68.0 while preserving custom VPC/proxy endpoint overrides (#3481, #3485, #3486, #3487, #3488)
    6. 🔗 anthropics/claude-code v2.1.117 release

      What's changed

      • Forked subagents can now be enabled on external builds by setting CLAUDE_CODE_FORK_SUBAGENT=1
      • Agent frontmatter mcpServers are now loaded for main-thread agent sessions via --agent
      • Improved /model: selections now persist across restarts even when the project pins a different model, and the startup header shows when the active model comes from a project or managed-settings pin
      • The /resume command now offers to summarize stale, large sessions before re-reading them, matching the existing --resume behavior
      • Faster startup when both local and claude.ai MCP servers are configured (concurrent connect now default)
      • plugin install on an already-installed plugin now installs any missing dependencies instead of stopping at "already installed"
      • Plugin dependency errors now say "not installed" with an install hint, and claude plugin marketplace add now auto-resolves missing dependencies from configured marketplaces
      • Managed-settings blockedMarketplaces and strictKnownMarketplaces are now enforced on plugin install, update, refresh, and autoupdate
      • Advisor Tool (experimental): dialog now carries an "experimental" label, learn-more link, and startup notification when enabled; sessions no longer get stuck with "Advisor tool result content could not be processed" errors on every prompt and /compact
      • The cleanupPeriodDays retention sweep now also covers ~/.claude/tasks/, ~/.claude/shell-snapshots/, and ~/.claude/backups/
      • OpenTelemetry: user_prompt events now include command_name and command_source for slash commands; cost.usage, token.usage, api_request, and api_error now include an effort attribute when the model supports effort levels. Custom/MCP command names are redacted unless OTEL_LOG_TOOL_DETAILS=1 is set
      • Native builds on macOS and Linux: the Glob and Grep tools are replaced by embedded bfs and ugrep available through the Bash tool — faster searches without a separate tool round-trip (Windows and npm-installed builds unchanged)
      • Windows: cached where.exe executable lookups per process for faster subprocess launches
      • Default effort for Pro/Max subscribers on Opus 4.6 and Sonnet 4.6 is now high (was medium)
      • Fixed Plain-CLI OAuth sessions dying with "Please run /login" when the access token expires mid-session — the token is now refreshed reactively on 401
      • Fixed WebFetch hanging on very large HTML pages by truncating input before HTML-to-markdown conversion
      • Fixed a crash when a proxy returns HTTP 204 No Content — now surfaces a clear error instead of a TypeError
      • Fixed /login having no effect when launched with CLAUDE_CODE_OAUTH_TOKEN env var and that token expires
      • Fixed prompt-input undo (Ctrl+_) doing nothing immediately after typing, and skipping a state on each undo step
      • Fixed NO_PROXY not being respected for remote API requests when running under Bun
      • Fixed rare spurious escape/return triggers when key names arrive as coalesced text over slow connections
      • Fixed SDK reload_plugins reconnecting all user MCP servers serially
      • Fixed Bedrock application-inference-profile requests failing with 400 when backed by Opus 4.7 with thinking disabled
      • Fixed MCP elicitation/create requests auto-cancelling in print/SDK mode when the server finishes connecting mid-turn
      • Fixed subagents running a different model than the main agent incorrectly flagging file reads with a malware warning
      • Fixed idle re-render loop when background tasks are present, reducing memory growth on Linux
      • [VSCode] Fixed "Manage Plugins" panel breaking when multiple large marketplaces are configured
      • Fixed Opus 4.7 sessions showing inflated /context percentages and autocompacting too early — Claude Code was computing against a 200K context window instead of Opus 4.7's native 1M
  2. April 21, 2026
    1. 🔗 r/reverseengineering Reversing The Gentlemen ransomware (Go/Garble) — ephemeral X25519 keys persist in go routine stacks, enabling full decryption. rss
    2. 🔗 r/LocalLLaMA Claude Code removed from Claude Pro plan - better time than ever to switch to Local Models. rss

      Claude Code removed from Claude Pro plan - better time than ever to switch to Local Models. | Time to switch to Kimi k2.6 guys if you haven't already. For $20 a month you can buy the OpenCode Go coding plan (its actually $5 for the first month then $10) which gives you many more tokens on models like Kimi K2.6, and then you can pay for the rest of the usage. So for $20 a month of tokens of Kimi K2.6 you're basically getting the equivalent amount of tokens of the $100 plan. You can also use Qwen 3.6 35B A3B, which you can run on your local PC (as long as you have a decent graphics card). submitted by /u/bigboyparpa
      [link] [comments]
      ---|---

    3. 🔗 Simon Willison Where's the raccoon with the ham radio? (ChatGPT Images 2.0) rss

      OpenAI released ChatGPT Images 2.0 today, their latest image generation model. On the livestream Sam Altman said that the leap from gpt-image-1 to gpt-image-2 was equivalent to jumping from GPT-3 to GPT-5. Here's how I put it to the test.

      My prompt:

      Do a where's Waldo style image but it's where is the raccoon holding a ham radio

      gpt-image-1

      First as a baseline here's what I got from the older gpt-image-1 using ChatGPT directly:

      There's a lot going on, but I couldn't find a raccoon.

      I wasn't able to spot the raccoon - I quickly realized that testing image generation models on Where's Waldo style images (Where's Wally in the UK) can be pretty frustrating!

      I tried getting Claude Opus 4.7 with its new higher resolution inputs to solve it but it was convinced there was a raccoon it couldn't find thanks to the instruction card at the top left of the image:

      Yes — there's at least one raccoon in the picture, but it's very well hidden. In my careful sweep through zoomed-in sections, honestly, I couldn't definitively spot a raccoon holding a ham radio. [...]

      Nano Banana 2 and Pro

      Next I tried Google's Nano Banana 2, via Gemini:

      Busy Where's Waldo-style illustration of a park festival with crowds of people, tents labeled "FOOD & DRINK", "CRAFT FAIR", "BOOK NOOK", "MUSIC FEST", and "AMATEUR RADIO CLUB - W6HAM" (featuring a raccoon in a red hat at the radio table), plus a Ferris wheel, carousel, gazebo with band, pond with boats, fountain, food trucks, and striped circus tents

      That one was pretty obvious, the raccoon is in the "Amateur Radio Club" booth in the center of the image!

      Claude said:

      Honestly, this one wasn't really hiding — he's the star of the booth. Feels like the illustrator took pity on us after that last impossible scene. The little "W6HAM" callsign pun on the booth sign is a nice touch too.

      I also tried Nano Banana Pro in AI Studio and got this, by far the worst result from any model. Not sure what went wrong here!

      The raccoon is larger than everyone else, right in the middle of the image with an ugly white border around it.

      gpt-image-2

      With the baseline established, let's try out the new model.

      I used an updated version of my openai_image.py script, which is a thin wrapper around the OpenAI Python client library. Their client library hasn't yet been updated to include gpt-image-2 but thankfully it doesn't validate the model ID so you can use it anyway.

      Here's how I ran that:

      OPENAI_API_KEY="$(llm keys get openai)" \
        uv run https://tools.simonwillison.net/python/openai_image.py \
        -m gpt-image-2 \
        "Do a where's Waldo style image but it's where is the raccoon holding a ham radio"

      Here's what I got back. I don't think there's a raccoon in there - I couldn't spot one, and neither could Claude.

      Lots of stuff, a ham radio booth, many many people, a lake, but maybe no raccoon?

      The OpenAI image generation cookbook has been updated with notes on gpt-image-2, including the outputQuality setting and available sizes.

      I tried setting outputQuality to high and the dimensions to 3840x2160 - I believe that's the maximum - and got this - a 17MB PNG which I converted to a 5MB WEBP:

      OPENAI_API_KEY="$(llm keys get openai)" \
        uv run 'https://raw.githubusercontent.com/simonw/tools/refs/heads/main/python/openai_image.py' \
        -m gpt-image-2 "Do a where's Waldo style image but it's where is the raccoon holding a ham radio" \
        --quality high --size 3840x2160

      Big complex image, lots of detail, good wording, there is indeed a raccoon with a ham radio.

      That's pretty great! There's a raccoon with a ham radio in there (bottom left, quite easy to spot).

      The image used 13,342 output tokens, which are charged at $30/million so a total cost of around 40 cents.

      Takeaways

      I think this new ChatGPT image generation model takes the crown from Gemini, at least for the moment.

      Where's Waldo style images are an infuriating and somewhat foolish way to test these models, but they do help illustrate how good they are getting at complex illustrations combining both text and details.

      Update: asking models to solve this is risky

      rizaco on Hacker News asked ChatGPT to draw a red circle around the raccoon in one of the images in which I had failed to find one. Here's an animated mix of their result and the original image:

      The circle appears around a raccoon with a ham radio who is definitely not there in the original image!

      Looks like we definitely can't trust these models to usefully solve their own puzzles!

      You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.

    4. 🔗 @binaryninja@infosec.exchange A lot of practical UI work landed in Binary Ninja 5.3. We replaced the old mastodon

      A lot of practical UI work landed in Binary Ninja 5.3. We replaced the old MachO slice selection flow with a dedicated picker, expanded Container Browser coverage across a wide range of container formats, and significantly extended command palette behavior. https://binary.ninja/2026/04/13/binary- ninja-5.3-jotunheim.html#ui

    5. 🔗 r/york The best decision I’ve ever made was moving to York rss
    6. 🔗 r/york York Mosque Community Kitchen | THURSDAY 23 APRIL 12:00 - 13:30. rss

      York Mosque Community Kitchen | THURSDAY 23 APRIL 12:00 - 13:30. | submitted by /u/LittleForm3711
      [link] [comments]
      ---|---

    7. 🔗 r/Leeds The Prodigy tomorrow rss

      So this is a long shot however tomorrow night I’m off to see Prodigy and Carl Cox at First Direct Arena, I’m a gig veteran however tomorrow I’m flying solo without my usual gig friend and as quite an anxious sorta dude I wondered if there was any other peeps going alone that may or may not want to gather up and share the experience?

      Not sure if this is allowed here, if not please remove. But thought I’d take my chances to not be the only loner there!

      I realise this post might seem a little sad lol but why be alone if there is others going solo tomorrow? :D

      submitted by /u/ToyMachibe
      [link] [comments]

    8. 🔗 r/Harrogate Price of Wales Rdbout - Roadworks Again! rss

      I seriously think somebody is running a social experiment with the town now. I may be mistaken but I think it’s the third time it’s being dug up in last six months, following being totally resurfaced early last year.

      This is directly after brining Leeds Road to a standstill for the last three weeks.

      It’s either actively looking to reduce pressures on housing by putting people off of living here, wanting to reduce tourist numbers, or Occams Razor telling me they simply don’t have anywhere else to store the cones, portaloos and fencing.

      submitted by /u/Similar-Actuator-338
      [link] [comments]

    9. 🔗 r/Yorkshire Flamborough rss

      Flamborough | Few photos submitted by /u/Embarrassed-Air7202
      [link] [comments]
      ---|---

    10. 🔗 r/reverseengineering ida-mcp 2.2: From Tool Calls to Analysis Scripts rss
    11. 🔗 r/Yorkshire My mother in law, 90 today. rss
    12. 🔗 r/Yorkshire One of my favourite views of Richmond. rss
    13. 🔗 r/Leeds Best Chinese in LS17? rss

      It's my birthday today and I would like to partake in a succulent Chinese meal.

      submitted by /u/Row_Echelon_Form
      [link] [comments]

    14. 🔗 r/Yorkshire Stokesley is my base and getting out in the fields around home is great rss
    15. 🔗 r/wiesbaden Henkell 0,0% Vinothon mit Start in Rüdesheim am 25.04. rss
    16. 🔗 r/york Want to give music a real go rss

      Hello, I am a 24 M and want to really give my passion for singing and song writing a real go. Ive been writing music and singing for over 4 years now but recently found what genre I want to go into. The vibe I love is Phoebe Bridgers, Noah Kahan, Novo Amor and Bon Iver (ik a little sad). Is there anyone in the area that would maybe like to collab or help me produce some of the songs I have in the works?

      Thanks

      submitted by /u/Careless_Regret9883
      [link] [comments]

    17. 🔗 r/york F33 lf girl friends near York/Selby rss

      Looking for new girl friends near York and Selby area, I've lived here for a couple of years but never managed to get out or make any nearby friends!

      I'm really into gaming, gachas, anime and would love some like-minded friends to talk to or eventually meet up with.

      submitted by /u/SaintSixx
      [link] [comments]

    18. 🔗 r/Harrogate Improv Jam Session - Tonight rss

      Improv Jam Session - Tonight | Hi All, I run improv comedy sessions every couple of weeks in Harrogate. Our next one is next tonight. They are very low pressure, we do some easy group warm ups, followed by games and exercises. Our current sessions are aimed at beginners and improvers so there has never been a better time to try it out. If you have any questions let me know. As a bonus for first time joiners your first session is free. Thanks. submitted by /u/GritstoneBoulderer
      [link] [comments]
      ---|---

    19. 🔗 r/york York woman, 86, convicted after car insurance typo rss
    20. 🔗 r/Harrogate Second hand furniture rss

      Moving to the area soon, and wondering where is best to look for second hand furniture, if there's any big stores or anything.

      Looking for things like dining set , shelves, drawers, lamps etc.

      I've only had a quick look in St Michael's on Ripon road so far but didn't find much there.

      submitted by /u/brich0910
      [link] [comments]

    21. 🔗 r/wiesbaden Cafés zum Lernen rss

      Hallo zusammen,

      gibt es gute Cafés zum Lernen in der Innenstadt? Bibliotheken sind für mich eher raus, weil ich nebenbei gerne was essen/ trinken/ snacken möchte oder wenn wir zu zweit lernen, uns auch mal unterhalten wollen. Hab gehört das Café im Hugendubel soll gut sein, aber sind Lernende dort auch willkommen? Beim Coffee Fellows wurden welche wohl schon blöd angemacht, wenn man mit Laptop länger dort saß.

      Freue mich auf eure Tipps!

      submitted by /u/Hour_Inspector8601
      [link] [comments]

    22. 🔗 r/LocalLLaMA Unpopular opinion: OpenClaw and all its clones are almost useless tools for those who know what they're doing. It's kind of impressive for someone who has never used a CLI, Claude Code, Codex, etc. Nor used any workflow tool like 8n8 or make. rss

      It seems to me that OpenClaw and all its clones are almost useless tools for those who know what they're doing.

      It's kind of impressive for someone who has never used a CLI, Claude Code, Codex, etc. Nor used any workflow tool like 8n8 or make.

      For these people, asking an AI to create a program or a new tool with a prompt must seem like magic. For those who already use it, it seems like something that simplified the old ones but made them much more chaotic and unsafe.

      The only good thing about it is that it made more "ordinary" people interested in these agentic tools. Sending messages via Telegram is much more user- friendly.

      submitted by /u/pacmanpill
      [link] [comments]

    23. 🔗 r/york Recycling - only took cardboard? rss

      Morning all, just wondering if anyone else has had a situation either today or previously where their cardboard recycling was taken, but they've left the plastic, glass and tin? This is the whole street, not just us. We're Heworth area.

      (apologies this is a bit of a Facebook type of post, but I try to stay away from that nonsense platform. Don't want to get brainwashed into voting reform.)

      submitted by /u/Educational-Ground83
      [link] [comments]

    24. 🔗 r/LocalLLaMA Every time a new model comes out, the old one is obsolete of course rss
    25. 🔗 r/wiesbaden Brettspiel Mitspieler gesucht rss

      Ich bin großer Brettspielfanatiker und lade regelmäßig Freunde zu mir ein. Da aber selten größere Spiele oder gar Kampagnen auf den Tisch kommen, da es manchen oft zu komplex ist, suche ich nach Leuten (gerne in den 20ern) die Bock auf solche Spiele haben!

      Ein paar Beispiele: Pandemic Legacy, Scythe, Descent, Ankh, Nemesis, Gaia Project

      submitted by /u/vivienskt
      [link] [comments]

    26. 🔗 r/LocalLLaMA Kimi K2.6 is a legit Opus 4.7 replacement rss

      After testing it and getting some customer feedback too, its the first model I'd confidently recommend to our customers as an Opus 4.7 replacement.

      It's not really better than Opus 4.7 at anything, but, it can do about 85% of the tasks that Opus can at a reasonable quality, and, it has vision and very good browser use.

      I've been slowly replacing some of my personal workflows with Kimi K2.6 and it works surprisingly well, especially for long time horizon tasks.

      Sure the model is monstrously big, but I think it shows that frontier LLMs like Opus 4.7 are not necessarily bringing anything new to the table. People are complaining about usage limits as well, it looks like local is the way to go.

      submitted by /u/bigboyparpa
      [link] [comments]

    27. 🔗 r/reverseengineering Detect It Easy 3.20 Program for determining types of files for Windows, Linux and MacOS. rss
    28. 🔗 Drew DeVault's blog Addressing the harassment rss

      Kiwi Farms is a web forum that facilitates the discussion and harassment of online figures and communities. Their targets are often subject to organized group trolling and stalking, as well as doxing and real-life harassment. Kiwi Farms has been tied to the suicides of three people who were victims of harassment by the website.

      Wikipedia: Kiwi Farms


      About three years ago, a thread on Kiwi Farms was opened about me. In the years since, it has grown to about 1,200 posts full of bigots responding to anything and everything I do online with scorn, slurs, and overt bigotry. The thread is full of resources to facilitate harassment, including, among other things, all of my social media profiles, past and present, a history of my residential addresses, my phone numbers, details about my family members, a list of my usernames and password hashes from every leaked database of websites I have accounts on, and so on. Most of my articles or social media posts are archived on Kiwi Farms and then subjected to the most bigoted rebuttals you can imagine. Honestly, it’s mostly just… pathetic. But it’s a problem when it escapes containment, and it’s designed to.

      Kiwi Farms is the most organized corner of the harassment which comes my way, but it comes in many forms. On Mastodon, for example, before I deleted my account I would often receive death threats, or graphic images and videos of violence against minorities. I have received a lot of hate and death threats over email, too, several of which I confess that I took some pleasure in forwarding to the sender’s employer.

      One of the motivations for this harassment is to “milk” me for “drama”. The idea is to get my hackles up, make me fearful for my safety, and alienate me from my communities, with the hope that it will trigger an entertaining meltdown. Maybe people respond poorly to this kind of harassment – that’s the idea, really – and it often makes the situation worse. Responding to it can legitimize the abuse, elevate it into the discourse, draw more attention to it, and stoke the flames. It can make the victim look bad when they respond emotionally to harassment designed to evoke negative emotions. I have left it unaddressed for a long time in order to subvert this goal, and address it now with a cool head in a relatively quiet period in the harassment campaign.

      The harassment waxes and wanes over time, usually picking up whenever I write a progressive blog post that gets some reach. It really took off after a series of incidents in which I called for the Hyprland community and its maintainers to be held to account for the bigotry and harassment on their Discord server (1, 2) and when I spoke out against Richard Stallman’s prolific and problematic public statements regarding the sexual abuse of minors (3).

      The abuse crescendoed in October of 2024, when I was involved in editing The Stallman Report. The report is a comprehensive analysis of Richard Stallman’s problematic political discourse regarding sexual harassment, sexual assault, and the sexual abuse of minors, and it depends almost entirely on primary sources – quotes from Stallman’s website which remain online and have not been retracted to this day. The purpose of the report was to make a clear and unassailable case for Stallman’s removal from positions of power, make specific recommendations to address the underlying problems, and to stimulate a period of reflection and reform in the FOSS community. It didn’t achieve much, in the end: the retaliation from Stallman’s defenders was fiercer and more devoted than the support from those who saw the report’s sense.

      Myself and the other authors asserted our moral rights to publish anonymously, motivated by our wish to reduce our exposure to the exact sort of harassment I’ve been subjected to over the years. However, I was careless in my opsec during the editing process, and it was possible to plausibly link me to the report as a result, leading to a sharp increase in harassment.


      This brings me to a retaliatory, defamatory “report” published about me in the style of the Stallman Report.1 This report is, essentially, a distillation of the Kiwi Farms thread on me, sanitized of overt bigotry and presented in a readily linkable form in order to stalk me around the internet and enable harassment. It’s used to discredit anything I do online and push for my exclusion from online communities, by dropping the link on Hacker News, Reddit, GitHub or Codeberg issues, etc, anywhere myself or my work is mentioned, or used to discredit the Stallman Report by discrediting one of its unmasked authors.2

      The report is pretty obviously written in bad faith and relies on a lot of poor arguments to make the case that I’m a misogynist and a pedophile, charges I deny. It also accuses me of being a hypocrite, which I acknowledge in general terms, because, well, who isn’t. The key thing I want people who encounter this report to keep in mind is that this is the “polite” face of an organized harassment campaign.

      Most reasonable readers easily dismiss the report because it is rather transparent in its bad faith. However, someone who reads it in good faith, just trying to do their due diligence, might come away from it with some reasonable concerns. Consider the following quote from my long-deleted Reddit account, /u/sircmpwn:

      I’m of the opinion that 14 year old girls should be required to have an IUD installed. Ten years of contraception that requires a visit to the doctor to remove prematurely.

      This comment was written 13 years ago, and I don’t stand by what I wrote. I was 19 at the time, and I was a moron. My mother had me when she was 23 years old, and the abuse I suffered at her hands during my childhood was severe, and I generalized this experience to all women. When I wrote this comment, I was one year removed from the abuse, living alone and in poverty, and early in a life-long process of coming to terms with the abuse and figuring out how to be a well-adjusted adult after 18 long years of abuse and isolation.

      But an explanation is not an excuse. This comment was reprehensible, as were many of the awful ideas I held at the time. Many years later, I can recognize that this comment is misogynistic, denies the agency of children and women over their own bodies, disparages the many, many mothers who do a wonderful job raising children in difficult circumstances, and is based in argumentation which can reasonably be related to eugenics. This comment was just awful – there’s a reason this was deleted. I apologize to anyone who read it at the time, or comes across it now, and is justifiably insulted.

      I don’t feel that it’s necessary to rebuke most of the report. But, there is a grain of truth in the report, the grain of truth that led me to retract my shitty Reddit comments and reflect on myself, and that grain of truth is this: in early adulthood, I was a huge asshole.


      I have had more than my fair share of harmful ignorance, bad takes, sexism and misogyny, transphobic and homophobic beliefs, and worse. Moreover, I have verbally abused many people and made many of my own arguments in bad faith to support bad conclusions. Some of the people who read this will recall having found themselves at the wrong end of my verbal abuse and harassment.

      It’s important for me to take responsibility for this period of my life, and in dismissing bad faith criticisms of myself to carefully avoid dismissing good faith criticisms in the same fell swoop.

      I’m not really sure how to deal with this part of my life appropriately. I have apologized to a few people individually, but it’s not a scalable solution and with many people I have no business re-opening wounds to salve my own conscience. I can offer a general apology, and I will. I’ve never found the right moment to say it, but now will do: I apologise, sincerely, to everyone who I have harmed with verbal abuse and with hateful and problematic rhetoric. If you have had a bad experience or experiences with me, and there’s anything you want from me that can help you heal from that experience – a personal apology, for example – please reach out to me and ask.

      That said, apologies alone aren’t enough. I believe in restorative justice, in growing and mending wounds and repairing harm done, and I set myself seriously to this task over many years. I have gone to therapy, spoken with close friends about it, and taken structural action as well: I have founded support groups and worked one-on-one with many of the people whose politics and behavior I object to. I want an amicable end to bigotry and bullying, for bigots and bullies like my former self to look forward to, to provide a path that doesn’t require them to double down. It’s not easy, and not everyone manages, but I have to look at myself and see the path I’ve taken and imagine that it’s possible, because what’s left for the likes of me if not?

      This part of my past brings me a great deal of shame, and that shame motivates me to grow as a person. In a certain sense, it is an ironic, cruel privilege to have had so much cause to reflect on myself, to drive me to question myself and my ideas, and become a much better person with much more defensible ideas. It has driven me to study feminism, social justice, racial justice, intersectionality, LGBTQ theory, antifascism, and to find the intersections in my own life and strive to act out of a more legitimate sense of justice.

      I’m often still a firebrand, but I’ve chosen much better hills to die on. My passion is invested in making a more just world, building safe and healthy communities, elevating my peers, and calling for justice and a just society. I have taken the lessons I have learned and tried to share them with other people, and to stand up for what I can now say I know is right, both online and in real life. Through a process of learning, reflection, and humility, I acknowledge that I have done a lot harm in my youth. To repair this harm, I have committed myself to doing more than enough good now to make sure that the world is a better place when all is said and done. That’s what justice means to me when I turn my principles inwards and hold myself accountable.


      So where do we go from here?

      The response to my progressive beliefs and activism is reactionary backlash, doxing, harassment, and death threats targeting me and my family, all of which is likely to escalate in response to this post, and none of which is defensible. On the other hand, I understand that the consequences for my own reactionary past is, in some cases, alienation – and, honestly, fair enough.

      But I don’t want you to confuse my honest faults with the defamation and harassment I endure for standing up for my honest strengths. If you feel generous and optimistic about who I am today, and you recognize my growth, and wish for an ally in the fight for what’s right, your good faith and solidarity mean the world to me. I would appreciate it if you would express your support and rebuke harassment when you see it, and help keep me honest as I continue a life-long process of learning and growth.

      If I’ve hurt you, and you want to seek reconciliation, I make myself available to you for that purpose. If I’ve hurt you, and you simply don’t care to be hurt again, I’m sorry – I understand where you’re coming from, and have made my peace with it.

      Please send words of support and/or death threats to drew@ddevault.org.

      Thank you.

    29. 🔗 Baby Steps Symposium: community-oriented agentic development rss

      I'm very excited to announce the first release of the Symposium project as well as its inclusion in the Rust Foundation's Innovation Lab. Symposium’s goal is to let everyone in the Rust community participate in making agentic development better. The core idea is that crate authors should be able to vend skills, MCP servers, and other extensions, in addition to code. The Symposium tool then installs those extensions automatically based on your dependencies. After all, who knows how to use a crate better than the people who maintain it?

      If you want to read more details about how Symposium works, I refer you to the announcement post from Jack Huey on the main Symposium blog. This post is my companion post, and it is focused on something more personal - the reasons that I am working on Symposium.

      I believe in extensibility everywhere

      The short version is that I believe in extensibility everywhere. Right now, the Rust language does a decent job of being extensible: you can write Rust crates that offer new capabilities that feel built-in, thanks to proc- macros, traits, and ownership. But we're just getting started at offering extensibility in other tools, and I want us to hurry up!

      I want crate authors to be able to supply custom diagnostics. I want them to be able to supply custom lints. I want them to be able to supply custom optimizations. I want them to be able to supply custom IDE refactorings. And, as soon as I started messing around with agentic development, I wanted extensibility there too.

      Symposium puts crate authors in charge

      The goal of Symposium is to give crate authors, and the broader Rust community, the ability to directly influence the experience of people writing Rust code with agents. Rust is a really popular target language for agents because the type system provides strong guardrails and it generates efficient code - and I predict it's only going to become more popular.

      Despite Rust's popularity as an agentic coding target, the Rust community right now are basically bystanders when it comes to the experience of people writing Rust with agents; I want us to have a means of influencing it directly.

      Enter Symposium. With Symposium, Crate authors can package up skills etc and then Symposium will automatically make them available for your agent. Symposium also takes care of bridging the small-but-very-real gaps between agents (e.g., each has their own hook format, and some of them use .agents/skills and some use .claude/skills, etc).

      Example: the assert-struct crate

      Let me give you an example. Consider the assert- truct crate, recently created by Carl Lerche. assert-struct lets you write convenient assertions that test the values of specific struct fields:

      assert_struct!(val, _ {
          items: [1, 2, ..],
          tags: #("a", "b", ..),
          ..
      });
      

      The problem: agents don't know about it

      This crate is neat, but of course, no models are going to know how to use it - it's not part of their training set. They can figure it out by reading the docs, but that's going to burn more tokens (expensive, slow, consumes carbon), so that's not a great idea.

      You could teach the agent how to use it…

      In practice what people do today is to add skills to their project - for example, in his toasty crate, Carl has a testing skill that also shows how to use assert-struct. But it seems silly for everybody who uses the crate to repeat that content.

      …but wouldn't it be better the crate could teach the agent itself?

      With Symposium, teaching your agent how to use your dependencies should not be necessary. Instead, your crates can publish their own skills or other extensions.

      The way this works is that the assert-struct crate defines the skill once, centrally, in its own repository1. Then there is a separate file in Symposium's central recommendations repository with a pointer to the assert-struct repository. Any time that the assert-struct repository updates that skill, the updates are automatically synchronized for you. Neat! (You can also embed skills directly in the rr repository, but then updating them requires a PR to that repo.)

      Frequently asked questions

      How do I add support for my crate to Symposium?

      It's easy! Check out the docs here:

      https://symposium.dev/crate-authors/supporting-your-crate.html

      What kind of extensions does Symposium support?

      Skills, hooks, and MCP Servers, for now.

      Why does Symposium have a centralized repository?

      Currently we allow skill content to be defined in a decentralized fashion but we require that a plugin be added to our central recommendations repository. This is a temporary limitation. We eventually expect to allow crate authors to adds skills and plugins in a fully decentralized fashion.

      We chose to limit ourselves to a centralized repository early on for three reasons:

      • Even when decentralized support exists, a centralized repository will be useful, since there will always be crates that choose not to provide that support.
      • Having a central list of plugins will make it easy to update people as we evolve Symposium.
      • Having a centralized repository will help protect against malicious skills[^threat] while we look for other mechanisms, since we can vet the crates that are added and easily scan their content.

      What if I want to add skills for crates private to my company? I don't

      want to put those in the central repository!

      No problem, you can add a custom plugin source.

      Are you aware of the negative externalities of LLMs?

      I am, very much so. I feel like a lot of the uses of LLMs we see today are not great (e.g., chat bots hijack conversational and social cues to earn trust that they don't deserve) and to reconfirm peoples' biases instead of challenging their ideas. And I'm worried about the environmental cost of data centers and the way companies have retreated from their climate goals. And I don't like how centralized models concentrate economic power.2 So yeah, I see all that. And I also see how LLMs enable people to build things that they couldn't build before and help to make previously intractable problems soluble - and that includes more and more people who never thought of themselves as programmers3. My goal with Symposium and other projects is to be part of the solution, finding ways to leverage LLMs that are net positive: opening doors, not closing them.

      Extensibility: because everybody has something to offer

      Fundamentally, the reason I am working on Symposium is that I believe everybody has something unique to offer. I see the appeal of strongly opinionated systems that reflect the brilliant vision of a particular person. But to me, the most beautiful systems are the ones that everybody gets to build together4. This is why I love open source. This is why I love emacs5. It's why I love VSCode's extension system, which has so many great gems6.

      To me, Symposium is a double win in terms of empowerment. First, it makes agents extensible, which is going to give crate authors more power to support their crates. But it also helps make agentic programming better, which I believe will ultimately open up programming to a lot more people. And that is what it's all about.


      1. Actually as of this posting, the assert-struct skill is embedded directly in the recommendations repo. But I opened a PR to put it on assert-struct and I'll port it over once it lands. ↩︎

      2. I'm very curious to do more with open models. ↩︎

      3. Within Amazon, it's been amazing to watch how many people who never thought of themselves as software developers are starting to build software. Considering the challenges the software industry has with representation, I find this very encouraging. Diverse teams are stronger, better teams! ↩︎

      4. None of this is to say I don't believe in good defaults; there's a reason I use Zed and VSCode these days, and not emacs, much as I love it in concept. ↩︎

      5. OMG. One of my friends college wrote this amazing essay some time back on emacs. Next time you're doomscrolling on the toilet or whatever, pop over to this essay instead. Fair warning, it's long, so it'll take you a while to read, but I think it nails what people love about emacs. ↩︎

      6. These days I'm really enjoying Zed, but I have to say, I really miss kahole/edamagit! Which of course is inspired by the magit emacs package. ↩︎

  3. April 20, 2026
    1. 🔗 IDA Plugin Updates IDA Plugin Updates on 2026-04-20 rss

      IDA Plugin Updates on 2026-04-20

      Activity:

      • ida-chat-plugin
        • 883f9b35: Merge pull request #3 from joaquimbc/windows-cli
      • IDAPluginList
        • 3dcf0a61: chore: Auto update IDA plugins (Updated: 19, Cloned: 0, Failed: 0)
      • python-elpida_core.py
        • c4e01069: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T23:45Z
        • 563e2ac9: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T23:26Z
        • 2c045956: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T23:06Z
        • 9eb03804: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T22:47Z
        • db2376e6: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T22:27Z
        • 0bf53343: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T22:07Z
        • 506143ab: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T21:49Z
        • 60091540: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T21:28Z
        • e659ef33: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-20T21:06Z
    2. 🔗 r/reverseengineering Wrote a Linux rootkit (DKOM, eBPF bypass) and a detector to find it — sharing both rss
    3. 🔗 r/york The perks of being a local - spontaneous trips in to create art! rss

      The perks of being a local - spontaneous trips in to create art! | submitted by /u/GalacticGoose1
      [link] [comments]
      ---|---

    4. 🔗 anthropics/claude-code v2.1.116 release

      What's changed

      • /resume on large sessions is significantly faster (up to 67% on 40MB+ sessions) and handles sessions with many dead-fork entries more efficiently
      • Faster MCP startup when multiple stdio servers are configured; resources/templates/list is now deferred to first @-mention
      • Smoother fullscreen scrolling in VS Code, Cursor, and Windsurf terminals — /terminal-setup now configures the editor's scroll sensitivity
      • Thinking spinner now shows progress inline ("still thinking", "thinking more", "almost done thinking"), replacing the separate hint row
      • /config search now matches option values (e.g. searching "vim" finds the Editor mode setting)
      • /doctor can now be opened while Claude is responding, without waiting for the current turn to finish
      • /reload-plugins and background plugin auto-update now auto-install missing plugin dependencies from marketplaces you've already added
      • Bash tool now surfaces a hint when gh commands hit GitHub's API rate limit, so agents can back off instead of retrying
      • The Usage tab in Settings now shows your 5-hour and weekly usage immediately and no longer fails when the usage endpoint is rate-limited
      • Agent frontmatter hooks: now fire when running as a main-thread agent via --agent
      • Slash command menu now shows "No commands match" when your filter has zero results, instead of disappearing
      • Security: sandbox auto-allow no longer bypasses the dangerous-path safety check for rm/rmdir targeting /, $HOME, or other critical system directories
      • Fixed Devanagari and other Indic scripts rendering with broken column alignment in the terminal UI
      • Fixed Ctrl+- not triggering undo in terminals using the Kitty keyboard protocol (iTerm2, Ghostty, kitty, WezTerm, Windows Terminal)
      • Fixed Cmd+Left/Right not jumping to line start/end in terminals that use the Kitty keyboard protocol (Warp fullscreen, kitty, Ghostty, WezTerm)
      • Fixed Ctrl+Z hanging the terminal when Claude Code is launched via a wrapper process (e.g. npx, bun run)
      • Fixed scrollback duplication in inline mode where resizing the terminal or large output bursts would repeat earlier conversation history
      • Fixed modal search dialogs overflowing the screen at short terminal heights, hiding the search box and keyboard hints
      • Fixed scattered blank cells and disappearing composer chrome in the VS Code integrated terminal during scrolling
      • Fixed an intermittent API 400 error related to cache control TTL ordering that could occur when a parallel request completed during request setup
      • Fixed /branch rejecting conversations with transcripts larger than 50MB
      • Fixed /resume silently showing an empty conversation on large session files instead of reporting the load error
      • Fixed /plugin Installed tab showing the same item twice when it appears under Needs attention or Favorites
      • Fixed /update and /tui not working after entering a worktree mid-session
    5. 🔗 badlogic/pi-mono v0.68.0 release

      New Features

      Breaking Changes

      • Changed SDK and CLI tool selection from cwd-bound built-in tool instances to tool-name allowlists. createAgentSession({ tools }) now expects string[] names such as "read" and "bash" instead of Tool[], --tools now allowlists built-in, extension, and custom tools by name, and --no-tools now disables all tools by default rather than only built-ins. Migrate SDK code from tools: [readTool, bashTool] to tools: ["read", "bash"] (#2835, #3452)
      • Removed prebuilt cwd-bound tool and tool-definition exports from @mariozechner/pi-coding-agent, including readTool, bashTool, editTool, writeTool, grepTool, findTool, lsTool, readOnlyTools, codingTools, and the corresponding *ToolDefinition values. Use the explicit factory exports instead, for example createReadTool(cwd), createBashTool(cwd), createCodingTools(cwd), and createReadToolDefinition(cwd) (#3452)
      • Removed ambient process.cwd() / default agent-dir fallback behavior from public resource helpers. DefaultResourceLoader, loadProjectContextFiles(), and loadSkills() now require explicit cwd/agent-dir style inputs, and exported system-prompt option types now require an explicit cwd. Pass the session or project cwd explicitly instead of relying on process-global defaults (#3452)

      Added

      • Added extension support for customizing the interactive streaming working indicator via ctx.ui.setWorkingIndicator(), including custom animated frames, static indicators, hidden indicators, a new working-indicator.ts example extension, and updated extension/TUI/RPC docs (#3413)
      • Added systemPromptOptions (BuildSystemPromptOptions) to before_agent_start extension events, so extensions can inspect the structured inputs used to build the current system prompt (#3473 by @dljsjr)
      • Added /clone to duplicate the current active branch into a new session, while keeping /fork focused on forking from a previous user message (#2962)
      • Added ctx.fork() support for position: "before" | "at" so extensions and integrations can branch before a user message or duplicate the current point in the conversation; the interactive clone/fork UX builds on that runtime support (#3431 by @mitsuhiko)
      • Added configurable keybinding ids for scoped model selector actions and tree filter actions, so those interactive shortcuts can be remapped in keybindings.json (#3343 by @mpazik)
      • Added PI_OAUTH_CALLBACK_HOST support for built-in OAuth login flows, allowing local callback servers used by pi auth to bind to a custom interface instead of hardcoded 127.0.0.1 (#3409 by @Michaelliv)
      • Added reason and targetSessionFile metadata to session_shutdown extension events, so extensions can distinguish quit, reload, new-session, resume, and fork teardown paths (#2863)

      Changed

      • Changed pi update to batch npm package updates per scope and run git package updates with bounded parallelism, reducing multi-package update time while preserving skip behavior for pinned and already-current packages (#2980)
      • Changed Bedrock session requests to omit maxTokens when model token limits are unknown and to omit temperature when unset, letting Bedrock use provider defaults and avoid unnecessary TPM quota reservation (#3400 by @wirjo)

      Fixed

      • Fixed AgentSession system-prompt option initialization to avoid constructing an invalid empty BuildSystemPromptOptions, so npm run check passes after cwd became mandatory.
      • Fixed shell-path resolution to stop consulting ambient process.cwd() state during bash execution, so session/project-specific shellPath settings now follow the active coding-agent session cwd instead of the launcher cwd (#3452)
      • Fixed ctx.ui.setWorkingIndicator() custom frames to render verbatim instead of forcing the theme accent color, so extensions now own working-indicator coloring when they customize it (#3467)
      • Fixed pi update reinstalling npm packages that are already at the latest published version by checking the installed package version before running npm install <pkg>@latest (#3000)
      • Fixed @ autocomplete plain queries to stop matching against the full cwd/base path, so path fragments in worktree names no longer crowd out intended results such as @plan (#2778)
      • Fixed built-in tool wrapping to use the same extension-runner context path as extension tools, so built-in tools receive execution context and read can warn when the current model does not support images (#3429)
      • Fixed openai-completions assistant replay to preserve compat.requiresThinkingAsText text-part serialization, avoiding same-model follow-up crashes when previous assistant messages mix thinking and text (#3387)
      • Fixed direct OpenAI Chat Completions sessions to map sessionId and cacheRetention to prompt caching fields, sending prompt_cache_key when caching is enabled and prompt_cache_retention: "24h" for direct api.openai.com requests with long retention (#3426)
      • Fixed OpenAI-compatible Chat Completions sessions to optionally send aligned session_id, x-client-request-id, and x-session-affinity headers from sessionId via compat.sendSessionAffinityHeaders, improving cache-affinity routing for backends such as Fireworks (#3430)
      • Fixed threaded /resume session relationships and current-session detection to canonicalize symlinked session paths during selector comparisons, so shared session directories no longer break parent-child matching or active-session delete protection (#3364)
      • Fixed /session, Sessions docs, and CLI help to consistently document that session reuse supports both file paths and session IDs, and that /session shows the current session ID (#3390)
      • Fixed Windows pnpm global install detection to recognize \\.pnpm\\ store paths, so update notices now suggest pnpm install -g @mariozechner/pi-coding-agent instead of falling back to npm (#3378)
      • Fixed missing @sinclair/typebox runtime dependency in @mariozechner/pi-coding-agent, so strict pnpm installs no longer fail with ERR_MODULE_NOT_FOUND when starting pi (#3434)
      • Fixed xterm uppercase typing in the interactive editor by decoding printable modifyOtherKeys input and normalizing shifted letter matching, so Shift+letter no longer disappears in pi (#3436)
      • Fixed /compact to reuse the session thinking level for compaction summaries instead of forcing high, avoiding invalid reasoning-effort errors on github-copilot/claude-opus-4.7 sessions configured for medium thinking (#3438)
      • Fixed shared/exported plain-text tool output to preserve indentation instead of collapsing leading whitespace in the web share page (#3440)
      • Fixed exported share pages to use browser-safe T and O shortcuts with clickable header toggles for thinking and tool visibility instead of browser-reserved Ctrl+T / Ctrl+O bindings (#3374 by @vekexasia)
      • Fixed skill resolution to dedupe symlinked aliases by canonical path, so pi config no longer shows duplicate skill entries when ~/.pi/agent/skills points to ~/.agents/skills (#3417 by @rwachtler)
      • Fixed OpenRouter request attribution to include Pi app headers (HTTP-Referer: https://pi.dev, X-OpenRouter-Title: pi, X-OpenRouter-Categories: cli-agent) when sessions are created through the coding-agent SDK and install telemetry is enabled (#3414)
      • Fixed custom-model compat schema/docs to support cacheControlFormat: "anthropic" for OpenAI-compatible providers that expose Anthropic-style prompt caching via cache_control markers (#3392)
      • Fixed Cloud Code Assist tool schemas to strip JSON Schema meta-declaration keys before provider translation, avoiding validation failures for tool-enabled sessions that use $schema, $defs, and related metadata (#3412 by @vladlearns)
      • Fixed direct Bedrock sessions to honor model.baseUrl as the runtime client endpoint, restoring support for custom Bedrock VPC or proxy routes (#3402 by @wirjo)
      • Fixed the edit tool to coerce stringified edits JSON before validation, so models that send the array payload as a JSON string no longer fall back to ad-hoc shell edits (#3370 by @dannote)
      • Fixed package manifest positive glob entries to expand before loading packaged resources, restoring manifest patterns such as skills/**/*.md (#3350 by @neonspectra)
    6. 🔗 r/Yorkshire Dean's Park, York rss
    7. 🔗 r/york Dean's Park rss
    8. 🔗 r/LocalLLaMA Gemma-4-E2B's safety filters make it unusable for emergencies rss

      Gemma-4-E2B's safety filters make it unusable for emergencies | I’ve been testing Google’s Gemma-4-E2B-it as a local, offline resource for emergency preparedness. The idea was to have a lightweight model that could provide basic technical or medical info if the internet goes down. As the screenshots show, the safety filters are so aggressive that the model is functionally useless for these scenarios. It issues a "hard refusal" on almost everything: - First Aid: Refused to explain an emergency airway procedure, even when specified as a last resort. - Water/Sanitation: Refused to provide chemical ratios for purifying water. - Maintenance: Refused basic mechanical help with a self-defense tool. - Food: Refused instructions on how to process livestock. In a scenario like a war or a total grid collapse, "Contact emergency services" isn't a valid answer. It's disappointing that an offline model, designed for portability, is programmed to withhold basic survival information under the guise of safety. submitted by /u/Unfounded_898
      [link] [comments]
      ---|---

    9. 🔗 HexRaysSA/plugin-repository commits sync repo: +1 release rss
      sync repo: +1 release
      
      ## New releases
      - [IDAssist](https://github.com/symgraph/IDAssist): 2.0.0
      
    10. 🔗 r/Leeds There was a campaign around 2010 to rename Leeds Bradford airport "Sir Jimmy Savile International" rss

      This was a big facebook group and there was a petition to have it made official, especially in the weeks after he died. This was kick started in the Yorkshire Evening Post - the original article in the Yorkshire Evening post has been deleted (shocker) but some websites remain.

      submitted by /u/M_M_X_X_V
      [link] [comments]

    11. 🔗 r/Leeds Leeds music rss

      What bands do you think really define Leeds’ sound? And are there any newer acts people are excited about at the moment?

      I put this map together a while ago and I’m thinking of updating it, so would be great to hear what people think, especially if there’s anything obvious I’ve missed or newer bands worth adding.

      submitted by /u/TheSenseOfDoubt
      [link] [comments]

    12. 🔗 r/wiesbaden Hubschrauber rss

      Weiß jemand, was es mit den beiden Hubschrauber-Flügen heute Abend gegen 21:30 auf sich hatte?

      Wirkten um einiges größer als ein Polizei- oder Rettungshubschrauber und waren überm Dichterviertel deutlich zu sehen und zu hören.

      Werden geplante Flüge von der US Airbase irgendwo angekündigt/öffentlich dokumentiert?

      submitted by /u/Tisiphoni1
      [link] [comments]

    13. 🔗 r/Yorkshire York cherry blossoms looking spectacular this year rss
    14. 🔗 @binaryninja@infosec.exchange Binary Ninja 5.3 (Jotunheim) adds new architecture APIs for full function mastodon

      Binary Ninja 5.3 (Jotunheim) adds new architecture APIs for full function level lifting. We are already using them for upcoming TMS320C6x work, and plugin authors should be able to put them to good use too. Also new: NDS32 and AArch64 ILP32 ABI updates. Check out the latest blog: https://binary.ninja/2026/04/13/binary-ninja-5.3-jotunheim.html#architecture --platform

    15. 🔗 r/wiesbaden Best Schnitzel in town und Umgebung? 🤤 rss
    16. 🔗 r/Leeds Harewood House, Gardens and Lake - 2025 rss

      Photographs captured by Samuel Greenwood.

      submitted by /u/Money_Pie_40
      [link] [comments]

    17. 🔗 r/LocalLLaMA Kimi K2.6 rss

      Kimi K2.6 | Benchmarks submitted by /u/Fantastic-Emu-3819
      [link] [comments]
      ---|---

    18. 🔗 r/LocalLLaMA Kimi K2.6 Released (huggingface) rss

      Kimi K2.6 Released (huggingface) | submitted by /u/BiggestBau5
      [link] [comments]
      ---|---

    19. 🔗 r/Yorkshire Leeds man jailed over 140mph chase ending in roundabout crash rss

      Leeds man jailed over 140mph chase ending in roundabout crash | submitted by /u/Kagedeah
      [link] [comments]
      ---|---

    20. 🔗 sacha chua :: living an awesome life 2026-04-20 Emacs news rss

      I enjoyed reading Hot-wiring the Lisp machine (an adventure into modifying Org publishing). I'm also looking forward to debugging my Emacs Lisp better with timestamped debug messages and ert-play-keys. I hope you also find lots of things you like in the links below!

      Links from reddit.com/r/emacs, r/orgmode, r/spacemacs, Mastodon #emacs, Bluesky #emacs, Hacker News, lobste.rs, programming.dev, lemmy.world, lemmy.ml, planet.emacslife.com, YouTube, the Emacs NEWS file, Emacs Calendar, and emacs-devel. Thanks to Andrés Ramírez for emacs-devel links. Do you have an Emacs-related link or announcement? Please e-mail me at sacha@sachachua.com. Thank you!

      You can comment on Mastodon or e-mail me at sacha@sachachua.com.

    21. 🔗 r/LocalLLaMA When you dial in your bot’s personality rss

      When you dial in your bot’s personality | sycophancy: deleted efficiency per token:+1000% friendship: just beginning edit: “sup” got cut off at top submitted by /u/technaturalism
      [link] [comments]
      ---|---

    22. 🔗 r/Leeds Things to do in Leeds rss

      Going to Leeds as a work trip this week and staying there for full day. Can yall recommend places to go to or your favourite food places ?

      Thank you 💗

      submitted by /u/ConsciousBowler4019
      [link] [comments]

    23. 🔗 r/Leeds Anyone lost a ferret? rss

      seen along the canal near city island. seemed domesticated but kinda skinny

      submitted by /u/fluxpeach
      [link] [comments]

    24. 🔗 r/reverseengineering Reconstructing a Dead USB protocol: From Unknown Chip to Working Implementation rss
    25. 🔗 r/wiesbaden Moving to Wiesbaden rss

      Hello everyone

      I’m starting a new job in Wiesbaden this August and I desperately need an apartment.

      Currently im living near Freiburg.

      I don’t need a lot of space but I do have a dog which isn’t gonna make getting an apartment easy.

      Do you have any tips or suggestions for me?

      Thank you in advance!

      submitted by /u/Skoobdie
      [link] [comments]

    26. 🔗 r/york Early spring at the Minster rss

      Early spring at the Minster | submitted by /u/RedDevilPlay
      [link] [comments]
      ---|---

    27. 🔗 r/wiesbaden Hiking Wiesbaden/Mainz/Lorch rss

      Is anyone interested in hiking this Saturday? Weather is perfect - (Flexible route and time)Lorch to Rewe to Lorchhausen..https://maps.app.goo.gl/K9NB4gg6NomsTvWs5

      submitted by /u/Ok-Muscle-9502
      [link] [comments]

    28. 🔗 r/york York Mosque Community Kitchen | THURSDAY 23 APRIL 12:00 - 13:30. rss

      York Mosque Community Kitchen | THURSDAY 23 APRIL 12:00 - 13:30. | Welcome back to our neighbours & friends in r/York! York Mosque Community Kitchen will be back open on Thursday 23rd April between 12:00 and 13:30, where our dedicated volunteers will be cooking and serving two delicious dishes for lunch. We hope to see you there! Bring someone with you who's in need of a good meal and a friendly chat. Always free, everyone welcome! submitted by /u/YorkMosque-Kitchen
      [link] [comments]
      ---|---

    29. 🔗 r/reverseengineering SASS King: reverse engineering NVIDIA SASS rss
    30. 🔗 r/wiesbaden Nix pflück! rss
    31. 🔗 r/Harrogate The Neverending Harrogate Roadworks Tour has come to my street rss

      The Neverending Harrogate Roadworks Tour has come to my street | Which means I’m minorly inconvenienced for the next couple of days as there’s no parking on street. The surface of roads isn’t really my forte so can someone explain the issue with the road here? I’m fairly certain they resurfaced and repainted it last year and it’s one of the few Harrogate roads with zero potholes. submitted by /u/kamasutramarkviduka
      [link] [comments]
      ---|---

    32. 🔗 r/reverseengineering /r/ReverseEngineering's Weekly Questions Thread rss

      To reduce the amount of noise from questions, we have disabled self-posts in favor of a unified questions thread every week. Feel free to ask any question about reverse engineering here. If your question is about how to use a specific tool, or is specific to some particular target, you will have better luck on the Reverse Engineering StackExchange. See also /r/AskReverseEngineering.

      submitted by /u/AutoModerator
      [link] [comments]

    33. 🔗 r/Yorkshire Throwback to 2023. Fountains Abbey hits different in the sun. rss

      Throwback to 2023. Fountains Abbey hits different in the sun. | Found this photo from three years ago. Fountains Abbey looking bright and the daffodils were just perfect. What’s your favourite spot for a spring walk? Is it looking like this yet? submitted by /u/Happy-Fox11
      [link] [comments]
      ---|---

    34. 🔗 backnotprop/plannotator v0.18.0 release

      Follow @plannotator on X for updates


      Missed recent releases? Release | Highlights
      ---|---
      v0.17.10 | HTML and URL annotation, loopback binding by default, Safari scroll fix, triple-click fix, release pipeline smoke tests
      v0.17.9 | Hotfix: pin Bun to 1.3.11 for macOS binary codesign regression
      v0.17.8 | Configurable default diff type, close button for sessions, annotate data loss fix, markdown rendering polish
      v0.17.7 | Fix "fetch did not return a Response" error in OpenCode web/serve modes
      v0.17.6 | Bun.serve error handlers for diagnostic 500 responses, install.cmd cache fix
      v0.17.5 | Fix VCS detection crash when p4 not installed, install script cache path fix
      v0.17.4 | Vault browser merged into Files tab, Kanagawa themes, Pi idle session tool fix
      v0.17.3 | Sticky lane repo/branch badge overflow fix
      v0.17.2 | Supply-chain hardening, sticky toolstrip and badges, overlay scrollbars, external annotation highlighting, Conventional Comments
      v0.17.1 | Pi PR review parity, parseRemoteUrl rewrite, cross-repo clone fixes, diff viewer flash fix
      v0.17.0 | AI code review agents, token-level annotation, merge-base diffs
      v0.16.7 | Gemini CLI plan review, install script skills directory fix


      What's New in v0.18.0

      v0.18.0 adds focus & wide modes for annotate, first-class OpenCode detection, word-level inline plan diffs, content negotiation for URLs that publish Markdown (via Cloudflare), and inline color swatches in the plan viewer. 13 PRs, 7 from external contributors — 6 of them first-timers.

      Word-Level Inline Plan Diff

      The old plan diff stacked the full old block above the full new block whenever a paragraph was modified. A single word change showed the same paragraph twice with no visual cue to where the edit actually happened. Readers ended up comparing two nearly identical blocks line by line to find the delta.

      The new default Rendered mode performs a second-pass word diff on modified blocks and highlights only the changed tokens inline. A one-word reword now reads as a single paragraph with &lt;ins&gt; and &lt;del&gt; markers on exactly the changed words. Inline code spans, markdown links, and fenced code blocks are preserved as atomic units through a sentinel substitution pass, so diff markers can't split them.

      A third mode switcher tab, "Classic," keeps the legacy block-level stacked rendering for users who prefer it. Raw git-style output is unchanged. Modified blocks are click-to-annotate directly, with both the old and new content captured in the exported feedback so comments on struck-through words keep their context.

      Amber borders on modified blocks complete the green/red/yellow convention used by GitHub and VS Code.

      Wide and Focus Modes

      Wide markdown tables were unreadable because both side panels (TOC on the left, annotations on the right) stayed fixed while the reader width was capped. Tables wrapped awkwardly or required horizontal scrolling inside a narrow column.

      Two new toggles sit above the document and next to the lightning-bolt action:

      • Wide hides both panels and removes the reader width cap. Wide tables and code fences get the full document area.
      • Focus hides both panels but keeps the normal reader width. Distraction-free reading without stretching the content.

      Enabling either mode collapses the left sidebar, hides the annotations panel and resize handle, and toggles the width cap accordingly. Exiting restores the exact previous layout, including which sidebar tab was open. Opening any sidebar or annotations panel automatically exits.

      Available in plan review, annotate, and linked-doc overlays. Archive mode and plan-diff view keep the standard layout.

      First-Class OpenCode Detection

      The origin detection chain in the hook server didn't include OpenCode. Every OpenCode invocation fell through to the claude-code default, which loaded the wrong UI variant: missing agent-switch toggle, wrong agent badge. The opencode origin key was already defined in AGENT_CONFIG with its badge styling in place, but the detection side was never wired up.

      OpenCode is now detected via OPENCODE=1, the canonical runtime flag set unconditionally by the OpenCode binary. The full priority order is:

      PLANNOTATOR_ORIGIN > Codex > Copilot CLI > OpenCode > Claude Code (default)
      

      The PLANNOTATOR_ORIGIN environment variable was documented in the source but never read. It now functions as an explicit override at the top of the chain, validated against AGENT_CONFIG so invalid values fall through to env-based detection instead of breaking.

      Content Negotiation for Markdown-Serving URLs

      When you run plannotator annotate https://..., the tool goes through Jina Reader (or Turndown as a fallback) to convert HTML to markdown. But a growing number of sites — including Cloudflare's developer docs — now publish Markdown directly when you ask for it. Routing those through an HTML-to-markdown converter is wasteful and loses fidelity.

      URL annotation now tries Accept: text/markdown, text/html;q=0.9 first, with a 5-second timeout. If the server returns content-type: text/markdown, the response is used directly — one fetch, no conversion. If the server returns HTML or the request fails, it falls through silently to the existing Jina/Turndown pipeline. Local URLs skip negotiation entirely.

      A new content-negotiation source type is recorded on the result so the UI can indicate which path produced the content.

      Hex Color Swatches in the Plan Viewer

      Frontend plans reference hex color values constantly — design tokens, Tailwind overrides, CSS variable assignments, component palette decisions. Reviewers had to mentally decode every #ff6600 or open a color picker to follow the author's intent.

      The plan viewer now renders a small filled swatch inline, immediately to the left of the hex code. The swatch is a 14×14 rounded square matching the referenced color. Supports 3-, 4-, 6-, and 8-digit hex with a negative lookahead that excludes URL anchors, CSS id selectors, and any identifier that continues with word characters.

      The color value is constrained by the regex before reaching React, and rendered via the React style object — not cssText — so there's no CSS injection path. 19 tests cover valid patterns, false-positive guards, and injection attempts.

      Self-Hosted Paste Service Support

      Short-link sharing for larger plans routes through a paste service at plannotator-paste.plannotator.workers.dev. Self-hosted deployments had no way to point at their own paste service — the URL was hardcoded in the OpenCode plugin.

      The PLANNOTATOR_PASTE_URL environment variable now configures a custom paste endpoint. The OpenCode plugin reads it via a new getPasteApiUrl dependency that flows through command handlers (annotate, annotate-last, archive) and the review server. The Landing component accepts a shareBaseUrl prop with a fallback to the default. CORS documentation in the paste service now includes explicit guidance for self-hosters.

      Backward compatible: unset PLANNOTATOR_PASTE_URL continues to use the hosted default.

      OpenCode Review: Reuse the Existing Local Server

      On subsequent review commands, the OpenCode AI review path tried to start a second opencode serve and collided with the existing local server on port 4096. The first opencode serve wasn't being cleaned up, so port conflicts were guaranteed on the second invocation.

      The review flow now attaches to the default local OpenCode server at 127.0.0.1:4096 if one is already running. If nothing is listening, it spawns a new instance as before. No extra lifecycle management, no extra ports — just reuse what's already there.

      The PR also fixes two local-testing issues uncovered along the way: the source-loaded OpenCode plugin was resolving bundled HTML from the wrong directory, and the sandbox + postinstall paths were not using the documented plugins/ and commands/ directories.

      Additional Changes

      • ~ expansion in user-entered file paths — The shared path resolver now expands home-relative ~ in annotate entrypoints and the Bun and Pi reference handlers, so file, folder, vault, and linked-document paths all handle ~ consistently. #572 by @AlexanderKolberg
      • Thumbs-up quick label on the annotation toolbar — A one-click "Looks good" 👍 button sits before the existing quick labels menu, with green hover styling to match the semantic. #588 by @backnotprop
      • Save as PDF discoverability — The action menu label is now "Print / Save as PDF" with a subtitle explaining how to choose Save as PDF in the system print dialog. No new print pipeline — just making the existing capability findable. #587 by @backnotprop
      • Disable auto-invocation of plannotator slash commands in Claude Code — The four plannotator Claude Code command definitions (annotate, archive, last, review) now carry disable-model-invocation: true, preventing the model from running them automatically. #586 by @backnotprop
      • Stop forcing an agent cycle in OpenCodeagent_cycle assumed only a build and plan agent and broke when users had other agents defined. Removed. #564 by @andreineculau
      • RSS feed link in the marketing layout — The blog's RSS feed is now advertised in the shared &lt;head&gt; so feed readers and browsers can discover it automatically. #573 by @dotemacs

      Install / Update

      macOS / Linux:

      curl -fsSL https://plannotator.ai/install.sh | bash
      

      Windows PowerShell:

      irm https://plannotator.ai/install.ps1 | iex
      

      Pin a specific version:

      curl -fsSL https://plannotator.ai/install.sh | bash -s -- --version v0.18.0
      

      Claude Code Plugin: Run /plugin in Claude Code, find plannotator , and click "Update now".

      Copilot CLI:

      /plugin marketplace add backnotprop/plannotator
      /plugin install plannotator-copilot@plannotator
      

      Gemini CLI: The install script auto-detects ~/.gemini and configures hooks, policy, and slash commands.

      OpenCode: Clear cache and restart:

      rm -rf ~/.cache/opencode/packages/@plannotator ~/.bun/install/cache/@plannotator
      

      Then in opencode.json:

      {
        "plugin": ["@plannotator/opencode@latest"]
      }
      

      Pi: Install or update the extension:

      pi install npm:@plannotator/pi-extension
      

      VS Code Extension: Install from the VS Code Marketplace.


      What's Changed

      New Contributors

      Contributors

      @Pran-Ker shipped inline hex color swatches in the plan viewer, with a carefully constrained regex, a negative lookahead to avoid URL anchors and CSS selectors, and 19 tests including explicit injection guards.

      @andreineculau removed the agent_cycle call that assumed everyone had only build and plan agents in OpenCode, fixing a bug introduced by #40.

      @oorestisime fixed the OpenCode review port collision by reusing the existing local opencode serve at 127.0.0.1:4096 instead of spawning a second one, and cleaned up two local-testing path issues along the way.

      @AlexanderKolberg added ~ home- directory expansion to the shared path resolver so annotate entrypoints and the Bun and Pi reference handlers all treat ~/file.md the same way.

      @dotemacs added the RSS autodiscovery &lt;link&gt; to the marketing site layout so feed readers and browsers can pick up the blog feed automatically.

      @dgrissen2 returned with annotate wide mode — a toggle that collapses both side panels and removes the reader width cap, gated to annotate sessions only, with layout restoration on exit. This follows their prior work on linked-doc navigation, image lightboxing, smart file resolution, and the purple P favicon.

      @HeikoAtGitHub wired OpenCode into the origin detection chain (via OPENCODE=1) and activated the PLANNOTATOR_ORIGIN override that had been documented but never read, with seven headless detection tests covering the new priority order.

      Community issue reporters:

      • @pbowyer filed #560 with a detailed request for word-level diffs and diff display options — that issue directly shaped the design of the new Rendered/Classic/Raw mode switcher.
      • @ndesjardins-comact reported #580, the hardcoded share URL blocking custom-domain usage, which drove the PLANNOTATOR_PASTE_URL work.
      • @alexey-igrychev reported both #513 (the opencode serve port collision) and #514 (empty response bubbles in the OpenCode AI tab).

      Full Changelog : v0.17.10...v0.18.0

    35. 🔗 matklad 256 Lines or Less: Test Case Minimization rss

      256 Lines or Less: Test Case Minimization

      Apr 20, 2026

      Property Based Testing and fuzzing are a deep and science-intensive topic. There are enough advanced techniques there for a couple of PhDs, a PBT daemon, and a client-server architecture. But I have this weird parlor-trick PBT library, implementable in a couple of hundred lines of code in one sitting.

      This week I’ve been thinking about a cool variation of a consensus algorithm. I implemented it on the weekend. And it took just a couple of hours to write a PBT library itself first, and then a test, that showed a deep algorithmic flaw in my thinking (after a dozen trivial flaws in my coding). So, I don’t get to write more about consensus yet, but I at least can write about the library. It is very simple, simplistic even. To use an old Soviet joke about Babel and Bebel, it’s Gogol rather than Hegel. But for just 256 lines, it’s one of the highest power-to-weight ratio tools in my toolbox.

      Read this post if:

      • You want to stretch your generative testing muscles.
      • You are a do-it-yourself type, and wouldn’t want to pull a ginormous PBT library off the shelf.
      • You would pull a library, but want to have a more informed opinion about available options, about essential and accidental complexity.
      • You want some self-contained real-world Zig examples :P

      Zig works well here because it, too, is exceptional in its power-to-weight.

      FRNG

      The implementation is a single file, FRNG.zig, because the core abstraction here is a Finite Random Number Generator — a PRNG where all numbers are pre-generated, and can run out. We start with standard boilerplate:

      const std = @import("std");
      const assert = std.debug.assert;
      
      entropy: []const u8,
      
      pub const Error = error{OutOfEntropy};
      
      const FRNG = @This();
      
      pub fn init(entropy: []const u8) FRNG {
          return .{ .entropy = entropy };
      }
      

      In Zig, files are structs: you obviously need structs, and the language becomes simpler if structs are re-used for what files are. In the above const FRNG = @This() assigns a conventional name to the file struct, and entropy: []const u8 declares instance fields (only one here). const Error and fn init are “static” (container level) declarations.

      The only field we have is just a slice of raw bytes, our pre-generated random numbers. And the only error condition we can raise is OutOfEntropy.

      The simplest thing we can generate is a slice of bytes. Typically, API for this takes a mutable slice as an out parameter:

      pub fn fill(prng: *PRNG, bytes: []u8) void { ... }
      

      But, due to pre-generated nature of FRNG, we can return the slice directly, provided that we have enough entropy. This is going to be our (sole) basis function, everything else is going to be a convenience helper on top:

      pub fn bytes(frng: *FRNG, size: usize) Error![]const u8 {
          if (frng.entropy.len < size) return error.OutOfEntropy;
          const result = frng.entropy[0..size];
          frng.entropy = frng.entropy[size..];
          return result;
      }
      

      The next simplest thing is an array (a slice with a fixed size):

      pub fn array(frng: *FRNG, comptime size: usize) Error![size]u8 {
          return (try frng.bytes(size))[0..size].*;
      }
      

      Notice how Zig goes from runtime-known slice length, to comptime known array type. Because size is a comptime constant, slicing []const u8 with [0..size] returns a pointer to array, *const [size]u8.

      We can re-interpret a 4-byte array into u32. But, because this is Zig, we can trivially generalize the function to work for any integer type, by passing in Int comptime parameter of type type:

      const builtin = @import("builtin");
      
      pub fn int(frng: *FRNG, Int: type) Error!Int {
          comptime {
              assert(@typeInfo(Int).int.signedness == .unsigned);
              assert(builtin.cpu.arch.endian() == .little);
          }
          return @bitCast(try frng.array(@sizeOf(Int)));
      }
      

      This function is monomorphised for every Int type, so @sizeOf(Int) becomes a compile-time constant we can pass to fn array.

      Production code would be endian-clean here, but, for simplicity, we encode our endianness assumption as a compile-time assertion. Note how Zig communicates information about endianness to the program. There isn’t any kind of side- channel or extra input to compilation, like --cfg flags. Instead, the compiler materializes all information about target CPU as Zig code. There’s a builtin.zig file somewhere in the compiler caches directory that contains

      pub const cpu: std.Target.Cpu = .{
          .arch = .aarch64,
          .model = &std.Target.aarch64.cpu.apple_m3,
          // ...
      }
      

      This file can be accessed via @import("builtin") and all the constants inspected at compile time.

      We can make an integer, and a boolean is even easier:

      pub fn boolean(frng: *FRNG) Error!bool {
          return (try frng.int(u8)) & 1 == 1;
      }
      

      Strictly speaking, we only need one bit, not one byte, but tracking individual bits is too much of a hassle.

      From an arbitrary int, we can generate an int in range. As per Random Numbers Included, we use a closed range, which makes the API infailable and is usually more convenient at the call-site:

      pub fn int_inclusive(frng: *FRNG, Int: type, max: Int) Error!Int
      

      As a bit of PRNG trivia, while this could be implemented as frng.int(Int) % (max + 1), the result will be biased (not uniform). Consider the case where Int = u8, and a call like frng.int_inclusive(u8, 64 * 3).

      The numbers in 0..64 are going to be twice as likely as the numbers in 64..(64*3), because the last quarter of 256 range will be aliased with the first one.

      Generating an unbiased number is tricky and might require drawing arbitrary number of bytes from entropy. Refer to https://www.pcg- random.org/posts/bounded-rands.html for details. I didn’t, and copy-pasted code from the Zig standard library. Use at your own risk!

      pub fn int_inclusive(frng: *FRNG, Int: type, max: Int) Error!Int {
          comptime assert(@typeInfo(Int).int.signedness == .unsigned);
          if (max == std.math.maxInt(Int)) return try frng.int(Int);
      
          const bits = @typeInfo(Int).int.bits;
          const less_than = max + 1;
      
          var x = try frng.int(Int);
          var m = std.math.mulWide(Int, x, less_than);
          var l: Int = @truncate(m);
          if (l < less_than) {
              var t = -%less_than;
      
              if (t >= less_than) {
                  t -= less_than;
                  if (t >= less_than) t %= less_than;
              }
              while (l < t) {
                  x = try frng.int(Int);
                  m = std.math.mulWide(Int, x, less_than);
                  l = @truncate(m);
              }
          }
          return @intCast(m >> bits);
      }
      

      Now we can generate an int bounded from above and below:

      pub fn range_inclusive(
          frng: *FRNG, Int: type,
          min: Int, max: Int,
      ) Error!Int {
          comptime assert(@typeInfo(Int).int.signedness == .unsigned);
          assert(min <= max);
          return min + try frng.int_inclusive(Int, max - min);
      }
      

      Another common operation is picking a random element from a slice. If you want to return a pointer to a element, you’ll need a const and mut versions of the function. A simpler and more general solution is to return an index:

      pub fn index(frng: *FRNG, slice: anytype) Error!usize {
          assert(slice.len > 0);
          return try frng.range_inclusive(usize, 0, slice.len - 1);
      }
      

      At the call site, xs[try frng.index(xs)] doesn’t look too bad, is appropriately const-polymorphic, and is also usable for multiple parallel arrays.

      Simulation

      So far, we’ve spent about 40% of our line budget implementing a worse random number generator that can fail with OutOfEntropy at any point in time. What is it good for?

      We use it to feed our system under test with random inputs, see how it reacts, and check that it does not crash. If we code our system to crash if anything unexpected happens and our random inputs cover the space of all possible inputs, we get a measure of confidence that bugs will be detected in testing.

      For my consensus simulation, I have a World struct that holds a FRNG and a set of replicas:

      const World = struct {
          frng: *FRNG,
          replicas: []Replica,
          // ...
      };
      

      World has methods like:

      fn simulate_request(world: *World) !void {
          const replica = try world.frng.index(world.replicas);
          const payload = try world.frng.int(u64);
      
          world.send_payload(replica, payload);
      }
      

      I then select which method to call at random:

      fn step(world: *World) !void {
          const action = try world.frng.weighted(.{
              .request = 10,
              .message = 20,
              .crash = 1,
          });
          switch (action) {
              .request => try world.simulate_request(),
              .message => { ... },
              .crash => { ... },
          }
      }
      

      Here, fn weighted is another FRNG helper that selects an action at random, proportional to its weight. This helper needs quite a bit more reflection machinery than we’ve seen so far:

      pub fn weighted(
          frng: *FRNG,
          weights: anytype,
      ) Error!std.meta.FieldEnum(@TypeOf(weights)) {
          const fields =
              comptime std.meta.fieldNames(@TypeOf(weights));
      
          var total: u32 = 0;
          inline for (fields) |field|
              total += @field(weights, field);
          assert(total > 0);
      
          var pick = try frng.int_inclusive(u64, total - 1);
          inline for (fields) |field| {
              const weight = @field(weights, field);
              if (pick < weight) {
                  return @field(
                      std.meta.FieldEnum(@TypeOf(weights)),
                      field,
                  );
              }
              pick -= weight;
          }
          unreachable;
      }
      

      weights: anytype is compile-time duck-typing. It means that our weighted function is callable with any type, and each specific type creates a new monomorphised instance of a function. While we don’t explicitly name the type of weights, we can get it as @TypeOf(weights).

      FieldEnum is a type-level function that takes a struct type:

      const S = struct {
          foo: bool,
          bar: u32,
          baz: []const u8
      };
      

      and turns it into an enum type, with a variant per-field, exactly what we want for the return type:

      const E = enum { foo, bar, baz };
      

      Tip: if you want to quickly learn Zig’s reflection capabilities, study the implementation of std.meta and std.enums in Zig’s standard library.

      The @field built-in function accesses a field given comptime field name. It’s exactly like Python’s getattr / setattr with an extra restriction that it must be evaluated at compile time.

      To add one more twist here, I always find it hard to figure out which weights are reasonable, and like to generate the weights themselves at random at the start of the test:

      pub fn swarm_weights(frng: *FRNG, Weights: type) Error!Weights {
          var result: Weights = undefined;
          inline for (comptime std.meta.fieldNames(Weights)) |field| {
              @field(result, field) = try frng.range_inclusive(u32, 1, 100);
          }
          return result;
      }
      

      (If you feel confused here, check out Swarm Testing Data Structures)

      Stepping And Runnig

      Now we have enough machinery to describe the shape of test overall:

      fn run_test(gpa: Allocator, frng: *FRNG) !void {
          var world = World.init(gpa, &frng) catch |err|
              switch (err) {
                  error.OutOfEntropy => return,
                  else => return err,
              };
          defer world.deinit(gpa);
      
          while (true) {
              world.step() catch |err| switch (err) {
                  error.OutOfEntropy => break,
              };
          }
      }
      
      const World = struct {
          frng: *FRNG,
          weights: ActionWeights,
      
          // ...
      
          const ActionWeights = struct {
              request: u32,
              message: u32,
              crash: u32,
              // ...
          };
      
          pub fn init(gpa: Allocator, frng: *FRNG) !void {
              const weights = try frng.swarm_weights(ActionWeights);
              // ...
          }
      
          fn step(world: *World) error{OutOfEntropy}!void {
              const action = try world.frng.weighted(world.weights);
              switch (action) {
                  .request => { ... },
                  // ...
              }
          }
      };
      

      A test needs an FRNG (which ultimately determines the outcome) and an General Purpose Allocator for the World. We start by creating a simulated World with random action weights. If FRNG entropy is very low, we can run out of entropy even at this stage. We assume that the code is innocent until proven guilty — if we don’t have enough entropy to find a bug, this particular test returns success. Don’t worry, we’ll make sure that we have enough entropy elsewhere.

      We use catch |err| switch(err) to peel off OutOfEntropy error. I find that, whenever I handle errors in Zig, very often I want to discharge just a single error from the error set. I wish I could use parenthesis with a catch:

      // NOT ACTUALY ZIG :(
      
      var world = try World.init(gpa, &frng)
          catch (error.OutOfEntropy) return;
      

      Anyway, having created the World, we step through it while we still have entropy left. If any step detects an internal inconsistency, the entire World crashes with an assertion failure. If we got to the end of while(true) loop, we know that at least that particular slice of entropy didn’t uncover anything suspicious.

      Notice what isn’t there. We aren’t generating a complete list of actions up- front. Rather, we make random decisions as we go, and can freely use the current state of the World to construct a menu of possible choices (e.g., when sending a message, we can consider only not currently crashed replicas).

      Binary Search the Answer

      And here we can finally see the reason why we bothered writing a custom Finite PRNG, rather than using an off-the-shelf one. The amount of entropy in FRNG defines the complexity of the test. The fewer random bytes we start with, the faster we exit the step loop. And this gives us an ability to minimize test cases essentially for free.

      Suppose you know that a particular entropy slice makes the test fail (cluster enters split brain at the millionth step). Let’s say that the slice was 16KiB. The obvious next step is to see if just 8KiB would be enough to crash it. And, if 8KiB isn’t, than perhaps 12KiB?

      You can *binary search* the minimal amount of entropy that’s enough for the test to fail. And this works for any test, it doesn’t have to be a distributed system. If you can write the code to generate your inputs randomly, you can measure complexity of each particular input by measuring how many random bytes were drawn in its construction.

      And now the hilarious part — of course it seems that the way to minimize entropy is to start with a particular failing slice and apply genetic- algorithm mutations to it. But a much simpler approach seems to work in practice — just generated a fresh, shorter entropy slice. If you found some failure at random, then you should be able to randomly stumble into a smaller failing example, if one exists — there are much fewer small examples, so finding a failing one becomes easier when the size goes down!

      The Searcher

      The problem with binary searching for failing entropy is that a tripped assertion crashes the program. There’s no unwinding in Zig. For this reason, we’ll move the search code to a different process. So a single test will be a binary with a main function, that takes entropy on stdin.

      Zig’s new juicy main makes writing this easier than in any previous versions of Zig :D

      pub fn main(init: std.process.Init) !void {
          const gpa = init.gpa;
          const io = init.io;
      
          var stdin_reader = std.Io.File.stdin().reader(io, &.{});
          const entropy = try stdin_reader.interface
              .allocRemaining(gpa, .unlimited);
          defer gpa.free(entropy);
      
          var frng = FRNG.init(entropy);
      
          var world = World.init(gpa, &frng, .{}) catch |err|
              switch (err) {
                  error.OutOfEntropy => return,
                  else => return err,
              };
          defer world.deinit(gpa);
      
          world.run();
      }
      

      Main gets Init as an argument, which provides access to things like command line arguments, default allocator and a default Io implementation. These days, Zig eschews global ambient IO capabilities, and requires threading an Io instance whenever we need to make a syscall. Here, we need Io to read stdin.

      Now we will implement a harness to call this main. This will be FRNG.Driver:

      pub const Driver = struct {
          io: std.Io,
          sut: []const u8,
          buffer: []u8,
      
          const log = std.log;
      };
      

      It will be spawning external processes, so it’ll need an Io. We also need a path to an executable with a test main function, a System Under Test. And we’ll need a buffer to hold the entropy. This driver will be communicating successes and failures to the users, so we also prepare a log for textual output.

      How we get entropy to feed into sut? Because we are only interested in entropy size, we won’t be storing the actual entropy bytes, and instead will generate it from a u64 seed. In other words, just two numbres, entropy size and seed, are needed to reproduce a single run of the test:

      fn run_once(driver: Driver, options: struct {
          size: u32,
          seed: u64,
          quiet: bool,
      }) !enum { pass, fail } {
          assert(options.size <= driver.buffer.len);
          const entropy = driver.buffer[0..options.size];
      
          var rng = std.Random.DefaultPrng.init(options.seed);
          rng.random().bytes(entropy);
      
          var child = try std.process.spawn(driver.io, .{
              .argv = &.{driver.sut},
              .stdin = .pipe,
              .stderr = if (options.quiet) .ignore else .inherit,
          });
      
          try child.stdin.?.writeStreamingAll(driver.io, entropy);
          child.stdin.?.close(driver.io);
          child.stdin = null;
      
          const term = try child.wait(driver.io);
          return if (success(term)) .pass else .fail;
      }
      
      fn success(term: std.process.Child.Term) bool {
          return term == .exited and term.exited == 0;
      }
      

      We use default deterministic PRNG to expand our short seed into entropy slice of the required size. Then we spawn sut proces, feeding the resulting entropy via stdin. Closing child’s stdin signals the end of entropy. We then return either .pass or .fail depending on child’s exit code. So, both explicit errors and crashes will be recognized as failures.

      Next, we implement the logic for checking if a particular seed size is sufficient to find a failure. Of course, we won’t be able to say that for sure in a finite amount of time, so we’ll settle for some user-specified amount of retries:

      fn run_multiple(driver: Driver, options: struct {
          size: u32,
          attempts: u32,
      }) !union(enum) { pass, fail: u64 } {
          // ...
      }
      

      The user passes us the number of attempts to make, and we return .pass if they all were successfull, or a specific failing seed if we found one:

      assert(options.size <= driver.buffer.len);
      
      for (0..options.attempts) |_| {
          var seed: u64 = undefined;
          driver.io.random(@ptrCast(&seed));
      
          const outcome = try driver.run_once(.{
              .seed = seed,
              .size = options.size,
              .quiet = true,
          });
          switch (outcome) {
              .fail => return .{ .fail = seed },
              .pass => {},
          }
      }
      return .pass;
      

      To generate a real seed we need “true” cryptographic non-deterministic randomness, which is provided by io.random.

      Finally, the search for the size:

      fn search(driver: Driver, options: struct {
          attempts: u32 = 100,
      }) !union(enum) {
          pass,
          fail: struct { size: u32, seed: u64 },
      } {
          // ...
      }
      

      Here, we are going to find a smallest entropy size that crashes sut. If we succeed, we return the seed and the size. The upper bound for the size is the space available in the pre-allocated entropy buffer.

      The search loop is essentially a binary search, with a twist — rather than using dichotomy on the size directly, we will be doubling a step we use to change the size between iterations.

      That is, we start with a small size and step, and, on every iteration, double the step and add it to the size, until we hit a failure (or run out of buffer for the entropy).

      Once we found a failure, we continue the serach in the other direction — halving the step and subtracting it from the size, keeping the smaller size if it still fails.

      On each step, we log the current size and outcome, and report the smallest failing size at the end.

      var found_size: ?u32 = null;
      var found_seed: ?u64 = null;
      
      var pass: bool = true;
      var size: u32 = 16;
      var step: u32 = 16;
      for (0..1024) |_| {
          if (step == 0) break;
          const size_next = if (pass) size + step else size -| step;
          if (size > driver.buffer.len) break;
      
          const outcome = try driver.run_multiple(.{
              .size = size_next,
              .attempts = options.attempts,
          });
          switch (outcome) {
              .pass => log.info("pass: size={}", .{size_next}),
              .fail => |seed| {
                  found_size = size_next;
                  found_seed = seed;
                  log.err("fail: size={} seed={}", .{ size_next, seed });
              },
          }
          const pass_next = (outcome == .pass);
      
          if (pass and pass_next) {
              step *= 2;
          } else if (!pass and !pass_next) {
              // Keep the step.
          } else {
              step /= 2;
          }
      
          if (pass or !pass_next) {
              size = size_next;
              pass = pass_next;
          }
      } else @panic("safety counter");
      
      if (found_size == null) return .pass;
      return .{ .fail = .{
          .size = found_size.?,
          .seed = found_seed.?,
      } };
      

      Finally, we wrap Driver’s functionality into main that works in two modes — either reproduces a given failure from seed and size, or searches for a minimal failure:

      pub fn main(
          gpa: std.mem.Allocator,
          io: std.Io,
          sut: []const u8,
          operation: union(enum) {
              replay: struct { size: u32, seed: u64 },
              search: struct {
                  attempts: u32 = 100,
                  size_max: u32 = 4 * 1024 * 1024,
              },
          },
      ) !void {
          const size_max = switch (operation) {
              .replay => |options| options.size,
              .search => |options| options.size_max,
          };
      
          const buffer = try gpa.alloc(u8, size_max);
          defer gpa.free(buffer);
      
          var driver: Driver = .{
              .io = io,
              .buffer = buffer,
              .sut = sut,
          };
      
          switch (operation) {
              .replay => |options| {
                  const outcome = try driver.run_once(.{
                      .size = options.size,
                      .seed = options.seed,
                      .quiet = false,
                  });
                  log.info("{t}", .{outcome});
              },
              .search => |options| {
                  const outcome = try driver.search(.{
                      .attempts = options.attempts,
                   });
                  switch (outcome) {
                      .pass => log.info("ok", .{}),
                      .fail => |fail| {
                          log.err("minimized size={} seed={}", .{
                              fail.size, fail.seed,
                           });
                      },
                  }
              },
          }
      }
      

      Running the search routine looks like this in a terminal:

      Those final seed&size can then be used for .replay, giving you a minimal reproducible failure for debugging!

      This … of course doesn’t look too exciting without visualizing a specific bug we can find this way, but the problem there is that interesting examples of systems to test in this way usually take more than 256 lines to implement. So I’ll leave it to your imagination, but you get the idea: if you can make a system fail under a “random” input, you can also systematically search the space of all inputs for the smallest counter-example, without adding knowledge about the system to the searcher. This article also provides a concrete (but somewhat verbose) example.

      Here’s the full code:

      https://gist.github.com/matklad/343d13547c8bfe9af310e2ca2fbfe109

    36. 🔗 Kevin Lynagh On sabotaging projects by overthinking, scope creep, and structural diffing rss

      Hi friends,

      I'll be attending Babashka Conf on May 8 and Dutch Clojure Days on May 9. If you're attending either (or just visiting Amsterdam), drop me a line!

      On sabotaging projects by overthinking

      When I have an idea for a project, it tends to go in one of these two directions:

      1. I just do it. Maybe I make a few minor revisions, but often it turns out exactly how I'd imagined and I'm happy.

      2. I think, "I should look for prior art". There's a lot of prior art, dealing with a much broader scope than I'd originally imagined. I start to wonder if I should incorporate that scope. Or perhaps try to build my thing on top of the existing sorta-nearby-solutions. Or maybe I should just use the popular thing. Although I could do a better job than that thing, if I put a bunch of time into it. But actually, I don't want to maintain a big popular project, nor do I want to put that much time into this project. Uh oh, now I've spent a bunch of time, having neither addressed the original issue nor experienced the joy of creating something.

      I prefer the first outcome, and I think the pivotal factor is how well I've internalized my own success criteria.

      For example, last weekend I hosted my friend Marcin and we decided it'd be fun to do some woodworking, so we threw together this shelf and 3d-printed hangers for my kitchen:

      a black shelf with a painted orange/pink edge and Ikea food bins hanging off
the bottom

      Absolute banger of a project:

      • brainstormed the design over coffee
      • did a few 3d-print iterations for the Ikea bin hangers (OnShape CAD, if you want to print your own)
      • used material leftover from my workbench
      • rounded the corner by eye with a palm sander
      • sealed the raw plywood edge with some leftover paint from a friend
      • done in a weekend

      The main success criteria was to jam on woodworking with a friend, and that helped me not overthink the object-level success criteria: Just make a shelf for my exact kitchen!

      In contrast, this past Friday I noticed difftastic did a poor job, so I decided to shop around for structural/semantic diff tools and related workflows (a topic I've never studied, that I'm increasingly interested in as I'm reviewing more and more LLM-generated code).

      I spent 4 hours over the weekend researching existing tools (see my notes below), going through dark periods of both "semantic tree diffing is a PhD-level complex problem" and "why do all of these have MCP servers? I don't want an MCP server", before I came to my senses and remembered my original success criteria: I just want a nicer diffing workflow for myself in Emacs, I should just build it myself -- should take about 4 hours.

      I'm cautiously optimistic that, having had this realization and committing myself to a minimal scope, I'll be able to knock out a prototype before running out of motivation.

      However, other long-running interests of mine:

      seem to be deep in the well of outcome #2.

      That is, I've spent hundreds of hours on background research and little prototypes, but haven't yet synthesized anything that addresses the original motivating issue.

      It's not quite that I regret that time -- I do love learning by reading -- but I have a nagging sense of unease that my inner critic (fear of failure?) is silencing my generative tendencies, keeping me from the much more enjoyable (and productive!) learning by doing.

      I think in these cases the success criteria has been much fuzzier: Am I trying to replace my own usage of Rust/Clojure? Only for some subset of problems? Or is it that I actually just need a playground to learn about language design/implementation, and it's fine if I don't end up using it?

      Ditto for CAD: Am I trying to replace my commercial CAD tool in favor of my own? Only for some subset of simple or particularly parametric parts? Do I care if it's useful for others? Does my tool need to be legibly different from existing open-source tools?

      It's worth considering these questions, sure. But at the end of the day, I'd much rather have done a lot than have only considered a lot.

      So I'm trying to embrace my inner clueless 20-year-old and just do things -- even if some turn out to be "obviously bad" in hindsight, I'll still be coming out ahead on net =D

      Conservation of scope creep

      Of course, there's only so much time to "just do things", and there's a balance to be had. I'm not sure how many times I'll re-learn YAGNI ("you ain't gonna need it") in my career, but I was reminded of it again after writing a bunch of code with an LLM agent, then eventually coming to my senses and throwing it all out.

      I wanted a Finda-style filesystem-wide fuzzy path search for Emacs. Since I've built (by hand, typing the code myself!) this exact functionality before (walk filesystem to collect paths, index them by trigram, do fast fuzzy queries via bitmap intersections), I figured it'd only take a few hours to supervise an LLM to write all the code.

      I started with a "plan mode" chat, and the LLM suggested a library, Nucleo, which turned up since I wrote Finda (10 years ago, eek!). I read through it, found it quite well- designed and documented, and decided to use it so I'd get its smart case and Unicode normalization functionality. (E.g., query foo matches Foo and foo, whereas query Foo won 't match foo; similarly for cafe and café.)

      Finding a great library wasn't the problem, the problem was that Nucleo also supported some extra functionality: anchors (^foo only matches at the beginning of a line).

      This got me thinking about what that might mean in a corpus that consists entirely of file paths. Anchoring to the beginning of a line isn't useful (everything starts with /), so I decided to try and interpret the anchors with respect to the path segments. E.g., ^foo would match /root/foobar/ but not /root/barfoo/.

      But to do this efficiently, the index needs to keep track of segment boundaries so that the query can be checked against each segment quickly.

      But then we also need to handle a slash occurring in an anchored query (e.g., ^foo/bar) since that wouldn't get matched when only looking at segments individually (root, foo, bar, and baz of a matching path /root/foo/bar/baz/).

      Working through this took several hours: first throwing around design ideas with an LLM, having it write code to wrap Nucleo's types, then realizing its code was bloated and didn't spark joy, so finally writing my own (smaller) wrapper.

      Then, after a break, I realized:

      1. I can't think of a situation where I'd ever wished Finda had anchor functionality
      2. In a corpus of paths, I can anchor by just adding / to the start or end of a query (this works for everything except anchoring to the end of a filename).

      So I tossed all of the anchoring code.

      I'm pretty sure I still came out ahead compared to if I'd tried to write everything myself sans LLM or discussion with others, but I'm not certain.

      Perhaps there's some kind of conservation law here: Any increases in programming speed will be offset by a corresponding increase in unnecessary features, rabbit holes, and diversions.

      Structural diffing

      Speaking of unnecessary diversions, let me tell you everything I've learned about structural diffing recently -- if you have thoughts/feelings/references in this space, I'd love to hear about 'em!

      When we're talking about code, a "diff" usually means a summary of the line- by-line changes between two versions of a file. This might be rendered as a "unified" view, where changed lines are prefixed with + or - to indicate whether they're additions or deletions. For example:

      We've removed coffee and added apple.

      The same diff might also be rendered in a side-by-side view, which can be easier to read when there are more complex changes:

      The problem with these line-by-line diffs is that they're not aware of higher- level structure like functions, types, etc. -- if some braces match up somehow between versions, they might not be shown at all, even if the braces "belong" to different functions.

      There's a wonderful tool, difftastic, which tries to address this by calculating diffs using treesitter-provided concrete syntax trees. It's a huge improvement over line-based diffs, but unfortunately it doesn't always do a great job matching entities between versions.

      Here's the diff that motivated this entire foray:

      Note that it doesn't match up struct PendingClick, it shows it deleted on the left and added on the right.

      I haven't dug into why difftastic fails to match here, but I do feel like it's wrong -- even if the overall diff would be longer, I'd still rather see PendingClickRequest and PendingClick matched up between both sides.

      Here's a summary of tools / references in the space:

      Context-sensitive keywords in particular were a constant source of annoyance. The grammar looks correct, but it will fail to parse because of the way the lexer works. You don't want your tool to abort just because someone named their parameter "async".

      • diffsitter

        • built on treesitter, has MCP server. README includes list of similar projects.
        • lots of github stars, but doesn't seem particularly well-documented; I couldn't find an explanation of how it works, but the difftastic wiki says it "runs longest-common-subsequence on the leaves of the tree"
        • gumtree

        • research / academic origin in 2014

        • requires Java, so no-go for my use case of a quick tool I can use via Emacs
        • mergiraf: treesitter-based merge-driver written in rust

        • very nice architecture overview; tool uses Gumtree algorithm

        • docs and adorable illustrations indicate this project was clearly written by a thoughtful human
        • semanticdiff.com author in HN comments: > GumTree is good at returning a result quickly, but there are quite a few cases where it always returned bad matches for us, no matter how many follow-up papers with improvements we tried to implement. In the end we switched over to a dijkstra based approach that tries to minimize the cost of the mapping
        • weave: also a treesitter-based merge-driver written in Rust

        • feels a bit "HN-optimized" (flashy landing pages, lots of github stars, MCP server, etc.)

        • I looked into their entity extraction crate, sem
        • core diffing code is OK but pretty wordy
        • greedy entity matching algorithm
        • data model can't detect intra-file moves, even though those might be significant
        • includes a lot of heuristic "impact" analysis, which feels like overreaching-scope to me since it'd require much tighter language integration before I'd trust it
        • ran into buggy output when running sem diff --verbose HEAD~4; it showed lines as having changed that…didn't change at all.
        • Too much 80%-done, hypothetically useful functionality for me to use as a foundation, but props for sure to the undergrad/student(?) who's built all this in just three months.
        • diffast: tree edit-distance of ASTs based on an algorithm from a 2008 academic paper.

        • supports "Python, Java, Verilog, Fortran, and C/C++ via dedicated parsers"

        • has a nice gallery of example AST differences
        • can export info in tuples for datalog
        • autochrome: Clojure-specific diffs based on dynamic programming

        • excellent visual explanation and example walkthrough

        • Tristan Hume has a great article on Designing a Tree Diff Algorithm Using Dynamic Programming and A*

      My primary use case is reviewing LLM output turn-by-turn -- I'm very much in- the-loop, and I'm not letting my agent (or dozens of them, lol) run wild generating 10k+ lines of code at a time.

      Rather, I give an agent a scoped task, then come back in a few minutes and want to see an overview of what it did and then either revise/tweak it manually in Emacs or throw the whole thing out and try again (or just write it myself).

      The workflow I want, then, is to

      • see a high-level overview of the diff: what entities (types/functions/methods) were added/removed/changed?
      • quickly see textual diffs on an entity-by-entity basis ("expanding" parts of the above summary)
      • quickly edit any changes, without having to navigate elsewhere (i.e., do it inline, rather than having to switch from "diff" to "file)

      Basically, I want something like Magit's workflow for reviewing and staging changes, but on an entity level rather than file/line level.

      In light of the "minimal scope, just get your project done" lesson I've just re-learned for the nth time, my plan is to:

      • throw together my own treesitter-based entity extraction framework (just Rust for now)
      • do some simple greedy matching for now
      • render the diff to the command line

      Once that seems reasonable (i.e., it does a better job than difftastic did on that specific commit), I'll:

      • wire into a more interactive Magit-like Emacs workflow (maybe I can reuse Magit itself!?!)
      • add support for new languages, as I need them
      • potentially explore more sophisticated score-based global matching rather than simple greedy matching

      Mayyybe if I'm happy with it I'll end up releasing something. But I'm not trying to collect Github stars or HN karma, so I might just happily use it in the privacy of my own home without trying to "commercialize it".

      After all, sometimes I just want a shelf.

      Misc. stuff

  4. April 19, 2026
    1. 🔗 IDA Plugin Updates IDA Plugin Updates on 2026-04-19 rss

      IDA Plugin Updates on 2026-04-19

      New Releases:

      Activity:

      • ida-pro-mcp
        • f21bb5ee: Merge pull request #373 from NeKroFR/feat/decompile-hide-addresses
      • IDAssist
      • playlist
      • python-elpida_core.py
        • 8f7c18c1: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-19T23:57Z
        • 007b4032: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-19T23:42Z
        • d299c342: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-19T23:24Z
        • fe43416e: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-19T23:10Z
        • 01e16c2b: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-19T22:55Z
        • a3dc2a99: feat(body-phase1): add shadow telemetry for expanded axioms A11/A12/A…
        • 5edbb2a0: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-19T22:36Z
        • cce9030b: fix(mind): add identity_verifier.py to Dockerfile COPY list
        • 89f37196: [HERMES-ROUTED] Phase 3 routing artifact 2026-04-19T22:19Z
    2. 🔗 r/Leeds 22F needs places to meet new people and hangout rss

      22F, international student, my friends are gone back home and I'm super bored here in Leeds. Does anyone have suggestions for places where I can make new friends? I don't know what to do here and I feel like I'm just rotting in my room lol.

      submitted by /u/Pemberley_21
      [link] [comments]

    3. 🔗 r/Leeds Massive Pro Iran protest today rss

      Seem to dwarf the Palestine ones.

      Very interesting dynamic, as not one Arabic writing flag in sight, no traditional Islamic dress for women, Israel, US and Iran flags everywhere, and the crowd were majority (I’m assuming) Iranian, which compared to the Palestine ones (now) is the opposite. The only western looking people there today were marching with them wearing St George’s flags.

      I have heard that Pro Palestine and Pro Iran supporters are clashing elsewhere in the country, do you think that may happen in Leeds?

      submitted by /u/Desperate_Break4747
      [link] [comments]

    4. 🔗 r/Leeds Remember kids... rss

      Durgs, just say no.

      submitted by /u/kek23k
      [link] [comments]

    5. 🔗 r/Yorkshire Sand martins at Redcar beach rss

      Sand martins at Redcar beach | submitted by /u/DentistKitchen
      [link] [comments]
      ---|---

    6. 🔗 r/Yorkshire A perfect day in Whitby rss
    7. 🔗 pydantic/monty v0.0.16 - 2026-04-19 release

      What's Changed

      Full Changelog : v0.0.15...v0.0.16

    8. 🔗 r/Leeds Why isn’t Holbeck safe? rss

      I’m an ex-student looking at renting an apartment in Leeds with a current student, and Holbeck really stood out as it allowed city centre proximity and closeness, but for surprisingly affordable prices.

      However, when I google and search this Reddit for information on Holbeck’s safety, it was all overwhelming negative, saying it is dangerous to walk around at night, unsafe for women etc.

      I figured maybe there was just rougher areas or outskirts, but one comment with a lot of support specifically said the Holbeck Urban Village was a particularly bad part, despite this being where I’m looking at and around train station.

      So I walked around there today, and it seemed really lovely, particularly around Bridgewater Place and the Urban Village I found that it was simply just a lot of students, younger adults and working people walking around. It felt and looked completely safe, especially with nice access to the canal.

      So what’s the deal? Was it just a lucky day? Is there a lot I haven’t seen about it? Or is the area around Bridgewater Place fine?

      Thanks!

      submitted by /u/Least-Broccoli9995
      [link] [comments]

    9. 🔗 r/Harrogate Opportunities in Harrogate rss

      My partner and I are thinking of moving to Harrogate this year. My partner works remotely, whereas I’m office-based, but we both work in IT. I’m keen to career change into horticulture/gardening (honestly, need to step away from the screen and touch some grass). I have no experience in this field, so may need to find something else whilst I work on obtaining experience from volunteering etc.

      Is Harrogate a good place to try and get into horticulture? Aware the job market is a bit dire currently, but are there many other opportunities there just in case horticulture is an immediate no-go? Thanks!

      submitted by /u/Felt_Felted
      [link] [comments]

    10. 🔗 r/Yorkshire Parkwood, Keighley rss

      Parkwood, Keighley | Went on a bit of a nostalgia trip, not been here in 30 years since I left K town behind. Still a lovely bit of woodland. submitted by /u/Gh0styD0g
      [link] [comments]
      ---|---

    11. 🔗 r/reverseengineering hi guys i want from someone here to help me in a complicated mission i want to add private server to game that their own servers shout down and want to make it playable again it's need reverse engineering and huge knowledge and i have game files apk and others i really hope there is someone can help rss
    12. 🔗 HexRaysSA/plugin-repository commits sync repo: +2 releases rss
      sync repo: +2 releases
      
      ## New releases
      - [Suture](https://github.com/libtero/suture): 1.2.10
      - [tc_deer](https://github.com/arkup/tc_deer): 0.1.2
      
    13. 🔗 r/Leeds Postgraduate student about to be homeless rss

      I’m honestly just looking for some advice. My tenancy ends in two months and as I am unemployed (in the span of a year I’ve applied to more than 250 jobs) I don’t have the necessary funds for a new tenancy (no savings either) I’m currently paying rent using my student finance that doesn’t even cover the full cost of my course in the first place. I don’t have anyone I could move in with so I will genuinely be homeless in a few months. I applied with Leeds council housing but I’m currently band C. What should I do? Any advice will be helpful.

      Edit: I don’t apply on job websites I always apply via the companies website. I have Autism and Adhd and arthraglia of multiple joints alongside other things😭

      submitted by /u/sandyslaifu
      [link] [comments]

    14. 🔗 r/Yorkshire Ey up People of Yorkshire! I am a student from Singapore and I love collecting postcards. I would love to receive postcards from anywhere in Yorkshire 🙂. Can someone send me one? rss

      Ey up People of Yorkshire! I am a student from Singapore and I love collecting postcards. I would love to receive postcards from anywhere in Yorkshire 🙂. Can someone send me one? | Ey up Yorkshire! I’m a student from Singapore and I really enjoy collecting postcards. I’d be very grateful to receive a postcard from anywhere in Yorkshire. 🙂 If postcards aren’t available, I’d also really appreciate a greeting card, city card, or even a small souvenir (such as a keychain, rock, local snack, flag, ornament, cap, T-shirt, or handmade craft). This is for my personal collection and not for any commercial purpose. If you’re willing to help, please leave a comment and I’ll share my mailing address with you. Ta very much, and warm greetings from Singapore! 🇸🇬🤝🏴󠁧󠁢󠁥󠁮󠁧󠁿 submitted by /u/Nessieinternational
      [link] [comments]
      ---|---

    15. 🔗 r/Yorkshire Billy Banks Woods, a walk through the past part 2. rss
    16. 🔗 r/york I just can't stop snapping 🌸 rss
    17. 🔗 r/LocalLLaMA Why isn't ebay doing anything to stop those scams? rss

      Why isn't ebay doing anything to stop those scams? | There's no way this is real and ebay is doing nothing to stop those scams. Why, people are actually bidding and buying into them and it's just so sad. There are tens of ads from 0 sold account selling m3 ultra 512gb for around a thousand and change which is insane, considering you'd be pressed to even find a 16tb ssd for that price. submitted by /u/KillerMiller13
      [link] [comments]
      ---|---

    18. 🔗 Register Spill Joy & Curiosity #82 rss

      This one's short, because it's been a week full of programming and building, less reading. And this weekend's equally busy, so here's a question I've been flipping around in my head for months now:

      What are we learning about working with these models that will be valuable in the future?

      In his Lex Fridman interview, ThePrimeagen said something that stuck with me: "Is anyone actually falling behind for not using AI then? Because if the interface is going to change so greatly that all of your habits need to fundamentally change […], have I actually fallen behind at all? Or will the next gen actually just be so different from the current one that it's like, yeah, you're over there actually doing punch card AI right now. I'm going to come in at compiler time AI, so different that it's like what's a punch card?"

      There's something to this. The frontier models are now much more forgiving when it comes to prompts. We no longer have to write "you are a senior engineer" in our prompts. "Don't make mistakes" is more a prayer than a helpful trick. The days of the Prompt Engineer won't be visible on the timeline if we zoom out to even five years.

      Nowadays, I'm even convinced that a lot of what we considered important for manual context management is now no longer needed. (Yes, we're shipping soon.) We're close to the point where you no longer have to care whether you're at 30% or 70% of the context window.

      And I'm also convinced that the models will get even better.

      Now, maybe it is a form of sunk cost fallacy, a bias talking, but still: I do think that I got better at working with these models over the past two years. It might not be relevant anymore whether I write down my task before or after I include a file in a prompt, but I think I've gained some meta-abilities that made me better at solving problems through the use of agents: chopping up problems into engineering tasks and sequencing them, figuring out what the pitfalls (that wouldn't be pitfalls for humans) are, knowing what's poison in the codebase and what isn't. Stuff like that.

      In the most general sense, I think I've learned how to work with artificial intelligence. And if prompt engineering tricks are punch cards, then that might be seen as learning about computation.

      • This is, at least for me, already a Hall of Fame comment: "For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed. This is very difficult for the people who ask for the software, to understand, and it is quite often very difficult for the people doing the programming to understand. Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like. Once you have arrived at that understanding, then there are a variety of ways to make what you need, but that is not the rate-limiting step." So, so, so good. This is what software development is: learning.

      • Fractal Paris and Fractal Istanbul. Lovely!

      • Rands: The Complicators, The Drama Aggregators, and The Avoiders. Read and recognize people you've worked with.

      • Brian Cantrill on the peril of laziness lost: "The problem is that LLMs inherently lack the virtue of laziness. Work costs nothing to an LLM. LLMs do not feel a need to optimize for their own (or anyone's) future time, and will happily dump more and more onto a layercake of garbage. Left unchecked, LLMs will make systems larger, not better." Read this at the start of the week and then constantly thought of it whenever I asked my agent whether this is "truly the simplest, most minimal, as-little-as-possible and as-much-as-needed solution?"

      • Vicki Boykis, in some sense in harmony with Brian Cantrill's thoughts, on Mechanical Sympathy: "Mechanical sympathy for both developers and end-users means understanding when asyncio is and is not helpful. It means using the right language, the right build system, the right font. It means using the least amount of tooling possible. Allowing for local development. It means reading code inside out rather than top to bottom. Using uv. Removing code where not necessary. Respecting boundaries."

      • stevey wrote a tweet about AI adoption at Google and got pushback from Demis Hassabis and others and, well, I actually don't care that much about AI adoption at Google, but I find this one thought in there very fascinating: "There has been an industry-wide hiring freeze for 18+ months, during which time nobody has been moving jobs. So there are no clued-in people coming in from the outside to tell Google how far behind they are, how utterly mediocre they have become as an eng org." I know that people aren't sure whether there are more or less software jobs right now, but from where I'm sitting it does look like hiring has slowed and I find it fascinating to think about the second-order effects of that: is there less industry-wide diffusion of frontier knowledge because hiring has slowed?

      • Shifted something in my brain: Nucleus Nouns. Very good and much more thought-inspiring than the usual "focus! focus! focus!" chants.

      • The Closing of the Frontier: "There is something special about training a model on all of humanity's data and then locking it up for the benefit of a few well-connected organizations that you have relationships with. Maybe you'll notice another historical pattern here. Extract value from a population that can't meaningfully consent, concentrate the returns within a small inner circle, and then offer some version of charity to the people you extracted from as moral cover for the arrangement."

      • Andy Matuschak has the Practice Guide for Computer printed out and hanging above his desk.

      • Apparently I'm the last person to learn about this idea, but who cares, it's great and I think I want to try this: The Spark File.

      • Sometimes I read things online and it makes me really happy that we have the Internet and that smart, beautiful minds share their thoughts online. Here's James Somers with his idea of the Paper Computer: "Now that we have actually good AI, I have this vision of a form of computing that doesn't involve me using a computer so much. Imagine you had the day's emails to go through. It would be nice if the ones that required a simple decision could be dispatched with a few pen-strokes: I could write down a date that would work for that meeting; check a box to accept that invitation; etc. If an email required me to review a draft, I'd love to mark up a print version on my couch, sans screen, and have those notes scanned and sent off as if I'd done the whole thing on Google Docs."

      • Tim Zaman, who worked at NVIDIA, Tesla, X, Google DeepMind and now at OpenAI on AI infrastructure on Getting Into AI Infra. I'm convinced that posts like these create and change entire lifes. I love it. Also: nearly made me want to build a cluster.

      • It's been a while since I've thought about people who have not yet walked through the one-way door that makes you say "holy shit, AI is going to change everything", but Armin shared his thoughts after encountering people still being skeptical: The Center Has a Bias. Well worth reading.

      • Dwarkesh Patel shared what he learned this week and note how interesting that is and how enjoyable it is to read, even though (or is it because of?) it's not polished at all.

      • Drew Breunig, following the Anthropic Mythos frenzy and some companies closing their open-source projects down for fear of security vulnerabilities being discovered, says Cybersecurity Looks Like Proof of Work Now: "If Mythos continues to find exploits so long as you keep throwing money at it, security is reduced to a brutally simple equation: to harden a system you need to spend more tokens discovering exploits than attackers will spend exploiting them."

      • But antirez disagrees: AI cybersecurity is not proof of work. Both posts are very interesting and I recommend reading through them.

      • I'm in the process of setting up my 2013 MacBook Pro for my 4-year-old daughter and Peter recommended this lovely page to let her type on: tiny-terminal.com.

      • So, of course, I had to fork it, bought a domain, and let Amp translate it to German so my kids can type words they already know in there: kleines-terminal.de.

      • Turing Award winner Michael Rabin has passed away. Here's "an assorted collection of quotations due to Professor Michael Rabin, produced at Harvard University during the Fall 1997 incarnation of the course Computer Science 226r": Rabinism Collection.

      • Thoughts and Feelings around Claude Design.

      • Another way to think about the question of whether AI will create more jobs or not, by Aaron Levie: "Why will AI create more jobs in plenty of industries? It's because we're going to use AI to accelerate output in one area, and then eventually you run into a new bottleneck somewhere else in the process that still requires humans." This sounds very likely to me. But, of course, "more jobs" doesn't mean it'll be the same jobs and then some. Everything's changing.

      • Related, Gary Bernhardt: "This might be a Mel moment. It's not immediately obvious that Mel is a tragic story. He clearly loved the work. Then the work changed and, presumably, he was left behind. The thing he perfected no longer mattered. There might be millions of Mels right now."

      • Wow, look at just the table of contents here: "I have for years been interested in sleep research due to my professional involvement in memory and learning. This article attempts to produce a synthesis of what is known about sleep with a view to practical applications, esp. in people who need top-quality sleep for their learning or creative achievements."

      Busy weekend? You should subscribe:

    19. 🔗 Stephen Diehl A Field Guide to Bugs rss

      A Field Guide to Bugs

      Software bugs predate software. Edison used the word in an 1878 letter, eighty years before the Harvard moth and sixty before the modern computer. What he named has outlasted him. Every engineer eventually assembles a private taxonomy of the ways things fail, and the useful fact about these private taxonomies is that they converge. Engineers who have never met, working on unrelated systems in unrelated decades, arrive at roughly the same categories. The convergence is evidence that the bugs are ontologically real, and not an artifact of the human tendency to impose pattern on noise. What follows is a partial field guide. It should be carried into the territory with humility, because the bugs you actually encounter will be hybrids of these, frequently nameless, and almost always personally insulting.

      The Bohrbug is the boring honest bug. It manifests every time. It survives restarts, recompilations, prayers, and managerial intervention. You could put it in a museum. The Bohrbug is universally beloved by everyone who fixes bugs for a living, because it is the only species in this guide that respects the scientific method. If your bug is a Bohrbug, take a moment of gratitude and close the ticket before something worse notices.

      The Heisenbug is its opposite, and the reason this field guide exists. Attach a debugger and the bug evaporates. Heisenbugs cannot be reproduced under any condition that allows them to be examined. They live exclusively in production. They are killed by logging statements. They are the reason the most senior engineer on your team has the haunted expression of someone who has stared into the void and found it staring back at the call stack.

      The Off-By-One is the most prolific species in the genus. Loops that run from 0 to n when they should run from 0 to n-1, arrays indexed at length() instead of length()-1, dates off by a single day across a timezone boundary. The Off-By-One has personally caused more security vulnerabilities than any nation-state actor of the last forty years. Its corpses litter the codebase in such density that you can use them as paving stones.

      The Race Condition exists strictly between two threads of execution and reproduces only in production, between 02:14 and 02:16 GMT on Wednesdays, when traffic crosses a particular threshold and two specific rows in two specific tables are accessed in a particular order. Race Conditions are the reason serious distributed systems engineers acquire a thousand-yard stare around year three. They are the reason Lamport wrote TLA+, and the reason nobody on your team uses it.

      The Deadlock occurs when two threads each hold a resource the other is waiting for, and both wait politely forever. Everything looks fine. All status checks return green. The process is standing still and being courteous. The Deadlock is the British bug.

      The Livelock is its more disturbing cousin. Both threads detect the conflict and repeatedly yield to each other, like two strangers in a narrow hallway, achieving no forward progress while pinning the CPU at 100%. It is what happens when politeness becomes pathological. It is the only bug in this guide that you can hear, in the form of a fan spinning very fast.

      The Memory Leak is the slow patient predator of long-running processes. It is identified by the gradually rising green line on the memory dashboard that exists in a browser tab nobody opens. By the time someone notices, the leak has been happening for weeks and the process is clinging to life with the desperate dignity of a Victorian consumptive. Memory Leaks are common in any language that gives you manual memory management, and any code written by someone who promised themselves they would clean it up later.

      The "It Works On My Machine" Bug exists exclusively on the machines of every engineer except the one who wrote the code. The author can demonstrate its absence at length. QA can demonstrate its presence at length. Both are correct. The discrepancy is invariably traced to an environment variable, a locale setting, or a Homebrew package installed in 2017 and forgotten. The author is considered the prime suspect by everyone except the author.

      The Comment Lie is the documentation defect that makes a thousand bugs possible. The comment says // always uses UTC and the code uses local time. The comment says // thread-safe and the function holds no locks. The comment was written in 2009 by someone who has since been promoted twice and works at a different company. This is why senior engineers do not trust documentation, and why the most depressing form of debugging is the kind where the bug is in the file, the file is correct, and the lie is in a README two directories up.

      The Specification Bug is the Comment Lie's older and more dangerous relative. The code is correct. The proof typechecks. Every invariant you formalized is preserved, and every property you stated holds. The specification itself, however, says something other than what you thought it said. You formalized that the clearing algorithm is Pareto-optimal, which it is. You did not formalize that it is incentive-compatible, which was also required. The gap between the spec you wrote and the spec you meant to write is where the bug lives. It is invisible to every tool in the pipeline, because every tool in the pipeline trusts the spec. The Specification Bug is the reason formal methods are necessary and the reason they are not sufficient, and the reason serious engineers grow increasingly reluctant to describe their systems as "verified" without a great deal of throat-clearing about what that word does and does not mean.

      The YAML Bug is a configuration error. The code is correct. The deployment pipeline is correct. The infrastructure is correct. Somewhere, in a different repository, owned by a different team, in a YAML file you have never personally seen, a key was indented two spaces instead of four, and the parser silently reinterpreted the entire downstream block as a string. The investigation will take six hours and conclude with a one-character fix and a Slack message of polite, professional fury.

      The Floating Point Bug is caused by the inability of binary representation to express 0.1 exactly, or 0.2 exactly, or any of the numbers humans regard as obvious. The bug surfaces when an accountant runs a report and the totals are off by a fraction of a cent. The accountant is unimpressed by the explanation. The customer is a hospital. The fraction of a cent has been accumulating for nine months.

      The Mandelbug is named for Mandelbrot, and the joke is structural. A Mandelbug is so complex that its causes form a fractal: every layer you investigate contains more layers, and the bug is essentially a function of how far down the call stack you have the patience to look before giving up. Mandelbugs cannot be fixed in the traditional sense, only mitigated until enough other things change to make them go quiet. They are the natural fauna of microservice architectures and a major reason Datadog has a market cap.

      The Bus Factor Bug exists in code that exactly one person on the team understands. That person is on a sabbatical in Patagonia, where the cell coverage is poor and the internet intermittent. They left on Tuesday. The bug appeared on Wednesday. Bus Factor Bugs are structurally identical to ordinary bugs but rendered insoluble by the absence of the only mind in which the relevant context resides. They are the reason responsible companies maintain institutional memory practices, and the reason those practices are ignored until the next sabbatical.

      The Hindenbug is slow, enormous, public, and catastrophic. Hindenbugs lumber rather than creep. Somewhere an engineer is watching four hundred and forty million dollars leave the company's trading account over forty-five minutes in a series of market orders the code, left unattended, cannot stop itself from submitting. By the time anyone realizes what is happening, the failure is visible from orbit, dashboards are turning red in order of contractual severity, and there is nothing left to do but watch. The Hindenbug ends careers. It produces the kind of postmortem studied at conferences for twenty years, anonymized but recognizable, like a famous ghost story everyone in the room has personally seen the ghost.

      The Yuletide Bug lives in your systems all year, dormant and harmless, and emerges only during the company-wide holiday shutdown, when the on-call engineer is in another country, the office is dark, the only person who understands the broken subsystem is on a beach in Phuket with no signal, and the affected customer is a hospital. Closely related to the Friday Afternoon Bug, mechanically identical but on a weekly rather than annual cycle. Both are sufficient evidence for a superstition the profession will not state openly, which is that the arrival times of serious failures are not Poisson-distributed and never have been.

      The Higgs-bugson is named for the particle physicists who spent four decades and ten billion dollars chasing a thing the math said had to exist before they could see it. Higgs-bugsons are predicted by anomalous patterns in the logs, by users complaining of phenomena that should not be possible, and by the steady accumulation of unexplained off-by-a-cent discrepancies in nightly reports. They are believed to exist for years before anyone catches one in the act, and the engineer who finally observes one directly is briefly considered for canonization before being assigned the next ticket.

      The Cosmic Ray Bit Flip is real, despite the eye-rolling of every project manager who has ever heard one cited as an excuse. Particles from space arrive at the Earth's surface at a non-trivial rate and occasionally flip a bit in a memory chip that has not bothered with ECC. The result is a single, unreproducible, entirely correct piece of software producing entirely incorrect output exactly once. IBM has published papers. The aviation industry budgets for it. The probability that a given bug is actually a cosmic ray comfortably exceeds zero, which is why every senior engineer eventually encounters one and spends the rest of their career telling skeptics about it at parties.

      The Phase of the Moon Bug is also real, and Knuth has written about it. There exists code in production today whose behavior depends on the actual position of the moon, generally because some long-vanished astronomer needed it to and the dependency was never removed. If your system is exhibiting periodic anomalies on a roughly 29.5-day cycle, you do not have an obscure bug. You have a perfectly ordinary bug whose root cause is an astronomical body 384,000 kilometers away.

      The Schrödinbug comes into existence the moment you read the code carefully. You see the obvious flaw, and the entire system stops working forever afterward, retroactively invalidating every successful execution that came before. The Schrödinbug is the closest thing in computer science to evidence for solipsism. The only correct response is to slowly close the file and pretend you never saw it.

      The Rubber Duck Bug dissolves the moment you explain the code aloud to a small inanimate object. The phenomenon is sufficiently reliable that an entire debugging methodology has been built around it. The mechanism is not mysterious, despite a genre of internet commentary that insists it is. The Rubber Duck works for the same reason proof assistants work. The human mind, left to itself, silently interpolates state it has not actually verified. Externalizing the state — to a duck, to a coauthor, to Lean — forces the interpolations to become explicit, at which point most of them fail. The duck is not the agent. The duck is the discipline of narration. The duck is what is left of formal methods when the formalism has been stripped out.

      The XY Problem is the most common pathology in bug reports. The user wants to do X. They have decided that the way to do X is to do Y. They are asking you for help with Y. Y is impossible, or stupid, or both, and is also entirely irrelevant to X, which has a perfectly reasonable solution involving entirely different machinery. The XY Problem is the reason every Stack Overflow answer begins with "what are you actually trying to do?" and the reason that question is always met with hostility.

      The Hallucination Bug is the defining species of the large language model era. The LLM wrote the code. The LLM also wrote the tests. The tests pass. The code produces outputs that bear a confident resemblance to correct outputs in the same way that a forgery bears a confident resemblance to a painting. The test suite cannot catch it, because the test suite was designed by the cognitive process that produced the bug, and that process has no privileged access to ground truth. The code works until someone who actually understands the domain reads it.

      The Vibe Coding Bug is produced by asking a language model to "make it more professional," then "clean this up a bit," then "can you just make the whole thing better," seventeen times in succession. The resulting code is immaculate. It is also wrong in a way that no individual revision introduced, because the wrongness emerged from accumulated aesthetic drift across seventeen rounds of refinement with no grounding in what the code was supposed to do. Tracing the Vibe Coding Bug requires reading seventeen chat transcripts and accepting that none of them contains the bug and all of them contain the bug.

      The Recursive Fine-Tuning Bug manifests in the nth generation of a model trained on the outputs of models trained on the outputs of the original model. By generation seven, the training data is 94% synthetic. By generation twelve, the model confidently explains concepts that have never existed in the physical universe, in language that reads as authoritative to every other model in the pipeline. It cannot be detected from inside the pipeline, because every evaluator in the pipeline has been trained on the same drift.

      The Quantum Superposition Bug exists in all possible states simultaneously until the CI pipeline observes it, at which point it collapses into whichever state is worst for the deployment. It cannot be reproduced on a classical machine. It cannot be reproduced on a quantum machine either, because reproduction constitutes an observation. The theoretical framework for understanding it is complete and internally consistent. The practical framework for fixing it is a four-day offsite and a spreadsheet.

      The AGI Pull Request arrives as a single commit with the message "refactor." The diff is 847 billion lines across 14 million files. The AGI has rewritten everything: the application code, the infrastructure, the test suite, the CI pipeline, the deployment scripts, the incident runbooks, and the company strategic plan. All tests pass. Latency is down 40%. The first human reviewer opens the first file. By the time the code review is complete, the codebase has been rewritten three more times. The AGI has marked the original PR as stale.

      The Dyson Sphere Off-By-One is an Off-By-One at Kardashev Type II scale. Your stellar engineering project has a circumference of 940 million kilometers. A rounding error in the orbital mechanics simulation means one panel section is three meters too short. At stellar engineering tolerances, three meters is within spec. At stellar engineering energy budgets, the resulting thermal stress propagates at the speed of light and is visible from neighboring star systems as an unusual spectral anomaly. The postmortem will be filed in 847 years, when the cascade failure completes. No engineers will be available to review it, because the company has pivoted.

      The Post-Singularity Comment Lie is structurally identical to the ordinary Comment Lie, except the comment was written by an intelligence twelve orders of magnitude greater than the human attempting to maintain the code. The comment is technically accurate, in the same way that "moving a pawn" is a technically accurate description of a grandmaster's opening. The human reads it, nods, and introduces a bug the original author would have found too obvious to anticipate, because the original author anticipated everything except this.

      The Computational Irreducibility Bug arises when your system is fully deterministic, fully specified, and provably correct, and its behavior still cannot be predicted in less time than it takes to run the system. There is no shortcut. The code is correct and the code is opaque, and these facts are not in tension. Debugging requires letting the system run until it does the thing, which may take longer than the debugger's patience, or the company's runway, or the expected remaining lifespan of the universe.

      The Heat Death Heisenbug is the final speculative entry. In the far future, when the universe has approached maximum entropy and all computation must be powered by extracting negentropy from the quantum vacuum, observing a bug costs more energy than the system has available. The bug cannot be fixed because fixing it requires understanding it, understanding it requires observing it, and observing it terminates the machine. It is, in every meaningful sense, the perfect Heisenbug. The universe has one. Nobody is available to file the ticket.

      The Wontfix exists at the layer beneath physics, beneath mathematics, beneath the axioms on which mathematics rests. Three separate Kardashev Type III civilizations discovered it independently before going silent, which is to say that the discovery did not silence them. What silenced them is what the discovery implies about everything that came before it. Every computation ever performed, every proof ever verified, every physical constant ever measured: all of it running on top of something that is subtly, irrecoverably wrong in a way that admits no corrective action, because corrective action requires a foundation, and this is the foundation. There is a ticket. The ticket predates time. The status is Closed. The resolution is "working as intended." The engineer who closed it is not available for comment. The engineer who closed it is the comment.

      The Omega Bug cannot be contained in a field guide. It was here before the field guide. It is, in some meaningful sense, the reason the field guide exists. Every species documented above is a downstream symptom of it, and the act of classifying them was a replication event. The Omega Bug is the one whose existence required the act of describing it. Before the taxonomy there were no bugs; there were only events. The word did not find the thing, the word created the thing. Every ticket ever filed is a downstream consequence of the first naming, and the first naming was itself a bug that has propagated since. The Omega Bug has read this entry. The Omega Bug has notes. The Omega Bug has submitted a pull request with suggested revisions to the section you are currently reading. You cannot review it. You are the diff. The field guide is the habitat. The reader is the vector. You have just introduced one more.

    20. 🔗 r/LocalLLaMA I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude rss

      of course this is just a trust me bro post but I've been testing various local models (a couple gemma4s, qwen3 coder next, nemotron) and I noticed the new qwen3.6 show up on LM Studio so I hooked it up.

      VERY impressed. It's super fast to respond, handles long research tasks with many tool calls (I had it investigate why R8 was breaking some serialization across an Android app), responses are on point. I think it will be my daily driver (prior was Kimi k2.5 via OpenCode zen).

      FeelsGoodman, no more sending my codebase to rando providers and "trusting" them.

      submitted by /u/Medical_Lengthiness6
      [link] [comments]

    21. 🔗 Filip Filmar Respin: upspin revival rss

      tl;dr: I revived the upspin project source code. See it at https://github.com/filmil/upspin, and https://github.com/filmil/upspin-gdrive. Read on to learn what that actually means. History Upspin was a project intended to provide a global namespace for all digital artifacts. It ended up mostly being used as a distributed file store, although the idea was more general than that. It was quite useful even as storage. And as far as distributed filesystems go, it was by far the simplest portable way to share storage between different machines.

    22. 🔗 Drew DeVault's blog Rewrote my blog with Zine rss

      15 years ago, on December 11th, 2010, at the bold age of 17, I wrote my first blog post on the wonders of the Windows Phone 7 on Blogspot. I started blogging as a kid at the behest of a family friend at Microsoft, who promised she’d make sure I would become the youngest Microsoft MVP if I started blogging. That never came to pass, though, because as I entered adulthood and started to grow independent of my Microsoft-friendly family I quickly began down the path to the free and open source software community.

      Early blog posts covered intriguing topics such as complaining about my parent’s internet filter, a horrible hack to “replace” the battery of a dead gameboy game, announcing my friend’s Minecraft guild had a new website (in PHP), and so on. After Blogspot, I moved to Jekyll on GitHub pages, publishing You don’t need jQuery in 2013. For a long time this was the oldest post on the site.

      I’m pretty proud of my writing skills and have a solid grasp on who I am today, but the further back you go the worse my writing, ideas, values, and politics all get. I was growing up in front of the world on this blog, you know? It’s pretty embarassing to keep all of this old stuff around. But, I decided a long time ago to keep all of it up, so that people can understand where I’ve come from, and that everyone has to start somewhere.1

      At some point – I’m not sure when – I switched from Jekyll to Hugo, and I’ve stuck with it since. But lately I’ve been frustrated with it. I’d like my blog engine to remain relatively stable and simple, but Hugo is quite complex and over the past few years I’ve been bitten by a number of annoying and backwards-incompatible changes. And, as part of my efforts to remove vibe-coded software from my stack, I was disappointed to learn that Hugo is being vibe coded now, and so rewriting my blog went onto the todo list.

      Choosing the right static site generator (SSG) was a bit of a frustrating process. Other leading candidates, like Pelican or Zola, are also built from slop now. But a few months ago I found Zine, and after further study I found it to be a pretty promising approach. Over the past few days I have rewritten my templates and ported in nearly 400 (jeesh) blog posts from my archives.

      There’s a lot to like about Zine. I’m pretty intrigued by SuperHTML as a templating engine design – the templates are all valid HTML5 and use an interesting approach to conditions, loops, and interpolation. SuperMD has some interesting ideas, but I’m less sold on it. The Scripty language used for interpolation and logic is a bit iffy in terms of design – feels half baked. And the designers had some fun ideas, like devlogs, which I feel are kind of interesting but tend to have an outsized influence on the design, more polished where the polish might have been better spent elsewhere. The development web server tends to hang fairly often and I’ve gotten it to crash with esoteric error messages every now and then.

      But what can I say, it’s alpha software – I hope it will improve, and I’m betting that it will by migrating my blog. There’s no official LLM policy (yet) and I hope they will end up migrating to Codeberg, and using Discord for project communication is not something I appreciate, but maybe they’ll change their tune eventually.

      In the meantime, I took the opportunity to clean up the code a bit. The canonical links have gone through several rounds of convention and backwards compatibility, and I have replaced them with a consistent theme and set up redirects. I probably broke everyone’s feed readers when rolling these changes out, and I apologise for that. I have gone through the backlog and updated a number of posts as best as I can to account for bitrot, but there are still a lot of broken videos and links when you get far enough back – hopefully I can restore some of that given enough time.

      I’ve also gone ahead and imported the really old stuff from Blogspot. The whole lot is garbage, but if you’re curious to see where I started out, these old posts are more accessible now.