🏑


  1. July 05, 2026
    1. πŸ”— HexRaysSA/plugin-repository commits sync repo: +1 plugin, +1 release rss
      sync repo: +1 plugin, +1 release
      
      ## New plugins
      - [IDA-Discord-RPC](https://github.com/reversedcodes/ida-rpc) (1.1.0)
      
    2. πŸ”— Simon Willison sqlite-utils 4.0rc2, mostly written by Claude Fable (for about $149.25) rss

      I wrote about the sqlite-utils 4.0rc1 release a couple of weeks ago. Since we only have Claude Fable on our Max subscriptions for a few more days, I decided to see if it could help me get to a 4.0 stable release that I felt truly comfortable about, since I try to keep to SemVer and like my incompatible major versions to be as rare as possible.

      I started with this prompt, in Claude Code for web on my iPhone:

      Final review before shipping a stable 4.0 release - very important to spot any last minute things that would be a breaking change if we fix them later

      Here's that initial report it created for me. There were some significant problems that I hadn't myself encountered yet - 5 that Fable categorized as "release blockers". Here's the worst of the bunch:

      1. delete_where() never commits and poisons the connection (data loss)

      Table.delete_where() (sqlite_utils/db.py:2948) runs its DELETE via a bare self.db.execute() with no atomic() wrapper β€” compare Table.delete() at db.py:2944, which wraps correctly. The connection is left in_transaction=True, so every subsequent atomic() call takes the savepoint branch (db.py:430-440) and never commits either.

      Reproduced end-to-end:

      db = sqlite_utils.Database("dw.db")
      db["t"].insert_all([{"id": i} for i in range(3)], pk="id")
      db["t"].delete_where("id = ?", [0])   # conn.in_transaction is now True
      db["t"].insert({"id": 50})
      db["u"].insert({"a": 1})
      db.close()
      # Reopen: rows are [0, 1, 2] β€” the delete, row 50, AND table u are all gone.

      That's a really bad bug! Very glad I didn't ship that, although at least it would have been a bug I could fix in a 4.0.1 point release, not a design flaw that would force a 5.0.

      Over the course of 37 prompts, 34 commits and +1,321 -190 code changes over 30 separate files, we worked through the entire set of feedback in turn, making several other design improvements along the way.

      A weird thing about coding agents is that harder tasks like this one actually provide more opportunity to do other things at the same time, since the agent sometimes needs 10-15 minutes to churn away on a new task. I went out to enjoy the Half Moon Bay 4th of July parade, occasionally checking in and prompting the next step for Fable from my phone.

      Full details in the PR and this shared transcript. I switched to my laptop for the final review, which I conducted through GitHub's PR interface.

      The most significant changes relate to transaction handling, which was the signature new feature in the earlier RC. The new RC now includes comprehensive documentation on the new transaction model, the intro to which I'll quote here in full:

      Every method in this library that writes to the database - insert(), upsert(), update(), delete(), delete_where(), transform(), create_table(), create_index(), enable_fts() and the rest - runs inside its own transaction and commits it before returning. Your changes are saved to disk as soon as the method call finishes:

      db = Database("data.db")
      db.table("news").insert({"headline": "Dog wins award"})
      # The new row is already saved - no commit() required

      The same applies to raw SQL executed with db.execute() - a write statement is committed as soon as it has run.

      You never need to call commit(), and you do not need to close the database to persist your changes. There are exactly two situations where you need to think about transactions:

      1. You want to group several write operations together, so they either all succeed or all fail - use db.atomic().

      2. You are managing a transaction yourself with db.begin(), in which case nothing is committed until you commit - the library will never commit a transaction you opened.

      In reviewing Fable's documentation - I find that reviewing the documentation edits first is an excellent way to build an initial understanding of what has changed - I spotted this detail:

      db.atomic() and the automatic per-method transactions are designed for connections in Python's default transaction handling mode. Connections created with the Python 3.12+ sqlite3.connect(..., autocommit=True) or autocommit=False options are not supported, because commit() and rollback() behave differently on those connections.

      I admit I hadn't thought about how sqlite-utils would react to the more recent autocommit setting, added in Python 3.12. It turns out "behave differently on those connections" equated to almost the entire test suite failing, so I worked with the model to ensure that this difference would not break how the library works.

      And a final review by GPT-5.5

      I used to think that the idea of having one model review the work of another was somewhat absurd - it felt weirdly superstitious. The problem is it really does work - I've started habitually having Anthropic's best model review OpenAI's work and vice versa, because I've had that turn up interesting results often enough to be valuable.

      I prompted Codex Desktop and GPT-5.5 xhigh with the following:

      Review changes since the last RC. Also confirm that the changelog is up-to-date.

      Which was enough to turn up two issues worth investigating:

      Findings

      • [P1] sqlite_utils/db.py:663 db.query() now rejects non-row statements only after calling db.execute(), and sqlite_utils/db.py:705 auto-commits those writes first. So db.query("update ...") raises ValueError but the update is already committed. That is a surprising side effect for a method documented as β€œcan only be used with SQL that returns rows.”
      • [P1] sqlite_utils/db.py:672 INSERT ... RETURNING through db.query() only commits after the returned generator is fully exhausted. db.query("insert ... returning ...") without iteration, or common next(db.query(...)) usage, leaves the transaction open and the write can be rolled back on close. This contradicts docs/changelog.rst:15 and docs/python-api.rst:232, which say it takes effect without iteration.

      I pasted that into a fresh Fable session, which ran some experiments to confirm the problem:

      Both findings were confirmed. db.query() called self.execute() first, which auto-commits writes, and only then checked cursor.description β€” so db.query("update ...") committed the update before raising ValueError. And the INSERT ... RETURNING commit lived at the end of the returned generator, so it never fired unless you exhausted the iterator β€” next(db.query(...)) or an un-iterated call left the transaction open, contradicting what the changelog and docs promise.

      Here's the PR with the fix, and the full Claude Code transcript. Reviewing this code helped me build a better mental model of the edge cases of SQLite transaction semantics!

      For an estimated (unsubsidized) cost of $149.25

      I upgraded to the Claude Max $200/month plan (I was previously on $100/month) to increase my Fable allowance for the remaining time until the July 7th Fablepocalypse, when even Claude Max subscribers will have to pay full API cost for the model.

      I was curious as to how much this would have cost me if I had been paying those costs directly. At first I thought those numbers weren't available to me since I had run the work remotely using Claude Code for web, and then I realized I could run AgentsView inside that existing session to get that cost estimate!

      Run "uvx agentsview --help"​ and then use that tool to calculate the cost of this session

      Claude figured out how to use the session list --include-children command and came out with the following:

      Transcript Model Cost
      Main session claude-fable-5 $141.02
      API-surface sweep agent claude-fable-5 $2.40
      Transactions/atomic review agent claude-fable-5 $2.39
      Post-rc1 commits review agent claude-fable-5 $1.72
      Migrations review agent claude-fable-5 $1.40
      Prompt-counting agent claude-opus-4-8 $0.32
      Total $149.25

      I'm very glad I'm on that subscription! I really should have followed my own advice and leaned more heavily into subagents with cheaper models.

      Here's what claude.ai/settings/usage is showing me right now:

      Screenshot of a Claude plan usage limits panel: "Plan usage limits Max (20x)"; "Current session" with "Resets in 3 hr 52 min" showing a progress bar at "7% used"; "Weekly limits" heading with a "Learn more about usage limits" link; "All models" with "Resets Wed 12:00 PM" showing a progress bar at "32% used"; "Fable" with "Resets Wed 12:00 PM" showing a progress bar at "63% used".

      I have several other major Fable-driven projects on the go right now as well, with the goal of hitting 100% on that Fable bar just in time for the price increase.

      The full release notes for sqlite-utils 4.0rc2

      Here are the full release notes for the RC. I had Fable add these to an "Unreleased" section of the changelog as each change landed, reviewing them as it went. This has the neat side effect that the commit history of the changelog acts as a concise summary of each of the changes that went into the release.

      In the past I've had a policy of writing release notes by hand, but honestly these are better than I would have created myself. Release notes are a great example of writing that I'm OK to outsource to agents because they need to be boring, predictable and accurate.

      Breaking changes:

      • Write statements executed with db.execute() are now committed automatically, unless a transaction is already open in which case they join it. Previously they opened an implicit transaction that stayed open until something committed it - writes appeared to work when read on the same connection but were silently rolled back when the connection closed. Code that relied on rolling back uncommitted db.execute() writes should use the new db.begin() method to open an explicit transaction first. The transaction model is documented in full at Transactions and saving your changes.
      • db.query() now executes its SQL as soon as it is called, rather than waiting until the returned generator is first iterated. Rows are still fetched lazily during iteration. SQL errors are now raised at the call site, statements such as INSERT ... RETURNING are executed and committed immediately without needing to iterate over their results, and passing a statement that returns no rows - previously a silent no-op - now raises a ValueError recommending db.execute() instead. A statement rejected this way is rolled back before the error is raised, so it has no effect on the database.
      • Python API validation errors now raise ValueError instead of AssertionError. Previously invalid arguments - such as create_table() with no columns, transform() on a table that does not exist, or passing both ignore=True and replace=True - were rejected using bare assert statements, which are silently skipped when Python runs with the -O flag. Code that caught AssertionError for these cases should catch ValueError instead.
      • table.upsert() and table.upsert_all() now raise PrimaryKeyRequired if a record is missing a value for any primary key column, or has a value of None for one. Previously such records - which can never match an existing row - were quietly inserted as brand new rows, or triggered a confusing KeyError after the insert had already taken place.
      • db.enable_wal() and db.disable_wal() now raise a sqlite_utils.db.TransactionError if called while a transaction is open. Previously they would silently commit the open transaction as a side effect of changing the journal mode, breaking the rollback guarantee of db.atomic() and of user-managed transactions.
      • The View class no longer has an enable_fts() method. It existed only to raise NotImplementedError, since full-text search is not supported for views - calling it now raises AttributeError instead, and the method no longer appears in the API reference. The sqlite-utils enable-fts command shows a clean error when pointed at a view.
      • The no-op -d/--detect-types flag has been removed from the insert and upsert commands. Type detection has been the default for CSV/TSV data since 4.0a1, so the flag did nothing - invocations using it should simply drop it. --no-detect-types remains available to disable detection.
      • Database() now raises a sqlite_utils.db.TransactionError if passed a connection created with the Python 3.12+ sqlite3.connect(..., autocommit=True) or autocommit=False options. commit() and rollback() behave differently on those connections, which previously caused every write made by the library to be silently discarded when the connection closed.

      Everything else:

      • Fixed a bug where table.delete_where(), table.optimize() and table.rebuild_fts() did not commit their changes, leaving the connection inside an open transaction. Their work - and any subsequent writes - could then be silently rolled back when the connection was closed. All three now use db.atomic(), consistent with the other write methods.
      • The sqlite-utils drop-table command now refuses to drop a view, and drop-view refuses to drop a table. Previously each would silently drop the wrong type of object if the name matched. Both now exit with an error suggesting the correct command to use.
      • Migrations applied by the new migrations system now run inside a transaction, together with the record of the migration having been applied. If a migration raises an exception its changes are rolled back and it stays pending, so it can be safely re-applied after the error is fixed. Migrations that cannot run inside a transaction, such as those executing VACUUM, can opt out using @migrations(transactional=False) - see Migrations and transactions.
      • table.upsert() and table.upsert_all() now detect the primary key or compound primary key of an existing table, so the pk= argument is no longer required when upserting into a table that already has a primary key.
      • db.table(table_name).insert({}) can now be used to insert a row consisting entirely of default values into an existing table, using INSERT INTO ... DEFAULT VALUES. (#759)
      • Improvements to the sqlite-utils migrate command: --stop-before values that do not match any known migration are now an error instead of being silently ignored, --stop-before now works correctly with migration files that still use the older sqlite_migrate.Migrations class, and --list is now a read-only operation that no longer creates the database file or the migrations tracking table. migrations.applied() now returns migrations in the order they were applied.
      • New db.begin(), db.commit() and db.rollback() methods for taking manual control of transactions, as an alternative to the db.atomic() context manager.
      • New documentation: Transactions and saving your changes describes how transactions work and when changes are committed, and a new Upgrading page details the changes needed to move between major versions.

      You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.

  2. July 04, 2026
    1. πŸ”— idursun/jjui v0.10.8 release

      A release with a new revision insertion workflow and fixes for command error output and inline describe editing.

      Features

      Create Changes Between Revisions

      Added a new revisions.open_new_between action, bound by default to alt+n, for creating a new change between revisions.

      The operation uses the selected revisions as --insert-after targets and the current revision as the default --insert-before target. You can press space to pin one or more explicit insert-before revisions before applying with enter.

      For example, if A and B are selected as insert-after revisions and C is the insert-before revision, applying the operation runs:

      jj new --insert-after A --insert-after B --insert-before C
      

      When the same single revision is both the insert-after and insert-before target, jjui falls back to:

      jj new A
      

      Fixes

      Command Error Output Is Preserved

      Fixed failed command handling so non-zero command output is preserved and shown correctly after removing the confusing generic ssh.hijack_askpass hint.

      (#704)

      Inline Describe Cursor Movement

      Fixed inline describe rendering so the editor synchronizes its size before handling wrapped cursor movement. This prevents incorrect cursor/line behavior when editing descriptions in constrained layouts.

      Full Changelog : v0.10.7...v0.10.8

    2. πŸ”— smol-machines/smolvm smolvm v1.4.5 release

      What's Changed

      • fix(registry): never evict or size-count in-flight .partial downloads by @BinSquare in #558
      • Add a node serve route to export a stopped machine to a .smolmachine by @BinSquare in #556
      • release: smolvm 1.4.5 by @BinSquare in #559

      Full Changelog : v1.4.4...v1.4.5

    3. πŸ”— smol-machines/smolvm smolvm v1.4.4 release

      What's Changed

      • Add a workflow to publish the library crates to crates.io by @BinSquare in #555
      • Pass the real disk format to libkrun in the dynamic launcher instead of assuming raw by @BinSquare in #546
      • Release smolvm 1.4.4 by @BinSquare in #557

      Full Changelog : v1.4.3...v1.4.4

    4. πŸ”— r/reverseengineering BareMetal RAM Dumper β€” Bare-metal x86 tool for Cold Boot Attack experiments rss
    5. πŸ”— BarutSRB/OmniWM OmniWM v0.5.3.2 release

      What's New Since 0.5.3.1

      • Fixed a launch crash on macOS 26.x caused by the issue reporter's FoundationModels integration binding to a macOS 27-only symbol before OmniWM could start.
      • Kept manual issue reporting available on macOS 26.x while AI issue rewriting now appears only on macOS versions with the required FoundationModels runtime.
      • Cleaned up System Stats view formatting.

      Release Integrity

      • OmniWM-v0.5.3.2.zip contains the Developer ID signed, notarized, and stapled OmniWM app.
      • OmniWM-v0.5.3.2.zip SHA-256: 1642c3053db962efcf4728fcd9a33c797b36ba750816653d61a132c66f889192
      • GhosttyKit.xcframework-v0.5.3.2.zip SHA-256: fabf2ddb360a51e5908306f4ac2d426a73329b8723784255c40ea77963435f4f
    6. πŸ”— r/reverseengineering My IDA ghost stopped just reading walls and started walking through them it now emulates/hooks/fuzzes what it names [SpectrIDA] SELF PROMO (again, but i promise its worth it) rss
    7. πŸ”— Register Spill Joy & Curiosity #90 rss

      … aaaaand we're back, ladies and gentlemen! That was a nice three week break and I've thought quite a bit about what to do with this newsletter when I come back to it.

      The result: I'm going to double down on having fun with it. I think I've gotten into a bit of a rut, after trying to follow the recipe every week for years.

      I don't know what that means exactly, but hey, you've been warned.

      • We released Agents in Orbs this week. This has been a long time in the making and now it's finally out and people are already starting to say "in an orb" as if it's a common phrase, I love it. I truly believe that remote agent-- excuse me: agents in orbs, will play a big role in the future. Why? One reason is that these models are incredible when thrown into a sandbo-- I'm sorry: an orb. I mean, look at this. There aren't any magic strings being pulled behind the curtain. If there's ffmpeg, a model will find a way. These agents need less and less handholding and that includes the handholding by a bespoke development setup. They're productive in these remote machines. The second reason is that something changes when you can start many agents in many different orbs in parallel. I tried to articulate that in that post up there but based on the conversations I had in response, I think it's something you really have to try for yourself. But I can add that the more I use agents in orbs, the more I believe that thinking of remote agents as "agents that I can remote control on a different machine that's similar to my local machine" is the wrong way to look at it. The fact that the orbs are ephemeral changes what you do and how you do it. Just like switching from a single build server to a build system with VMs changes things. State is no longer an issue. Resources and runtime is no longer an issue. These agents in orbs now look like async functions to me, less like remote controlled agents. Async is the point. I now often end prompts with "… and now run all the tests, fix all the bugs you run into, then push" and then switch to another agent. It's very, very interesting and exciting to try to get them to do more and more in orbs and see how it changes your interaction with agents.

      • I also wrote about how we made our codebase work with agents and how we made it work with agents in orbs so that they can do a full end-to-end runthrough of our core workflow inside an orb and then present a screenshot: Putting an Agent in an Orb.

      • At Amp we want Freedom of Intelligence.

      • The results of the Twenty Ninth International Obfuscated C Code Contest are in and interesting as always. Take a look at the Hacker News comments. They contain some gems. This one here, for example, a comment by the author of the "GameBoy emulator's code [that] also looks like the GameBoy": "I first wrote a full Gameboy emulator in C. It started out at about 6000 non white space characters. I then spent about about 100 hours work trying to get it to fit into the 2503 limit. For a long time I wasn't sure it was going to fit."

      • Long and very good post about Turbopuffer: Inside the fastest-growing Canadian AI startup you've never heard of. More of this! The form already exists, of course: new startup, short profile of founders, how they grew, how they blew up, etc. But this feels more nuanced and deeper. Good stuff.

      • Ethan Mollick on working with Mythos: "Last year I called this working with a wizard: you chant the spell and something happens. With Fable the spell has gotten powerful enough that I am no longer sure I am the wizard. I am closer to a patron. I describe what I want, I pay for it, and I judge the result. The conjuring happens somewhere I cannot watch, in hundreds of small choices I never get a vote on. The work has shifted from process to outcome. I no longer steer; I commission."

      • Very thorough post on agents and how models have been trained to be agents: Agents need Work Data. Yours truly makes an appearance in a quote.

      • The End of Determinism. I like the phrase The End of Determinism.

      • Definitely not in the Joy column, but it's a good post: LLMs are eroding my software engineering career and I do not know what to do.

      • 19min talk by Tyler Cowen that neatly summarizes a lot of things I've heard him say or write in other forms: AI will improve our economy, but will we let it?

      • Tim Ferriss on whether AI has already killed how-to nonfiction: "My position--and I'd genuinely love to be wrong--is yes, prescriptive nonfiction is the canary in the coal mine, and the coal mine is enormous. I believe LLMs become the interface to everything: search and purchasing, obviously, but also surfing video, summarizing podcasts, navigating courses, even browsing books. The original content doesn't exactly disappear; it just becomes raw material that most people never touch directly." But the post is deeper than this quote might suggest. Highly recommend reading it. It's great.

      • Very, very, very, very good: Why I Stopped Arguing With People. Read this and if you find it even slightly interesting or even irritating, I recommend reading How to Win Friends and Influence People. Being right isn't valuable most of the time.

      • Giant Banana Pulled Over in Montana. I wish I had a giant banana car.

      • Valve announced the launch of the Steam Machine and here's what I wrote in our internal Slack: "I'm 100% serious with this: This is incredible writing. This is perfect. This is how to talk to technical customers." And after that I wrote ten more lines. I also had a fever at the time, but I think that's unrelated. It's fantastic writing.

      • "Something that keeps me up at night is the amount of miscompiled software running in the wild. There's a famous story, 'the Core 59 problem' from Facebook where seemingly random files were missing in one of their Spark databases. After herculean levels of debugging, they narrowed the problem down to a single worker box, on a single CPU core, that was literally doing math wrong. The initial bug-reproducer was 430k(!) lines of code. Eventually they managed to create a 60-line snippet of assembly that reproduced the issue 100% of the time. […]" And now it'll keep us awake at night too.

      • "Elderflower is an experimental, open source, independent single-user desktop

      OS. Linux kernel, musl libc, no existing distribution underneath. The userland is TypeScript. Apps are written in TypeScript and WASM, no native binaries. The system shell is a TypeScript REPL. All system APIs (filesystem, network, audio, app UI toolkit) are TypeScript." Watch the video here.

      • Voytek Pitula's Fintech Engineering Handbook. Lovely resource.

      • "During the first two centuries of the Roman Empire, there spread a practice known as hypomnΔ“mata, a type of notetaking system, used as a tool for meditation, in which the writer would store quotes from books they had read. Each day, often in the morning, the notetaker would open their notebook and look for a passage relevant to something they were struggling with, and then they would meditate on that--unpacking it, making the idea top of mind, ensuring it was alive in them." I'd love to do this, so I set myself the goal to collect these quotes (or videos, or tweets, or articles) and then realized: wait, that's this newsletter, isn't it?

      • This was fantastic and I wish it had been twice as long: Stop eating Lady Gaga's Oreos. It's very good. I just went through it again, trying to pick out a single quote to convince you to read it, but it's very hard. Go read it, please.

      • The Original Computer Art? Lissajous Figures Created with Oscilloscopes

      • Nabeel S. Qureshi's Principles: "A cursed fact of the world is that the most important life lessons you learn are the hardest to communicate to others. They always sound like cliches. In any case, these are a few things I've learned from experience and that I try and keep in mind." Great.

      • As someone who also loves performance optimizations, I enjoyed this very much: When Impressive Performance Gains Do Not Matter.

      • Here's a very nice pairing for you. First, watch this interview with F1 world champion Nico Rosberg. It's very honest and deep and reflective. Fascinating to hear Rosberg talk about enduring (or not) the pressure that F1 drivers are under, how he handled (or didn't) the expectations, how even when you're one of the world's best race car drivers you don't think "life is amazing!" but instead you might only feel anxiety and loneliness and not good enough. It's great and it made me wish I could talk to Rosberg and ask him some things.

      • And then watch this interview with Tom Brady. It's a fascinating conversation, both because I find what they talk about interesting (how to set a high bar in a team, "do your fucking job", "you work Monday to Saturday really, really hard, so Sunday is easy") but also because Brady is incredibly eloquent and the meta-questions of how and why he answers specific questions in certain ways are interesting. But it pairs nicely with the Rosberg interview, because Brady seems to be the complete opposite of Rosberg. Brady is incredibly confident, his mental game is his advantage; the guy sounds like he never once doubted whether he can be the greatest of all time. Rosberg on the other hand, who also became a world champion, talks about how he's very defensive, often has doubts, struggles with anxiety, and so on. And yet he also was world champion.

      • After 9 years, I changed my avatar. Strange feeling. I've followed many, many people over multiple decades on different platforms and some of them I can't picture any other way than their avatar. They are their avatar in my mind. But I felt I was kinda cheating by not weaing glasses in mine (I started wearing them in the year the original avatar photo was taken), so I recreated it and here we are. Also: wow, so many platform where one uploads their avatar.

      Like the avatar? Subscribe here:

    8. πŸ”— smol-machines/smolvm smolvm v1.4.3 release

      What's Changed

      New Contributors

      Full Changelog : v1.4.2...v1.4.3

    9. πŸ”— HexRaysSA/plugin-repository commits sync repo: +1 release rss
      sync repo: +1 release
      
      ## New releases
      - [DriverBuddyReloaded](https://github.com/voidsec/driverbuddyreloaded): 2.4.0
      
    10. πŸ”— Armin Ronacher Better Models: Worse Tools rss

      A very strange Pi issue sent me down a rabbit hole over the last two days. The short version is that newer Claude models sometimes call Pi's edit tool with extra, invented fields in the nested edits[] array. And not Haiku or some small model: Opus 4.8. The edit itself is usually correct but the arguments do not match the schema as the model invents made-up keys and Pi thus rejects the tool call and asks to try again.

      That alone is not too surprising as models emit malformed tool calls sometimes. Particularly small ones. What surprised me is that this is getting worse with newer Anthropic models as both Opus 4.8 and Sonnet 5 show it but none of the older models. In other words, the SOTA models of the family are worse at this specific tool schema than their older siblings.

      In case you are curious about Fable: I intentionally did not test it because I was not sure if the classifiers they are running might downgrade me to Opus silently.

      Tool Calls Are Text

      If you have not spent too much time looking at LLM tool calling internals, the important thing to understand is that tool calls are not magic and use some rather crude in-band signalling. The model receives a transcript, a system prompt and a list of available tools. The server munches that into a large prompt with special marker tokens. Because the model was trained and reinforced on examples of that format, at some point during generation it emits something that the API or client interprets as "call this tool with these arguments".

      For a file edit tool, the intended invocation payload might say something like this:

      {
        "path": "some/file.py",
        "edits": [
          {
            "oldText": "text to replace",
            "newText": "replacement text"
          }
        ]
      }
      

      A harness then validates the arguments, performs the edit, and feeds the result back into the model. If validation fails, the model sees an error and usually tries again.

      How exactly that formatting happens is not known for the Anthropic models, but some people have gotten out "ANTML" markers and they at times do leak also into public communications. To the best of my knowledge, the call above would come out serialized like this from the model:

      <antml:function_calls>
        <antml:invoke name="edit">
          <antml:parameter name="path">some/file.py</antml:parameter>
          <antml:parameter name="edits">
      [
        {
          "oldText": "text to replace",
          "newText": "replacement text"
        }
      ]
          </antml:parameter>
        </antml:invoke>
      </antml:function_calls>
      

      An important thing to note here is that this thing, while looking like XML, is not really XML. It's just a thing they found convenient to tokenize and train on. The other thing to note is that a basic top-level string parameter appears in-line whereas an array of objects is implemented via JSON serialization. While I'm not entirely sure that this is how it works, there are some indications that this is not too far off. This will become relevant later.

      There are two very different ways to make the model produce a structure like this:

      1. You can ask the model to produce valid JSON matching a schema and then validate it afterwards.
      2. You can constrain the sampler so that invalid JSON, or even invalid schema shapes, cannot be sampled in the first place.

      The second approach is what people usually refer to as grammar-aware or constrained decoding. The sampler masks out tokens that would violate the grammar. If the model is currently inside a JSON object and the schema says only oldText and newText are allowed, the sampler can prevent it from emitting "in_file" or "type". Grammar-aware decoding can be used both to constrain something to be syntactically valid JSON and also to enforce specific enum values or keys.

      Without any form of constraints the model is merely following a learned convention.

      The Failure

      Pi's edit tool supports multiple exact string replacements in one call. That is why the arguments contain an edits array. In the failing cases the model produces entries like this:

      {
        "oldText": "...",
        "newText": "...",
        "requireUnique": true
      }
      

      or this:

      {
        "oldText": "...",
        "newText": "...",
        "oldText2": "",
        "newText2": ""
      }
      

      Across repeated trials I saw a whole zoo of invented trailing keys: type, id, kind, unique, requireUnique, matchCase, in_file, forceMatchCount, children, notes, cost, oldText2, newText2, oldText_2, newText_2, and even an event.0.additionalProperties key inside the edit object itself.

      The most annoying part is that the actual oldText and newText payloads were byte-correct in the invalid calls I inspected. The model had in fact produced the right invocation but then added nonsense at the end of the object.

      The failure is also heavily context-dependent. A fresh single-turn prompt like "edit this file" did not reproduce it at all for me. An agentic history where the model had read files, diagnosed a problem and then composed a multi-line edit could reproduce it. And more annoyingly, not all transcripts will show that behavior. In fact, I needed Petr Baudis's transcripts to reproduce this for me at all! In that user's session continuing the session caused Opus 4.8 to fail around 20% of the time. Stripping thinking blocks from history reduced the failure rate by half. Turning on strict tool invocation eliminated it in my runs.

      Why It's Getting Worse

      My strongest hypothesis is that this is not random deterioration but a training artifact.

      When older Anthropic models were trained, they were trained on some tools (some of which were documented). But that training did not yet have a user- shipped harness like Claude Code as the obvious target. Modern Anthropic models are most likely different because their post-training includes Claude Code or a harness that looks very similar. The model learns what a successful tool call looks like in that environment. It also learns what mistakes are tolerated by that environment.

      Claude Code's own tools are comparatively flat. The ordinary edit tool is not Pi's nested edits[] shape; it is closer to file_path, old_string, new_string, and an optional flag (replace_all). Looking at Claude Code's client is very instructive: it contains retry paths for malformed tool use, parameter aliases, type coercions, Unicode repairs and filtering of unknown keys. In other words, Anthropic's own client appears to expect and accept a fair amount of slop and repairs it, mostly silently.

      If reinforcement learning happens in a harness like that, or a simulation of one, then slightly malformed tool calls can still complete the task and receive reward. The harness fully absorbs the error and there is little gradient against inventing an alias, adding a stray field or using a nearby parameter name.

      Worse, the model may become very strongly adapted to the canonical Claude Code edit tool shape. A different harness can present a tool with the same semantic intent but a different schema. Such a tool can increasingly be off- distribution. The better-trained model might actually fight you harder because its prior is stronger.

      This is not too surprising, but it is a change from how this was a few months ago. When Opus 4.5 launched, it adapted to other edit tools exceptionally well. In fact, I was pretty convinced that we're on a good path where the models are more likely to adapt to any sort of tool shape that comes around for as long as the instructions are good.

      Now I'm somewhat worried about the track we're on here. Alternative tool schemas might not just be unfamiliar. They might be implicitly punished by post-training that optimizes for one particular, forgiving tool ecology. And that ecology is not documented. While there is a text editor tool that is documented, you will see that this format is in fact not followed by Claude Code. What Claude Code does internally (which is a closed- source harness) is hidden from you.

      The Slop Harness

      Claude Code is obviously closed-source but we can look at the minified code and get some idea of what it does. And honestly, it's very forgiving of incoming data.

      For a start, Claude Code checks the model's visible text for leaked <invoke markup. It also emits some telemetry when that happens and then it has its own state machine to retry such bad calls by pushing back to the model.

      It has explicit Unicode escape repair which fixes broken \uXXXX sequences and lone surrogates in string values. It also has per-tool aliases for parameters. For instance, Edit accepts old_str (presumably from the times when the models were trained on the officially documented text editor tool), the newer old_string from the schema, new_str/new_string, path as an alias for file_path, and some more.

      It also silently filters out unexpected keys and it does not use strict mode either. The issue with strict mode is that Anthropic applies complexity limits to the tool definitions that cause API requests to fail, so presumably that's why Claude Code does not attempt to use it.

      Strictness

      Will this problem be with us in other harnesses too? One huge issue with Anthropic is that the models are completely closed, and so is the harness. Codex models are also closed, but at least the harness is not. We also have gpt-oss which is at least a bit interesting. The models are explicitly trained to use OpenAI's harmony response format and there is a lot of documentation that at least tells us how OpenAI people think about this.

      Harmony makes channels and tool-call content types part of the prompt format. A function call can look like this:

      <|start|>assistant<|channel|>commentary to=functions.get_weather
      <|constrain|>json<|message|>{"location":"San Francisco"}<|call|>
      

      The important bit is <|constrain|>json. The model can express in-band that this message body is JSON, and an inference stack can use that boundary to switch into JSON-constrained sampling for the body of the tool call. Presumably a bit of this also happens in Anthropic's models, at least in strict mode I would imagine.

      The marker in harmony helps the sampler to detect when it needs to sample with a specific grammar, and because it is part of the transcript, it makes that rather easy to do. For hosted GPT models, there is also an option to provide a LARK grammar for custom tools that need to adhere to something like this.

      Anthropic appears different from that, though maybe not entirely. If an array of objects is represented as JSON, as it appears to be, then the model has to write JSON inside the tool parameter. There is probably basic grammar- constrained sampling going on, and that may partly explain the extra keys. For a nested array parameter, that JSON includes escaped multi-line file content inside string literals, inside one tag. The unexpected, made-up keys appear exactly at the highest-entropy point of that task: after closing a several- hundred-token escaped newText string, where the model must decide } vs , "...".

      Opus 4.8 and Sonnet 5 seem to have much stronger priors about what an edit tool call should look like and that prior appears to be Claude Code's edit schema: a flat old/new string pair, plus the optional replace_all flag. My guess is that Opus has learned that an edit operation may have one extra optional field, but under Pi's nested oldText/newText shape it has no trained name for that field. So it samples a plausible name fresh each time, which is why the failures produce dozens of random keys rather than one stable alias.

      As strict mode in Anthropic appears to fix this, I presume that on the server side they are refusing to sample a key that is not permitted by the JSON schema structure. That would also explain why they have limits to the complexity of the tool definitions when strict mode is enabled.

      So far, the Codex models I tested did not show this type of regression. I tested all available ones except 5.6, which I do not have access to yet.

      What This Means For Harnesses

      The uncomfortable lesson is that tool schemas are not neutral, at least not on Anthropic models. We like to pretend that a schema is an abstract contract and the model is a general reasoner that will follow it, but that might no longer be the case for some of the tools.

      Tool schemas are somewhere in the distribution and some shapes are close to what the model saw during post-training and some are far away. Some are easy for the provider's hidden encoding (e.g. top-level attributes in ANTML), whereas some require the model to write large escaped JSON objects inside nested arrays after long multiline strings. The model may be smart enough to understand the schema and still be bad at sampling the exact shape under pressure.

      If this type of model behavior continues, I wonder what the implications for harnesses are. Obviously one could turn on strict sampling in Anthropic and the problem should go away. On the other hand, that the model has this behavior shows the impact that reinforcement learning has on them. Fighting that prior is probably futile if you want to get the best model performance.

      Right now the reality is that Claude Code is not open source and we cannot really know what they are doing in their RL environments either. We cannot assume Claude-Code-trained behavior will transfer cleanly to your tools unless they are a close match. The more post-training happens inside one dominant harness, the more every other harness will have to inherit its quirks.

      I used to be more skeptical of strict grammar-constrained tool invocation because constrained decoding can have quality tradeoffs. I still think that can be true in general, but this bug moved my priors significantly. If the newest models get better at solving the task while getting worse at faithfully emitting an alternative tool schema, then the harness needs stronger guarantees somewhere.

      If you want to find out more, or you want to discuss this, consider reading the issue on the Pi tracker.

  3. July 03, 2026
    1. πŸ”— IDA Plugin Updates IDA Plugin Updates on 2026-07-03 rss

      IDA Plugin Updates on 2026-07-03

      New Releases:

      Activity:

      • disrobe
        • ccd453f4: fix #33: harden py-decompile against out-of-range jump targets from a…
        • a5e75e06: recover utf-16le decoded strings from the emulator write-log (both al…
        • 78e6592e: condense multi-line prose doc comments to terse single-line contracts…
        • 43ecb16f: recover dense-switch jump tables via a relocation-aware object reader…
        • 0f53c9c1: recover clang branchless absdiff via a pre-sub flag snapshot for cmov…
        • 1014e9e2: harvest decoded strings from the whole emulator write-log, sealing tr…
        • a1d90592: add a whole-program differential oracle for native decompile and reco…
        • 904249ca: grade native decompile multi-block control flow and broaden the pseud…
        • 9b589e80: add an about/cli reference page to the playground, remove the native-…
        • ca0b784d: native decompile recovers direct calls now, modeling callee(args) fro…
        • 21a8af96: add an x86-64 -> rust pseudo-rust backend to native decompile, rustc-…
        • b1648cb6: fix the cross-platform native-decompile ci test, preserve nested f-st…
        • 1323b375: lift roundsd/roundss to c __builtin round and regenerate the stale de…
      • GhidraDec
        • 9dabfb2e: Fix MacOS on IDA 8.3-pro, 8.4-pro
        • 6629cc31: More careful MacOS fix
        • 0e49fd13: Fix MacOS on 7.0 to 7.4 builds
        • 92baabc3: Fix broken 6.8 and enhanced platform selection
        • 4126ee13: Make build skip existing release assets
        • 90670c71: Fix package names, now fully tested and working with old IDA SDKs
        • cdb74d31: Make publish testing possible
        • 73980b4b: Prepare for self-hosted SDK builds in 1.1 for full platform coverage
        • 57d396f1: Automate release for older IDA versions
        • e7550570: Final preparation for release
        • ed4f3bc3: Release change log
        • 6a39f3b7: Full Ghidra 12.1 decompiler options now user configurable
        • 9d78092e: All new Ghidra decompiler options added, more fixes to difficult targ…
        • e03ba067: Final test targets needed for release, better noreturn functionality
      • hrtng
        • 3dc942f1: - Automatically detect and set base class member inside derived class…
      • Kodak-DCS-Tools
      • rikugan
        • e35f1a15: chore(release): bump version to 1.7.0
    2. πŸ”— smol-machines/smolvm smolvm v1.4.2 release

      What's Changed

      • Surface crane's stderr when a layer blob fails to pull by @BinSquare in #544
      • Propagate host file changes into the guest as fsnotify events for -v mounts by @BinSquare in #547
      • Bundle the libkrunfw guest kernel with /proc/smolvm-fsnotify by @BinSquare in #548
      • Bump smolvm to 1.4.2 by @BinSquare in #551

      Full Changelog : v1.4.1...v1.4.2

    3. πŸ”— r/reverseengineering I built an open-source Chromium fork that compiles fingerprint spoofing into the C++ instead of injecting JS rss
    4. πŸ”— BarutSRB/OmniWM OmniWM v0.5.3 release

      Warning

      OmniWM 0.5.3 is superseded by 0.5.3.1. The 0.5.3 app bundle accidentally advertised LSMinimumSystemVersion as 27.0; 0.5.3.1 restores the intended macOS 26.0 floor.

      What's New Since 0.5.2.1

      Status Menu and Workspace Bar

      • Added an inline workspace stats button and system stats popup.
      • Rebuilt the status menu around hosted SwiftUI content, shared model state, and reusable menu components.
      • Registered the secure input indicator with the surface coordinator so it behaves like the rest of OmniWM's overlay surfaces.
      • Consolidated utility-window hosting through HostedWindowPresenter.

      Runtime and Window Management

      • Reworked create placement, column parking, border handling, and reconcile persistence.
      • Made offscreen parks durable and added park visibility diagnostics.
      • Fixed Quick Look placement and focus behavior on multi-monitor setups.
      • Fixed tiled windows dropping to floating during transient AX subrole misreads.
      • Fixed manual float placement using stale tiled frames.
      • Reduced hot-path waste by caching screen coordinate transforms, skipping frame-echo window-server queries during tile animations, avoiding repeated title queries during workspace bar reconcile, and preventing unchanged tracked windows from being re-admitted during rescans.

      Platform and Packaging

      • Raised the supported baseline to macOS 26+ on Apple Silicon.
      • Dropped Intel Mac support and universal builds; OmniWM now ships arm64-only.
      • Aligned GhosttyKit build wiring and release packaging with the arm64 artifact.
      • Removed obsolete config migration machinery and dead legacy paths.
      • Routed border color conversion through SettingsColor and migrated command palette state to Observable.

      Release Integrity

      • OmniWM-v0.5.3.zip contains the Developer ID signed, notarized, and stapled OmniWM app.
      • OmniWM-v0.5.3.zip SHA-256: 77fe911bc493621f67e32c931eab269b8dc1835654f50682bcd474699c4489fe
      • GhosttyKit.xcframework-v0.5.3.zip SHA-256: 9c02f380089e22fdd2eaba2f18e757c886d38b4b0f57de2162c5b1e4394fe830
    5. πŸ”— BarutSRB/OmniWM OmniWM v0.5.3.1 release

      What's New Since 0.5.3

      • Fixed the app bundle minimum system version so OmniWM 0.5.3.1 launches on macOS 26 Tahoe as intended.
      • Supersedes 0.5.3, whose app bundle accidentally advertised macOS 27.0 as the minimum system version.

      Release Integrity

      • OmniWM-v0.5.3.1.zip contains the Developer ID signed, notarized, and stapled OmniWM app.
      • OmniWM-v0.5.3.1.zip SHA-256: 51d7672270843ba8d48138a99deb059bafa493b7fae6453b01fb0f76cbc5140e
      • GhosttyKit.xcframework-v0.5.3.1.zip SHA-256: d5f6f073678dd817a0f9cd849b39c21396833f50c48f1ab93f8df26ee76da011
    6. πŸ”— r/reverseengineering 2000s ringtones ran on an undocumented FM chip. nobody reverse-engineered it until now. rss
    7. πŸ”— smol-machines/smolvm smolvm v1.4.1 release

      What's Changed

      • Route non-interactive execs through the keep-alive container so backgrounded processes persist by @BinSquare in #542
      • Bump the engine to 1.4.1 by @BinSquare in #543

      Full Changelog : v1.4.0...v1.4.1

    8. πŸ”— smol-machines/smolvm smolvm v1.4.0 release

      What's Changed

      Full Changelog : v1.3.9...v1.4.0

    9. πŸ”— smol-machines/smolvm smolvm v1.3.9 release

      What's Changed

      • fix(fork): regenerate per-machine on-disk secrets and fail closed on clone rejuvenation by @BinSquare in #531
      • Run streamed exec inside the persistent container overlay on image machines so SDK streamed changes survive by @BinSquare in #534
      • Bump libkrun to the virtio-fs/gpu mapping-bounds-hardened build and refresh the bundled linux library by @BinSquare in #526
      • Mark the aarch64 seccomp allowlist validated for enforce by @BinSquare in #535
      • Fail closed to a strict egress floor on serve nodes by @BinSquare in #536
      • Validate content digests before they become filesystem paths by @BinSquare in #537
      • Add /dev/kmsg to the container device set so nested Kubernetes works out of the box by @BinSquare in #538
      • Bump the engine to 1.3.9 by @BinSquare in #539

      Full Changelog : v1.3.8...v1.3.9

    10. πŸ”— exe.dev Connect Your ChatGPT Subscription to exe.dev rss

      The Edit LLM integration dialog in exe.dev, with OpenAI set to ChatGPT
subscription

      We're pleased to announce that you can connect your ChatGPT subscription to exe.dev and use our coding agent Shelley with the OpenAI models you're already paying for!

      To enable this, go to exe.dev/integrations, click on the "LLM" integration tile, click "Edit", choose "ChatGPT subscription" and do the authentication dance. All your VMs will then be configured to use your subscription for OpenAI models.

      Under the hood, this composes several exe.dev features. Integrations expose additional functionality or connections to VMs with a tag-based configuration system. The reflection integration tells a VM which other integrations are attached to it, the LLM integration lets the VM make LLM requests and discover available models, and the OpenAI integration provides the authentication. When it boots up, Shelley offers the models available to the VM. If you prefer a terminal based agent, pi is pre-configured the same way. Codex can be configured with exeuntu configure codex. As with many of our platform features, you can use the tools we've pre-configuredβ€”and anything you build yourself can also tap the LLM integration to call models.

      How Shelley uses the reflection, LLM, and OpenAI integrations: it discovers
integrations via the reflection integration, lists models and makes chat-
completion requests through the LLM integration, and the OpenAI integration
injects the upstream auth.

    11. πŸ”— Ampcode News More Orb Sizes rss

      You can now pick the size of the orbs used to run Amp agents remotely:

      • a0.tiny: 1 CPU, 2GB memory, 40GB disk ($0.10/hour)
      • a0.small: 2 CPUs, 4GB memory, 40GB disk ($0.21/hour)
      • a0.medium: 8 CPUs, 16GB memory, 40GB disk ($0.83/hour)
      • a0.large: 16 CPUs, 32GB memory, 40GB disk ($1.66/hour) β€” default

      Go to project settings to change the size of a project's orbs.

      We also doubled orb storage from 20GB to 40GB, at no additional cost to you.

      See Orbs in the Amp Owner's Manual for more information.

  4. July 02, 2026
    1. πŸ”— IDA Plugin Updates IDA Plugin Updates on 2026-07-02 rss

      IDA Plugin Updates on 2026-07-02

      New Releases:

      Activity:

      • GhidraDec
        • 1ce45660: Fix to JVM/Dalvik and PPC64EL
        • 4298dec8: Expand corpus, improve testing coverage
        • 8bb96074: Finish support of m68k/sh4, HPPA near complete
        • 1dfb5dd0: More stability fixes for MIPS BE
        • 78b08a48: Better support for runtime/loader functions
        • be7d4590: Improve namespace support and fix the flush timing to be in sync with…
        • eddd8331: Audit and CMake resilience on latest builds
        • 4eef1e72: Source code color fixed, graph on previous/next fixed
        • 96be700c: Several GUI mode fixes
      • ida-domain
        • 6e181525: Update test matrix to use IDA 9.4 beta 2 (#91)
      • ida-settings
        • 5462ad1b: plugin: remove demo settings
      • plugin-ida
        • 50244f76: Merge pull request #145 from RevEngAI/feat-PLU-296
        • 5a16e256: fix: auth boot loop:
        • 13d6f1b4: feat(PLU-296): remove legacy auto-unstrip impl
      • rikugan
        • ea1f1ba2: Merge branch 'feat/naming-convention'
        • 412b0d80: fix(review): resolve whole-branch findings β€” tighten triggers, add to…
        • 52bd5de5: docs(changelog): note naming-convention unification + bulk_renamer Pa…
        • 99e24af0: docs(skills): sync malware-analysis + generic-re naming sections to 6…
        • 651ca0af: fix(bulk_renamer): switch Quick/Deep prompts from snake_case to Pasca…
        • 206b721e: feat(prompt): expand RENAMING_SECTION to 6 object types + skill refer…
        • a4614093: feat(skills): add naming-convention skill with full standard + escala…
        • ebd72e3c: docs(plan): naming convention implementation plan β€” 5 TDD tasks
        • 6ddb1624: docs(spec): self-review round 2 β€” fix rename_multi_variables ghost to…
        • 08e2a701: docs(spec): naming convention design β€” hybrid 3-tier standard
        • 7249707c: fix(deps): restore html2text as required dependency across all manifests
        • 31bad52f: chore(deps): bump the actions group across 1 directory with 5 updates…
        • afc1a776: docs(a2a): document authentication model for outbound A2A HTTP requests
        • bef2f336: fix(review): resolve 2 subjective T2 items (orchestra-error + provide…
        • 15f45dc7: fix(review): resolve 2 subjective T2 items (dep-manifest + cross-module)
        • 1fa2e17e: fix(types): replace 33 'parent: QWidget = None' with '| None' annotation
        • c003f60e: fix(review): resolve 3 subjective T1/T2 review items
        • c74da3a7: fix(review): resolve 4 subjective T1 issues + dead panel.py shim
        • 665e8652: chore(release): bump version to 1.6.1
    2. πŸ”— HexRaysSA/plugin-repository commits sync repo: +1 release rss
      sync repo: +1 release
      
      ## New releases
      - [ida-settings-editor](https://github.com/williballenthin/ida-settings): 1.2.3
      
    3. πŸ”— r/reverseengineering I built a fast, terminal-based static analyzer for quick malware triage (Rust) rss
    4. πŸ”— r/reverseengineering How we broke Rhysida ransomware encryption rss
    5. πŸ”— The Pragmatic Engineer The Pulse: a new trend, smart model routing rss

      Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from a previous The Pulse issue . Full subscribers received the article below three weeks ago. If you 've been forwarded this email, you can subscribe here .

      Two weeks ago, I covered a trend of companies trying to reduce spending on AI within their engineering departments. While talking to my sources about this, one head of engineering at a larger company told me that they wished there was an 'intelligent' router that picks the right model for the right task.

      The reason for such a wish is clear; prices for tokens vary greatly per model, and there can easily be a 10-20x difference between a cheap, average model, and a state-of-the-art one.

      I did some digging into whether any solutions like this currently exist because the benefits look obvious, and what I found is listed below. Usual disclaimer: I have no affiliation with these vendors, and have not been paid to mention any of them!

      Vendors:

      • Factory Router: automatically selecting the right model per session, claiming 20-25% cost savings. More details.
      • Not Diamond: auto-selection of coding models, claiming around 30% cost savings. Used by OpenRouter, under the hood. More details.
      • Vercel AI gateway. Hundreds of AI models, smart routing and billing in one place. More details.
      • Prism**** by Augment Code. Choosing the "best" model automatically for coding tasks. More details.
      • Model Router by Morph. An API to suggest model selection for a prompt, based on a list of models. More details
      • Weave router: a token router that works inside Codex, Claude Code and Cursor. "Hard" requests stay on frontier models, while "easy" ones go to open source ones. More details

      AI gateways with routing built in. API gateways are popular ways to use LLMs in workplaces.

      • OpenRouter: comes with "auto router" functionality where, after analyzing the prompt, the best one is selected. Uses Not Diamond under the hood.****More details
      • Kilo Gateway: route requests the model considered the best price-per-value. Supports using your own model keys, and using the service only as a router. More details
      • Requestly.ai: automatically route requests to the right model based on cost, latency, and availability, and tons of configuration. More details
      • LiteLLM: define routing rules that automatically select the best model, based on input content with the "auto routing" functionality. The setup is more manual, but you get more control than with many other AI gateways. More details
      • Envoy AI Gateway: an open source gateway that offers some routing configuration, though it feels that the routing engine focuses more on availability, not cost optimization and smart model routing. More details

      Cursor and GitHub Copilot also have an "Auto" model selection that does automatic model selection. For Cursor, it's a fixed-price model where any savings made are for Cursor: they are not passed on to customers, but the model is cheaper than most others. For Copilot, the Auto mode results in intelligent model selection - but I've not heard much positive feedback about this mode from the few devs I asked about it. For Pro plans, Copilot supports pretty old models: GPT-5.5 and Opus 4.8 are not available. These are, however, available on the Pro+ and above plans.

      Demand seems to be extremely high for intelligent routing. I asked Matan Grinberg, cofounder and CEO at Factory AI, who told me:

      "Demand has been off the charts, especially from the enterprise [from large companies.] I've met with practically every bank CEO since we launched this offering, because they want a layer to control spend, while still generating high-quality code.

      Pretty much everyone in tech is starting to see that open models are often sufficient. We're seeing open model usage strictly increasing the last six months. My guess is that hosted open models are sufficient in performance for around 60% of coding-related work, in terms of token spend."

      It feels to me that "intelligent routing" will become table stakes, and so we can expect pretty much all AI vendors to build some version of it, and many new vendors to offer this kind of functionality.

      If you know of any additional vendors not listed, you can add a comment on the original The Pulse article, and see more options there.

      Read the full issue The Pulse that this excerpt was from , or check out all The Pulse issues.

    6. πŸ”— r/reverseengineering Reverse Engineering Warframe's usermode anti-cheat rss
    7. πŸ”— Kagi release notes July 2nd, 2026 - Heads, tails, and an AI toggle rss

      Kagi Search

      New controls to completely turn off AI-based features in search

      We've added an option to disable access to AI features in search, under settings/ai.

      We're also planning to add this option to onboarding, so new users can personalise their Kagi experience from the start.

      It's finally here! We believe that Kagi's application of AI should always be useful - there when you'd like it, and never when you don't, and always respecting your privacy.

      This took us some time to navigate the right way to communicate this option. We did not want to create a confusing narrative as a company adding a toggle while continuing to invest in AI features elsewhere in our portfolio. But in the end, we want to stay true to putting you in control of your search engine - so here you are!

      We deeply thank the community for their feedback and patience.

      Flip coins and more sports widgets

      By popular demand, our dice widget has gained the ability to roll dice with any number of sides. We're not sure what kind of games you're playing that need d7s, but we support them now.

      We also added support for flipping coins, which are really just two-sided dice when you think about it:

      Kagi search results page for the query 'flip 2 coins' displaying a widget
that reads '1 head' alongside two illustrated
coins.

      We've added a set of switches on https://kagi.com/settings/more_search so you can disable any of our widgets you don't want to see. The toggle descriptions include links illustrating the widgets' capabilities so you understand what you're turning on or off; go check it out!

      Settings page titled 'Widgets' listing three enabled toggle options:
Calculator, Time & Date, and Package
Tracking.

      Orion browser ✴︎

      This week, we’re launching Orion 1.1 for macOS, one of the most significant updates in our history. This version is built around three major new features (in addition to 170+ smaller improvements and bug fixes).

      A New Interface ✴︎
      When Apple released LiquidGlass , the reception was mixedβ€”even within our own team. The demand was there, but we weren't ready to just copy-paste what Safari had done. They had even removed compact tabs!
      So, we created our own implementation.

      Containers ✴︎
      Just like Firefox, we now offer containers. What are they? Each tab becomes completely isolated from the others: total privacy and the ability to log into multiple accounts on the same site from the very same window!

      Browser window showing the Facebook login page with a container menu open
offering options to open a new tab in different containers such as Shopping,
Social media, or
Flights.

      A Personalized Browser Border ✴︎
      The current trend is an elegant, transparent border seen on many browsers. The problem is, they don't match Apple's design language. So, what did we do?

      As we usually do: we made it an option! And we took it even further: transparency, solid colors, gradients, and even an automatic color-match with the website for total immersion.

      This option is exclusively available to Orion+ subscribers.

      Orion+
      Orion is your free browser, but we offer a support plan to maintain the independence that guarantees your data is not, and will never be, sold to advertisersβ€”or worse.

      We have a dedicated website where you can download all the versions we currently support, as well as any we may support in the future (macOS, iOS, iPadOS, Windows, and Linux): https://orionbrowser.com

      Kagi News & Kagi Translate

      Kagi News and Kagi Translate have both been successes that took us by surprise.

      Kagi News users from all over the world loved being able to read their news in the language of their choice, stress-free, and even add new topics.

      Kagi Translate users loved the contextual features that provide a spectacular translation quality β€” far beyond what typical machine translation offers.

      But these unexpected successes led to a massive spike in our costs for applications offered for free.

      As a result, we have temporarily removed translations and left access to the articles’ original languages as well as English. Kagi Translate will be back in the coming days as a subscription-based service.

      Thank you for your patience and your trust πŸ™ we hope to have everything up and running again very soon!

      Other improvements and bug fixes

      Kagi Search

      Kagi Assistant

    8. πŸ”— r/reverseengineering cfgrip - PE/ELF x86/x64 CFG extractor rss
    9. πŸ”— r/reverseengineering Transformers Forged To Fight Offline Version Reverse Engineering rss
    10. πŸ”— Hex-Rays Blog IDA 9.4: Improved analysis of compiled Swift binaries rss

      IDA 9.4: Improved analysis of compiled Swift binaries

      IDA 9.4 constitutes the first step towards better handling of Swift binaries. Perhaps unsurprisingly, our focus is on Swift for ARM64 Mach-O files, but we generally aim to improve Swift support also in the more unusual settings across different architectures and file formats like x86-64 and ELF. For 9.4, we want to highlight two different improvements: proper modelling of the Swift ABI, and proper typing of Swift runtime functions.

    11. πŸ”— r/reverseengineering Redundancy seen in AAA game engines (Game Engine Reversing) rss
    12. πŸ”— Jeremy Fielding (YouTube) Engineer Vs Bee: Round 1 rss

      This work was supported by the Alfred P. Sloan Foundation, enhancing public understanding of science and technology in the modern era, in partnership with IMI: watch what matters. https://www.theimi.co/ & https://sloan.org/programs/public-understanding Order custom parts Send Cut Send πŸ‘‰ http://sendcutsend.com/jeremyfielding Engineers Get Up to 6 Months Pro: πŸ‘‰ https://onshape.pro/JeremyFielding

      If you want to join my community of makers and Tinkers consider getting a YouTube membership πŸ‘‰ https://www.youtube.com/@JeremyFieldingSr/join

      If you want to chip in a few bucks to support these projects and teaching videos, please visit my Patreon page or Buy Me a Coffee. πŸ‘‰ https://www.patreon.com/jeremyfieldingsr πŸ‘‰ https://www.buymeacoffee.com/jeremyfielding

      Social media, websites, and other channel

      Instagram https://www.instagram.com/jeremy_fielding/?hl=en Twitter πŸ‘‰https://twitter.com/jeremy_fielding TikTok πŸ‘‰https://www.tiktok.com/@jeremy_fielding0 LinkedIn πŸ‘‰https://www.linkedin.com/in/jeremy-fielding-749b55250/ My websites πŸ‘‰ https://www.jeremyfielding.com πŸ‘‰https://www.fatherhoodengineered.com My other channel Fatherhood engineered channel πŸ‘‰ https://www.youtube.com/channel/UC_jX1r7deAcCJ_fTtM9x8ZA

      Notes:

      Chapters 0:00 The engineering problem 05:09 Biology & photography problems 11:42 Trying without a robot. 17:40 Lets talk to scientist about the problem 46:09 The first prototype design 51:43 Building the bee chaser

      Technical corrections

      Nothing yet

    13. πŸ”— r/reverseengineering PE structural validation notes (four format ambiguities) + IOCX v0.7.5 release rss
    14. πŸ”— smol-machines/smolvm smolvm v1.3.8 release

      What's Changed

      • fix(exec): bound streaming exec output to prevent host OOM; reach aarch64 seccomp call site by @BinSquare in #529
      • fix(pack): block symlink-parent host-escape in sparse tar extraction by @BinSquare in #530
      • Kill a wedged VM identified by its unique boot-config argv so teardown never leaks an untracked live orphan by @BinSquare in #532
      • Bump the workspace to 1.3.8 by @BinSquare in #533

      Full Changelog : v1.3.7...v1.3.8

    15. πŸ”— r/reverseengineering FSR 4.1.0 static RE evidence bundle β€” looking for native D3D12 validators rss
    16. πŸ”— r/reverseengineering Claude Code / Codex Skill for Reverse Engineering rss
    17. πŸ”— smol-machines/smolvm smolvm v1.3.7 release

      What's Changed

      • Enforce read-only host mounts at the virtiofs device instead of only in the guest by @BinSquare in #524
      • fix: offload DNS off the poll loop and copy-truncate live log rotation by @BinSquare in #528
      • fix(dist): ship the Windows disk templates at their real 512 MiB size by @BinSquare in #527
      • fix(agent,disk): bound stalled-body reads with a deadline; guard disk-resize overflow by @BinSquare in #525

      Full Changelog : v1.3.6...v1.3.7

    18. πŸ”— Console.dev newsletter Oak rss

      Description: Version control and storage layer.

      What we like: Unit of work is a branch which is cloned instantly using content-addressed storage - never push to main directly. Changes based around a branch description rather than commit messages. Create a working tree from a remote repo without requiring a full clone. Spaces combine repos for large cross-repo tasks. Import/export from git.

      What we dislike: Only supports macOS (Apple Silicon) and Linux (x86) due to underlying filesystem requirements for content addressable storage.

    19. πŸ”— Console.dev newsletter cmux rss

      Description: Multi-workspace terminal.

      What we like: Vertical tabs with individual notifications, working directories, and ports. Split panes within each tab with in-app browser support. Fully macOS native (Swift + libghostty). Includes a CLI and socket API for automation. iOS app syncs in real time to continue on mobile. Good keyboard shortcuts.

      What we dislike: Currently macOS only, but there is a waitlist for other platforms.

    20. πŸ”— Ampcode News Read Bigger Threads rss

      Threads outgrew read_thread, so we rewrote it.

      read_thread is the tool that lets Amp pull context out of other Amp threads when you mention them. Before the rewrite, it would fetch the whole thread and extract the relevant parts in a single call to another LLM.

      That used to work when threads were shorter and contained a single context window. Then we added compaction and now a single thread can run for weeks. Our longest thread has been compacted over 68 times β€” without compaction, it would be over 21 million tokens long.

      A 21-million-token thread doesn't fit into a single context window, so asking another LLM to extract relevant parts doesn't work anymore. And even threads with 1 million tokens that fit gave bad answers: one giant prompt over-weights whatever the thread ended with or started with and ignores the information in the middle.

      read_thread is now a subagent tuned to extract information from long threads. The subagent takes a thread and a question, searches the thread, reads the messages, and checks whether later work revised or reverted what it found.

      Our first version of the read_thread subagent answered from the first plausible hit. In a long thread, the first hit is often an attempt that was later revised or reverted. We switched the model to GLM 5.2 from Gemini 3.5 Flash and tuned its prompt to optimize for correctness over speed:

      • "Do not stop at the first relevant hit; check for newer messages that revise, supersede, revert, or contradict it."
      • "Tool calls record attempted actions, not outcomes." It checks whether an edit actually succeeded before believing it.
      • "Use compactions for orientation, but inspect original messages when exact requirements, wording, code, commands, chronology, edits, or verification matter."

      It also works on the thread you're in. When the agent needs something from three weeks ago β€” a decision, an error, the original plan β€” it goes back and looks instead of trusting the compaction.

      Nothing changes on your end. Either tell Amp what you're looking for and let it find the thread, or give it a thread explicitly: paste a URL, or @-mention it. And when you open a new thread, hit Enter twice to reference the thread you just left.

      Mention a thread and ask a question, just like before, except it now works with big threads too: