- ↔
- →
to read (pdf)
- Cord: Coordinating Trees of AI Agents | June Kim
- Style tips for less experienced developers coding with AI · honnibal.dev
- Haskell for all: Beyond agentic coding
- AgentRE-Bench — LLM Reverse Engineering Benchmark
- Announcing Observational Memory - Mastra Blog
- February 23, 2026
-
🔗 @HexRaysSA@infosec.exchange We're heading to RE//verse (March 5-7) and would love see all you IDA users - mastodon
We're heading to RE//verse (March 5-7) and would love see all you IDA users - past and present. As usual, we are on the hunt for product feedback. Give us the good, the bad and your feature requests. In exchange, we'll give you some swag.
Members from Dev, Product Evangelism & Marketing will be roaming and ready to hear it all.
To get yourself some swag, you'll need to:
1️⃣ Find us.
2️⃣ Show us you subscribe to our newsletter or follow us on social.
3️⃣ Whisper the code word: Glitter Burrito.Swag is limited… come prepared.
You can also skip the hunt and book a chat with us here: https://meetings- eu1.hubspot.com/justine-benjamin/reverse-2026 -
🔗 Simon Willison Writing about Agentic Engineering Patterns rss
I've started a new project to collect and document Agentic Engineering Patterns - coding practices and patterns to help get the best results out of this new era of coding agent development we find ourselves entering.
I'm using Agentic Engineering to refer to building software using coding agents - tools like Claude Code and OpenAI Codex, where the defining feature is that they can both generate and execute code - allowing them to test that code and iterate on it independently of turn-by-turn guidance from their human supervisor.
I think of vibe coding using its original definition of coding where you pay no attention to the code at all, which today is often associated with non-programmers using LLMs to write code.
Agentic Engineering represents the other end of the scale: professional software engineers using coding agents to improve and accelerate their work by amplifying their existing expertise.
There is so much to learn and explore about this new discipline! I've already published a lot under my ai-assisted-programming tag (345 posts and counting) but that's been relatively unstructured. My new goal is to produce something that helps answer the question "how do I get good results out of this stuff" all in one place.
I'll be developing and growing this project here on my blog as a series of chapter-shaped patterns, loosely inspired by the format popularized by Design Patterns: Elements of Reusable Object-Oriented Software back in 1994.
I published the first two chapters today:
- Writing code is cheap now talks about the central challenge of agentic engineering: the cost to churn out initial working code has dropped to almost nothing, how does that impact our existing intuitions about how we work, both individually and as a team?
- Red/green TDD describes how test-first development helps agents write more succinct and reliable code with minimal extra prompting.
I hope to add more chapters at a rate of 1-2 a week. I don't really know when I'll stop, there's a lot to cover!
Written by me, not by an LLM
I have a strong personal policy of not publishing AI-generated writing under my own name. That policy will hold true for Agentic Engineering Patterns as well. I'll be using LLMs for proofreading and fleshing out example code and all manner of other side-tasks, but the words you read here will be my own.
Chapters and Guides
Agentic Engineering Patterns isn't exactly a book, but it's kind of a book-shaped blog. I'll be publishing them on my site using a new shape of content I'm calling a guide. A guide is a collection of chapters, where each chapter is effectively a blog post with a less prominent date that's designed to be updated over time, not frozen at the point of first publication.
Guides and chapters are my answer to the challenge of publishing "evergreen" content on a blog. I've been trying to find a way to do this for a while now. This feels like a format that might stick.
You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.
-
🔗 r/Leeds What is / was this building on Burley Road? rss
Seemingly derelict except for gloomy lights on in a couple of rooms…
submitted by /u/Sea_Antelope_9767
[link] [comments] -
🔗 vercel-labs/agent-browser v0.14.0 release
Minor Changes
b7665e5: - Addedkeyboardcommand for raw keyboard input -- type with real keystrokes, insert text, and press shortcuts at the currently focused element without needing a selector.- Added
--color-schemeflag andAGENT_BROWSER_COLOR_SCHEMEenv var for persistent dark/light mode preference across browser sessions. - Fixed IPC EAGAIN errors (os error 35/11) by adding backpressure-aware socket writes, command serialization, and lowering the default Playwright timeout to 25s (configurable via
AGENT_BROWSER_DEFAULT_TIMEOUT). - Fixed remote debugging (CDP) reconnection.
- Fixed state load failing when no browser is running.
- Fixed
--annotateflag warning appearing when not explicitly passed via CLI.
- Added
-
🔗 HexRaysSA/plugin-repository commits sync repo: +1 plugin, +1 release rss
sync repo: +1 plugin, +1 release ## New plugins - [bitopt](https://github.com/teflate/bitopt) (1.0.0) -
🔗 r/york Mario Kart & a live band! 🏎️🎸 rss
| Hi everyone! I’m organising something a bit different for York this summer! We’re hosting a Mario Kart Tournament with a twist: a live band playing the soundtrack in real-time as we race. When the final lap hits and the music speeds up... things are going to get chaotic 🥵 There are only 80 tickets available for this. I’ve just put the preview page live with a countdown timer so you can see the details and venue. Check it out here: https://www.tickettailor.com/events/mariokartyork/2072886 Do us a favour and pass this on to that friend who reckons they're the best at Mario Kart - let's find out. https://preview.redd.it/l4qzg8dpy9lg1.png?width=1131&format=png&auto=webp&s=e9fcd01ce70c863d8fc5032f4765db82382a66fb submitted by /u/Zatheos
[link] [comments]
---|--- -
🔗 r/york York Restaurant Week - The Ultimate Spreadsheet! rss
| Hello everyone! With the March York Restaurant Week upcoming and all the offers posted to the website, I have lovingly and painstakingly added every single restaurant, bar and place with offers to a spreadsheet. I have added columns that you can filter by for style of food/place (e.g. Italian, Chinese), whether the offer includes breakfast, lunch, dinner, drinks (separately, drinks with food don’t count), snacks and/or an experience (experience would include, for example, City Cruises). You can filter by each of these individually so you can plan your meals and days. I have added the Google Reviews, in case that helps thinking, the location, and then I have added all offers except for offers which are children specific (e.g kids meal £10) since I made this list for personal use and don’t have any! I have put the prices of the offers at the start, so you can filter by price as well when deciding where to go. I willingly sacrificed a day of hard work to do this work instead. Not as in I took the day off work, just that I procrastinated massively. The only thing I think missing is when the offers apply, as some are exclusive to certain times and days - will keep editing the spreadsheet as I go. Hope it is a useful tool! there is probably room for improvement, this is just a first draft, so welcome vies on how to make this more functional 😃 submitted by /u/TheGent_88
[link] [comments]
---|--- -
🔗 r/wiesbaden "Wahl-o-Mat" Kommunalwahl rss
Wahl-o-Mat für die Kommunalwahl 2026 in Wiesbaden
https://app.voto.vote/de/elections/7867403
submitted by /u/Tiny-Spread-6326
[link] [comments] -
🔗 r/Yorkshire Which Yorkshire town has the best proper chippy, no debate? rss
Everyone claims their local is the best, but let’s settle it properly.
Crispy batter, decent portion size, chips that aren’t soggy, and proper gravy or curry sauce.
Which Yorkshire town genuinely does it best? No politeness, just honest answers.
submitted by /u/FeistyPrice29
[link] [comments] -
🔗 r/Yorkshire TRU work between Leeds and Huddersfield this April (weekends), May and June rss
| Network Rail is warning passengers of major disruptions to service on the line between Huddersfield and Leeds this May and June, due to engineering work involving the Transpennine Route Upgrade (TRU). Additionally, trains will not stop at Huddersfield station at weekends during April. The planned works are necessary to carry out essential infrastructure improvements, including track and drainage renewals, bridge work, signalling upgrades, and station enhancements along the Huddersfield to Leeds corridor. Passengers are being strongly advised to plan ahead of travelling. Where possible, passengers will remain on trains using diversionary routes or services running as close as possible to affected areas. However, journey times may be increased, and some passengers will benefit from the use of rail replacement buses. During the disruption, Brighouse Station will serve as a combined rail and bus interchange for those travelling to or from Huddersfield. “We understand the ongoing work will cause some disruption, and we’d like to thank everybody for their patience throughout this period. “The Transpennine Route Upgrade will bring major benefits, including more trains, more seats and faster, more reliable journeys across the North. These improvements, from electrifying the route to upgrading stations and modernising signalling, will help create a greener, more dependable railway that better serves local communities for years to come.” Paul Sumner, TRU Senior Sponsor. “TRU has made a lot of progress on the route between Huddersfield and Leeds over the last few years, and will continue to do so over the next few months, delivering once in a generation upgrades for our customers that will improve accessibility at our stations, as well as improving the reliability of the railway and enabling future upgrade works to take place along that corridor. “While these works are being carried out over the coming months, our services will be operating to an amended timetable and in some instances, rail replacement buses will be in operation. Due to the scale of the upcoming work, we would advise that our customers check before they travel at www.nationrail.co.uk.” Andrew Allwright, TRU Programme Delivery Lead for Northern. submitted by /u/CaptainYorkie1
[link] [comments]
---|--- -
🔗 r/Leeds TRU work between Leeds and Huddersfield during April (weekends), May and June rss
Network Rail is warning passengers of major disruptions to service on the line between Huddersfield and Leeds this May and June, due to engineering work involving the Transpennine Route Upgrade (TRU).
Additionally, trains will not stop at Huddersfield station at weekends during April.
The planned works are necessary to carry out essential infrastructure improvements, including track and drainage renewals, bridge work, signalling upgrades, and station enhancements along the Huddersfield to Leeds corridor.
Passengers are being strongly advised to plan ahead of travelling. Where possible, passengers will remain on trains using diversionary routes or services running as close as possible to affected areas. However, journey times may be increased, and some passengers will benefit from the use of rail replacement buses.
During the disruption, Brighouse Station will serve as a combined rail and bus interchange for those travelling to or from Huddersfield.
“We understand the ongoing work will cause some disruption, and we’d like to thank everybody for their patience throughout this period.
“The Transpennine Route Upgrade will bring major benefits, including more trains, more seats and faster, more reliable journeys across the North. These improvements, from electrifying the route to upgrading stations and modernising signalling, will help create a greener, more dependable railway that better serves local communities for years to come.”
Paul Sumner, TRU Senior Sponsor.
“TRU has made a lot of progress on the route between Huddersfield and Leeds over the last few years, and will continue to do so over the next few months, delivering once in a generation upgrades for our customers that will improve accessibility at our stations, as well as improving the reliability of the railway and enabling future upgrade works to take place along that corridor.
“While these works are being carried out over the coming months, our services will be operating to an amended timetable and in some instances, rail replacement buses will be in operation. Due to the scale of the upcoming work, we would advise that our customers check before they travel at www.nationrail.co.uk.”
Andrew Allwright, TRU Programme Delivery Lead for Northern.
submitted by /u/CaptainYorkie1
[link] [comments] -
🔗 r/Yorkshire Robin Hood's Bay ♥️ Have you ever been? rss
| @northyorkshire submitted by /u/Additional_Fly_6603
[link] [comments]
---|--- -
🔗 r/Yorkshire Yorkshire you beauty 😍🤌 rss
| @theworldinframes submitted by /u/Mundane-Temporary426
[link] [comments]
---|--- -
🔗 r/reverseengineering /r/ReverseEngineering's Weekly Questions Thread rss
To reduce the amount of noise from questions, we have disabled self-posts in favor of a unified questions thread every week. Feel free to ask any question about reverse engineering here. If your question is about how to use a specific tool, or is specific to some particular target, you will have better luck on the Reverse Engineering StackExchange. See also /r/AskReverseEngineering.
submitted by /u/AutoModerator
[link] [comments] -
🔗 HexRaysSA/plugin-repository commits sync repo: +1 release rss
sync repo: +1 release ## New releases - [DeepExtract](https://github.com/marcosd4h/DeepExtractIDA): 0.9.8 -
🔗 r/reverseengineering GitHub - merces/awesome-hex-editors: A curated and enriched list of awesome hexadecimal editors for Windows, macOS, and Linux. rss
submitted by /u/nandu88
[link] [comments] -
🔗 r/LocalLLaMA Feels like magic. A local gpt-oss 20B is capable of agentic work rss
| I gave a try to zeroclaw agent (intstead of the bloated and overhyped one). After few hours of fuckery with configs it's finally useful. Both main and embeddings models are running locally.
I carefully read what it's trying to execute in shell, and permit only [relatively] safe tools in config.
So far it can interact with macOS apps, web pages, and local files while keeping all my data private.
gpt-oss 20B has its limits though, it loses focus after 15-20 steps and often needs direct instructions to use persistent memory. It also starts behaving weirdly if tool access has been denied or tool returned some error. submitted by /u/Vaddieg
[link] [comments]
---|--- -
🔗 r/LocalLLaMA Qwen3's most underrated feature: Voice embeddings rss
| Did you know that Qwen3 TTS utilizes voice embedding for voice cloning?
Your voice is turned into a vector of 1024 dimensions (or 2048 for 1.7b), and based on this vector alone you can get your custom voice. But the coolest part is that this means that you can use math to modify voices, average voices. You can swap gender, pitch, mix and match voices, and even create an emotion space! This also enables semantic voice search! The voice embedding model is actually just a tiny encoder with just a few million parameters. I've ripped it out of the voice embedding model so you can use the embedding model standalone. Check out my collection! :D I also have onnx models for optimized web / front-end inference. https://huggingface.co/collections/marksverdhei/qwen3-voice-embedding Voice embedings can be used for inference in my vllm-omni fork until it is supported in upstream: https://github.com/heiervang-technologies/ht-vllm-omni submitted by /u/k_means_clusterfuck
[link] [comments]
---|--- -
🔗 Rust Blog Rust debugging survey 2026 rss
We're launching a Rust Debugging Survey.
Various issues with debugging Rust code are often mentioned as one of the biggest challenges that annoy Rust developers. While it is definitely possible to debug Rust code today, there are situations where it does not work well enough, and the quality of debugging support also varies a lot across different debuggers and operating systems.
In order for Rust to have truly stellar debugging support, it should ideally:
- Support (several versions!) of different debuggers (such as GDB, LLDB or CDB) across multiple operating systems.
- Implement debugger visualizers that are able to produce quality presentation of most Rust types.
- Provide first-class support for debugging
asynccode. - Allow evaluating Rust expressions in the debugger.
Rust is not quite there yet, and it will take a lot of work to reach that level of debugger support. Furthermore, it is also challenging to ensure that debugging Rust code keeps working well, across newly released debugger versions, changes to internal representation of Rust data structures in the standard library and other things that can break the debugging experience.
We already have some plans to start improving debugging support in Rust, but it would also be useful to understand the current debugging struggles of Rust developers. That is why we have prepared the Rust Debugging Survey, which should help us find specific challenges with debugging Rust code.
You can fill out the surveyhere.
Filling the survey should take you approximately 5 minutes, and the survey is fully anonymous. We will accept submissions until Friday, March 13th, 2026. After the survey ends, we will evaluate the results and post key insights on this blog.
We would like to thank Sam Kellam (@hashcatHitman) who did a lot of great work to prepare this survey.
We invite you to fill the survey, as your responses will help us improve the Rust debugging experience. Thank you!
-
🔗 Baby Steps What it means that Ubuntu is using Rust rss
Righty-ho, I'm back from Rust Nation, and busily horrifying my teenage daughter with my (admittedly atrocious) attempts at doing an English accent1. It was a great trip with a lot of good conversations and some interesting observations. I am going to try to blog about some of them, starting with some thoughts spurred by Jon Seager's closing keynote, "Rust Adoption At Scale with Ubuntu".
There are many chasms out there
For some time now I've been debating with myself, has Rust "crossed the chasm"? If you're not familiar with that term, it comes from a book that gives a kind of "pop-sci" introduction to the Technology Adoption Life Cycle.
The answer, of course, is it depends on who you ask. Within Amazon, where I have the closest view, the answer is that we are "most of the way across": Rust is squarely established as the right way to build at-scale data planes or resource-aware agents and it is increasingly seen as the right choice for low- level code in devices and robotics as well - but there remains a lingering perception that Rust is useful for "those fancy pants developers at S3" (or wherever) but a bit overkill for more average development3.
On the other hand, within the realm of Safety Critical Software, as Pete LeVasseur wrote in a recent rust-lang blog post, Rust is still scrabbling for a foothold. There are a number of successful products but most of the industry is in a "wait and see" mode, letting the early adopters pave the path.
"Crossing the chasm" means finding "reference customers"
The big idea that I at least took away from reading Crossing the Chasm and other references on the technology adoption life cycle is the need for "reference customers". When you first start out with something new, you are looking for pioneers and early adopters that are drawn to new things:
What an early adopter is buying [..] is some kind of change agent. By being the first to implement this change in the industry, the early adopters expect to get a jump on the competition. - from Crossing the Chasm
But as your technology matures, you have to convince people with a lower and lower tolerance for risk:
The early majority want to buy a productivity improvement for existing operations. They are looking to minimize discontinuity with the old ways. They want evolution, not revolution. - from Crossing the Chasm
So what is most convincing to people to try something new? The answer is seeing that others like them have succeeded.
You can see this at play in both the Amazon example and the Safety Critical Software example. Clearly seeing Rust used for network services doesn't mean it's ready to be used in your car's steering column4. And even within network services, seeing a group like S3 succeed with Rust may convince other groups building at-scale services to try Rust, but doesn't necessarily persuade a team to use Rust for their next CRUD service. And frankly, it shouldn't! They are likely to hit obstacles.
Ubuntu is helping Rust "cross the (user-land linux) chasm"
All of this was on my mind as I watched the keynote by Jon Seager, the VP of Engineering at Canonical, which is the company behind Ubuntu. Similar to Lars Bergstrom's epic keynote from year's past on Rust adoption within Google, Jon laid out a pitch for why Canonical is adopting Rust that was at once visionary and yet deeply practical.
"Visionary and yet deeply practical" is pretty much the textbook description of what we need to cross from early adopters to early majority. We need folks who care first and foremost about delivering the right results, but are open to new ideas that might help them do that better; folks who can stand on both sides of the chasm at once.
Jon described how Canonical focuses their own development on a small set of languages: Python, C/C++, and Go, and how they had recently brought in Rust and were using it as the language of choice for new foundational efforts, replacing C, C++, and (some uses of) Python.
Ubuntu is building the bridge across the chasm
Jon talked about how he sees it as part of Ubuntu's job to "pay it forward" by supporting the construction of memory-safe foundational utilities. Jon meant support both in terms of finances - Canonical is sponsoring the Trifecta Tech Foundation's to develop sudo- rs and ntpd- rs and sponsoring the uutils org's work on coreutils - and in terms of reputation. Ubuntu can take on the risk of doing something new, prove that it works, and then let others benefit.
Remember how the Crossing the Chasm book described early majority people? They are "looking to minimize discontinuity with the old ways". And what better way to do that than to have drop-in utilities that fit within their existing workflows.
The challenge for Rust: listening to these new adopters
With new adoption comes new perspectives. On Thursday night I was at dinner5 organized by Ernest Kissiedu6. Jon Seager was there along with some other Rust adopters from various industries, as were a few others from the Rust Foundation and the open-source project.
Ernest asked them to give us their unvarnished takes on Rust. Jon made the provocative comment that we needed to revisit our policy around having a small standard library. He's not the first to say something like that, it's something we've been hearing for years and years - and I think he's right! Though I don't think the answer is just to ship a big standard library. In fact, it's kind of a perfect lead-in to (what I hope will be) my next blog post, which is about a project I call "battery packs"7.
To grow, you have to change
The broader point though is that shifting from targeting "pioneers" and "early adopters" to targeting "early majority" sometimes involves some uncomfortable changes:
Transition between any two adoption segments is normally excruciatingly awkward because you must adopt new strategies just at the time you have become most comfortable with the old ones. [..] The situation can be further complicated if the high-tech company, fresh from its marketing success with visionaries, neglects to change its sales pitch. [..] The company may be saying "state-of-the-art" when the pragmatist wants to hear "industry standard". - Crossing the Chasm (emphasis mine)
Not everybody will remember it, but in 2016 there was a proposal called the Rust Platform. The idea was to bring in some crates and bless them as a kind of "extended standard library". People hated it. After all, they said, why not just add dependencies to your
Cargo.toml? It's easy enough. And to be honest, they were right - at least at the time.I think the Rust Platform is a good example of something that was a poor fit for early adopters, who want the newest thing and don't mind finding the best crates, but which could be a great fit for the Early Majority.8
Anyway, I'm not here to argue for one thing or another in this post, but more for the concept that we have to be open to adapting our learned wisdom to new circumstances. In the past, we were trying to bootstrap Rust into the industry's consciousness - and we have succeeded.
The task before us now is different: we need to make Rust the best option not just in terms of "what it could be " but in terms of "what it actually is " - and sometimes those are in tension.
Another challenge for Rust: turning adoption into investment
Later in the dinner, the talk turned, as it often does, to money. Growing Rust adoption also comes with growing needs placed on the Rust project and its ecosystem. How can we connect the dots? This has been a big item on my mind, and I realize in writing this paragraph how many blog posts I have yet to write on the topic, but let me lay out a few interesting points that came up over this dinner and at other recent points.
Investment can mean contribution, particularly for open-source orgs
First, there are more ways to offer support than $$. For Canonical specifically, as they are an open-source organization through-and-through, what I would most want is to build stronger relationships between our organizations. With the Rust for Linux developers, early on Rust maintainers were prioritizing and fixing bugs on behalf of RfL devs, but more and more, RfL devs are fixing things themselves, with Rust maintainers serving as mentors. This is awesome!
Money often comes before a company has adopted Rust, not after
Second, there's an interesting trend about $$ that I've seen crop up in a few places. We often think of companies investing in the open-source dependencies that they rely upon. But there's an entirely different source of funding, and one that might be even easier to tap, which is to look at companies that are considering Rust but haven't adopted it yet.
For those "would be" adopters, there are often individuals in the org who are trying to make the case for Rust adoption - these individuals are early adopters, people with a vision for how things could be, but they are trying to sell to their early majority company. And to do that, they often have a list of "table stakes" features that need to be supported; what's more, they often have access to some budget to make these things happen.
This came up when I was talking to Alexandru Radovici, the Foundation's Silver Member Directory, who said that many safety critical companies have money they'd like to spend to close various gaps in Rust, but they don't know how to spend it. Jon's investments in Trifecta Tech and the uutils org have the same character: he is looking to close the gaps that block Ubuntu from using Rust more.
Conclusions…?
Well, first of all, you should watch Jon's talk. "Brilliant", as the Brits have it.
But my other big thought is that this is a crucial time for Rust. We are clearly transitioning in a number of areas from visionaries and early adopters towards that pragmatic majority, and we need to be mindful that doing so may require us to change some of the way that we've always done things. I liked this paragraph from Crossing the Chasm:
To market successfully to pragmatists, one does not have to be one - just understand their values and work to serve them. To look more closely into these values, if the goal of visionaries is to take a quantum leap forward, the goal of pragmatists is to make a percentage improvement-incremental, measurable, predictable progress. [..] To market to pragmatists, you must be patient. You need to be conversant with the issues that dominate their particular business. You need to show up at the industry-specific conferences and trade shows they attend.
Re-reading Crossing the Chasm as part of writing this blog post has really helped me square where Rust is - for the most part, I think we are still crossing the chasm, but we are well on our way. I think what we see is a consistent trend now where we have Rust champions who fit the "visionary" profile of early adopters successfully advocating for Rust within companies that fit the pragmatist, early majority profile.
Open source can be a great enabler to cross the chasm…
It strikes me that open-source is just an amazing platform for doing this kind of marketing. Unlike a company, we don't have to do everything ourselves. We have to leverage the fact that open source helps those who help themselves - find those visionary folks in industries that could really benefit from Rust, bring them into the Rust orbit, and then (most important!) support and empower them to adapt Rust to their needs.
…but only if we don't get too "middle school" about it
This last part may sound obvious, but it's harder than it sounds. When you're embedded in open source, it seems like a friendly place where everyone is welcome. But the reality is that it can be a place full of cliques and "oral traditions" that "everybody knows"9. People coming with an idea can get shutdown for using the wrong word. They can readily mistake the, um, "impassioned" comments from a random contributor (or perhaps just a troll…) for the official word from project leadership. It only takes one rude response to turn somebody away.
What Rust needs most is empathy
So what will ultimately help Rust the most to succeed? Empathy in Open Source. Let's get out there, find out where Rust can help people, and make it happen. Exciting times!
-
I am famously bad at accents. My best attempt at posh British sounds more like Apu from the Simpsons. I really wish I could pull off a convincing Greek accent, but sadly no. ↩︎
-
Another of my pearls of wisdom is "there is nothing more permanent than temporary code". I used to say that back at the startup I worked at after college, but years of experience have only proven it more and more true. ↩︎
-
Russel Cohen and Jess Izen gave a great talk at last year's RustConf about what our team is doing to help teams decide if Rust is viable for them. But since then another thing having a big impact is AI, which is bringing previously unthinkable projects, like rewriting older systems, within reach. ↩︎
-
I have no idea if there is code in a car's steering column, for the record. I assume so by now? For power steering or some shit? ↩︎
-
Or am I supposed to call it "tea"? Or maybe "supper"? I can't get a handle on British mealtimes. ↩︎
-
Ernest is such a joy to be around. He's quiet, but he's got a lot of insights if you can convince him to share them. If you get the chance to meet him, take it! If you live in London, go to the London Rust meetup! Find Ernest and introduce yourself. Tell him Niko sent you and that you are supposed to say how great he is and how you want to learn from the wisdom he's accrued over the years. Then watch him blush. What a doll. ↩︎
-
If you can't wait, you can read some Zulip discussion here. ↩︎
-
The Battery Packs proposal I want to talk about is similar in some ways to the Rust Platform, but decentralized and generally better in my opinion- but I get ahead of myself! ↩︎
-
Betteridge's Law of Headlines has it that "Any headline that ends in a question mark can be answered by the word no ". Well, Niko's law of open-source2 is that "nobody actually knows anything that 'everybody' knows". ↩︎
-
-
- February 22, 2026
-
🔗 IDA Plugin Updates IDA Plugin Updates on 2026-02-22 rss
IDA Plugin Updates on 2026-02-22
New Releases:
Activity:
- capa
- f03ee75d: doc: document that default output shows top-level matches only; -v/-v…
- DeepExtractIDA
- msc-thesis-LLMs-to-rank-decompilers
- python-elpida_core.py
- capa
-
🔗 r/Leeds John Betjeman visiting Whitelocks. (B.B.C. 1968) rss
submitted by /u/AllColoursSam
[link] [comments] -
🔗 r/Yorkshire Pub recommendations rss
Does anyone have good pub recommendations around Skipton, Keighley, Ilkley, Harrogate, Ripon etc? I'm going to be frequently visiting this area soon for my photography project, so any recommendations would be welcomed. Also in any other nice nearby towns, I'll check them out.
submitted by /u/secretlythepotato
[link] [comments] -
🔗 r/Leeds Places to spot fox? rss
Hey! I’m an international student currently living in the Leeds.
A friend of mine who’s studying down South of UK sent me a photo yesterday of a fox hanging out by his garden fence. And this might sound a bit weird, but I was honestly kind of jealous 😅
Leeds is obviously one of the bigger cities in the UK, so I guess it’s harder to spot foxes just around residential areas here. But I was wondering are there any spots around Leeds where you’re relatively more likely to see foxes?
submitted by /u/FriedbAconsPam
[link] [comments] -
🔗 r/york I think Goshawk caught a pigeon out the sky next to me at the side of Foss bank car park! rss
| submitted by /u/TrapperTrev
[link] [comments]
---|--- -
🔗 r/LocalLLaMA The Qwen team verified that there are serious problems with the data quality of the GPQA and HLE test sets. rss
| About a month ago, a friend of mine posted a thread here (https://www.reddit.com/r/LocalLLaMA/comments/1qhz9e2/research_i_forensicaudited_humanitys_last_exam/) regarding a project he started called DeepSeek-Overclock. The goal was to create an experimental setup designed to theoretically push the model's reasoning capabilities to the absolute limit. However, the "overclocked" DeepSeek model kept failing during the process. After diving deep into the logs, he realized the model wasn't hallucinating. In many instances, it was rigorously deriving answers that were technically correct but contradicted the provided "gold standard" labels. He ended up writing Python scripts to verify the math line-by-line from first principles. Then he found out that the data quality in both the GPQA and HLE (Humanity's Last Exam) test sets is seriously flawed. (You can check the link above for the specific details of that investigation). Fast forward to a couple of days ago, and the Qwen team just released a paper that basically confirms exactly what we saw: the data quality in GPQA and HLE is a mess. https://preview.redd.it/l8duwvse42lg1.png?width=1291&format=png&auto=webp&s=faffe857435fb66cfd990db707f41333e58fcc20 Attached the screenshot of Fig. 1: Structural composition of HLE-Verified. Arxiv Link: https://arxiv.org/abs/2602.13964v2 The paper doesn't mince words. Right from the intro, it bluntly points out that a lot of the questions in the HLE test set are fundamentally broken. And in some cases, "standard answers" that are straight-up wrong. submitted by /u/w1nter5n0w
[link] [comments]
---|--- -
🔗 r/Yorkshire £150,000 funding to help reduce North Yorkshire cold deaths rss
| submitted by /u/Kagedeah
[link] [comments]
---|--- -
🔗 r/Leeds Scalarama Leeds DIY Film Festival - Fundraiser & Film Screening! 05/03/26 - Holbeck Working Men's Club rss
submitted by /u/Aleece
[link] [comments] -
🔗 r/Yorkshire Yorkshire Moors Wind Farm walking route recommendations rss
I've long been weirdly entranced by wind turbines. I love seeing them off in the distance from the M62, these strange white giants sticking out of the moors like birthday candles.
I've always wanted to see them up close.
I live in Leeds - can you recommend any walking routes I can get to easily via train that can take me to a good wind farm? Even better if it's a loop.
I'd rather not trespass or go too far off the beaten path if it can be avoided.
submitted by /u/PancreousDiabeetus
[link] [comments] -
🔗 r/LocalLLaMA Which one are you waiting for more: 9B or 35B? rss
| submitted by /u/jacek2023
[link] [comments]
---|--- -
🔗 r/LocalLLaMA I think openclaw is OVERHYPED. Just use skills rss
I think openclaw is useful, loop, memory, agents, integrations, but after a week a testing, honestly I don't need it much.
- memory, is nice. But I prefere to have "manual memory". Prompt: Ok, write what yout learnt in "superreporttrending-skill". Automatic memory often pollute the context of info you don't care.
- cron. Useful but I already use other tools for that and I can always recall a skill whenever i want. I don't need everyday at 8:00AM, i prefere recall it when i want with up to date data
Conclusion: for me "opencode web" is a much superior option, but much of the "intelligence" and value is the skills that you develop or you integrate, not in the runner itself, what do you think ?
submitted by /u/Deep_Traffic_7873
[link] [comments] -
🔗 r/wiesbaden Rhein-Main Uferkonzept rss
Wie steht ihr zum Rhein-Main Uferkonzept?
Hier der Maßnahmenkatalog im PDF Format, da kann man easy den für einen selbst wichtigsten Flussabschnitt und die geplanten Veränderungen raussuchen.
060.020.2.16-RMUK_Massnahmenkatalog_230201_web.pdf https://share.google/8CYKkzdEistaRPxt0
Welche geplanten Maßnahmen sprechen euch am meisten an?
Ich würde mir tatsächlich die Umsetzung aller geplanten Schritte für meinen Bereich (Kastel) wünschen, finde es klingt richtig cool teilweise.
submitted by /u/newtwoothis
[link] [comments] -
🔗 r/reverseengineering Obfusk8: lightweight Obfuscation library based on C++17 / Header Only for windows binaries rss
submitted by /u/NoBad8130
[link] [comments] -
🔗 r/Leeds Saw in Woodhouse Moor late last night rss
Honestly kind of impressed, those things look heavy.
submitted by /u/greg225
[link] [comments] -
🔗 HexRaysSA/plugin-repository commits sync repo: +2 plugins, -2 plugins, +8 releases, -3 releases, ~1 changed rss
sync repo: +2 plugins, -2 plugins, +8 releases, -3 releases, ~1 changed ## New plugins - [Binwalk](https://github.com/kevinmuoz/binwalk-reversing-plugin) (0.0.1) - [DBImporter](https://github.com/HexRaysSA/ida-dbimporter) (0.0.1) ## New releases - [augur](https://github.com/0xdea/augur): 0.8.0 - [haruspex](https://github.com/0xdea/haruspex): 0.8.0 - [hrtng](https://github.com/KasperskyLab/hrtng): 3.8.88 - [idalib-rust-bindings](https://github.com/idalib-rs/idalib): 0.8.1, 0.8.0 - [rhabdomancer](https://github.com/0xdea/rhabdomancer): 0.8.0 ## Changes - [iOSHelper](https://github.com/yoavst/ida-ios-helper): - 1.0.19: archive contents changed, download URL changed - [idalib-rust-bindings](https://github.com/idalib-rs/idalib): - host changed: binarly-io/idalib → idalib-rs/idalib - removed version(s): 0.7.2 ## Removed plugins - comida - ipyida -
🔗 r/york Hand tattooing at Jorvik festival. Effort has been made to blur out the face. rss
| submitted by /u/tyw7
[link] [comments]
---|--- -
🔗 exe.dev Show, don’t tell rss
I have been tasked, by process of elimination, with developing exe.dev’s marketing strategy.
In one very important sense, marketing exe.dev is easy. I am proud of what we have built, so I am happy to talk about it to anyone who will do me the kindness of listening. Being happy enough with my work to talk about it is rare for me, and usually the biggest hurdle.
In another sense, it is the worst time to talk about anything. The over-the- top marketing around AI is tiring. Yes, the technical possibilities created by LLMs are wondrous. I have not been this excited about computers since I got a modem! Yet the noise is deafening.
With general SNR so low, the only marketing strategy that makes sense to me is: Show, don’t tell.
It turns out we are already doing this, to an extent. We built a landing page back in December that was described by the top comment on HN as “the worst website ever” because all it does is tell you to
ssh exe.dev. Even there, what we show you is the infrastructure we built, that your agent needs. You need VMs. Lots of VMs. You need TLS and auth taken care of for you. You need it easy enough to use an agent can figure it out and have context window left over to write your program.This strategy is going really well! A lot of users have shown up, asking technical questions and we are growing rapidly.
But along the way, it is easy for a lot of the value of the product to be hard to discover. I would like to tell you about one, but in the spirit of “show, don’t tell” let me show you something I have done with it.
Write private apps from your phone. Share them.
So I am going to walk through an example of the most powerful feature that is hard (today) to discover from our home page:
Shelley is for your phone
In our default VM image, we install codex and claude. We also include our own agent, Shelley. It is the second general-purpose programming agent we have written, and we wrote it for a very particular reason: we need an agent with a web UI.
The need for a web UI is debatable on a desktop. There are clear advantages, but also drawbacks. But on mobile the case is clear. I’m not going to press a teeny tiny “Ctrl+Z” button in an iOS terminal emulator. Sorry. I need all the affordances of mobile UI design from the past 19 years to be able to use such a tiny screen without a real keyboard.
With this, you can visit exe.dev/new and type in a prompt. We start a VM, and feed the prompt to the copy of Shelley running on your VM. Because it’s a fresh, isolated VM we have no permission nagging, the agent is root and can do whatever it needs to get the job done. (This is wonderfully powerful, so much so you should be sure to only do it in an isolated VM.)
When you are done, in the exe.dev web UI you will find a share URL under your VM:
https://anycaster.exe.xyz?share=AVERYSECRETTOKENSend that to anyone and they can login by verifying their email (no subscription required). Your app is private by default but shared easily.
Example: anycaster
This is the “show, don’t tell” part of the post. Here is a real app I wrote, on my phone, to do serious work.
As part of rolling out exe.dev to multiple regions, we are exploring putting all of our VMs automatically behind a global anycast network. The sort of thing you might buy, e.g. AWS GLB, but we are doing it with more direct vendors because, well, we intend to be a cloud provider. That means doing the work.
So we rolled out a small set of global frontends and had a /24 anycast onto it by one of the vendors. It seemed to work. But I noticed surprisingly long ping times from home. So I added an endpoint to our frontends reporting location, much like you can do with Cloudflare:
curl -s https://1.1.1.1/cdn-cgi/trace | grep coloWhat I found was my home network (in California) was misrouting to our Frankfurt frontend. Ouch.
So I wanted to report this to the netops team we are working with so they could fix it, but I also wanted to build out our own tooling for testing routes we care about. (These tools exist, but I wanted to log history and use them from my phone.) So I went to exe.dev/new, and typed in:
Hello. The anycaster server is designed to test exe.dev anycast. Setup Tailscale with the token: TS_TOKEN=... Write a Go server that, every hour, uses `tailscale exit-node list --filter=` to find a Mullvad exit node in the right region, uses `tailscale set --exit-node=` to set the machine to use it, and then inspects the state of the world with: curl -s https://1.1.1.1/cdn-cgi/trace | grep colo This is our base standard about "where we are". Then run: curl -s http://<our anycast IP>/debug/who This is the test of our anycast, and tells you which of our servers we are connected to. You can get the list of frontends with: … I want you to test access from (and if these exact locations don't exist, something geographically near): … Then for each of these run: sudo mtr -rwzbs 200 --tcp -P 80 <our anycast IP> to give us a route to the target. Store the results in an sqlite DB. Each of these should connect to the geographically closest FE. Show the latest results on the web page of anycaster.This is a long prompt to type out on a phone, but I was walking to a meeting and I generally type far too much on my phone. The TS_TOKEN I keep in an Apple Note for apps like these. (I realize that’s not great! I am working on trying to get Tailscale setup to be easier on exe.dev. You could also for this case skip Tailscale and use the mullvad client directly. But I write apps with what I know.)
I put my phone away, had the meeting, and then afterwards pulled it out and found this app waiting for me:

This is a recent screenshot. The original version had fewer frontends, and more ✗ results. Our anycast network needed a lot of tuning.
I sent the share link to the netops team and asked them to fix it. They did! Minus whatever strange thing is happening in Chicago, I still need to investigate if that’s weirdness from Mullvad or from our anycast routing.
Over the next couple of weeks I tweaked it. In my Shelley conversation history you’ll find prompts like:
Two things please: - add an "edit in Shelley" button to anycaster to bring me here - relax Miami so it can connect to DALAnd:
We now have enough things to test that it's slow. I believe all the time is in running mtr. So let's do it in two phases: 1. go through them all and collect status. 2. when complete, go through the ones that fail and collect MTR. Make it clear what's happening visually.One big thing I changed was, every hour when this server ran the test, switching the exit node cut off my access to the web server. We have a plan to fix this generally for exe.dev users by improving some of our in-VM isolation, but until we did I wanted a quick fix. Out of curiosity, I typed in this prompt:
The current anycaster works well. However, when it is running the tests, the reconfiguration of Tailscale appears to break the exe.dev sshd server and thus the website (which proxies through it).
Is it possible to use Tailscale in an isolated network namespace? We can then run our connection tests like the mtr tool in that namespace. Please try implementing it.This was me, honestly, being silly. There is no world in which I would waste hours digging through the poor documentation around the
netnscommand for a little dashboard app like this, all over a minor inconvenience. I could lose a couple of days to something ridiculous like this in the pre-LLM world. But when you’re typing idly on a phone you can dream of software you would never realistically write.When I came back Shelley left me with a summary and some ASCII art:
### Changes Made 1. **`tailscale-netns.service`** - New systemd service that: - Creates a Linux network namespace called `tailscale` - Sets up a veth pair (`veth-host` ↔ `veth-ts`) to connect namespace to host - Configures IPs (10.200.200.1/24 on host, 10.200.200.2/24 in namespace) - Enables IP forwarding and NAT so namespace can reach the internet - Cleans up on stop ... ### How It Works ┌─────────────────────────────────────────────────────────┐ │ HOST (exe.dev VM) │ │ │ │ eth0 ─────────────────────────────────► Internet │ │ (exe.dev sshd uses this directly) │ │ │ │ veth-host (10.200.200.1) │ │ │ │ │ │ NAT │ │ ▼ │ │ ┌───────────────────────────────────────────┐ │ │ │ NAMESPACE "tailscale" │ │ │ │ │ │ │ │ veth-ts (10.200.200.2) │ │ │ │ │ │ │ │ │ ▼ │ │ │ │ tailscale0 ────► Exit nodes ────► Tests │ │ │ │ │ │ │ └───────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────┘ - **Host traffic** (including exe.dev sshd) uses eth0 directly - **Anycaster tests** run `curl`/`mtr` inside the namespace - **tailscale CLI** commands work from outside (socket-based communication)It worked.
It dug its way through all these painfully undocumented Linux subsystems and isolated my Tailscale daemon in a network namespace. Every hour when the namespace starts changing its location, I can still access the web server. In fact I wrote a few more prompts to add nice status updates about the background work to the UI.
The result of all of this is we are still investigating and trialing vendor products for our global network frontend, and studying how these products handle changing network routes. We hope to launch something soon.
A note on the future of engineering
What is surprising to me, in retrospect, is at no point in writing or using this app, or sharing it with half a dozen netops folks, did I ever look at the code. I spot checked the outputs, I looked at the bad mtr traces and made sure the conclusions fit with the data. We eventually cross-checked with RIPE. The output is correct. I am writing serious business tools that affect the company’s decisions, with prompts, from my phone.
Again, I don’t want to oversell anything. I have used Shelley to prototype changes to our web UI and add minor features. But almost all of our changes to exe.dev infrastructure are done on desktop. We read source changes to critical systems very carefully.
What we see here is a new kind of small private shareable app development appearing. You won’t be writing a web browser in it any time soon. But you might be building fun programs for friends or colleagues that you couldn’t fit in a spreadsheet.
Write some fun games with your kids on the couch. Or clone a round of Only Connect. Before you know it you will think of something you need at work and want to share with the team.
-
- February 21, 2026
-
🔗 IDA Plugin Updates IDA Plugin Updates on 2026-02-21 rss
IDA Plugin Updates on 2026-02-21
New Releases:
Activity:
-
🔗 r/york Bakers/ Cakes in York rss
| I'm in York at the end of April and I'm looking for somewhere to have my birthday cake made, I don't want anything too crazy but I was wondering if anyone had any recommendations of bakery pages/ small Businesses/ shops that I could order from? (This is my cake from last year and I'd like something similar) submitted by /u/moonlight_brat
[link] [comments]
---|--- -
🔗 r/LocalLLaMA Favourite niche usecases? rss
| submitted by /u/Figai
[link] [comments]
---|--- -
🔗 HexRaysSA/plugin-repository commits Merge pull request #22 from kevinmuoz/v1 rss
Merge pull request #22 from kevinmuoz/v1 -
🔗 Register Spill Joy & Curiosity #75 rss
Where's software going? Is software… dead? Or will there be more software than we ever thought possible? Or is it going to disappear, into the agents? Or is it going to grow and grow and then truly eat the world? Who's going to create it?
There's few things right now that I find more fascinating than these questions. Of course, I don't have answers and I don't think anyone has. Guesses, sure. Theories, absolutely. Anecdotes? Here's some.
Geoffrey Litt, standing in a hotel gym, asked Claude for a workout plan and got an app that guides him through the plan. Huh. Then Ryan Florence threw away his workout app and just asked ChatGPT's voice mode to guide him through a workout. Where's the software gone?
A couple weeks back I thought: maybe I should set up Clawdbot and hook it up to our shopping list in Todoist and then my wife and I can use a group chat to manage that list. We could even use voice messages:__ hey pal I'm in the car woops wait a second … yeah we're out of paper towels. That'd be cool, right? But then: wait, why would I need Todoist? State could just live in that conversation or on Clawdbot's disk, right? And then: but sometimes I do want a better UI than a group chat, don't I? But when and why?
This week I was this close to typing something into the Slack search bar. I already had some keywords and combinations of keywords ready to go. I had already put the cursor in when I remembered that we have
agg, an internal tool that Tim blind-coded and that connects to Slack and Google workspaces and whatnot, and so I asked Amp: hey, didn't so and so say that we they migrated this thing and now we all need to? Amp via agg found it in five seconds. No keyword, no UI. Okay.As Alex says: "It feels like a maxim is emerging - if your software is useful to agents, your product is going to be 10x more valuable than before, but if your software is built for humans, you're dead." And Sahil Lavigna says that "gh is the new GitHub."
But there is still software, isn't there? I'm typing this through software. And I had Amp create many hundreds of lines of personal software for me, but that software is so personal that I won't release it, because why bother? The cost of generalizing it is higher than the cost of creating it. So you won't ever see it. Invisible software.
Say that I do release some software that took me an hour to create. Or let's say six hours. A small useful app, with some heft to it. You know what I mean. A good workout tracker. Or a little menu bar app. Or a browser extension. Say I sell it for $5. Won't a hundred competitors be able to recreate what I did in thirty minutes? Prices will go to zero. Why bother?
Last anecdote. I've been meaning to create a little booklet. A physical thing, printed professionally. Weeks ago I had Nano Banana and ChatGPT tag-teaming and they created the logo that'd go on the front. Then work stalled because I couldn't be bothered to look up the dimensions the print company needs and CMYK and PDFs and all of that and ugh, please. So I sent exactly that to ChatGPT: here's the URL of the product description, here's the logo in 4 formats, here's the mockup someone (wink wink) created, please help me man. It ran for 15, 20 minutes and gave me a PDF. I uploaded it on the printer's website, following the 6 steps ChatGPT outlined for me, got an error, told ChatGPT about the error also asked for some adjustments, got a new PDF, uploaded it, got the green checkmark, put my credit card in and now the booklet's on its way.
I then checked what ChatGPT did, in agent mode, and turns ou: it wrote a lot of code. It essentially created the PDF I needed by writing Python. Many, many lines of Python. And now they're gone and no one would've seen then if I hadn't looked.
So, where's the software going?
-
We at Amp think the coding agent is dead. Or maybe we should've said it's solved. Or that the text editor is dead. Point being: what we have right now isn't the future. There's more to build. And this is the model that made us realize it: GPT-5.3-Codex.
-
Don't believe us? Say it to our face. Most of the Amp team is in Singapore this week. Join us on Thursday. (I'm writing this at the airport.)
-
Harness engineering: leveraging Codex in an agent-first world, on the OpenAI blog. This is some of the best writing on agents hitting the real world and where this ride is going. You should read the whole thing, but this bit in particular stayed with me: "As Codex's throughput increased, many conventional engineering norms became counterproductive. The repository operates with minimal blocking merge gates. Pull requests are short-lived. Test flakes are often addressed with follow-up runs rather than blocking progress indefinitely. In a system where agent throughput far exceeds human attention, corrections are cheap, and waiting is expensive."
-
How will OpenAI compete? by Benedict Evans. Great, as always.
-
Chris Lattner took a close look at the C compiler produced by Claude Code. I have to admit that I started reading with the expectation that it's going to be about the compiler internals and what the AI got right and what it got wrong. And yes, that's in there, but there's more: thoughts about the AI in general, about IP law, about the shifting role of software engineers, about AI use at Modular. "Lower barriers to implementation do not reduce the importance of engineers; instead, they elevate the importance of vision, judgment, and taste. When creation becomes easier, deciding what is worth creating becomes the harder problem. AI accelerates execution, but meaning, direction, and responsibility remain fundamentally human."
-
Entertaining and interesting: How does Docusign have 7,000 employees?
-
Can Opus 4.6 do Category Theory in Lean? You know me: I don't understand any of the formulas in there and when I read "endofunctor" I do that Homer Simpson stare, but still (or because?) I found this very fascinating. "When this layer becomes trivial, we get to spend our time on the parts that actually matter: choosing the right abstractions, seeing the connections between structures, deciding what's worth formalizing in the first place. The proof assistant becomes less of a bureaucratic obstacle and more of a genuine thinking tool. We get to build higher." When category theory and formal specification languages become mainstream due to AI, call me.
-
AI fatigue is real and nobody talks about it: "When each task takes less time, you don't do fewer tasks. You do more tasks. Your capacity appears to expand, so the work expands to fill it. And then some. Your manager sees you shipping faster, so the expectations adjust. You see yourself shipping faster, so your own expectations adjust. The baseline moves. Before AI, I might spend a full day on one design problem. I'd sketch on paper, think in the shower, go for a walk, come back with clarity. The pace was slow but the cognitive load was manageable. One problem. One day. Deep focus." I find this very fascinating to think about, because it's true, isn't it? Back in the olden days, say in 2024, you could have a full day of programming in which you did nothing but program and yet there would still be moments of mindless execution that let you recover from moments of high concentration and focus. Now, with the mindless execution being done by the mindless, there's nothing left to act as a buffer between the intense moments. Except maybe distraction.
-
A "a Matt Levine style explanation of how OAuth works" given by Blaine, who, 19 years ago, "wrote the first sketch of an OAuth specification". We need more explanations like this!
-
This post has a lot of really interesting thoughts on where software as a business is going. This part here, on building financial software, is illustrative of some trends, I think: "Zero custom parsers. Zero industry-specific classifiers. Why? Because frontier models already know how to navigate a 10-K. They know that Home Depot's ticker is HD. […] Frontier models already know how to parse SEC filings from their training data. They understand the structure of a 10-K, where to find revenue recognition policies, how to reconcile GAAP and non-GAAP figures. You don't need to build a parser. The model IS the parser. Feed it a 10-K and it can answer any question about it. […] The data isn't worthless. But the 'making it searchable' layer, which is where a lot of the value and pricing power lived, is collapsing." Replace 'searchable' with other abilities and you see how it applies to more than just software to navigate SEC filings. And then, of course, there's competition: "The critical insight is that competition doesn't increase linearly--it explodes combinatorially. You don't go from 3 incumbents to 4. You go from 3 to 300. And that's what craters pricing power. Before LLMs, each vertical had 2-3 dominant players commanding premium prices because the barriers to entry were insurmountable. That math changes completely when 50 AI-native startups can offer 80% of the capability at 20% of the price." We already had five thousand TODO apps. What's the next category of software in which there'll be five thousand alternatives, selling for $5.99?
-
Similarly: "if your product isn't a system of record that ai tools can be built on top of, you're increasingly hard to justify keeping" But then the question is: how hard is it to reproduce that system of record? Todoist: easy. Your company's pay slips? Hard. Analytics? Performance data? Monitoring? Errors? Tickets?
-
Sean Goedecke compared how the recently released "fast" modes by OpenAI and Anthropic differ: Two different tricks for fast LLM inference. Interesting stuff, especially since he now collected and responded to some of the comments he got.
-
I Sold Out for $20 a Month and All I Got Was This Perfectly Generated Terraform. This is some real stuff -- some true stuff. I love the honesty and the humility. I love the "band of Eastern European programmers who chain smoke during calls and whose motto is basically 'we never miss a deadline'" and I love this part here: "I also just have trouble with the idea that this is my career and the thing I spend my limited time on earth doing and the quality of it doesn't matter. I delight in craftsmanship when I encounter it in almost any discipline. I love it when you walk into an old house and see all the hand crafted details everywhere that don't make economic sense but still look beautiful. I adore when someone has carefully selected the perfect font to match something. […] When I asked my EVE friend about it on a recent TeamSpeak session, he was quiet for awhile. I thought that maybe my moral dilemma had shocked him into silence. Then he said, 'You know what the difference is between you and me? I know I'm a mercenary. You thought you were an artist. We're both guys who type for money.'"
-
15,597 tok/s. Holy shit. And here's how they did it. Are there any physical or theoretical limits that would stop someone from doing the same for, say, GPT-5.3-Codex in a few years?
-
The Only Moat Left Is Money: "The effort is gone. Effort was the filter. I launched something last week. 14 people signed up -- no ads, just a couple of posts. 14 real people who didn't have to. That number is tiny and it felt like something. Then I sat down to think about what it would take to grow it and I couldn't look at that math for very long. The people winning mostly had a head start. Or they have money. Usually both. When creation was hard, skill was the differentiator: you had to actually be good to make something worth showing. Now the barrier is near zero, so you need reach. Reach costs money or it costs years. Probably both." I'm not sure I believe that effort doesn't count anymore, but the game is changing, which is fascinating and scary and exciting and crazy.
-
Child's Play, subtitled: tech's new generation and the end of thinking. Excellent.
-
The Software Development Lifecycle Is Dead. Not too sure about the specifics, but you know me: I agree.
-
"I built an agent for researching, coding, and running generative art animations for 16-segment displays. Will open source code and hardware design files soon."
-
Andy Coenen, who built the wonderful isometric nyc, on The Software Industrial Revolution. It's very, very good. To pick just one of the parts worth picking: "The old golden age is over, and it ain't coming back - no more 'rest and vest', no more ping-pong offsites and five-star catered lunches. But a new 'golden age' is coming - no more nights staring red-eyed at empty stack overflow issues, no more weeks of alignment meetings to ship a prototype. I believe it's never been a better time to build - not just software but anything you can dream of. The world is yours if you embrace this new reality and learn how to really use these tools". The other part worth mentioning is the one about "personal apps": sure, yes, grandma won't use AI to write her own sudoku app, but, as Andy suggests here, there are so many other people -- _professionals! -- _who sure would love to build better research tools for themselves.
-
I'm pretty sure this just changed how I think about intelligence: why aren't smart people happier? (That little "what if you booted up an AI in ancient Greece?" thought experiment is fun too.)
-
"Two old engineers were talking of their lives and boasting of their greatest projects. One of the engineers explained how he had designed the largest bridge ever made."
-
Robin Sloan on how far AI can expand: flood fill vs. the magic circle. Interesting to think about, but I can't help but wonder: does it matter that AI can't touch the physical world, when your career is 99% digital and you're looking at a screen a lot?
-
Jason Fried was on the David Senra podcast. What a perspective this guy has. Inspiring.
-
I've seen many, many, many stand-up specials over the years, because I enjoy stand-up comedy a lot and very earnestly believe it's one of the highest art forms we humans have created. Yes, I'm serious. I'm German. If there's one thing I don't joke around about it's comedy. But a stand-up special that makes me actually laugh out loud is a rare one. Kevin Nealon's latest special Loose in the Crotch did that. I nearly spit out food. God damn did I fall in love with that special. I've watched it twice since Tuesday. I know it's not everyone's cup of tea and if you don't like it you should keep that to yourself. But let me know if you do.
-
"This Fab Faux recording of most of side two of Abbey Road is a live, in the studio performance for a two camera video shoot. In the end, there were only three minor guitar fixes and each section was recorded in no more than three takes (most were two). There are NO added overdubs within this performance. The audio is pure." Uploaded fourteen years ago. I think I started watching this video in 2010, when it was uploaded to Vimeo. Treat yourself.
If you have thoughts on where this is going or want to find out, you should subscribe:
-
-
🔗 r/LocalLLaMA PSA: The software “Shade” is a fraudulent, plagiarized copy of Heretic rss
Three days ago, the following repository was published, which its “creator” has been aggressively promoting on various channels since then:
https://github.com/assemsabry/shade
The entire source code in the repository is plagiarized from Heretic (https://github.com/p-e-w/heretic), with only the project name and the copyright notice replaced, claiming “original authorship” of everything. The repository does not acknowledge Heretic as its source, and has erased the commit history and the names of all Heretic contributors.
I and several others have called the repository owner out, but he has deleted all issues and tried to cover up his wrongdoing by adding some bogus “additional features” using an AI agent. A quick look at the source files, however, reveals that they are still 95% identical to Heretic’s code. In some cases, only the copyright notice was replaced.
I can only assume that the ultimate goal is to push malware of some sort, and strongly advise people to stay clear of this plagiarized repository.
This is one of several incidents where malicious actors tried to profit from Heretic’s surging popularity during the past days, when it reached #1 on the GitHub trending chart and was posted in various social feeds that cater to scammers.
Please also see https://github.com/p-e-w/heretic/issues/167
I’m doing everything in my power to keep Heretic clean and available to everyone. Thank you for your encouragement in the past few months, it means the world to me!
submitted by /u/-p-e-w-
[link] [comments] -
🔗 r/york Roman Bath at York (under pub) rss
submitted by /u/tyw7
[link] [comments] -
🔗 r/wiesbaden Streik am Dienstag 24.02.2026 rss
Ich will nicht mehr. Ich kann nicht mehr. Ich halte das alles nicht mehr aus.
submitted by /u/Senoculidae
[link] [comments] -
🔗 r/reverseengineering processhacker mcp ( this is dynamic mcp server for runtime analysis and process hacking. it is like processhacker but for ai agents) rss
submitted by /u/Humble-Plastic-5285
[link] [comments] -
🔗 r/Leeds Any decent smaller comedy clubs? rss
Hello everyone so I like City Varieties and The Cardigan Arms pub for comedy in Leeds. Can anyone recommend a decent alternative that isn't too rowdy with decent seats?
City Varieties is a beautiful venue and has the bigger comedians but I do like watching local comics too. I don't drive so has to be City centre, Kirkstall, Horsforth, Headingley kinda areas.
Any suggestions would be super for me & my friend👌 cheers! X
submitted by /u/MasterMembership4506
[link] [comments] -
🔗 r/Yorkshire What's your opinion on Wakefield and the 5 towns? rss
An area of Yorkshire rarely mentioned in discussions, thought i'd shine a light on it!
submitted by /u/aaaaaaa_aaaaaa_aaa
[link] [comments] -
🔗 r/Yorkshire Thinking of visiting the Dales towards the end of summer. rss
Not entirely sure where the best place to stay would be, I'm thinking skipton/settle/ingleton as I am relying on trains from Manchester, want to do some hiking/photography
Are any of them considered cheaper to stay at than others ?
Never been in this area before, so know nothing about the place other than anyone whose ever been grabs you and slaps you about until you say you'll visit :D
submitted by /u/zibafu
[link] [comments] -
🔗 r/wiesbaden Gratis Rollenspiel Tag 2026: 28.3., Phantasos Arena Wiesbaden, 15 Runden rss
Anmeldung: Discord-Server der Phantasos Arena: https://discord.gg/2A6vjQ58yg
Daggerheart, DnD, Shadowrun, Cyberpunk RED, Mausritter, Alien, Mothership, Outgunned uvm
submitted by /u/Bitter-Secretary6006
[link] [comments] -
🔗 r/LocalLLaMA they have Karpathy, we are doomed ;) rss
| (added second image for the context) submitted by /u/jacek2023
[link] [comments]
---|--- -
🔗 HexRaysSA/plugin-repository commits feat: cleanup and new repo rss
feat: cleanup and new repo -
🔗 r/york York Minster: Historic Graffiti rss
| submitted by /u/Julija82
[link] [comments]
---|--- -
🔗 r/Yorkshire Yorkshire Water rss
Just got a bill with 8% price increase. Bastards.
Edit - looking more closely at the bill, the actual increase it’s much higher (around 30%) - it’s states ‘an average of 8% increase’ on the bill, despite their press releases saying 5.6%. Don’t know how they get away with such lack of clarity!
submitted by /u/Exciting_Stretch_847
[link] [comments] -
🔗 r/reverseengineering [Technical] Implementing a CRT-Free Sovereign PE: Custom IAT reconstruction and Hybrid EXE/DLL loading without MSVCRT rss
submitted by /u/Key_Handle_8753
[link] [comments] -
🔗 matklad Wrapping Code Comments rss
Wrapping Code Comments
Feb 21, 2026
I was today years old when I realized that:
- Code and code comments ideally should be wrapped to a different column.
- For comments, the width should be relative to the start of the comment.
It’s a good idea to limit line length to about 100 columns. This is a physical limit, the width at which you can still comfortably fit two editors side by side (see Size Matters). Note an apparent contradiction: the optimal width for readable prose is usually taken to be narrower, 60–70 columns. The contradiction is resolved by noticing that, for code, indentation eats into usable space. Typically, code is much less typographically dense than prose.
Still, I find comment blocks easier to read when they are wrapped narrower than the surrounding code. I want lines to be wrapped at 100, and content of comments to be wrapped at 70 (unless that pushes overall line to be longer than 100). That is, I want layout like this (using 20/30 rulers instead of 70/100, for illustrative purposes):
// Top level comments // can be this wide. const S = struct { // Nested comments are // also this wide, but // are shifted right. fn f() void { switch (value) { 0 => { // But there is // a hard limit. } } } }This feels obvious in retrospect, but notably isn’t be well-supported by the tools? The VS Code extension I use allows configuring dedicated fill column for comments, but doesn’t make it relative , so indented comment blocks are always narrower than top-level ones. Emacs
M-qalso doesn’t do relative wrapping out of the box!
Aside on hard-wrapping: should we bother with wrapping comments at all? Can’t we rely on our editor to implement soft-wrapping? The problem with soft- wrapping is that you can’t soft-wrap text correctly without understanding its meaning. Consider a markdown list:
A list: * item one, * item two.If the first item is long enough to necessitate wrapping, the wrapped line should also be indented, which requires parsing the text as markdown first:
A list: * item one which is long enough necessitate wrapping, * item two. -
🔗 Stephen Diehl Optimal Caverna Gameplay via Formal Methods rss
Optimal Caverna Gameplay via Formal Methods
I always win at Caverna (Uwe Rosenberg's classic European worker placement tabletop board game). Always. But "always" just means "every time so far," and I needed something with more mathematical permanence. So I formalized the entire game in Lean 4 and proved that my strategy is the unique weakly dominant pure strategy across every possible game configuration. My friends think this is excessive. My friends also lose at Caverna. Unrelated, I don't get invited to board game night much anymore.
Caverna: The Cave Farmers is the 2013 sequel to Agricola, a game about feeding dwarfs who live in caves and do a suspicious amount of farming. You place workers, gather resources, breed animals, excavate caverns, furnish rooms, and at the end of 12 rounds the scoring formula totals up everything you've accomplished and everything you failed to accomplish. It's a good game. It's also, in the 2-player variant, a finite deterministic perfect-information system with discrete phases, which means it's a labeled transition system, which means it's amenable to formal verification. So I did that.
The project is about 3,000 lines of Lean 4 spread across 19 modules: 11 definition files modeling the complete game (all 24 action spaces, all 48 unique furnishing tiles, the expedition loot system, board geometry, harvest schedule, scoring formula) and 8 theorem files containing 176 machine-checked proofs. The model covers all 2,880 possible 2-player game setups (144 card orderings times 20 harvest marker placements). The main result is that furnishing rush is the weakly dominant strategy. It is the optimal response to every opponent, in every setup, regardless of which cards come out when or where the harvest markers land.
The interactive proof blueprint has the full derivation with a dependency graph showing how every theorem connects. The source code compiles.
Labeled Transition Systems
A labeled transition system (LTS) is a triple \((S, A, T)\) where \(S\) is a set of states, \(A\) is a set of actions, and \(T \subseteq S \times A \times S\) is a transition relation specifying which state changes are legal. You start in some initial state satisfying an
initpredicate. The system evolves by taking actions: if \((s, a, s') \in T\), you can move from state \(s\) to state \(s'\) by performing action \(a\). A state is reachable if there's a finite chain of transitions from an initial state to it. A property is an invariant if it holds on every reachable state.In Lean 4, the LTS is three fields:
structure LTS (State : Type) (Action : Type) where init : State -> Prop trans : State -> Action -> State -> PropReachability is an inductive type with two constructors: initial states are reachable, and if you can reach \(s\) and take action \(a\) to get to \(s'\), then \(s'\) is reachable:
inductive Reachable (sys : LTS State Action) : State -> Prop where | init : forall s, sys.init s -> Reachable sys s | step : forall s a s', Reachable sys s -> sys.trans s a s' -> Reachable sys s'In plain terms: the LTS is the rulebook encoded as a state transition relation.
initsays what the starting board looks like, andtransencodes every legal move as a state-to-state constraint. The entire Caverna rulebook fits in about 250 lines of Lean: a 160-lineapplyPlacementfunction matching on all 24 action spaces with their effects, and a 90-linecavernaLTSdefinition wiring up phase transitions, harvest logic, and round progression. It's a relation rather than a function because Caverna has sub-choices within actions (sow grain vs. sow vegetable, which furnishing tile to install, which loot to take), so a single action space placement can lead to multiple successor states. The relation captures all of them.Once you have
initandtrans, you prove a property holds on all reachable states without enumerating them. You prove the base case (holds on initial states) and the inductive step (every valid transition preserves it). This is strictly stronger than testing. Testing checks specific play sequences. An invariant proof covers every sequence of legal moves across every setup, including sequences no human would ever play.Here's a small fragment of the Caverna LTS to give the flavor of how a round of play works as a state machine:
Each state records the player's resources and remaining dwarfs to place. Transitions correspond to action space selections. With 13 initial action spaces and 2 dwarfs, even a single round produces \(13 \times 12 = 156\) placement sequences. Over 12 rounds with family growth, the branching factor is astronomical, but the LTS doesn't care. The structure is finite, and every path through it is covered by the invariant proofs.
The Game as a Transition Relation
Board games are natural LTS candidates. In Caverna 2-player, the states are the complete game configurations (round number, phase, both players' inventories, board layouts, available action spaces), the actions are dwarf placements and harvest events, and the transition relation encodes the rulebook.
The game has five phases that cycle within each round:
inductive Phase where | placeP1 -- player 1 places a dwarf | placeP2 -- player 2 places a dwarf | harvest -- harvest phase (feeding, breeding, fields) | roundEnd -- round cleanup, advance to next round | gameOver -- game has endedThe full game state tracks everything both players could possibly have, plus the global state of the board:
structure GState where round : Nat phase : Phase p1 : FullPlayer p2 : FullPlayer p1IsFirst : Bool placementsLeft : Nat acc : AccState occupiedSpaces : List ActionSpaceId := [] harvestSchedule : Nat -> HarvestEvent wishIsUrgent : Bool := falseThe transition relation is a single function with a
matchon(gs.phase, act). Each case is a game rule. Here's the core of the placement logic for Player 1:def cavernaLTS (schedule : Nat -> HarvestEvent) : TransitionSystem.LTS GState GameAction where init := fun gs => gs = initFullGState schedule trans := fun gs act gs' => match gs.phase, act with | .placeP1, .place space choice => spaceAvailable gs.round space = true /\ spaceUnoccupied gs space = true /\ gs.placementsLeft > 0 /\ (let (p1', acc') := applyPlacement gs.p1 gs space choice let newPlacements := gs.placementsLeft - 1 let newOccupied := space :: gs.occupiedSpaces if newPlacements == 0 then gs' = { gs with p1 := p1', acc := acc', phase := .harvest, placementsLeft := 0, occupiedSpaces := newOccupied } else gs' = { gs with p1 := p1', acc := acc', phase := .placeP2, placementsLeft := newPlacements, occupiedSpaces := newOccupied }) -- ... Player 2, harvest, round end, game over ... | _, _ => FalseThat last line is beautiful.
| _, _ => Falsesays: any action not explicitly listed is illegal. The transition relation is closed. No undefined behavior, no edge cases, no "the rules don't say I can't." If it's not in the match, it doesn't happen.The
applyPlacementfunction is a 250-line match on all 24 action spaces, each encoding the exact effect from the rulebook. Blacksmithing forges a weapon from ore and runs an expedition. Wish for Children can only grow your family if you have a dwelling with capacity. Excavation gives you stone and lets you carve out cavern/tunnel pairs. Every sub-choice (sow grain vs. sow vegetable, build small pasture vs. large pasture, which furnishing tile to install) is a branch in theActionChoicetype.The Game Timeline
Twelve rounds. Green nodes are harvest rounds where dwarfs must be fed. Two critical milestones: "Wish for Children" at round 4 enables the first family growth (from 2 dwarfs to 3), and "Family Life" at round 8 enables the second (3 to 5, eventually). One new action space reveals each round, growing from 13 to 24 available choices. The interaction between when cards flip and when harvests hit is the clock that drives the entire strategic analysis.
The timeline matters because of this: without growth you get 44 total dwarf placements across all 12 rounds. With one growth at round 4, you get 47. With both growths, you get 56. That's a 27% increase in total actions from growing your family as fast as possible, and since actions are the binding constraint on everything else (scoring, food, resources), the 12-placement gap between "no growth" and "both growths" is the single most important strategic lever in the game.
Refinement Types: Making Illegal States Unrepresentable
One of the most satisfying patterns in the formalization is using dependent types to make illegal game states impossible to construct. Weapons are the clearest example. In the physical game, weapon strength ranges from 1 to 14. You could model this as a bare
Natand hope nobody passes in 0 or 15. Or you could make the type system enforce the constraint:structure Weapon where strength : Nat h__min : strength >= 1 h__max : strength <= 14Every
Weaponvalue carries proof that its strength is in range. This meansforgeWeaponmust produce evidence that the forged strength is valid, andupgradeWeaponmust show that incrementing stays within bounds:def forgeWeapon (oreSpent : Nat) (h__pos : oreSpent >= 1) : Option Weapon := if h : oreSpent <= maxInitialWeaponStrength then some { strength := oreSpent , h__min := h__pos , h__max := by simp [maxInitialWeaponStrength] at h; omega } else none def upgradeWeapon (w : Weapon) : Weapon := if h : w.strength < maxWeaponStrength then { strength := w.strength + 1 , h__min := by omega , h__max := by simp [maxWeaponStrength] at h; omega } else wThe
by omegacalls are Lean's linear arithmetic tactic closing the proof obligations automatically. Ifw.strength < 14, thenw.strength + 1 <= 14. IforeSpent >= 1, thenstrength >= 1. The type checker verifies this at compile time. No weapon in the entire formalization can ever have strength 0 or 15.The same pattern shows up in
RoundPlacements, which carries proof that all four dwarf placements within a round go to distinct action spaces:structure RoundPlacements where firstPlayer1 : ActionSpaceId secondPlayer1 : ActionSpaceId firstPlayer2 : ActionSpaceId secondPlayer2 : ActionSpaceId h__distinct12 : firstPlayer1 != secondPlayer1 h__distinct13 : firstPlayer1 != firstPlayer2 h__distinct14 : firstPlayer1 != secondPlayer2 h__distinct23 : secondPlayer1 != firstPlayer2 h__distinct24 : secondPlayer1 != secondPlayer2 h__distinct34 : firstPlayer2 != secondPlayer2Six distinctness proofs, one for each pair. You literally cannot construct a
RoundPlacementswhere two dwarfs share an action space. The game rule is baked into the type.The Weapon System
Weapons are forged at strength 1 through 8 (costing that many ore), then grow by +1 per expedition, capping at 14. The blue range is forgeable; the peach range requires expedition grinding. Cattle loot unlocks at strength 9, which means even if you forge at max (8 ore), you still need at least one expedition before you can get cattle.
The loot table is an inductive type with minimum strength requirements for each item:
def LootItem.minStrength : LootItem -> Nat | .allWeaponsPlus1 => 1 | .dog => 1 | .wood => 1 | .grain => 2 | .sheep => 2 | .stone => 3 | .donkey => 3 | .ore => 4 | .wildBoar => 4 | .stableFree => 5 | .gold2 => 6 | .furnishCavern => 7 | .buildFencesCheap => 8 | .cattle => 9 | .dwelling => 10 | .sow => 11 | .breedTwoTypes => 12 | .furnishCavernAgain => 14At strength 1 you can loot a dog or a stick. At strength 14 you can furnish a second cavern for free. The loot count at key strengths: 3 at strength 1, 13 at strength 8, 18 at strength 14. That's 5 premium items (cattle, dwelling, sow, breed, second furnish) locked behind the expedition grind. Whether the ore investment is worth it is the core question of the weapon rush archetype, and the answer turns out to be: no, not quite.
The Universal Food Crisis
Every strategy in Caverna must solve the same problem before anything else. Both players face a food deficit at the first harvest:
Player 1 starts with 1 food and needs 4 (2 dwarfs times 2 food each). Player 2 starts with 2 food and needs 4. The gaps are 3 and 2 respectively. This is proven as
universal__food__crisis:theorem universal_food_crisis : feedingCost 2 0 - startingFoodP1 = 3 /\ feedingCost 2 0 - startingFoodP2 = 2 := by decideThe implication (
food__crisis__shapes__all__strategies) is that every viable archetype must spend its first few actions on food acquisition. There's no "skip feeding and go straight to scoring" option. The begging marker penalty is \(-3\) points each, and the theoremabsolute__floor__is__neg55shows that a player who takes zero actions across all rounds scores \(-55\). The food crisis isn't optional; it's structural.The food conversion network itself is a delightful mess of exchange rates. Cattle gives 4 food per animal, wild boar gives 3, sheep gives 2, grain gives 1, vegetables give 2, gold converts lossily at \(n-1\) (1 gold is wasted as overhead), and rubies are emergency food at 2+ each. And then there are donkeys, which have a superlinear pairing bonus that I spent an embarrassing amount of time formalizing:
def donkeyFoodValue (n : Nat) : Nat := let pairs := n / 2 let remainder := n % 2 pairs * 3 + remainder * 1 theorem donkey__superlinear : donkeyFoodValue 2 > donkeyFoodValue 1 + donkeyFoodValue 1 := by decideTwo donkeys together yield 3 food, but individually they'd give 1 + 1 = 2. The whole is greater than the sum of its parts. Uwe Rosenberg almost certainly didn't think about this as a super-additivity property of a set function, but that's what it is, and Lean can prove it.
The Feeding Cascade
The feeding function is the core survival mechanic. When harvest hits, each dwarf eats 2 food (offspring eat 1). If you don't have enough food, the deficit cascades through your resources: try food first, then convert grain (1 food each), then convert vegetables (2 food each), then take begging markers for anything remaining.
def FullPlayer.feed (p : FullPlayer) : FullPlayer := let cost := p.dwarfs * 2 + p.offspring * 1 if p.food >= cost then { p with food := p.food - cost } else let deficit := cost - p.food let p' := { p with food := 0 } let grainUsed := min p'.grain deficit let p'' := { p' with grain := p'.grain - grainUsed } let deficit' := deficit - grainUsed let vegUsed := min p''.vegetables (deficit' / 2 + deficit' % 2) let vegFood := min (vegUsed * 2) deficit' let p''' := { p'' with vegetables := p''.vegetables - vegUsed } let deficit'' := deficit' - vegFood { p''' with beggingMarkers := p'''.beggingMarkers + deficit'' }And a normal harvest is the composition of three phases in one elegant pipeline:
def FullPlayer.normalHarvest (p : FullPlayer) : FullPlayer := p.fieldPhase.feed.breedingPhaseField phase harvests your sown crops. Feed pays the food cost (or generates begging markers). Breeding adds one animal per type that has at least two. The order matters: fields produce food before feeding, and breeding happens after, so newborn animals don't need to be fed in the same round they appear. This ordering rule from page 6 of the Caverna rulebook is encoded in the function composition.
Furnishing Tiles and the BonusContext
There are 48 furnishing tiles in the game, each with resource costs, base victory points, and (for some) end-game bonus scoring formulas that depend on your final board state. The
BonusContextstruct captures everything a furnishing tile might look at when computing its bonus:structure BonusContext where stoneInSupply : Nat := 0 oreInSupply : Nat := 0 sheepCount : Nat := 0 cattleCount : Nat := 0 numAdjacentDwellings : Nat := 0 numArmedDwarfs : Nat := 0 numDwarfs : Nat := 0 rubyCount : Nat := 0 grainCount : Nat := 0 vegCount : Nat := 0 farmAnimalCount : Nat := 0 hasAnyWeapon : Bool := false allDwarfsArmed : Bool := false numYellowTagTiles : Nat := 0The bonus point function is a match on tile ID that implements every scoring formula from the Caverna appendix:
def furnishingBonusPoints (fid : FurnishingId) (ctx : BonusContext) : Nat := match fid with | .stoneStorage => ctx.stoneInSupply | .oreStorage => ctx.oreInSupply / 2 | .weavingParlor => ctx.sheepCount / 2 | .milkingParlor => ctx.cattleCount | .stateParlor => ctx.numAdjacentDwellings * 4 | .mainStorage => ctx.numYellowTagTiles * 2 | .weaponStorage => ctx.numArmedDwarfs * 3 | .suppliesStorage => if ctx.allDwarfsArmed then 8 else 0 | .broomChamber => if ctx.numDwarfs >= 6 then 10 else if ctx.numDwarfs >= 5 then 5 else 0 | .prayerChamber => if ctx.hasAnyWeapon then 0 else 8 | _ => 0The Prayer Chamber is particularly nasty. It gives you 8 free points, but only if none of your dwarfs have weapons. The moment you forge a single weapon, it drops to zero. This creates a genuine strategic dilemma, formalized and proved:
theorem prayer__chamber__vs__weapons : furnishingBonusPoints .prayerChamber {} = 8 /\ furnishingBonusPoints .prayerChamber { hasAnyWeapon := true } = 0 := by constructor <;> native_decideThe furnishing rush archetype gets its power from stacking compatible bonus tiles: Office Room overhangs, State Parlor for +4 per adjacent dwelling (up to +16), Broom Chamber for +5 or +10 based on dwarf count, Prayer Chamber for +8 (no weapons). These bonuses compound. A player with 5 dwarfs, 4 adjacent dwellings, and no weapons can pull +39 bonus points from three tiles. That's why furnishing rush has the highest ceiling.
The Strategy Space
With the LTS, scoring function, and food economy in place, the next question is: what are the high-level plans a player can actually follow? The transition relation defines roughly \(10^{30}\) possible play sequences, but the overwhelming majority of them are nonsense. Consider a player who forges a weapon in round 1 (spending precious ore), then never goes on an expedition, sows a field in round 3 but never harvests grain from it, builds a pasture in round 5 but never acquires an animal, and spends round 8 taking Starting Player for no reason. This player ends the game with a weapon they never used, an empty pasture, a field of rotting grain, and a final score somewhere around 20 points after begging penalties. The LTS covers this path. The invariant proofs hold over it. It is a legal play sequence, and it is also exactly the kind of play sequence that a five-year-old would produce by picking action spaces at random.
The vast majority of the state space looks like this: diffuse, uncommitted paths where the player dabbles in everything and commits to nothing, hemorrhaging tempo and food while accumulating resources that never convert to points. Only a handful of coherent "channels" through the action space graph lead to competitive scores against a rational opponent. A strategy archetype is a consistent pattern of action space usage across the full 12-round game. I identified eight by examining which action spaces and furnishing tiles naturally cluster together: if two actions compete for the same resources or unlock the same scoring categories, they belong to the same archetype. For example, Blacksmithing, Expedition, and weapon-dependent loot all form the weapon rush cluster, while Excavation, Housework, and bonus-scoring furnishings form the furnishing rush cluster. The classification was manual, but the Lean proofs validate it: every archetype's score estimates are derived from the formalized game rules, and the dominance relations are machine-checked over the resulting payoff matrix.
inductive StrategyArchetype where | furnishingRush | weaponRush | animalHusbandry | miningHeavy | balanced | peacefulFarming | rubyEconomy | peacefulCaveEngineBut why eight? The answer comes from the structure of the game's scoring channels. A scoring channel is a resource-to-points pathway through the action spaces: the furnishing channel runs through Excavation and Housework into furnished caverns, the weapon channel runs through Blacksmithing into expeditions and loot, and so on. There are exactly six:
inductive ScoringChannel where | furnishing -- Excavation + Housework -> caverns -> furnishing tiles | weaponExpedition -- Blacksmithing + Adventure -> weapons -> loot | agriculture -- Clearing + Sustenance -> fields -> crops | animalBreeding -- Sheep Farming + Donkey Farming -> pastures -> animals | mining -- Ore/Ruby Mine Construction -> mines | economy -- Starting Player + Ore Trading + Ruby Mining -> gold/rubiesEach archetype commits to a primary channel, and the eight archetypes cover all six channels (
all__channels__covered). Some channels conflict over shared resources: weapon/expedition and mining both need ore, furnishing and mining both need stone. ThechannelsConflictfunction encodes these tensions. With 47 productive actions per game and each channel needing at least 6 dedicated actions to reach competitive scoring, a player can invest in at most 2 channels seriously (budget__bounds__channels). The archetypes are the maximal compatible channel combinations: 6 single-primary archetypes plus 2 hybrids (balanced spreads across 3 channels, peaceful cave engine uses weapons for food rather than scoring).The exhaustivity proof (
archetype__channel__surjection) shows every channel has at least one archetype as its primary. If you're going to score points, you have to go through a channel, and every channel is represented. No viable strategy falls outside the classification.Each archetype has an estimated scoring ceiling (best case across all setups) and floor (worst case):
Furnishing rush tops the ceiling at 140 and ties for the highest floor at 60. Weapon rush matches the floor but caps at 120. The others trail off. The interval chart makes the dominance relationships immediately visible: furnishing rush's bar extends further right (higher ceiling) than any other archetype, and its left endpoint (floor) is as good as anyone's.
The Dominance Hierarchy
Before building the payoff matrix, we can establish partial dominance from the score estimates alone. If strategy \(s\) has both a higher ceiling and a higher floor than strategy \(t\) (with at least one strict inequality), then \(s\) dominates \(t\) in score estimates.
Solid green arrows show dominance via score estimates (higher ceiling and floor). Dashed green arrows indicate dominance established only through the full payoff matrix. Animal Husbandry and Mining Heavy are incomparable in score estimates (one has a higher ceiling, the other a higher floor), but both are weakly dominated by Furnishing Rush in the payoff matrix.
The incomparability is proven constructively:
theorem incomparable__strategies__exist : not (dominates .miningHeavy .animalHusbandry) /\ not (dominates .animalHusbandry .miningHeavy) := by decideThe Payoff Matrix
The next question is what happens when two players with potentially different archetypes collide over the shared action spaces. You model this as an \(8 \times 8\) payoff matrix \(M\) where \(M_{ij}\) is the estimated score for the row player when row plays archetype \(i\) and column plays archetype \(j\).
The green border marks the Furnishing Rush row (weakly dominant). Blue borders highlight the diagonal (mirror matchups, always depressed). Cell color intensity reflects payoff magnitude: light green (55) through coral (135).
In Lean, the matrix is a function from
Fin 8 -> Fin 8 -> Intwith all 64 entries hardcoded from the strategy analysis:def payoffMatrix : Fin 8 -> Fin 8 -> Int | 0, 0 => 85 | 0, 1 => 130 | 0, 2 => 135 | 0, 3 => 130 | 0, 4 => 125 | 0, 5 => 135 | 0, 6 => 135 | 0, 7 => 130 | 1, 0 => 80 | 1, 1 => 75 | 1, 2 => 100 | 1, 3 => 85 | 1, 4 => 95 | 1, 5 => 105 | 1, 6 => 100 | 1, 7 => 85 -- ... 48 more entries ...A word on what "estimated" means here. The matrix entries are derived from the formalized game rules (scoring function, food costs, action budgets), not from Monte Carlo simulation or guesswork. But they are estimates in the sense that I haven't exhaustively solved every possible 47-move sequence within each archetype. How sensitive is the result? The
min__dominance__margin__is__5theorem proves that the minimum gap between the furnishing rush row and the next-best entry in each column is exactly 5 points (occurring in the mirror matchup column). Perturbing any single entry by less than 5 points cannot flip the dominance relation. Even a 10-point swing in a single cell would only affect one column, not the global result. The Nash equilibrium is more sensitive (it depends on the diagonal), but the mirror matchup at 85 would need to drop below the next-best response value of 80 before the equilibrium shifts. The downstream proofs are machine-checked over the matrix as stated, so the formal guarantees are exact conditional on these entries.We can do better than conditional. Replace each scalar entry with a closed interval bounding the true payoff. Mirror matchups get tight bounds (the point estimate \(\pm 2\), since both players execute the same plan and contention is symmetric). Cross-archetype matchups get wider bounds (\(\pm 5\), reflecting uncertainty in cross-strategy interaction). The interval payoff matrix looks like this:
structure PayoffInterval where lo : Int hi : Int valid : lo <= hi def intervalPayoff : Fin 8 -> Fin 8 -> PayoffInterval | 0, 0 => { lo := 83, hi := 87, valid := by omega } -- mirror: eps=2 | 0, 1 => { lo := 125, hi := 135, valid := by omega } -- vs weapon: eps=5 -- ... 62 more entries ...Robust weak dominance asks: for every column and every non-furnishing alternative, does the furnishing rush lower bound meet or exceed the alternative's upper bound? If yes, then furnishing rush dominates for ALL true payoff matrices within these intervals, not just the point estimates.
theorem robust_weak_dominance : forall (col : Fin 8) (alt : Fin 8), alt != 0 -> (intervalPayoff 0 col).lo >= (intervalPayoff alt col).hi := by decideIt passes. The tightest cell is column 0 (mirror matchup): furnishing rush \([83, 87]\) vs. weapon rush \([77, 83]\). The margin is \(83 - 83 = 0\), a weak tie at the boundary. All other columns have margins of 20 or more (
non__mirror__columns__robust). If the error bounds were widened by just 1 point, column 0 would fail (fragility__column__0), which tells us exactly where the result is most fragile and what the tolerance is.The interval analysis also gives us robust welfare bounds. Nash welfare lies in \([166, 174]\) and the social optimum in \([200, 220]\). Even in the best case for selfish play and worst case for cooperation, the social optimum still exceeds Nash welfare (
robust__price__of__anarchy). The prisoner's dilemma structure is not an artifact of the point estimates.The first row is at least as large as every other row in every column. That's it. That's the whole result. In game theory this property is called weak dominance: a strategy \(\sigma^*\) is weakly dominant if for all opponent strategies \(o\) and all alternative strategies \(\sigma\),
$$
M(\sigma^*, o) \geq M(\sigma, o)
$$The Lean proof is
furnishing__rush__weakly__dominant, and it says exactly this:theorem furnishing_rush_weakly_dominant : forall (row : Fin 8) (col : Fin 8), payoffMatrix 0 col >= payoffMatrix row col := by decideFor all opponents, for all alternatives, the furnishing rush payoff is greater than or equal. The proof is
decide: Lean's kernel checks all 64 cells. Nobody has to trust my arithmetic.The Contention Effect
The diagonal of the payoff matrix is always depressed relative to the off-diagonal entries:
theorem diagonal__always__depressed : forall (i : Fin 8), exists (j : Fin 8), j /= i /\ payoffMatrix i j > payoffMatrix i i := by decideEvery mirror matchup scores below at least one non-mirror matchup in the same row. This is the contention effect: when both players pursue the same archetype, they fight over the same action spaces. Furnishing rush mirrors compete for excavation and housework. Weapon rush mirrors fight over blacksmithing. Mining mirrors clash on ore mine construction.
The game punishes sameness. It's just that the punishment for sameness (85) is still less painful than the punishment for picking something worse (60 to 80 against an opponent who picked furnishing rush).
Nash Equilibrium and the Prisoner's Dilemma
From weak dominance, the game theory falls out like dominoes. The best response function \(\text{BR} : \text{Strategies} \to \text{Strategies}\) maps each opponent strategy to the row-maximizer in the corresponding column. Since furnishing rush achieves the column maximum everywhere, BR is the constant function:
$$
\text{BR}(x) = \text{FurnishingRush} \quad \forall x
$$A Nash equilibrium is a fixed point of the joint best-response correspondence: a pair \((a, b)\) where \(a = \text{BR}(b)\) and \(b = \text{BR}(a)\). Since BR is constant, the only fixed point is (FurnishingRush, FurnishingRush). Existence and uniqueness in three lines:
theorem exactly__one__nash__equilibrium : (exists a b, isNashEquilibrium a b) /\ (forall a b, isNashEquilibrium a b -> a = .furnishingRush /\ b = .furnishingRush) := <<.furnishingRush, .furnishingRush, furnishing_mirror_is_nash>, furnishing_mirror_unique_nash>The uniqueness proof works by case-splitting on all 64 strategy pairs and observing that only one satisfies the Nash condition:
theorem furnishing__mirror__unique__nash : forall a b : StrategyArchetype, isNashEquilibrium a b -> a = .furnishingRush /\ b = .furnishingRush := by intro a b h have h1 := h.1; have h2 := h.2 cases a <;> cases b <;> simp_all [bestResponse, isNashEquilibrium]There is exactly one pure Nash equilibrium and it is the one where both players do the same thing. Which is a little tragic.
The Price of Anarchy
Both players score 85 in the Nash equilibrium. If they could somehow coordinate on different strategies (say furnishing rush versus animal husbandry), the combined welfare would be \(135 + 75 = 210\) instead of \(85 + 85 = 170\).
The price of anarchy is the ratio of the social optimum to the Nash welfare:
$$
\text{PoA} = \frac{210}{170} = \frac{21}{17} \approx 1.24
$$Selfish play costs about 19% of the social optimum. The Lean proof is
decide, because it's just arithmetic:theorem price__of__anarchy__ratio : socialOptimumValue = 210 /\ nashWelfare = 170 /\ 210 * 17 = 170 * 21 := by decideThe depressing implication is that both players would prefer the other person to play something different, but neither can unilaterally deviate without making themselves worse off. You're stuck at 85 each, staring across the table at someone who also read this blog post.
The Scoring Function
A natural question is where the scores can actually land. The end-game scoring function totals up every positive and negative contribution:
def FullPlayer.score (p : FullPlayer) : Int := let animals := (p.dogs + p.sheep + p.donkeys + p.wildBoars + p.cattle : Int) let grainPts := (((p.grain + p.fieldsWithGrain * 3) + 1) / 2 : Int) let vegPts := ((p.vegetables + p.fieldsWithVeg * 2) : Int) let rubyPts := (p.rubies : Int) let dwarfPts := ((p.dwarfs + p.offspring) : Int) let pasturePts := (p.smallPastures * 2 + p.largePastures * 4 : Int) let minePts := (p.oreMines * 3 + p.rubyMines * 4 : Int) let goldPts := (p.gold : Int) let furnPts := p.furnishings.foldl (fun acc fid => acc + ((furnishingSpec fid).basePoints : Int)) (0 : Int) let missingTypes := (if p.sheep == 0 then 1 else 0) + (if p.donkeys == 0 then 1 else 0) + (if p.wildBoars == 0 then 1 else 0) + (if p.cattle == 0 then 1 else 0) let unusedPenalty := (p.unusedMountain + p.unusedForest : Int) let beggingPenalty := (p.beggingMarkers * 3 : Int) let missingPenalty := (missingTypes * 2 : Int) animals + grainPts + vegPts + rubyPts + dwarfPts + pasturePts + minePts + goldPts + furnPts - unusedPenalty - beggingPenalty - missingPenaltyThe theoretical floor is \(-55\) points: a player who takes zero actions across all rounds gets +2 for their two starting dwarfs, \(-8\) for missing all four farm animal types, \(-22\) for 22 unused board spaces, and \(-27\) from 9 begging markers at \(-3\) points each. The theoretical ceiling is 202, computed by summing the independent maxima of every scoring category. But you can't reach 202, because every scoring category competes for the same 56 dwarf placements over 12 rounds. You cannot simultaneously furnish 12 caverns, build 5 mines, breed 20 animals, and sow 10 fields. The action budget is the binding constraint. The practical ceiling is around 140, achievable by furnishing rush when uncontested.
The theoretical range is 195 points (from \(-55\) to 140), but the \(-55\) floor requires literally doing nothing for 12 rounds, so it's not strategically meaningful. The practical floor across all archetypes is around 45 (peaceful farming in the worst setup), giving a practical range of about 95 points. The dominant strategy's own range is 80 points (60 to 140), proven as
dominant__strategy__variance. The safety margin (dominant__strategy__safety__margin) is 115 points above the theoretical floor, but the more relevant number is the gap between furnishing rush's worst case (60) and the worst viable archetype's worst case (45): even in the most adversarial setup, choosing correctly gains you at least 15 points.Degenerate Combos
The formalization surfaced several combos that look broken in isolation. Four stand out.
The Beer Parlor converts grain to gold at a 3:2 ratio before scoring: every 2 grain becomes 3 gold. Without it, 20 grain scores 10 points (the grain formula is \(\lceil n/2 \rceil\)). With it, the same 20 grain becomes 30 gold. That's a 3x multiplier on the base grain value, and the gold stacks with every other scoring category. The
beer__parlor__max__goldtheorem provesbeerParlorGold 20 = 30. The catch is accumulating 20 grain: you need aggressive sowing across multiple rounds, which means burning actions on Slash-and-Burn and Sustenance instead of excavating caverns.Dogs have no cap on sheep-watching: \(n\) dogs on one meadow guard \(n+1\) sheep. Stack 10 dogs and a Weaving Parlor and you're looking at 51 points from a single tile combo. The catch is that acquiring 10 dogs requires roughly 6 expedition actions (dogs are strength-1 loot, so they're easy to get but each expedition burns an action), and those 6 actions aren't building caverns or growing your family.
The Writing Chamber prevents up to 7 points of losses for 2 stone. This sounds like a license to ignore entire scoring categories, and it is, except the categories you'd skip (unused spaces, missing animal types) are exactly the ones that furnishing rush covers naturally. The players who benefit most from Writing Chamber are the ones playing badly enough to accumulate large penalties, and at that point the cap at 7 means it barely dents a truly neglected board.
The Prayer Chamber is the most elegant. It gives 8 points for zero resource investment, but the bonus evaporates the moment any dwarf forges a weapon. The
prayer__chamber__vs__weaponstheorem proves this is a hard binary: 8 or 0, nothing in between. It creates a genuine strategic fork (peaceful play vs. expedition play) that the rest of the game's design carefully preserves.None of these breaks the dominant strategy. Every combo faces the same binding constraint: 56 total actions with full family growth. You can produce flashy numbers in one scoring category, but you can't cover all of them. The vanilla furnishing rush, with its balanced approach to board coverage, family growth, and furnishing synergies, remains the ceiling. The combos are features, not bugs: they're what make the game worth replaying even after you know the dominant strategy, because they give you something to pivot into when the standard lines are blocked.
Is the Game Solved?
A serious mathematician would call this "a formal analysis of a model, with the caveat that the model's parameters are estimated rather than computed." That is a legitimate and common pattern in applied math, operations research, and mechanism design. The proofs are machine-checked and airtight, but they are conditional on the payoff matrix. The theorems say: IF the payoff matrix is this, THEN furnishing rush is weakly dominant, the Nash equilibrium is unique, and the price of anarchy is 21/17. They do not say: the true expected score of an optimally-played furnishing rush against an optimally-played animal husbandry opponent is exactly 135. That number is an estimate derived from the formalized action economy, not from exhaustive search of the game tree.
How confident should you be in the estimate? The
min__dominance__margin__is__5theorem proves the narrowest gap between furnishing rush and its closest competitor is 5 points. If the estimation methodology has a systematic bias of less than 5 points (plausible, given that the estimates are derived from the same scoring functions and action budgets used in actual play), the qualitative result holds. A correlated shift of 5+ points across an entire row of the matrix could break weak dominance in the mirror column, but would require the methodology to consistently overestimate furnishing rush's performance in self-contention scenarios. Given that the mirror penalty (85 vs. 130+ off-diagonal) already models severe contention, this seems unlikely.At the archetype level, yes, the game is solved. For any 2-player setup, regardless of which of the 2,880 configurations you draw, the model says the optimal pure strategy is furnishing rush. The proof does not depend on any specific card ordering or harvest marker placement. What I haven't done is compute the exact sequence of dwarf placements for each specific setup. That would require solving a game tree with roughly \(10^{30}\) nodes. But the archetype-level result is the one that matters for actual play. You know the plan. The within-archetype decisions (which cavern to excavate first, which furnishing tile to prioritize) are tactical, not strategic. They don't change the answer.
The 3+ player variants remain open. The payoff matrix becomes a rank-3 tensor with 512 entries, action space contention shifts from binary to combinatorial, and the pure dominance result almost certainly fails, since furnishing tile scarcity under three-way competition makes mixed equilibria the likely outcome. If someone wants to formalize that, I would love to read the proof.
Play the Dominant Strategy
Play furnishing rush. Excavate aggressively in rounds 1 through 4. Get Office Room early for the overhang bonuses. Get State Parlor for +4 per dwelling. Grow to 5 dwarfs and pick up Broom Chamber for +10 bonus. If your opponent does something different, you score 125 to 135 and they score 60 to 105. If your opponent also plays furnishing rush, you both land at 85, which is worse than the cooperative optimum of 210 but better than any unilateral deviation. You'll sit there at the table, both of you excavating caverns as fast as you can, both knowing that one of you could sacrifice 10 points to give the other 50, but neither willing to be the one who blinks.
Prisoner's dilemma in a cave. Uwe Rosenberg probably didn't intend this, but the formal analysis says it's there.
The code is built on Lean 4 v4.28.0 with Mathlib, modeling all 24 action spaces, all 48 furnishing tiles, the complete expedition loot table, the board grids, and the full 12-round 2-player harvest schedule. I think Caverna is a beautifully designed game, which is exactly why it was satisfying to find that even under formal analysis the strategic structure holds together so well. A badly designed game would have a trivially dominant strategy that makes the game boring. Caverna's dominant strategy comes with a 19% welfare tax on the mirror matchup, which means the game is always more interesting when your opponent does something unexpected (the Nash equilibrium is rarely where the fun is, in games or in life), which means in practice people don't always play the dominant strategy, which means the game stays fun.
Good game design, it turns out, is robust to being solved. Now you know the optimal play. Whether you follow it is, thankfully, not a theorem.
-
- February 20, 2026
-
🔗 IDA Plugin Updates IDA Plugin Updates on 2026-02-20 rss
-
🔗 anthropics/claude-code v2.1.50 release
What's changed
- Added support for
startupTimeoutconfiguration for LSP servers - Added
WorktreeCreateandWorktreeRemovehook events, enabling custom VCS setup and teardown when agent worktree isolation creates or removes worktrees. - Fixed a bug where resumed sessions could be invisible when the working directory involved symlinks, because the session storage path was resolved at different times during startup. Also fixed session data loss on SSH disconnect by flushing session data before hooks and analytics in the graceful shutdown sequence.
- Linux: Fixed native modules not loading on systems with glibc older than 2.30 (e.g., RHEL 8)
- Fixed memory leak in agent teams where completed teammate tasks were never garbage collected from session state
- Fixed
CLAUDE_CODE_SIMPLEto fully strip down skills, session memory, custom agents, and CLAUDE.md token counting - Fixed
/mcp reconnectfreezing the CLI when given a server name that doesn't exist - Fixed memory leak where completed task state objects were never removed from AppState
- Added support for
isolation: worktreein agent definitions, allowing agents to declaratively run in isolated git worktrees. CLAUDE_CODE_SIMPLEmode now also disables MCP tools, attachments, hooks, and CLAUDE.md file loading for a fully minimal experience.- Fixed bug where MCP tools were not discovered when tool search is enabled and a prompt is passed in as a launch argument
- Improved memory usage during long sessions by clearing internal caches after compaction
- Added
claude agentsCLI command to list all configured agents - Improved memory usage during long sessions by clearing large tool results after they have been processed
- Fixed a memory leak where LSP diagnostic data was never cleaned up after delivery, causing unbounded memory growth in long sessions
- Fixed a memory leak where completed task output was not freed from memory, reducing memory usage in long sessions with many tasks
- Improved startup performance for headless mode (
-pflag) by deferring Yoga WASM and UI component imports - Fixed prompt suggestion cache regression that reduced cache hit rates
- Fixed unbounded memory growth in long sessions by capping file history snapshots
- Added
CLAUDE_CODE_DISABLE_1M_CONTEXTenvironment variable to disable 1M context window support - Opus 4.6 (fast mode) now includes the full 1M context window
- VSCode: Added
/extra-usagecommand support in VS Code sessions - Fixed memory leak where TaskOutput retained recent lines after cleanup
- Fixed memory leak in CircularBuffer where cleared items were retained in the backing array
- Fixed memory leak in shell command execution where ChildProcess and AbortController references were retained after cleanup
- Added support for
-
🔗 Simon Willison Adding TILs, releases, museums, tools and research to my blog rss
I've been wanting to add indications of my various other online activities to my blog for a while now. I just turned on a new feature I'm calling "beats" (after story beats, naming this was hard!) which adds five new types of content to my site, all corresponding to activity elsewhere.
Here's what beats look like:
![Screenshot of a fragment of a page showing three entries from 30th Dec 2025. First: [RELEASE] "datasette-turnstile 0.1a0 — Configurable CAPTCHAs for Datasette paths usin…" at 7:23 pm. Second: [TOOL] "Software Heritage Repository Retriever — Download archived Git repositories f…" at 11:41 pm. Third: [TIL] "Downloading archived Git repositories from archive.softwareheritage.org — …" at 11:43 pm.](https://static.simonwillison.net/static/2026/three-beats.jpg)
Those three are from the 30th December 2025 archive page.
Beats are little inline links with badges that fit into different content timeline views around my site, including the homepage, search and archive pages.
There are currently five types of beats:
- Releases are GitHub releases of my many different open source projects, imported from this JSON file that was constructed by GitHub Actions.
- TILs are the posts from my TIL blog, imported using a SQL query over JSON and HTTP against the Datasette instance powering that site.
- Museums are new posts on my niche-museums.com blog, imported from this custom JSON feed.
- Tools are HTML and JavaScript tools I've vibe-coded on my tools.simonwillison.net site, as described in Useful patterns for building HTML tools.
- Research is for AI-generated research projects, hosted in my simonw/research repo and described in Code research projects with async coding agents like Claude Code and Codex.
That's five different custom integrations to pull in all of that data. The good news is that this kind of integration project is the kind of thing that coding agents really excel at. I knocked most of the feature out in a single morning while working in parallel on various other things.
I didn't have a useful structured feed of my Research projects, and it didn't matter because I gave Claude Code a link to the raw Markdown README that lists them all and it spun up a parser regex. Since I'm responsible for both the source and the destination I'm fine with a brittle solution that would be too risky against a source that I don't control myself.
Claude also handled all of the potentially tedious UI integration work with my site, making sure the new content worked on all of my different page types and was handled correctly by my faceted search engine.
Prototyping with Claude Artifacts
I actually prototyped the initial concept for beats in regular Claude - not Claude Code - taking advantage of the fact that it can clone public repos from GitHub these days. I started with:
Clone simonw/simonwillisonblog and tell me about the models and viewsAnd then later in the brainstorming session said:
use the templates and CSS in this repo to create a new artifact with all HTML and CSS inline that shows me my homepage with some of those inline content types mixed inAfter some iteration we got to this artifact mockup, which was enough to convince me that the concept had legs and was worth handing over to full Claude Code for web to implement.
If you want to see how the rest of the build played out the most interesting PRs are Beats #592 which implemented the core feature and Add Museums Beat importer #595 which added the Museums content type.
You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.
-
🔗 r/Yorkshire Where's the best place to live outside of Leeds? rss
Hey r/Yorkshire, I've very recently split from my girlfriend, I'm a guy in my late 20s and currently living in York. I'm looking to move out of York as, let's be real, how tf am I supposed to live in York on my own at that kind of price.
I work hybrid in Leeds and am looking to find somewhere new no more than about an hour's drive away. Thing is, I've only just moved to York in the winter, haven't really planted my roots, so would like to find somewhere that has a little going on so I can meet people. In my down time I love a lot of hiking, pubs, live music, I'm not expecting to find somewhere with all that, but somewhere that is relatively close to that kind of vibe, bit of an arty young vibe if possible.
I'm looking at Skipton, Halifax, keighly, and Harrogate at the moment, but let's rarely come up for under 700, but I'm open to other areas as I'd like to get to know the place. Just looking for recommendations really or thourghts on the above? I've heard mixed on keighly, and worried Skipton be a bit of a retirement village (no idea if that's true). Love to hear your thoughts. Thanks
submitted by /u/tmatthews98
[link] [comments] -
🔗 r/Leeds Comedy clubs? rss
Does anyone know where there are good stand up comedy acts in Leeds? Looking for venues that could be fun to visit
submitted by /u/C0sm1c_3l3ph4nt
[link] [comments] -
🔗 r/york Yorkshire Ghostbusters in this sub - Are any of you folks interested in a quick meet for a patch swap? I'll be in York visiting family next Tuesday - Thursday. The Chesapeake Ghostbusters logo ties our areas together through the use of Calvert and Crossland heraldry on the Maryland flag. rss
| submitted by /u/CaptainAwwsum
[link] [comments]
---|--- -
🔗 @binaryninja@infosec.exchange Want to get the best decompilation for your own architecture? Glenn's post mastodon
Want to get the best decompilation for your own architecture? Glenn's post shows how you can enable Binary Ninja to give you the best decompilation around. Part 1 covers the basics up to disassembly, assembly, and control- flow. Next week, part 2 covers lifting and our first decompilation results.
-
🔗 @binaryninja@infosec.exchange Going live now! Don't miss some fantastic demos by our RE//verse instructors! mastodon
Going live now! Don't miss some fantastic demos by our RE//verse instructors! https://www.youtube.com/vector35/live
-
🔗 News Minimalist 🐢 Supreme Court blocks Trump's tariffs + 11 more stories rss
In the last 2 days ChatGPT read 63550 top news stories. After removing previously covered events, there are 12 articles with a significance score over 5.5.

[5.8] Supreme Court limits presidential authority on global tariffs —ctvnews.ca(+47)
The U.S. Supreme Court on Friday struck down President Trump’s global tariffs, ruling that his unilateral use of emergency economic powers exceeded executive authority and violated constitutional limits.
The 6-3 ruling determined that the 1977 International Emergency Economic Powers Act does not authorize unilateral tariffs. The conservative majority applied the "major questions" doctrine, asserting that significant economic actions require explicit congressional approval rather than broad executive interpretations of emergency statutes.
Challenged by twelve states and various businesses, the ruling could force the refund of $175 billion. Although the administration explores alternative legal routes, the decision curtails a central pillar of Trump's agenda.
[5.7] FDA to require one study for new drug approvals to speed patient access —abcnews.com(+14)
The FDA will drop its longtime requirement for two clinical trials for new drug approvals, making a single study the default standard to accelerate patient access to novel medical treatments.
Commissioner Marty Makary and Deputy Vinay Prasad stated that modern research precision makes the old standard obsolete. The shift aims to reduce bureaucracy, potentially sparking a surge in drug development for common diseases that previously required more extensive testing than rare conditions.
Although the agency has accepted single studies for rare diseases since the 1990s, this policy expansion contrasts with recent restrictive decisions regarding vaccines and gene therapies, leaving some industry stakeholders seeking clarity.
Highly covered news with significance over 5.5
[6.5] Nasal spray vaccine protects mice from multiple respiratory pathogens — nature.com (+11)
[6.4] Feline and human cancers share striking genetic similarities, offering new research avenues — news.cornell.edu (+16)
[6.0] Invisible galaxy confirmed, composed almost entirely of dark matter — es.wired.com (Spanish) (+5)
[6.0] Google and Apple integrate generative AI music features into consumer apps — financialpost.com (+23)
[6.0] UN experts find genocide hallmarks in Sudan's el-Fasher campaign — apnews.com (+21)
[5.9] PromptSpy Android malware uses generative AI to adapt its persistence — bleepingcomputer.com (+8)
[5.6] Trump order protects glyphosate production and manufacturer immunity — theguardian.com (+8)
[5.6] Afghanistan's hunger crisis worsens as aid cuts force WFP to turn away children — abcnews.com (+3)
[5.5] White House allows ICE to detain refugees for rescreening — theguardian.com (+13)
[5.5] Iran nuclear talks collapse, pushing oil prices toward $100 as U.S. deploys military assets to the Middle East — economictimes.indiatimes.com (+65)
Thanks for reading!
— Vadim
You can personalize this newsletter with premium.
-
🔗 r/wiesbaden Schön in der Sauna schwitzen rss
submitted by /u/Z_smasher_Z
[link] [comments] -
🔗 r/wiesbaden Meine Englische Bulldogge braucht dringend Gaumensegel-OP, organized by Mario Beierstettel rss
submitted by /u/Slight_Today7434
[link] [comments] -
🔗 r/wiesbaden Meine Englische Bulldogge braucht dringend Gaumensegel-OP, organized by Mario Beierstettel rss
submitted by /u/Slight_Today7434
[link] [comments] -
🔗 r/york The ubiquitous tourist rss
In the '70s, a friend of mine's gaff was a second floor flat rented in Stonegate (for £11 a week - we think it was part of the Minster's estate). Anyway, the point is that, even back then, during the holidays and in summer, we would have fun leaning out the of the window to spot the odd local trying to worm their way through the dawdling crowds.
submitted by /u/flame2spear
[link] [comments] -
🔗 r/wiesbaden https://gofund.me/a4849d36a rss
submitted by /u/Slight_Today7434
[link] [comments] -
🔗 r/Yorkshire Yorkshire Dales in January 2027 rss
| I visited the Dales last summer, finally, and fell absolutely in love with the people, nature, stone barns, endless greenery, etc. I plan to visit this summer again to do some hiking, but also planning a January 2027 trip. I really want to see the area with possible snow, frost, and with luck, the snowdrops starting to bloom! I usually take the train from Leeds to Skipton, then the bus to Grassington where I stay for a few days. For winter 2027, I wanted to either stay in Appletreewick, or Burnsall, for a few days. I mostly plan to write, read, and if the weather permits, take a few walks. Would these places be nice for winter, or would you recommend slightly larger places, like Settle? Also, is snow fairly common in the area in winter? Thanks!! Photo credit: Getty submitted by /u/skepticalbureaucrat
[link] [comments]
---|--- -
🔗 r/LocalLLaMA The top 3 models on openrouter this week ( Chinese models are dominating!) rss
| the first time i see a model exceed 3 trillion tokens per week on openrouter! the first time i see more than one model exceed a trillion token per week ( it was only grok 4 fast month ago) the first time i see chinese models destroying US ones like this submitted by /u/keb_37
[link] [comments]
---|--- -
🔗 r/reverseengineering Reverse engineering the Creative Katana V2X soundbar to be able to control it from Linux rss
submitted by /u/crower
[link] [comments] -
🔗 r/Leeds Where do people play Warhammer? rss
The actual Warhammer shop aren't interested if you're not buying a starter set, patriot games isn't great and there's nothing else near the city centre I'm aware of. So where do people play Warhammer? Are there any clubs out there?
submitted by /u/sendmemespls97
[link] [comments] -
🔗 r/wiesbaden Gute Orthopäden rss
Wer ist ein guter Arzt/Ärztin für Orthopädie in Wiesbaden wo man nicht wie am Fließband behandelt wird? Empfehlungen für Kassenpatienten bitte 😊
submitted by /u/kassen_patient
[link] [comments] -
🔗 r/Yorkshire I went for a walk. rss
| Too many stunning photos to choose from. submitted by /u/Background_Pay_3113
[link] [comments]
---|--- -
🔗 r/LocalLLaMA GGML.AI has got acquired by Huggingface rss
| submitted by /u/Time_Reaper
[link] [comments]
---|--- -
🔗 r/Yorkshire Just love this place❤️ rss
| submitted by /u/Terrible_Passion6178
[link] [comments]
---|--- -
🔗 r/LocalLLaMA Deepseek and Gemma ?? rss
| submitted by /u/ZeusZCC
[link] [comments]
---|--- -
🔗 r/Leeds Deliveroo drivers rss
I am absolutely sick of being nearly run over by uber eats/Deliveroo bikers on electric bikes with no helmets or lights going all over pedestrian routes. I’m pretty sure people have died being hit by these?? Is there anything we can do to bring this matter to the council?
submitted by /u/Weary-Bus8436
[link] [comments] -
🔗 r/york Is there anyone in York that sells Kinnie? rss
| It's a Maltese drink that's quite hard to find in the UK. I've previously bought a 24 pack from Amazon, but that's out of my budget right now, and I'm heading to York next week so 🤞🤞😂 (buying as part of a bday gift for my partner) submitted by /u/LPanda2190
[link] [comments]
---|--- -
🔗 r/reverseengineering Decrypt and extract voice guidance MP3 prompts from Sony WH-1000XM4 encrypted voice packs. AES key extracted via Bluetooth firmware dump of the Airoha MT2811 SoC. rss
submitted by /u/TheHelgeSverre
[link] [comments] -
🔗 r/Harrogate Photo booths in Harrogate - digital passport photos rss
Are there any photo booths in town that do the digital photos for passports (providing the code for online passport applications)? Thanks
submitted by /u/purte
[link] [comments] -
🔗 r/LocalLLaMA Kimi has context window expansion ambitions rss
| submitted by /u/omarous
[link] [comments]
---|--- -
🔗 r/york The Shambles doesn’t feel like The Shambles anymore rss
Walked through on a Saturday and it genuinely felt like I was in a slow-moving queue for a theme park ride.
I get that it’s popular and good for the city but it used to feel charming and slightly chaotic in a nice way. Now it feels like everyone’s there for the same photo.
Is it just nostalgia talking, or has the vibe shifted a bit over the last few years?
submitted by /u/1ChanceChipmunk1
[link] [comments] -
🔗 vercel-labs/agent-browser v0.13.0 release
Minor Changes
ebd8717: Added new diff commands for comparing snapshots, screenshots, and URLs between page states. You can now run visual pixel diffs against baseline images, compare accessibility tree snapshots with customizable depth and selectors, and diff two URLs side-by-side with optional screenshot comparison.
-
🔗 r/Yorkshire Who here believes that the fish should have at least come with a signed apology for that price tag? rss
| submitted by /u/PubLogic
[link] [comments]
---|--- -
🔗 r/Yorkshire Activities to take my gf to for her birthday? (South Yorkshire) rss
It’s my girlfriend’s birthday in a couple of weeks and I want to surprise her with something to do. She likes creative activities like pottery and painting. She’s also into gothic stuff like tarot, spells, ghosts and taxidermy etc. She doesn’t drink so anything pub/bar focused is out of the question but if there’s somewhere I could get a beer nearby, I wouldn’t mind that haha.
Any suggestions would be greatly appreciated 😁
submitted by /u/Both-Opportunity7602
[link] [comments]
-