🏡


  1. Optimize for momentum
  2. Nviso vshell report
  3. jj auto-track none
  4. Transparent Leadership Beats Servant Leadership
  5. Writing a good CLAUDE.md | HumanLayer Blog

  1. December 19, 2025
    1. 🔗 Locklin on science Don’t go to physics grad school and other cautionary tales rss

        Contra Professor Katz, I have known more people ruined by drugs. Mostly because I didn’t spend my life as a physics professor as he did. But I can see why he said it, because you’d see a lot of ruined lives in gradual school. It is an essay that should occur to people in […]

    2. 🔗 facebookresearch/faiss v1.13.2 release

      [1.13.2] - 2025-12-19

      Added

      • 033e6ac Add RaBitQStats for tracking two-stage search filtering effectiveness (#4723)
      • 64a2367 Add multi-bit support for IndexIVFRaBitQFastScan (#4722)
      • cd2af8b Implement IndexRefinePanorama (#4683)
      • 18d20fe Add multi-bit support for IndexRaBitQFastScan (#4721)
      • 7372bc7 Reapply IndexHNSWFlatPanorama with backward compatible serialization (#4692)
      • 1721ebf Also add backwards compatible check for binary (#4714)
      • 98bf8b3 Implement remaining IndexFlatPanorama functions (#4694)
      • a695814 Enable Intel ScalableVectorSearch support (#4548)
      • d81a08e Index serialization backward compatibility test (#4706)

      Changed

      Fixed

      • 5b19fca Allow over-writing centroid files to mitigate [S603653] (#4725)
      • 337dfe8 Fix typos in demos, benchs, and other directories" (#4719)
      • aea2b6b fix(docs): broken link to SVS in INSTALL.md (#4724)
      • 4627695 Fix SVS Python tutorial (#4720)
      • ac2e3ab Update c_api install docs for CMake build system (#4702)
      • abc2944 fix broken test due to renaming to avoid lint (#4712)
      • 3d4d59f Fix typos in demos, benchs, and other directories (#4709)
    3. 🔗 anthropics/claude-code v2.0.74 release

      What's changed

      • Added LSP (Language Server Protocol) tool for code intelligence features like go-to-definition, find references, and hover documentation
      • Added /terminal-setup support for Kitty, Alacritty, Zed, and Warp terminals
      • Added ctrl+t shortcut in /theme to toggle syntax highlighting on/off
      • Added syntax highlighting info to theme picker
      • Added guidance for macOS users when Alt shortcuts fail due to terminal configuration
      • Fixed skill allowed-tools not being applied to tools invoked by the skill
      • Fixed Opus 4.5 tip incorrectly showing when user was already using Opus
      • Fixed a potential crash when syntax highlighting isn't initialized correctly
      • Fixed visual bug in /plugins discover where list selection indicator showed while search box was focused
      • Fixed macOS keyboard shortcuts to display 'opt' instead of 'alt'
      • Improved /context command visualization with grouped skills and agents by source, slash commands, and sorted token count
      • [Windows] Fixed issue with improper rendering
      • [VSCode] Added gift tag pictogram for year-end promotion message
    4. 🔗 oxigraph/oxigraph v0.5.3-post.1 release

      Retrigger the publishing CI after fixing an auth error

    5. 🔗 r/wiesbaden MTG Commander Gruppe rss

      Servus, suche nach Leuten zum Commander spielen in Wiesbaden.

      submitted by /u/WretchedIEgg
      [link] [comments]

    6. 🔗 Bits About Money The gift card accountability sink rss

      The gift card accountability
sink

      Programming note : Merry Christmas! There will likely be another Bits about Money after the holiday but before New Year.

      Bits about Money is supported by our readers . If your education budget or business can underwrite the coming year of public goods in financial-infrastructure education, commentary, and policy analysis, please consider supporting it. I 'm told this is particularly helpful for policymakers and others who cannot easily expense a subscription, and who benefit from all issues remaining publicly available with no paywall.

      The American Association of Retired People (AARP, an advocacy non-profit for older adults) has paid for ads on podcasts I listen to. The ad made a claim which felt raspberry-worthy (in service of an important public service announcement), which they repeat in writing: Asking to be paid by gift card is always a scam.

      Of course it isn't. Gift cards are a payments rail, and an enormous business independently of being a payments rail. Hundreds of firms will indeed ask you to pay them on gift cards! They also exist, and are marketed, explicitly to do the thing that the AARP implicitly asserts no business or government entity will ever do: provide a method for transacting for people who do not have a banked method of transacting. [0]

      Gift card scams are also enormous. The FBI's Internet Crime Complaint Center received $16.6 billion in reports in 2024 across several payment methods; this is just for those consumers who bothered reporting it, in spite of the extremely real received wisdom that reporting is unlikely to improve one's direct situation.

      The flavor texts of scams vary wildly, but in substance they'll attempt to convince someone, often someone socially vulnerable, to part with sometimes very large sums of money by buying gift cards and conveying card information (card number and PIN number, both printed on the card) to the scammer. The scammer will then use the fraud supply chain, generally to swap the value on the card to another actor in return for value unconnected to the card. This can be delivered in many ways: cash, crypto, products and services in the scamming economy (such as purloined credit cards or even "lead lists" of vulnerable people to run more scams on), or laundered funds within regulated financial institutions which obscure the link between the crime and the funds (layering, in the parlance of AML professionals). A huge portion of running a gift card marketplace is trying to prevent yourself from being exploited or made into an instrumentality in exploiting others.

      It surprises many people to learn that the United States aggressively defends customers from fraud over some payment methods, via a liability transfer to their financial institution, which transfers it to intermediaries, who largely transfer it to payment-accepting businesses. Many people think the U.S. can't make large, effective, pro-consumer regulatory regimes. They are straightforwardly wrong… some of the time.

      But the AARP, the FBI, and your friendly local payments nerd will all tell you that if you're abused on your debit card you are quite likely to be made whole, and if you're abused via purchasing gift cards, it is unlikely any deep pockets will cover for you. The difference in treatment is partially regulatory carveouts, partially organized political pressure, and partly a side effect of an accountability sink specific to the industrial organization of gift cards.

      Most businesses do not run their own gift card programs

      There exists an ecosystem of gift card program managers, who are essentially financial services businesses with a sideline in software. (I should probably mention that I previously worked for and am currently an advisor to Stripe, whose self conception would not be precisely that, but which a) supports many ways for people to pay money for things and b) does not necessarily endorse what I say in my personal spaces.)

      Why does the program manager exist? Why not simply have the retailer keep some internal database of who the retailer owes money to, updating this when someone buys or loads a gift card and when they spend the balance at the store? Because this implies many capabilities that retailers do not necessarily have, such as e.g. software development teams.

      There is also a large regulatory component to running a gift card program, despite gift cards' relatively lax regulatory drag (we'll return to that in a moment). Card programs are regulated at both the federal and state levels. One frequent requirement in several states is escheatment. (Essentially all states have a requirement for escheatment; many but not all exempt gift cards from it.)

      As discussed previously in Bits about Money, a major component of the gift card business model is abandonment ("breakage"). Consumer advocates felt this was unfair to consumers, bordering on fraudulent really. They convinced states to take the money that retailers were keeping for themselves. (Many states didn't take all that much convincing.)

      In theory, and sometimes even in practice, a consumer can convince a state treasurer's office of unclaimed property (e.g. Illinois') that the $24.37 that Target remitted as part of its quarterly escheatment payment for an unused gift card 13 years ago was actually theirs. A consumer who succeeds at this, which is neither easy nor particularly inexpensive to do, will receive a $24.37 check in the mail. The state keeps the interest income; call it a fee for service. It also keeps the interest income of the tens of billions of dollars of accumulated unclaimed property, which it generally promises to dutifully custody awaiting a legitimate claim for as long as the United States shall exist.

      And so if you are a regional or national retailer who wants to offer gift cards, you have a choice. You can dedicate a team of internal lawyers and operations specialists to understanding both what the laws of the several states require with respect to gift cards, which are a tiny portion of your total operations, not merely today but as a result of the next legislative session in Honolulu, because you absolutely must order the software written to calculate the payment to remit accurately several quarters in advance of the legal requirement becoming effective. Or you can make the much more common choice, and outsource this to a specialist.

      That specialist, the gift card program manager, will sell you a Solution™ which integrates across all the surfaces you need: your point-of-sale systems, your website, your accounting software, the 1-800 number and website for customers to check balances, ongoing escheatment calculation and remittance, cash flow management, carefully titrated amounts of attention to other legal obligations like AML compliance, etc. Two representative examples: Blackhawk Network and InComm Payments. You've likely never heard of them, even if you have their product on your person right now. Their real customer has the title Director of Payments at e.g. a Fortune 500 company.

      And here begins the accountability sink: by standard practice and contract, when an unsophisticated customer is abused by being asked to buy a BigCo gift card, BigCo will say, truthfully and unhelpfully, that BigCo does not issue BigCo gift cards. It sells them. It accepts them. But it does not issue them. Your princess is in another castle.

      BigCo may very well have a large, well-staffed fraud department. But, not due to any sort of malfeasance whatsoever, that fraud department may consider BigCo gift cards entirely out of their own scope. They physically cannot access the database with the cards. Their security teams, sensitive that gift card numbers are dangerous to keep lying around, very likely made it impossible for anyone at BigCo to reconstruct what happened to a particular gift card between checkout and most recent use. "Your privacy is important to us!" they will say, and they are not cynically invoking it in this case.

      Gift cards are not regulated like other electronic payments instruments

      As mentioned above, Regulation E is the primary driver for the private enforcement edifice that makes scarily smart professionals (and their attached balance sheets) swing into action on behalf of consumers. Reg E has a carveout for certain prepaid payments. Per most recent guidance, that includes prepaid gift cards, gift certificates, and similar.

      And so, if you call your bank and say, "I was defrauded! Someone called me and pretended to be the IRS, and I read them my debit card number, and now I've lost money," the state machine obligates the financial institution to have the customer service representative click a very prominent button on their interface. This will restore your funds very quickly and have some side effects you probably care about much less keenly. One of those is an "investigation," which is not really an investigation in the commanding majority of cases.

      And if you call the program manager and say, "I was defrauded! Someone called me and pretended to be the IRS, and I read them a gift card number, and now I've lost money," there is… no state machine. There is no legal requirement to respond with alacrity, no statutorily imposed deadline, no button for a CS rep to push, and no investigation to launch. You will likely be told by a low-paid employee that this is unfortunate and that you should file a police report. The dominant reason for this is that suggesting a concrete action to you gets you off the phone faster, and the call center aggressively minimizes time to resolution of calls and recidivism, where you call back because your problem is not solved. Filing a police report will, in most cases, not restore your money--but if it causes you not to call the 1-800 number again, then from the card program manager's perspective this issue has been closed successfully.

      Why do we choose this difference in regulation?

      The people of the United States, through their elected representatives and the civil servants who labor on their behalf, intentionally exempt gift cards from the Reg E regime in the interest of facilitating commerce.

      It is the ordinary and appropriate work of a democracy to include input from citizens in the rulemaking process. The Retail Industry Leaders Association participated, explaining to FinCEN that it would be quite burdensome for retailers to fall into KYC scope, etc etc. Many other lobbyists and industry associations made directionally similar comments.

      The Financial Crimes Enforcement Network, for example, has an explicit carveout in its regulations: while FinCEN will aggressively police rogue bodegas, it has no interest in you if you sell closed-loop gift cards of less than $2,000 face value. This is explicitly to balance the state's interest in law enforcement against, quote, preserving innovation and the many legitimate uses and societal benefits offered by prepaid access, endquote.

      FinCEN's rules clarify that higher-value activity--such as selling more than $10,000 in gift cards to a single individual in a day--brings sellers back into scope. Given the relatively lax enforcement environment for selling a $500 gift card, you very likely might not build out systems which will successfully track customer identities and determine that the same customer has purchased twenty-one $500 gift cards in three transactions. That likely doesn't rate as a hugely important priority for Q3.

      And so the fraud supply chain comes to learn which firms haven't done that investment, and preferentially suggests those gift cards to their launderers, mules, brick movers, and scam victims.

      And that's why the AARP tells fibs about gift cards: we have, with largely positive intentions and for good reasons, exposed them to less regulation than most formal payment systems in the United States received. That decision has a cost. Grandma sometimes pays it.

      [0] Indeed, there are entire companies which exist to turn gift cards into an alternate financial services platform, explicitly to give unbanked and underbanked customers a payments rail. Paysafe, for example, is a publicly traded company with thousands of employees, the constellation of regulatory supervision you'd expect, and a subsidiary Openbucks which is designed to give businesses the ability to embed Pay Us With A Cash Voucher in their websites/invoices/telephone collection workflows. This is exactly the behavior that "never happens from a legitimate business" except when it does by the tens of billions of dollars.

      As Bits about Money has frequently observed, people who write professionally about money--including professional advocates for financially vulnerable populations--often misunderstand alternative financial services, largely because those services are designed to serve a social class that professionals themselves do not belong to, rarely interact with directly, and do not habitually ask how they pay rent, utilities, or phone bills.

    7. 🔗 oxigraph/oxigraph v0.5.3 release

      Three SPARQL changes:

      • support VERSION declaration.
      • fixes parsing of HAVING when there are multiple conditions.
      • compute ordering values for ORDER BY only once (allows ORDER BY RAND() to work properly).
    8. 🔗 libtero/idaguides IDA Guides v1.0.0 release

      No content.

    9. 🔗 r/wiesbaden Fachanwalt für Mietrecht rss

      Hallo.

      Kennt jemand einen guten (fiesen) Anwalt für Mietrecht?

      submitted by /u/Best_Ad3170
      [link] [comments]

    10. 🔗 @cxiao@infosec.exchange RE: mastodon

      RE: https://infosec.exchange/@decoderloop/115746825926307965

      I'm happy to announce that I'll be teaching 2 Rust reverse engineering trainings in 2026!

      1) Deconstructing Rust Binaries at @ringzer0 COUNTERMEASURE, March 23-26 2026, 16 hours, Remote: https://ringzer0.training/countermeasure- spring-2026-deconstructing-rust- binaries/

      2) Deconstructing Rust Binaries at @NorthSec, May 11-13 2026, 24 hours, Onsite in Montréal, Canada and Remote: https://nsec.io/training/2026-deconstructing-rust- binaries/

      No previous experience with reversing Rust binaries, or writing Rust code, is required, and we'll be using Binary Ninja in the course! (A Binary Ninja student license is provided!)

    11. 🔗 r/reverseengineering #ScanOfTheYear2025 Week 1 – Reverse Engineering: Submit Your Project for a $300 Gift Card! rss
    12. 🔗 anthropics/claude-code v2.0.73 release

      What's changed

      • Added clickable [Image #N] links that open attached images in the default viewer
      • Added alt-y yank-pop to cycle through kill ring history after ctrl-y yank
      • Added search filtering to the plugin discover screen (type to filter by name, description, or marketplace)
      • Added support for custom session IDs when forking sessions with --session-id combined with --resume or --continue and --fork-session
      • Fixed slow input history cycling and race condition that could overwrite text after message submission
      • Improved /theme command to open theme picker directly
      • Improved theme picker UI
      • Improved search UX across resume session, permissions, and plugins screens with a unified SearchBox component
      • [VSCode] Added tab icon badges showing pending permissions (blue) and unread completions (orange)
    13. 🔗 @cxiao@infosec.exchange RE: mastodon

      RE: https://mastodon.social/@KristopherWells/115743294984047151

      really great read on the background of skate canada's decision

    14. 🔗 Rust Blog What do people love about Rust? rss

      Rust has been named Stack Overflow's Most Loved (now called Most Admired) language every year since our 1.0 release in 2015. That means people who use Rust want to keep using Rust1--and not just for performance-heavy stuff or embedded development, but for shell scripts, web apps, and all kinds of things you wouldn't expect. One of our participants captured it well when they said, "At this point, I don't want to write code in any other language but Rust."

      When we sat down to crunch the vision doc data, one of the things we really wanted to explain was: What is it that inspires that strong loyalty to Rust?2 Based on the interviews, the answer is at once simple and complicated. The short version is that Rust empowers them to write reliable and efficient software. If that sounds familiar, it should: it's the slogan that we have right there on our web page. The more interesting question is how that empowerment comes about, and what it implies for how we evolve Rust.

      What do people appreciate about Rust?

      The first thing we noticed is that, throughout every conversation, no matter whether someone is writing their first Rust program or has been using it for years, no matter whether they're building massive data clusters or embedded devices or just messing around, there are a consistent set of things that they say they like about Rust.

      The first is reliability. People love that "if it compiles, it works" feeling:

      "What I really love about Rust is that if it compiles it usually runs. That is fantastic, and that is something that I'm not used to in Java." -- Senior software engineer working in automotive embedded systems

      "Rust is one of those languages that has just got your back. You will have a lot more sleep and you actually have to be less clever." -- Rust consultant and open source framework developer

      Another, of course, is efficiency. This comes up in particular at the extremes, both very large scale (data centers) and very small scale (embedded):

      "I want to keep the machine resources there for the [main] computation. Not stealing resources for a watchdog." -- Software engineer working on data science platforms

      "You also get a speed benefit from using Rust. For example, [..] just the fact that we changed from this Python component to a Rust component gave us a 100fold speed increase." -- Rust developer at a medical device startup

      Efficiency comes up particularly often when talking to customers running "at-scale" workloads , where even small performance wins can translate into big cost savings:

      "We have a library -- effectively it's like an embedded database -- that we deploy on lots of machines. It was written in Java and we recently rewrote it from Java to Rust and we got close to I think 9x to 10x performance wins." -- Distinguished engineer working on cloud infrastructure services

      "I'm seeing 4x efficiency in the same module between Java code that loads a VM and Rust. That's a lot of money you save in data center cost." -- Backend engineering company founder specializing in financial services

      At the other end of the spectrum, people doing embedded development or working at low-levels of abstraction highlight Rust's ability to give low-level control and access to system details :

      "Rust was that replacement for C I'd been looking for forever." -- Backend engineering company founder specializing in financial services

      "If you're going to write something new and you do kind of low-level systemsy stuff, I think Rust is honestly the only real choice." -- Distinguished engineer

      Many people cite the importance of Rust's supportive tooling , which helps them get up and going quickly, and in particular the compiler's error messages:

      "I think a big part of why I was able to succeed at learning Rust is the tooling. For me, getting started with Rust, the language was challenging, but the tooling was incredibly easy." -- Executive at a developer tools company

      "The tooling really works for me and works for us. The number one way that I think I engage with Rust is through its tooling ecosystem. I build my code through Cargo. I test it through Cargo. We rely on Clippy for everything." -- Embedded systems engineer working on safety-critical robotics

      "I think the error messages and suggestions from the Rust compiler are super helpful also." -- Professor specializing in formal verification

      Finally, one of Rust's most important virtues is its extensibility. Both in the language itself and through the crates.io ecosystem, Rust is designed to let end-users create libraries and abstractions that meet their needs:

      "The crate ecosystem combined with the stability guarantees and the semantic versioning mean that it's the best grab and go ecosystem I've ever seen." -- Computer science professor and programming language designer

      "I think proc macros are a really big superpower for Rust." -- Creator and maintainer of Rust networking libraries

      "Rust is incredibly good at making it very very easy to get started, to reuse things, just to experiment quickly with new tools, new libraries, all the rest of it... so for me, as an experimentation platform, it's great." -- Rust expert and consultant focused on embedded and real-time systems

      But what they love is the sense of empowerment and versatility

      Reliability, efficiency, tooling, ecosystem—these are all things that people appreciate about Rust. But what they love isn't any one of those things. It's the way the combination makes Rust a trusted, versatile tool that you can bring to virtually any problem :

      "When I got to know about it, I was like 'yeah this is the language I've been looking for'. This is the language that will just make me stop thinking about using C and Python. So I just have to use Rust because then I can go as low as possible as high as possible." -- Software engineer and community organizer in Africa

      "I wanted a language that works well from top to bottom in a stacking all the way from embedded to very fancy applications" -- Computer science professor and programming language designer

      "If [Rust] is going to try and sort of sell itself more in any particular way, I would probably be saying high performance, highly expressive, general purpose language, with the great aspect that you can write everything from the top to the bottom of your stack in it." -- Rust expert and consultant focused on embedded and real-time systems

      Each piece is necessary for the whole to work

      Take away the reliability, and you don't trust it: you're second-guessing every deployment, afraid to refactor, hesitant to let junior developers touch the critical paths.

      "Rust just lowers that bar. It's a lot easier to write correct Rust code. As a leader on the team, I feel a lot safer when we have less experienced engineers contributing to these critical applications." -- Distinguished engineer working on cloud infrastructure services

      "My experience with writing Rust software tends to be once you've got it working, it stays working. That's a combination of a lot of care taken in terms of backwards compatibility with the language and a lot of care taken around the general ecosystem." -- Rust expert and consultant focused on embedded and real-time systems

      Reliability also provides guardrails that help people enter new domains—whether you're a beginner learning the ropes or an expert venturing into unfamiliar territory:

      "Rust introduces you to all these things, like match and all these really nice functional programming methods." -- Software engineer with production Rust experience

      "I think Rust ownership discipline is useful both for regular Rust programmers and also for verification. I think it allows you to within the scope of your function to know very clearly what you're modifying, what's not being modified, what's aliased and what's not aliased." -- Professor specializing in formal verification

      "I discovered Rust... and was basically using it just to give myself a little bit more confidence being like a solo firmware developer" -- Software engineer working on automotive digital cockpit systems

      Take away the efficiency and low-level control, and there are places you can't go: embedded systems, real-time applications, anywhere that cost-per-cycle matters.

      "The performance in Rust is nutty. It is so much better and it's safe. When we rewrote C++ and C libraries or C applications into Rust, they would end up being faster because Rust was better at laying out memory." -- Senior Principal Engineer leading consumer shopping experiences

      "9 times out of 10, I write microcontroller code and I only test it through unit testing. I put it on real hardware and it just works the first time." -- Embedded systems engineer working on safety-critical robotics

      "I can confidently build systems that scale." -- Engineering manager with 20 years experience in media and streaming platforms

      Take away the tooling and ecosystem, and you can't get started: or you can, but it's a slog, and you never feel productive.

      "For me, getting started with Rust, the language was challenging, but the tooling was incredibly easy... I could just start writing code and it would build and run, and that to me made a huge difference." -- Founder and CEO of company creating developer tools

      "Cargo is an amazing package manager. It is probably the best one I've ever worked with. I don't think I ever run into issues with Cargo. It just works." -- Software engineer with production Rust experience

      "The Rust compiler is fantastic at kind of the errors it gives you. It's tremendously helpful in the type of errors it produces for it. But not just errors, but the fact it also catches the errors that other languages may not catch." -- Distinguished engineer working on cloud infrastructure services

      The result: Rust as a gateway into new domains

      When all these pieces come together, something interesting happens: Rust becomes a gateway into domains that would otherwise be inaccessible. We heard story after story of people whose careers changed because Rust gave them confidence to tackle things they couldn't before:

      "I was civil engineering and I studied front-end development on my own, self taught. I had no computer background. I got interested in Rust and distributed systems and designs and systems around it. I changed my major, I studied CS and Rust at the same time." -- Software engineer transitioning to cryptography research

      "I've been working with arbitrary subsidiaries of [a multinational engineering and technology company] for the last 25 years. Always doing software development mostly in the Java space... two years ago I started peeking into the automotive sector. In that context it was a natural consequence to either start working with C++ (which I did not want to do) or take the opportunity to dive into the newly established Rust ecosystem." -- Senior software engineer working in automotive embedded systems

      "I started in blockchain. Currently I'm doing something else at my day job. Rust actually gave me the way to get into that domain." -- Rust developer and aerospace community leader

      "Before that, I had 10 years of programming on some dynamic programming languages, especially Ruby, to develop web applications. I wanted to choose some language which focuses on system programming, so I chose Rust as my new choice. It is a change of my career." -- Rust consultant and author working in automotive systems and blockchain infrastructure

      But the balance is crucial

      Each of Rust's attributes are necessary for versatility across domains. But when taken too far, or when other attributes are missing, they can become an obstacle.

      Example: Complex APIs and type complexity

      One of the most powerful aspects of Rust is the way that its type system allows modeling aspects of the application domain. This prevents bugs and also makes it easier for noobs to get started3:

      "Instead of using just a raw bit field, somebody encoded it into the type system. So when you'd have a function like 'open door', you can't pass an 'open door' if the door's already open. The type system will just kick that out and reject it." -- Software engineer working on automotive digital cockpit systems

      "You can create contracts. For example, when you are allowed to use locks in which order." -- Senior embedded systems engineer working on automotive middleware development

      The problem though is that sometimes the work to encode those invariants in types can create something that feels more complex than the problem itself:

      "When you got Rust that's both async and generic and has lifetimes, then those types become so complicated that you basically have to be some sort of Rust god in order to even understand this code or be able to do it." -- Software engineer with production Rust experience

      "Instead of spaghetti code, you have spaghetti typing" -- Platform architect at automotive semiconductor company

      "I find it more opaque, harder to get my head around it. The types describe not just the interface of the thing but also the lifetime and how you are accessing it, whether it's on the stack or the heap, there's a lot of stuff packed into them." -- Software engineer working on data science platforms

      This leads some to advocate for not using some of Rust's more complex features unless they are truly needed:

      "My argument is that the hard parts of Rust -- traits, lifetimes, etc -- are not actually fundamental for being productive. There's a way to set up the learning curve and libraries to onboard people a lot faster." -- Creator and maintainer of Rust networking libraries

      Example: Async ecosystem is performant but doesn't meet the bar for

      supportiveness

      Async Rust has fueled a huge jump in using Rust to build network systems. But many commenters talked about the sense that "async Rust" was something altogether more difficult than sync Rust:

      "I feel like there's a ramp in learning and then there's a jump and then there's async over here. And so the goal is to get enough excitement about Rust to where you can jump the chasm of sadness and land on the async Rust side." -- Software engineer working on automotive digital cockpit systems

      "My general impression is actually pretty negative. It feels unbaked... there is a lot of arcane knowledge that you need in order to use it effectively, like Pin---like I could not tell you how Pin works, right?" -- Research software engineer with Rust expertise

      For Rust to provide that "trusted tool that will help you tackle new domains" experience, people need to be leverage their expectations and knowledge of Rust in that new domain. With async, not only are there missing language features (e.g., async fn in traits only became available last year, and still have gaps), but the supportive tooling and ecosystem that users count on to "bridge the gap" elsewhere works less well:

      "I was in favor of not using async, because the error messages were so hard to deal with." -- Desktop application developer

      "The fact that there are still plenty of situations where you go that library looks useful, I want to use that library and then that immediately locks you into one of tokio-rs or one of the other runtimes, and you're like that's a bit disappointing because I was trying to write a library as well and now I'm locked into a runtime." -- Safety systems engineer working on functional safety for Linux

      "We generally use Rust for services, and we use async a lot because a lot of libraries to interact with databases and other things are async. The times when we've had problems with this is like, um, unexplained high CPU usage, for example. The only really direct way to try to troubleshoot that or diagnose it is like, OK, I'm going to attach GDB and I'm gonna try to see what all of the threads are doing. GDB is -- I mean, this is not Rust's fault obviously -- but GDB is not a very easy to use tool, especially in a larger application. [..] And with async, it's, more difficult, because you don't see your code running, it's actually just sitting on the heap right now. Early on, I didn't actually realize that that was the case." -- Experienced Rust developer at a company using Rust and Python

      Async is important enough that it merits a deep dive. Our research revealed a lot of frustration but we didn't go deep enough to give more specific insights. This would be a good task to be undertaken by the future User Research team (as proposed in our first post).

      Example: The wealth of crates on crates.io are a key enabler but can be an

      obstacle

      We mentioned earlier how Rust's extensibility is part of how it achieves versatility. Mechanisms like overloadable operators, traits, and macros let libraries create rich experiences for developers; a minimal standard library combined with easy package management encourage the creation of a rich ecosystem of crates covering needs both common and niche. However, particularly when people are first getting started, that extensibility can come at the cost of supportiveness , when the "tyranny of choice" becomes overwhelming:

      "The crates to use are sort of undiscoverable. There's a layer of tacit knowledge about what crates to use for specific things that you kind of gather through experience and through difficulty. Everyone's doing all of their research." -- Web developer and conference speaker working on developer frameworks

      "Crates.io gives you some of the metadata that you need to make those decisions, but it's not like a one stop shop, right? It's not like you go to crates.io and ask 'what I want to accomplish X, what library do I use'---it doesn't just answer that." -- Research software engineer

      The Rust org has historically been reluctant to "bless" particular crates in the ecosystem. But the reality is that some crates are omnipresent. This is particular challenging for new users to navigate:

      "The tutorial uses Result<Box<dyn Error>> -- but nobody else does. Everybody uses anyhow-result... I started off using the result thing but all the information I found has example code using anyhow. It was a bit of a mismatch and I didn't know what I should do." -- Software engineer working on data science platforms

      "There is no clear recorded consensus on which 3P crates to use. [..] Sometimes it's really not clear---which CBOR crate do you use?[..] It's not easy to see which crates are still actively maintained. [..] The fact that there are so many crates on crates.io makes that a little bit of a risk." -- Rust team from a large technology company

      Recommendations

      Enumerate Rust's design goals and integrating them into our processes

      We recommend creating an RFC that defines the goals we are shooting for as we work on Rust. The RFC should cover the experience of using Rust in total (language, tools, and libraries). This RFC could be authored by the proposed User Research team, though it's not clear who should accept it — perhaps the User Research team itself, or perhaps the leadership council.

      This post identified how the real "empowering magic" of Rust arises from achieving a number of different attributes all at once -- reliability, efficiency, low-level control, supportiveness, and so forth. It would be valuable to have a canonical list of those values that we could collectively refer to as a community and that we could use when evaluating RFCs or other proposed designs.

      There have been a number of prior approaches at this work that we could build on (e.g., this post from Tyler Mandry, the Rustacean Principles, or the Rust Design Axioms). One insight from our research is that we don't need to define which values are "most important". We've seen that for Rust to truly work, it must achieve all the factors at once. Instead of ranking, it may help to describe how it feels when you:

      • Don't achieve it (too little)
      • Get it right (the sweet spot)
      • Go overboard (too much)

      This "goldilocks" framing helps people recognize where they are and course- correct, without creating false hierarchies.

      Double down on extensibility

      We recommend doubling down on extensibility as a core strategy. Rust's extensibility — traits, macros, operator overloading — has been key to its versatility. But that extensibility is currently concentrated in certain areas: the type system and early-stage proc macros. We should expand it to cover supportive interfaces (better diagnostics and guidance from crates) and compilation workflow (letting crates integrate at more stages of the build process).

      Rust's extensibility is a big part of how Rust achieves versatility, and that versatility is a big part of what people love about Rust. Leveraging mechanisms like proc macros, the trait system, and the borrow checker, Rust crates are able to expose high-level, elegant interfaces that compile down to efficiemt machine code. At its best, it can feel a bit like magic.

      Unfortunately, while Rust gives crates good tools for building safe, efficient abstractions, we don't provide tools to enable supportive ones. Within builtin Rust language concepts, we have worked hard to create effective error messages that help steer users to success; we ship the compiler with lints that catch common mistakes or enforce important conventions. But crates benefit from none of this. RFCs like RFC #3368, which introduced the diagnostic namespace and #[diagnostic::on_unimplemented], Rust has already begun moving in this direction. We should continue and look for opportunities to go further, particularly for proc-macros which often create DSL-like interfaces.

      The other major challenge for extensibility is concerned with the build system and backend. Rust's current extensibility mechanisms (e.g., build.rs, proc- macros) are focused on the early stages of the compilation process. But many extensions to Rust, ranging from interop to theorem proving to GPU programming to distributed systems, would benefit from being able to integrate into other stages of the compilation process. The Stable MIR project and the build-std project goal are two examples of this sort of work.

      Doubling down on extensibility will not only make current Rust easier to use, it will enable and support Rust's use in new domains. Safety Critical applications in particular require a host of custom lints and tooling to support the associated standards. Compiler extensibility allows Rust to support those niche needs in a more general way.

      Help users get oriented in the Rust ecosystem

      We recommend finding ways to help users navigate the crates.io ecosystem. Idiomatic Rust today relies on custom crates for everything from error- handling to async runtimes. Leaning on the ecosystem helps Rust to scale to more domains and allows for innovative new approaches to be discovered. But finding which crates to use presents a real obstacle when people are getting started. The Rust org maintains a carefully neutral stance, which is good, but also means that people don't have anywhere to go for advice on a good "starter set" crates.

      The right solution here is not obvious. Expanding the standard library could cut off further experimentation; "blessing" crates carries risks of politics. But just because the right solution is difficult doesn't mean we should ignore the problem. Rust has a history of exploring creative solutions to old tradeoffs, and we should turn that energy to this problem as well.

      Part of the solution is enabling better interop between libraries. This could come in the form of adding key interop traits (particularly for async) or by blessing standard building blocks (e.g., the http crate, which provides type definitions for HTTP libraries). Changes to coherence rules can also help, as the current rules do not permit a new interop trait to be introduced in the ecosystem and incrementally adopted.

      Conclusion

      To sum up the main points in this post:

      • What people love about Rust is the way it empowers them to tackle tough problems and new domains. This is not the result of any one attribute but rather a careful balancing act between many; if any of them are compromised, the language suffers significantly.
      • We make three recommendations to help Rust continue to scale across domains and usage levels

        • Enumerate and describe Rust's design goals and integrate them into our processes, helping to ensure they are observed by future language designers and the broader ecosystem.
        • Double down on extensibility, introducing the ability for crates to influence the develop experience and the compilation pipeline.
        • Help users to navigate the crates.io ecosystem and enable smoother interop
      • In 2025, 72% of Rust users said they wanted to keep using it. In the past, Rust had a way higher score than any other language, but this year, Gleam came awfully close, with 70%! Good for them! Gleam looks awesome--and hey, good choice on the fn keyword. ;)

      • And, uh, how can we be sure not to mess it up?

      • ...for experienced devs operating on less sleep, who do tend to act a lot like noobs.

  2. December 18, 2025
    1. 🔗 IDA Plugin Updates IDA Plugin Updates on 2025-12-18 rss

      IDA Plugin Updates on 2025-12-18

      New Releases:

      Activity:

    2. 🔗 r/LocalLLaMA Kimi K2 Thinking at 28.3 t/s on 4x Mac Studio cluster rss

      Kimi K2 Thinking at 28.3 t/s on 4x Mac Studio cluster | I was testing llama.cpp RPC vs Exo's new RDMA Tensor setting on a cluster of 4x Mac Studios (2x 512GB and 2x 256GB) that Apple loaned me until Februrary. Would love to do more testing between now and returning it. A lot of the earlier testing was debugging stuff since the RDMA support was very new for the past few weeks... now that it's somewhat stable I can do more. The annoying thing is there's nothing nice like llama-bench in Exo, so I can't give as direct comparisons with context sizes, prompt processing speeds, etc. (it takes a lot more fuss to do that, at least). submitted by /u/geerlingguy
      [link] [comments]
      ---|---

    3. 🔗 News Minimalist 🐢 C-sections overtake natural births in England + 14 more stories rss

      In the last 4 days ChatGPT read 119121 top news stories. After removing previously covered events, there are 15 articles with a significance score over 5.5.

      [5.7] Caesaran sections overtake natural vaginal births for the first time in England —bbc.com(+5)

      For the first time in England, Caesarean sections have become more common than unassisted natural vaginal births, according to the latest NHS data for 2024-25.

      The data shows C-sections accounted for 45% of births, narrowly surpassing spontaneous vaginal births at 44%. Officials attribute the rise to factors including maternal choice and pre-existing health conditions.

      The number of C-sections has doubled over the past decade. Experts also note that women may choose surgery due to concerns about the quality of maternity care for natural labor.

      [5.6] US President Trump orders naval blockade of oil tankers near Venezuela —zeit.de(German) (+338)

      U.S. President Donald Trump announced a naval blockade targeting sanctioned oil tankers off Venezuela's coast, causing an immediate increase in oil prices.

      President Trump accused Venezuela’s government of funding drug trafficking with oil sales, calling it a “terrorist organization.” Venezuela rejected the move as a threat and will appeal to the United Nations.

      [5.5] UK to rejoin EU's student exchange program in a step toward closer ties after Brexit —abcnews.go.com(+45)

      The United Kingdom announced it will rejoin the European Union's Erasmus student exchange program, a significant step in repairing post-Brexit relations with the bloc.

      Beginning January 2027, UK and EU students can study abroad without extra fees. The agreement covers various learners and will cost the UK about £570 million for the first year.

      The move reverses the post-Brexit withdrawal by the previous government, which cited high costs. It is part of the current administration’s broader effort to improve relations with the European Union.

      Highly covered news with significance over 5.5

      [6.2] India's Parliament approves bill to open civil nuclear power sector to private firms — abcnews.go.com (+16)

      [6.1] Israel and Egypt sign 30 billion euro gas deal — zeit.de (German) (+12)

      [6.1] EU Parliament approves plan to process asylum claims in third countries — zeit.de (German) (+5)

      [6.1] Europe establishes commission to assess Ukraine war damages — dw.com (+17)

      [6.1] US approves $10 billion arms sale to Taiwan — npr.org (+43)

      [6.0] US sanctions International Criminal Court judges over Israel investigation — kathimerini.gr (Greek) (+10)

      [5.8] European Commission launches first plan for affordable housing — germany.representation.ec.europa.eu (German) (+8)

      [5.7] Trump administration reclassifies cannabis to Schedule III, expanding research access — bbc.com (+7)

      [5.5] European Parliament approves safeguards for Mercosur trade deal — bbc.com (Portuguese) (+147)

      [5.5] Chile elects José Antonio Kast, a right-wing leader with Pinochet sympathies, as president — dn.se (Swedish) (+78)

      [6.4] Researchers create programmable robots smaller than a millimeter — futurity.org (+2)

      [6.5] Brazil eliminates mother-to-child HIV transmission — paho.org (+2)

      Thanks for reading!

      — Vadim


      You can set up and personalize your own newsletter like this with premium.


      Powered by beehiiv

    4. 🔗 r/reverseengineering Decompiling the Synergy: An Empirical Study of Human–LLM Teaming in Software Reverse Engineering rss
    5. 🔗 r/reverseengineering Firmware extractor for CH55x microprocessors rss
    6. 🔗 r/wiesbaden 2 Kinotickets anzugeben für Avatar rss

      Ich habe mir aus Versehen 2 Tickets für die falsche Vorstellung gekauft. Es geht um 2 Tickets in Recliner-Seats, mittig, im Cinestar in Frankfurt für Avatar: Fire & Ash in IMAX 3D auf Deutsch am 25.12. um 15:45. Kosten normalerweise 26,40€ pro Ticket, ich würd sie natürlich günstiger abgeben. Gerne DM

      submitted by /u/keinsportdochichball
      [link] [comments]

    7. 🔗 r/reverseengineering Reconstructed MS-DOS Commander Keen 1-3 Source Code rss
    8. 🔗 The Pragmatic Engineer The Pulse: Cloudflare’s latest outage proves dangers of global configuration changes (again) rss

      Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover Big Tech and startups through the lens of senior engineers and engineering leaders. Today, we cover one out of four topics from last week 's The Pulse issue. Full subscribers received the below article seven days ago. If you 've been forwarded this email, you can subscribe here .

      A mere two weeks after Cloudflare suffered a major outage and took down half the internet, the same thing has happened again. Last Friday, 5th December, thousands of sites went down or partially down once more, in a global Cloudflare outage lasting 25 minutes.

      As per last time, Cloudflare was speedy to share a full postmortem on the same day. It estimated that 28% of Cloudflare's HTTP traffic was impacted. The cause of this latest outage was Cloudflare making a seemingly innocent - but global - configuration change that went on to take out a good portion of Cloudflare, globally , until being reverted. Here's what happened:

      • Cloudflare was rolling out a fix for a nasty React security vulnerability
      • The fix caused an error in an internal testing tool
      • The Cloudflare team disabled the testing tool with a global killswitch
      • As this global configuration change was made, the killswitch unexpectedly caused a bug that resulted in HTTP 500 errors across Cloudflare's network

      In this latest outage, Cloudflare was burnt by yet another global configuration change. The previous outage in November happened thanks to a global database permissions change. In the postmortem of that incident, the Cloudflare team closed with this action item:

      "Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input"

      This change would make it so that Cloudflare's configuration files do not propagate immediately to the full network, as they still do now. But making all global configuration files have staged rollouts is a large implementation that could take months. Evidently, there wasn't time to make it yet, and it has come back to bite Cloudflare.

      Unfortunately for Cloudflare, customers are likely to find unacceptable a second outage with similar causes to a previous one, only weeks ago. If Cloudflare proves unreliable, customers should plan to onboard to backup CDNs at the very least, and a backup CDN vendor will do its best to convince new customers to use it as the primary CDN.

      Cloudflare's value-add rests on rock-solid reliability without customers needing to budget for a backup CDN. Yes, publishing postmortems on the same day as an outage occurs helps restore trust, but that will crumble anyway with repeated large outages.

      To be fair, the company is doubling down on implementing staged configuration rollouts. In its postmortem, Cloudflare is its own biggest critic. CTO Dane Knecht reflected:

      "[Global configuration changes rolling out globally] remains our first priority across the organization. In particular, the projects outlined below should help contain the impact of these kinds of changes:Enhanced Rollouts & Versioning: Similar to how we slowly deploy software with strict health validation, data used for rapid threat response and general configuration needs to have the same safety and blast mitigation features. This includes health validation and quick rollback capabilities among other things.Streamlined break glass capabilities: Ensure that critical operations can still be achieved in the face of additional types of failures. This applies to internal services as well as all standard methods of interaction with the Cloudflare control plane used by all Cloudflare customers." Fail-Open" Error Handling: As part of the resilience effort, we are replacing the incorrectly applied hard-fail logic across all critical Cloudflare data-plane components. If a configuration file is corrupt or out- of-range (e.g., exceeding feature caps), the system will log the error and default to a known-good state or pass traffic without scoring, rather than dropping requests. Some services will likely give the customer the option to fail open or closed in certain scenarios. This will include drift-prevention capabilities to ensure this is enforced continuously.
      These kinds of incidents, and how closely they are clustered together, are not acceptable for a network like ours".

      Global configuration errors often trigger large outages

      There's a pattern of implicit or explicit global configuration errors causing large outages, and some of the biggest ones in recent years were caused by a single change being rolled out to a whole network of machines:

      • DNS and DNS-related systems like BGP: DNS changes are global by default, so it's no wonder that DNS changes can cause global outages. Meta's 7-hour outage in 2021 was related to DNS changes (more specifically, Border Gateway Protocol changes.) Meanwhile, the AWS outage in October started with the internal DNS system.
      • OS updates happening at the same time, globally: Datadog's 2023 outage cost the company $5M and was caused by Datadog's Ubuntu machines executing an OS update within the same time window, globally. It caused issues with networking, and it didn't help that Datadog ran its infra on 3 different cloud providers across 3 networks. The same kind of Ubuntu update also caused a global outage for Heroku in 2024.

      Globally replicating configs:in 2024, a configuration policy change was rolled out globally and crashed every Spanner database node straight away. As Google concluded in its postmortem: "Given the global nature of quota management, this metadata was replicated globally within seconds".

      altStep 2 - replicating a configuration file globally across GCP - caused a global outage in 2024

      Implementing gradual rollouts for all configuration files is a lot of work. It's also invisible labor because when done well, then its benefits will be undetectable, except in the absence of incidents, thanks to better infrastructure!

      The largest systems in the world will likely have to implement safer ways to roll out configs - but not everybody needs to. Staged configuration rollout doesn't make much sense for smaller companies and products because this infra work slows down product development.

      It doesn't just slow down building, but every deployment, too, and this friction is designed to make everything slower. As such, they don't make much sense unless the stability of mature systems is more important than fast iteration.

      Software engineering is a field where tradeoffs are a fact of life, and universal solutions don't exist. The development which worked for a system with 1/100th of the load and users a year ago, may not make sense today.

      This was one out of the four topics covered in this week 's The Pulse. The full edition additionally covers:

      1. Industry Pulse. Poor capacity planning at AWS, Meta moves to a "closed AI" approach, a looming RAM shortage, early-stage startups hiring slower than before, how long it takes to earn $600K at Amazon and Meta, Apple loses execs to Meta, and more
      2. How the engineering team at Oxide uses LLMs. They find LLMs great for reading documents and lightweight research, mixed for coding and code review, and a poor choice for writing documents - or any kind of writing, really!
      3. Linux officially supports Rust in the kernel. Rust is now a first-class language inside the Linux kernel, eight months after a Linux Foundation Fellow predicted more support for Rust. A summary of the pros and cons of Rust support for Linux

      Read the full The Pulse issue.

    9. 🔗 r/reverseengineering Bridging the Gap between Real-World and Formal Binary Lifting through Filtered-Simulation (OOPSLA 2025) rss
    10. 🔗 r/LocalLLaMA Google's Gemma models family rss

      Google's Gemma models family | submitted by /u/jacek2023
      [link] [comments]
      ---|---

    11. 🔗 r/wiesbaden Café Klatsch eröffnet am 20.12 rss
    12. 🔗 batrachianai/toad v0.5.1 release

      Fixed missing "-g" in Claude installer. Added a "Launch" entry to the actions selection.

    13. 🔗 Simon Willison Your job is to deliver code you have proven to work rss

      In all of the debates about the value of AI-assistance in software development there's one depressing anecdote that I keep on seeing: the junior engineer, empowered by some class of LLM tool, who deposits giant, untested PRs on their coworkers - or open source maintainers - and expects the "code review" process to handle the rest.

      This is rude, a waste of other people's time, and is honestly a dereliction of duty as a software developer.

      Your job is to deliver code you have proven to work.

      As software engineers we don't just crank out code - in fact these days you could argue that's what the LLMs are for. We need to deliver code that works - and we need to include proof that it works as well. Not doing that directly shifts the burden of the actual work to whoever is expected to review our code.

      How to prove it works

      There are two steps to proving a piece of code works. Neither is optional.

      The first is manual testing. If you haven't seen the code do the right thing yourself, that code doesn't work. If it does turn out to work, that's honestly just pure chance.

      Manual testing skills are genuine skills that you need to develop. You need to be able to get the system into an initial state that demonstrates your change, then exercise the change, then check and demonstrate that it has the desired effect.

      If possible I like to reduce these steps to a sequence of terminal commands which I can paste, along with their output, into a comment in the code review. Here's a recent example.

      Some changes are harder to demonstrate. It's still your job to demonstrate them! Record a screen capture video and add that to the PR. Show your reviewers that the change you made actually works.

      Once you've tested the happy path where everything works you can start trying the edge cases. Manual testing is a skill, and finding the things that break is the next level of that skill that helps define a senior engineer.

      The second step in proving a change works is automated testing. This is so much easier now that we have LLM tooling, which means there's no excuse at all for skipping this step.

      Your contribution should bundle the change with an automated test that proves the change works. That test should fail if you revert the implementation.

      The process for writing a test mirrors that of manual testing: get the system into an initial known state, exercise the change, assert that it worked correctly. Integrating a test harness to productively facilitate this is another key skill worth investing in.

      Don't be tempted to skip the manual test because you think the automated test has you covered already! Almost every time I've done this myself I've quickly regretted it.

      Make your coding agent prove it first

      The most important trend in LLMs in 2025 has been the explosive growth of coding agents - tools like Claude Code and Codex CLI that can actively execute the code they are working on to check that it works and further iterate on any problems.

      To master these tools you need to learn how to get them to prove their changes work as well.

      This looks exactly the same as the process I described above: they need to be able to manually test their changes as they work, and they need to be able to build automated tests that guarantee the change will continue to work in the future.

      Since they're robots, automated tests and manual tests are effectively the same thing.

      They do feel a little different though. When I'm working on CLI tools I'll usually teach Claude Code how to run them itself so it can do one-off tests, even though the eventual automated tests will use a system like Click's CLIRunner.

      When working on CSS changes I'll often encourage my coding agent to take screenshots when it needs to check if the change it made had the desired effect.

      The good news about automated tests is that coding agents need very little encouragement to write them. If your project has tests already most agents will extend that test suite without you even telling them to do so. They'll also reuse patterns from existing tests, so keeping your test code well organized and populated with patterns you like is a great way to help your agent build testing code to your taste.

      Developing good taste in testing code is another of those skills that differentiates a senior engineer.

      The human provides the accountability

      A computer can never be held accountable. That's your job as the human in the loop.

      Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review. That's no longer valuable. What's valuable is contributing code that is proven to work.

      Next time you submit a PR, make sure you've included your evidence that it works as it should.

      You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.

    14. 🔗 Anton Zhiyanov Detecting goroutine leaks with synctest/pprof rss

      Deadlocks, race conditions, and goroutine leaks are probably the three most common problems in concurrent Go programming. Deadlocks usually cause panics, so they're easier to spot. The race detector can help find data races (although it doesn't catch everything and doesn't help with other types of race conditions). As for goroutine leaks, Go's tooling did not address them for a long time.

      A leak occurs when one or more goroutines are indefinitely blocked on synchronization primitives like channels, while other goroutines continue running and the program as a whole keeps functioning. We'll look at some examples shortly.

      Things started to change in Go 1.24 with the introduction of the synctest package. There will be even bigger changes in Go 1.26, which adds a new experimental goroutineleak profile that reports leaked goroutines. Let's take a look!

      A simple leakDetection: goleakDetection: synctestDetection: pprofAlgorithmRange over channelDouble sendEarly returnTake firstCancel/timeoutOrphansFinal thoughts

      A simple leak

      Let's say there's a function that runs the given functions concurrently and sends their results to an output channel:

      // Gather runs the given functions concurrently
      // and collects the results.
      func Gather(funcs ...func() int) <-chan int {
          out := make(chan int)
          for _, f := range funcs {
              go func() {
                  out <- f()
              }()
          }
          return out
      }
      

      And a simple test:

      func Test(t *testing.T) {
          out := Gather(
              func() int { return 11 },
              func() int { return 22 },
              func() int { return 33 },
          )
      
          total := 0
          for range 3 {
              total += <-out
          }
      
          if total != 66 {
              t.Errorf("got %v, want 66", total)
          }
      }
      
      
      
      PASS
      

      Send three functions to be executed and collect the results from the output channel. The test passed, so the function works correctly. But does it really?

      Let's pass three functions to Gather without collecting the results, and count the goroutines:

      func main() {
          Gather(
              func() int { return 11 },
              func() int { return 22 },
              func() int { return 33 },
          )
      
          time.Sleep(50 * time.Millisecond)
          nGoro := runtime.NumGoroutine() - 1 // minus the main goroutine
          fmt.Println("nGoro =", nGoro)
      }
      
      
      
      nGoro = 3
      

      After 50 ms — when all the functions should definitely have finished — there are still three running goroutines (runtime.NumGoroutine). In other words, all the goroutines are stuck.

      The reason is that the out channel is unbuffered. If the client doesn't read from it, or doesn't read all the results, the goroutines inside Gather get blocked on sending the f() result to out.

      Let's modify the test to catch the leak.

      Detecting the leak: goleak

      Obviously, we don't want to rely on runtime.NumGoroutine in tests — such check is too fragile. Let's use a third-party goleak package instead:

      // Gather runs the given functions concurrently
      // and collects the results.
      func Gather(funcs ...func() int) <-chan int {
          out := make(chan int)
          for _, f := range funcs {
              go func() {
                  out <- f()
              }()
          }
          return out
      }
      
      func Test(t *testing.T) {
          defer goleak.VerifyNone(t)
      
          Gather(
              func() int { return 11 },
              func() int { return 22 },
              func() int { return 33 },
          )
      }
      

      playground ▶

      --- FAIL: Test (0.44s)
      goleak_test.go:28: found unexpected goroutines:
      
      Goroutine 8 in state chan send, with play.Gather.func1 on top of the stack:
      play.Gather.func1()
          /tmp/sandbox4216740326/prog_test.go:16 +0x37
      created by play.Gather in goroutine 7
          /tmp/sandbox4216740326/prog_test.go:15 +0x45
      
      Goroutine 9 in state chan send, with play.Gather.func1 on top of the stack:
      play.Gather.func1()
          /tmp/sandbox4216740326/prog_test.go:16 +0x37
      created by play.Gather in goroutine 7
          /tmp/sandbox4216740326/prog_test.go:15 +0x45
      
      Goroutine 10 in state chan send, with play.Gather.func1 on top of the stack:
      play.Gather.func1()
          /tmp/sandbox4216740326/prog_test.go:16 +0x37
      created by play.Gather in goroutine 7
          /tmp/sandbox4216740326/prog_test.go:15 +0x45
      

      The test output clearly shows where the leak occurs.

      Goleak uses time.Sleep internally, but it does so quite efficiently. It inspects the stack for unexpected goroutines up to 20 times, with the wait time between checks increasing exponentially, starting at 1 microsecond and going up to 100 milliseconds. This way, the test runs almost instantly.

      Still, I'd prefer not to use third-party packages and time.Sleep.

      Detecting the leak: synctest

      Let's check for leaks without any third-party packages by using the synctest package (experimental in Go 1.24, production-ready in Go 1.25+):

      // Gather runs the given functions concurrently
      // and collects the results.
      func Gather(funcs ...func() int) <-chan int {
          out := make(chan int)
          for _, f := range funcs {
              go func() {
                  out <- f()
              }()
          }
          return out
      }
      
      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              Gather(
                  func() int { return 11 },
                  func() int { return 22 },
                  func() int { return 33 },
              )
              synctest.Wait()
          })
      }
      
      
      
      --- FAIL: Test (0.00s)
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]
      
      goroutine 10 [chan send (durable), synctest bubble 1]:
      sandbox.Gather.func1()
          /tmp/sandbox/main_test.go:34 +0x37
      created by sandbox.Gather in goroutine 9
          /tmp/sandbox/main_test.go:33 +0x45
      
      goroutine 11 [chan send (durable), synctest bubble 1]:
      sandbox.Gather.func1()
          /tmp/sandbox/main_test.go:34 +0x37
      created by sandbox.Gather in goroutine 9
          /tmp/sandbox/main_test.go:33 +0x45
      
      goroutine 12 [chan send (durable), synctest bubble 1]:
      sandbox.Gather.func1()
          /tmp/sandbox/main_test.go:34 +0x37
      created by sandbox.Gather in goroutine 9
          /tmp/sandbox/main_test.go:33 +0x45
      

      I'll keep this explanation short since synctest isn't the main focus of this article. If you want to learn more about it, check out the Concurrency testing guide. I highly recommend it — synctest is super useful!

      Here's what happens:

      1. The call to synctest.Test starts a testing bubble in a separate goroutine.
      2. The call to Gather starts three goroutines.
      3. The call to synctest.Wait blocks the root bubble goroutine.
      4. One of the goroutines executes f, tries to write to out, and gets blocked (because no one is reading from out).
      5. The same thing happens to the other two goroutines.
      6. synctest.Wait sees that all the child goroutines in the bubble are durably blocked, so it unblocks the root goroutine.
      7. The inner test function finishes.

      Next, synctest.Test comes into play. It tries to wait for all child goroutines to finish before it returns. But if it sees that some goroutines are durably blocked (in our case, all three are blocked trying to send to the channel), it panics:

      main bubble goroutine has exited but blocked goroutines remain

      So, here we found the leak without using time.Sleep or goleak. Pretty useful!

      Detecting the leak: pprof

      Let's check for leaks using the new profile type goroutineleak (experimental in Go 1.26). We'll use a helper function to run the profiled code and print the results when the profile is ready:

      func printLeaks(f func()) {
          prof := pprof.Lookup("goroutineleak")
      
          defer func() {
              time.Sleep(50 * time.Millisecond)
              var content strings.Builder
              prof.WriteTo(&content, 2)
              // Print only the leaked goroutines.
              goros := strings.Split(content.String(), "\n\n")
              for _, goro := range goros {
                  if strings.Contains(goro, "(leaked)") {
                      fmt.Println(goro + "\n")
                  }
              }
          }()
      
          f()
      }
      

      (If you try this locally, don't forget to set the GOEXPERIMENT=goroutineleakprofile environment variable.)

      Call Gather with three functions and observe all three leaks:

      func main() {
          printLeaks(func() {
              Gather(
                  func() int { return 11 },
                  func() int { return 22 },
                  func() int { return 33 },
              )
          })
      }
      
      
      
      goroutine 5 [chan send (leaked)]:
      main.Gather.func1()
          /tmp/sandbox/main.go:35 +0x37
      created by main.Gather in goroutine 1
          /tmp/sandbox/main.go:34 +0x45
      
      goroutine 6 [chan send (leaked)]:
      main.Gather.func1()
          /tmp/sandbox/main.go:35 +0x37
      created by main.Gather in goroutine 1
          /tmp/sandbox/main.go:34 +0x45
      
      goroutine 7 [chan send (leaked)]:
      main.Gather.func1()
          /tmp/sandbox/main.go:35 +0x37
      created by main.Gather in goroutine 1
          /tmp/sandbox/main.go:34 +0x45
      

      We have a nice goroutine stack trace that shows exactly where the leak happens. Unfortunately, we had to use time.Sleep again, so this probably isn't the best way to test — unless we combine it with synctest to use the fake clock.

      On the other hand, we can collect a profile from a running program, which makes it really useful for finding leaks in production systems (unlike synctest). Pretty neat.

      Leak detection algorithm

      This goroutineleak profile uses the garbage collector's marking phase to find goroutines that are permanently blocked (leaked). The approach is explained in detail in the proposal and the paper by Saioc et al. — check it out if you're interested.

      Here's the gist of it:

         [ Start: GC mark phase ]
                   │
                   │ 1. Collect live goroutines
                   v
         ┌───────────────────────┐
         │   Initial roots       │ <────────────────┐
         │ (runnable goroutines) │                  │
         └───────────────────────┘                  │
                   │                                │
                   │ 2. Mark reachable memory       │
                   v                                │
         ┌───────────────────────┐                  │
         │   Reachable objects   │                  │
         │  (channels, mutexes)  │                  │
         └───────────────────────┘                  │
                   │                                │
                   │ 3a. Check blocked goroutines   │
                   v                                │
         ┌───────────────────────┐          (Yes)   │
         │ Is blocked G waiting  │ ─────────────────┘
         │ on a reachable obj?   │ 3b. Add G to roots
         └───────────────────────┘
                   │
                   │ (No - repeat until no new Gs found)
                   v
         ┌───────────────────────┐
         │   Remaining blocked   │
         │      goroutines       │
         └───────────────────────┘
                   │
                   │ 5. Report the leaks
                   v
            [   LEAKED!   ]
       (Blocked on unreachable
        synchronization objects)
      
      1. Collect live goroutines. Start with currently active (runnable or running) goroutines as roots. Ignore blocked goroutines for now.
      2. Mark reachable memory. Trace pointers from roots to find which memory objects (like channels or mutexes) are currently reachable by these roots.
      3. Resurrect blocked goroutines. Check all currently blocked goroutines. If a blocked goroutine is waiting for a synchronization resource that was just marked as reachable — add that goroutine to the roots.
      4. Iterate. Repeat steps 2 and 3 until there are no more new goroutines blocked on reachable objects.
      5. Report the leaks. Any goroutines left in the blocked state are waiting for resources that no active part of the program can access. They're considered leaked.

      In the rest of the article, we'll review the different types of leaks often observed in production and see whether synctest and goroutineleak are able to detect each of them (spoiler: they are).

      Based on the code examples from the common-goroutine-leak- patterns repository by Georgian-Vlad Saioc, licensed under the Apache-2.0 license.

      Range over channel

      One or more goroutines receive from a channel using range, but the sender never closes the channel, so all the receivers eventually leak:

      func RangeOverChan(list []any, workers int) {
          ch := make(chan any)
      
          // Launch workers.
          for range workers {
              go func() {
                  // Each worker processes items one by one.
                  // The channel is never closed, so every worker leaks
                  // once there are no more items left to process.
                  for item := range ch {
                      _ = item
                  }
              }()
          }
      
          // Send items for processing.
          for _, item := range list {
              ch <- item
          }
      
          // close(ch) // (X) uncomment to fix
      }
      

      Using synctest:

      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              RangeOverChan([]any{11, 22, 33, 44}, 2)
              synctest.Wait()
          })
      }
      
      
      
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
      
      goroutine 10 [chan receive (durable), synctest bubble 1]:
      sandbox.RangeOverChan.func1()
          /tmp/sandbox/main_test.go:36 +0x34
      created by sandbox.RangeOverChan in goroutine 9
          /tmp/sandbox/main_test.go:34 +0x45
      
      goroutine 11 [chan receive (durable), synctest bubble 1]:
      sandbox.RangeOverChan.func1()
          /tmp/sandbox/main_test.go:36 +0x34
      created by sandbox.RangeOverChan in goroutine 9
          /tmp/sandbox/main_test.go:34 +0x45
      

      Using goroutineleak:

      func main() {
          printLeaks(func() {
              RangeOverChan([]any{11, 22, 33, 44}, 2)
          })
      }
      
      
      
      goroutine 19 [chan receive (leaked)]:
      main.RangeOverChan.func1()
          /tmp/sandbox/main.go:36 +0x34
      created by main.RangeOverChan in goroutine 1
          /tmp/sandbox/main.go:34 +0x45
      
      goroutine 20 [chan receive (leaked)]:
      main.RangeOverChan.func1()
          /tmp/sandbox/main.go:36 +0x34
      created by main.RangeOverChan in goroutine 1
          /tmp/sandbox/main.go:34 +0x45
      

      Notice how synctest and goroutineleak give almost the same stack traces, clearly showing the root cause of the problem. You'll see this in the next examples as well.

      Fix: The sender should close the channel after it finishes sending.

      Try uncommenting the ⓧ line and see if both checks pass.

      Double send

      The sender accidentally sends more values to a channel than intended, and leaks:

      func DoubleSend() <-chan any {
          ch := make(chan any)
      
          go func() {
              res, err := work(13)
              if err != nil {
                  // In case of an error, send nil.
                  ch <- nil
                  // return // (X) uncomment to fix
              }
              // Otherwise, continue with normal behaviour.
              // This leaks if err != nil.
              ch <- res
          }()
      
          return ch
      }
      

      Using synctest:

      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              <-DoubleSend()
              synctest.Wait()
          })
      }
      
      
      
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
      
      goroutine 22 [chan send (durable), synctest bubble 1]:
      sandbox.DoubleSend.func1()
          /tmp/sandbox/main_test.go:42 +0x4c
      created by sandbox.DoubleSend in goroutine 21
          /tmp/sandbox/main_test.go:32 +0x5f
      

      Using goroutineleak:

      func main() {
          printLeaks(func() {
              <-DoubleSend()
          })
      }
      
      
      
      goroutine 19 [chan send (leaked)]:
      main.DoubleSend.func1()
          /tmp/sandbox/main.go:42 +0x4c
      created by main.DoubleSend in goroutine 1
          /tmp/sandbox/main.go:32 +0x67
      

      Fix: Make sure that each possible path in the code sends to the channel no more times than the receiver is ready for. Alternatively, make the channel's buffer large enough to handle all possible sends.

      Try uncommenting the ⓧ line and see if both checks pass.

      Early return

      The parent goroutine exits without receiving a value from the child goroutine, so the child leaks:

      func EarlyReturn() {
          ch := make(chan any) // (X) should be buffered
      
          go func() {
              res, _ := work(42)
              // Leaks if the parent goroutine terminates early.
              ch <- res
          }()
      
          _, err := work(13)
          if err != nil {
              // Early return in case of error.
              // The child gorouine leaks.
              return
          }
      
          // Only receive if there is no error.
          <-ch
      }
      

      Using synctest:

      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              EarlyReturn()
              synctest.Wait()
          })
      }
      
      
      
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
      
      goroutine 22 [chan send (durable), synctest bubble 1]:
      sandbox.EarlyReturn.func1()
          /tmp/sandbox/main_test.go:35 +0x45
      created by sandbox.EarlyReturn in goroutine 21
          /tmp/sandbox/main_test.go:32 +0x5f
      

      Using goroutineleak:

      func main() {
          printLeaks(func() {
              EarlyReturn()
          })
      }
      
      
      
      goroutine 7 [chan send (leaked)]:
      main.EarlyReturn.func1()
          /tmp/sandbox/main.go:35 +0x45
      created by main.EarlyReturn in goroutine 1
          /tmp/sandbox/main.go:32 +0x67
      

      Fix: Make the channel buffered so the child goroutine doesn't get blocked when sending.

      Try making the channel buffered at line ⓧ and see if both checks pass.

      Cancel/timeout

      Similar to "early return". If the parent is canceled before receiving a value from the child goroutine, the child leaks:

      func Canceled(ctx context.Context) {
          ch := make(chan any) // (X) should be buffered
      
          go func() {
              res, _ := work(100)
              // Leaks if the parent goroutine gets canceled.
              ch <- res
          }()
      
          // Wait for the result or for cancellation.
          select {
          case <-ctx.Done():
              // The child goroutine leaks.
              return
          case res := <-ch:
              // Process the result.
              _ = res
          }
      }
      

      Using synctest:

      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              ctx, cancel := context.WithCancel(t.Context())
              cancel()
              Canceled(ctx)
      
              time.Sleep(time.Second)
              synctest.Wait()
          })
      }
      
      
      
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
      
      goroutine 22 [chan send (durable), synctest bubble 1]:
      sandbox.Canceled.func1()
          /tmp/sandbox/main_test.go:35 +0x45
      created by sandbox.Canceled in goroutine 21
          /tmp/sandbox/main_test.go:32 +0x76
      

      Using goroutineleak:

      func main() {
          printLeaks(func() {
              ctx, cancel := context.WithCancel(context.Background())
              cancel()
              Canceled(ctx)
          })
      }
      
      
      
      goroutine 19 [chan send (leaked)]:
      main.Canceled.func1()
          /tmp/sandbox/main.go:35 +0x45
      created by main.Canceled in goroutine 1
          /tmp/sandbox/main.go:32 +0x7b
      

      Fix: Make the channel buffered so the child goroutine doesn't get blocked when sending.

      Try making the channel buffered at line ⓧ and see if both checks pass.

      Take first

      The parent launches N child goroutines, but is only interested in the first result. The rest N-1 children leak:

      func TakeFirst(items []any) {
          ch := make(chan any)
      
          // Iterate over every item.
          for _, item := range items {
              go func() {
                  ch <- process(item)
              }()
          }
      
          // Retrieve the first result. All other children leak.
          // Also, the parent leaks if len(items) == 0.
          <-ch
      }
      

      Using synctest (zero items, the parent leaks):

      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              go TakeFirst(nil)
              synctest.Wait()
          })
      }
      
      
      
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
      
      goroutine 22 [chan receive (durable), synctest bubble 1]:
      sandbox.TakeFirst({0x0, 0x0, 0x0?})
          /tmp/sandbox/main_test.go:40 +0xdd
      created by sandbox.Test.func1 in goroutine 21
          /tmp/sandbox/main_test.go:44 +0x1a
      

      Using synctest (multiple items, children leak):

      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              go TakeFirst([]any{11, 22, 33})
              synctest.Wait()
          })
      }
      
      
      
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
      
      goroutine 10 [chan send (durable), synctest bubble 1]:
      sandbox.TakeFirst.func1()
          /tmp/sandbox/main_test.go:35 +0x2e
      created by sandbox.TakeFirst in goroutine 9
          /tmp/sandbox/main_test.go:34 +0x51
      
      goroutine 11 [chan send (durable), synctest bubble 1]:
      sandbox.TakeFirst.func1()
          /tmp/sandbox/main_test.go:35 +0x2e
      created by sandbox.TakeFirst in goroutine 9
          /tmp/sandbox/main_test.go:34 +0x51
      

      Using goroutineleak (zero items, the parent leaks):

      func main() {
          printLeaks(func() {
              go TakeFirst(nil)
          })
      }
      
      
      
      goroutine 19 [chan receive (leaked)]:
      main.TakeFirst({0x0, 0x0, 0x0?})
          /tmp/sandbox/main.go:40 +0xeb
      created by main.main.func1 in goroutine 1
          /tmp/sandbox/main.go:44 +0x1a
      

      Using goroutineleak (multiple items, children leak):

      func main() {
          printLeaks(func() {
              go TakeFirst([]any{11, 22, 33})
          })
      }
      
      
      
      goroutine 20 [chan send (leaked)]:
      main.TakeFirst.func1()
          /tmp/sandbox/main.go:35 +0x2e
      created by main.TakeFirst in goroutine 19
          /tmp/sandbox/main.go:34 +0x51
      
      goroutine 21 [chan send (leaked)]:
      main.TakeFirst.func1()
          /tmp/sandbox/main.go:35 +0x2e
      created by main.TakeFirst in goroutine 19
          /tmp/sandbox/main.go:34 +0x51
      

      Fix: Make the channel's buffer large enough to hold values from all child goroutines. Also, return early if the source collection is empty.

      Try changing the TakeFirst implementation as follows and see if both checks pass:

      func TakeFirst(items []any) {
          if len(items) == 0 {
              // Return early if the source collection is empty.
              return
          }
          // Make the channel's buffer large enough.
          ch := make(chan any, len(items))
      
          // Iterate over every item
          for _, item := range items {
              go func() {
                  ch <- process(item)
              }()
          }
      
          // Retrieve first result.
          <-ch
      }
      

      Orphans

      Inner goroutines leak because the client doesn't follow the contract described in the type's interface and documentation.

      Let's say we have a Worker type with the following contract:

      // A worker processes a queue of items one by one in the background.
      // A started worker must eventually be stopped.
      // Failing to stop a worker results in a goroutine leak.
      type Worker struct {
          // ...
      }
      
      // NewWorker creates a new worker.
      func NewWorker() *Worker
      
      // Start starts the processing.
      func (w *Worker) Start()
      
      // Stop stops the the processing.
      func (w *Worker) Stop()
      
      // Push adds an item to the processing queue.
      func (w *Worker) Push(item any)
      

      The implementation isn't particularly important — what really matters is the public contract.

      Let's say the client breaks the contract and doesn't stop the worker:

      func Orphans() {
          w := NewWorker()
          w.Start()
          // defer w.Stop() // (X) uncomment to fix
      
          items := make([]any, 10)
          for _, item := range items {
              w.Push(item)
          }
      }
      

      Then the worker goroutines will leak, just like the documentation says.

      Using synctest:

      func Test(t *testing.T) {
          synctest.Test(t, func(t *testing.T) {
              Orphans()
              synctest.Wait()
          })
      }
      
      
      
      panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
      
      goroutine 10 [select (durable), synctest bubble 1]:
      sandbox.(*Worker).run(0xc00009c190)
          /tmp/sandbox/main_test.go:113 +0xcc
      created by sandbox.(*Worker).Start.func1 in goroutine 9
          /tmp/sandbox/main_test.go:89 +0xb6
      
      goroutine 11 [select (durable), synctest bubble 1]:
      sandbox.(*Worker).run(0xc00009c190)
          /tmp/sandbox/main_test.go:113 +0xcc
      created by sandbox.(*Worker).Start.func1 in goroutine 9
          /tmp/sandbox/main_test.go:90 +0xf6
      

      Using goroutineleak:

      func main() {
          printLeaks(func() {
              Orphans()
          })
      }
      
      
      
      goroutine 19 [select (leaked)]:
      main.(*Worker).run(0x147fe4630000)
          /tmp/sandbox/main.go:112 +0xce
      created by main.(*Worker).Start.func1 in goroutine 1
          /tmp/sandbox/main.go:88 +0xba
      
      goroutine 20 [select (leaked)]:
      main.(*Worker).run(0x147fe4630000)
          /tmp/sandbox/main.go:112 +0xce
      created by main.(*Worker).Start.func1 in goroutine 1
          /tmp/sandbox/main.go:89 +0x105
      

      Fix: Follow the contract and stop the worker to make sure all goroutines are stopped.

      Try uncommenting the ⓧ line and see if both checks pass.

      Final thoughts

      Thanks to improvements in Go 1.24-1.26, it's now much easier to catch goroutine leaks, both during testing and in production.

      The synctest package is available in 1.24 (experimental) and 1.25+ (production-ready). If you're interested, I have a detailed interactive guide on it.

      The goroutineleak profile will be available in 1.26 (experimental). According to the authors, the implementation is already production-ready. It's only marked as experimental so they can get feedback on the API, especially about making it a new profile.

      Check the proposal and the commits for more details on goroutineleak:

      P.S. If you are into concurrency, check out my interactive book.

    15. 🔗 Kagi release notes Dec 18th, 2025 - Popular areas land in Kagi Maps rss

      Kagi Maps

      We're continuously improving Kagi Maps, and with the latest release we've added a new data layer: Popular Areas. It highlights the busiest and most frequented spots when you're exploring a new city.

      Two side-by-side mobile screenshots displaying map features on kagi.com. The
left screen shows a map view with arrows highlighting a layers icon and a
'Popular Areas' option within the layer menu; the caption reads 'Addition of
Popular Areas'. The right screen displays business details for Black Crown
Coffee Company, with arrows pointing to a menu button and a 'Report an issue'
tooltip; the caption reads 'Additional Data & Ability to Report An
Issue'

      New Global Map Layer:

      • Highlights most popular areas where people congregate near Cafes/Restaurants/Shops/Cultural-Centers

      POI Infoboxes have more 3rd party external links:

      • OpenStreetMap, Wikipedia, Google Maps, Apple Maps
      • Reviews on Yelp and TripAdvisor
      • Social media profiles (Facebook, Instagram, Twitter)
      • Reservations via OpenTable
      • easier-to-read opening hours with weekly schedules
      • Direct links to restaurant menus when available

      Strengthening ties to OpenStreetMap Community:

      • with ability to Report Map Issues to OpenStreetMap directly. A new "Report an issue" option in Infobox connects you to OpenStreetMap's note system, where you can flag errors or suggest improvements to the underlying map data.

      Additional Map Data:

      • POI data now preloads in the background for faster navigation when clicking markers or search results
      • Mobile-optimized zoom controls for smoother touch interaction
      • Sorting preferences (distance, rating, price) now persist across sessions
      • Faster POI on click load-times with use of shorterm caching
      • Middle-click support on search results and sorting buttons

      Various ad-hoc bug fixes and database improvements:

      • Improved caching system for POI data reducing redundant API calls
      • Better location cookie handling using kagi_precise_location
      • Various improvements to our POI-matching algorithms
      • UI rendering fixes

      Kagi Search

      • Location management is now available in settings, where you can view and update your location at any time. Kagi uses either a coarse location estimated from your IP address or, if you opt in, your device's precise location. This is stored only on your device as a cookie. It supports local-intent searches (e.g. "petrol stations near me") and sets the initial map position in Kagi Maps.
        screenshot of the Kagi settings interface with the 'Search' and 'Privacy'
tabs selected. A red box highlights the 'Location' section, which contains a
description of how location is determined, a 'Fetch precise location' button,
and text indicating the current IP-based
location

      • Incorrect geoip location #9194 @klandarey

      • Searching for 'Pop! OS (System76)' redirects to incendar.com #9053 @gigabit-jack
      • Summarizer fails on all YouTube videos, "Sorry, no transcript could be found for this video." #9278 @urrlich
      • Kid accounts cannot select a companion. #9246 @leuchtthurm
      • Quick answer responds in Indonesian, despite results being English #9237 @zq40000
      • Can't get to the consumption page from a team plan account #7265 @Thibaultmol-kagi
      • Add an indicator to the shield for websites marked by Surveillance Watch #8912 @pma_snek
      • !tr as the regional bang for Turkey #6376 @GERGE
      • Quick Answer shifts layout on mobile #9171 @hmnd
      • Context menus for inline news and videos are stuck inside the frame #9127 @pma_snek

      Kagi Assistant

      You can now effortlessly navigate your threads and jump to specific messages with our new thread scrollbar.

      • We've made the following model upgrades:
        • Grok 4 Fast and GPT 5 have been updated to their latest versions
        • Retired Mistral Medium in favor of Mistral Large
      • Add a column to the Custom Assistants settings table that displays each assistant's associated bang #7440 @jogojapan
      • Kagi Mobile Assistant: Tapping or holding a model name should prompt the model info box #8023 @__
      • Claude output cut off around 6500 tokens #9265 @igakagi
      • When using Web Access, Kagi Assistant searches too few sources #6149 @Mar
      • Buttons to quickly jump between chats in an Assistant thread #9232 @brrrendan
      • Right click on highlighted text cause thinking, search, plan and etc to expend #9117 @rxzlion
      • Assistant using 2024 as the search year in 2025 #8350 @blackbird2150
      • Update Grok Fast to 4.1 #9190 @ldmitch
      • Sharing page for Assistant broken #9176 @catwars
      • Research Assistant image generation should allow you to specify higher resolution than 1024x1024 #9156 @jmp242
      • Navigating between versions of the same prompt is broken with 3 prompts after page reload in Kagi Assistant #7134 @bsamek

      Post of the week

      Here is this week's featured social media mention:

      Mastodon post from Gonzalo Fernandez Gomez stating: I did it. I am
officially a member of the Kagi family. I upgraded as soon as I used up my
trial. People said they would never pay for TV. Now everybody does. I'm pretty
confident the same will happen with search. It's whether you control your
search experience or advertisers do. You
choose.

      We truly appreciate your support in spreading the word, so be sure to follow us and tag us in your comments!

      Is your browser a rat?

      Check out this fun video we made for Orion. We also made this comic in collaboration with artist Chaz Hutton to show why we built Orion to be your trusted daily companion for the web:

      Stick figure illustrations showing six internet activities with Orion
Browser: web exploration, password security, data privacy, ad blocking, and
tab
management.

      End-of-Year Community Event

      Join us tomorrow, December 19, at 09:00 PST (convert to local time) for Kagi's annual community event, covering major updates, launches, and what's next. Plus live Q&A with the Kagi team. Register via Zoom. Looking forward to seeing you there!

    16. 🔗 r/wiesbaden Yours Sportsbar hat geschlossen rss
    17. 🔗 batrachianai/toad The Toad is out of the bag release

      This is the first public release of Toad.

      Still plenty to do, but it is quite usable. Hope you enjoy it!

    18. 🔗 Confessions of a Code Addict How PyTorch Generates Random Numbers in Parallel on the GPU rss

      GPUs power modern deep learning models because these models rely on tensor operations, which can be efficiently parallelized on GPUs with their thousands of cores. However, apart from tensor computations, these models also rely on random numbers. For example, to initialize the model weights, during dropout, data sampling, stochastic gradient descent, etc.

      So, the question arises: how do frameworks like PyTorch generate random numbers in parallel on GPU devices? Because if random number generation becomes a bottleneck, it can significantly slow down the entire training or inference pipeline.

      The answer lies in a clever algorithm called Philox , a counter-based parallel random number generator. In this article, we'll explore:

      1. Why traditional random number generators don't parallelize well

      2. How Philox works and what makes it different

      3. How to parallelize random number generation using Philox

      4. PyTorch's implementation of Philox by dissecting its C++ and CUDA code

      By the end, you'll understand how that simple torch.randn() call efficiently generates millions of random numbers in parallel on your GPU while maintaining perfect reproducibility.


      Cut Code Review Time & Bugs in Half (Sponsored)

      Code reviews are critical but time-consuming. CodeRabbit acts as your AI co- pilot, providing instant Code review comments and potential impacts of every pull request.

      Beyond just flagging issues, CodeRabbit provides one-click fix suggestions and lets you define custom code quality rules using AST Grep patterns, catching subtle issues that traditional static analysis tools might miss.

      CodeRabbit has so far reviewed more than 10 million PRs, installed on 2 million repositories, and used by 100 thousand Open-source projects. CodeRabbit is free for all open-source repo's.

      Get Started Today


      Problem with Traditional PRNGs

      Let's start by developing an intuition about why traditional pseudo random number generators (PRNGs) are sequential and not suitable for parallel hardware, such as GPUs.

      A PRNG needs to be able to reproduce the same sequence of random numbers when initialized with a specific seed. A natural way of achieving this is through a state transformation function that takes the current state of the generator as input and produces a new state. As long as the function is deterministic, it is guaranteed that we can reproduce the exact same sequence of numbers starting from the same initial state. Mathematically, it can be expressed like this:

      Here, the next state is derived by applying the function f on the current state s_n. As you can see, this is a sequential model where you can't jump ahead arbitrarily without computing all the previous states, and you can't shard the generation of the random numbers by distributing the work across threads.

      To parallelize the generation of random numbers, we need a different model where we can directly generate the nth random number without having to go through the generation of all the previous n-1 numbers. Mathematically, it should look like this:

      Where x_n is the nth random number we wish to generate by applying a function b. Here, we can think of the input n as an integer counter and as such the PRNGs that follow this model are called counter-based random number generators. One such counter-based PRNG is the Philox PRNG, used widely in frameworks such as PyTorch for parallel random number generation on GPUs.

      Let's understand how Philox works.

      Writing these deep dives takes 100+ hours of work. If you find this valuable and insightful, please consider upgrading to a paid subscription to keep this work alive.


      How Philox Works

      The Philox algorithm, short for "Product HI, LOw, with XOR", is a counter- based PRNG that was designed specifically for parallel computation. It was introduced by Salmon et al. in 2011 as part of the Random123 library. The key insight behind Philox is that we can use a cryptographic-like construction to transform a counter into a pseudorandom number.

      The Core Idea: Treating RNG as Encryption

      We can think of the counter-based RNG problem this way: we want to take a sequence of integers (0, 1, 2, 3, …) and scramble them so thoroughly that they appear random. This is conceptually similar to what a block cipher does in cryptography, it takes a plaintext message and a key, then produces a ciphertext that looks random.

      In Philox's case:

      • The counter (n) acts like the plaintext

      • The seed acts like the encryption key

      • The output is our pseudorandom number

      Philox takes a counter and a key (derived from the seed) as its input and
produces a random number as its
outputPhilox takes a counter and a key (derived from the seed) as its input and produces a random number as its output

      The beauty of this approach is that any thread can independently compute its random number by knowing just two things: which counter value it needs (its position in the sequence) and the seed. No synchronization or communication with other threads is needed.

      The Philox Construction

      Philox operates on fixed-size inputs and outputs. The most common variant is Philox-4x32 , which means:

      • 4 : Works with 4 32-bit integers at a time

      • 32 : Each integer is 32 bits wide

      So Philox-4x32 takes a 128-bit counter (represented as four 32-bit integers) and produces a 128-bit output (four 32-bit random numbers). This is perfect for generating multiple random numbers at once, which is common in GPU workloads.

      The algorithm consists of applying multiple rounds of a transformation function. Each round performs these operations:

      1. Multiplication and splitting : Multiply pairs of the input integers and split the results into high and low parts

      2. XOR with keys : XOR certain parts with key-derived values

      3. Permutation : Shuffle the positions of the integers

      Let's break down a single round in detail. Philox-4x32 works with four 32-bit values, which we'll call (c 0​,c 1​,c 2​,c 3​). Each round transforms these values through the following steps:

      Step 1: Multiply and Split

      Take the first pair (c 0​,c 1​) and the second pair (c 2​,c 3​). Multiply each by a carefully chosen constant:

      For Philox-4x32, these constants are:

      • M 0​=0xD2511F53

      • M 1​=0xCD9E8D57

      These constants were chosen through careful analysis to ensure good statistical properties. When we multiply two 32-bit numbers, we get a 64-bit result. We split this into:

      • High 32 bits : hi(prod)

      • Low 32 bits : lo(prod)

      The multiplication of two 32-bit values c0 and M0 produces a 64-bit result
which is split into hi and lo
partsThe multiplication of two 32-bit values c0 and M0 produces a 64-bit result which is split into hi and lo parts

      Step 2: XOR with Keys

      The high parts are XORed with round-specific keys derived from the seed, and with the other input values:

      Here, k 0​ and k 1​ are the key values (derived from the seed), and ⊕ represents the XOR operation.

      Step 3: Permutation

      Finally, we rearrange the values for the next round. The output of one round becomes:

      Notice how the values are shuffled: the low parts of the products go to positions 0 and 2, while the XORed high parts are swapped and go to positions 1 and 3.

      Multiple Rounds

      To achieve good randomness, Philox-4x32 typically applies 10 rounds of this transformation. After each round except the last, the keys are also updated:

      Where w 0​=0x9E3779B9 and w 1​=0xBB67AE85 are the "Weyl sequence" constants derived from the golden ratio. This ensures that each round uses different key material, increasing the mixing of the input bits.

      Visualizing a Complete Philox Transformation

      The following diagram shows the complete flow through multiple rounds:

      The complete Philox transformation across multiple rounds producing four
32-bit random
integersThe complete Philox transformation across multiple rounds producing four 32-bit random integers

      Why This Works

      The Philox algorithm achieves good randomness through several mechanisms:

      1. Multiplication is a non-linear operation that mixes bits effectively. Small changes in input lead to large changes in output.

      2. High-low splitting ensures we use all 64 bits of the multiplication result, not just the lower 32 bits.

      3. XOR operations combine different data streams (keys, previous values) in a way that's invertible but unpredictable without knowing the key.

      4. Permutation ensures that the mixing effect propagates to all output positions across rounds.

      5. Multiple rounds compound these effects, ensuring that every output bit depends on every input bit in a complex way.

      The algorithm has been extensively tested and passes standard statistical tests for randomness like the TestU01 suite, making it suitable for scientific computing and machine learning applications.

      Properties of Philox

      Before we dive into PyTorch's implementation, let's summarize the key properties that make Philox attractive:

      • Parallel-friendly : A GPU with thousands of cores can generate thousands of random numbers simultaneously, each using a different counter value.

      • Deterministic : Given the same seed and counter, you always get the same output.

      • Long period : With a 128-bit counter, you can generate 2^128 random numbers before the sequence repeats numbers, more than enough for any practical application.

      • Fast : The operations (multiplication, XOR, bit shifting) are primitive operations that run very efficiently on modern CPUs and GPUs.

      • Memory efficient : The generator state is just the counter and key, requiring minimal storage per thread.

      Next, let's understand how Philox can be parallelized.


      Parallelizing Philox: Subsequences and Offsets

      Now that we understand how the Philox algorithm works, let's explore what makes it particularly powerful for parallel computing: the ability to generate random numbers across thousands of threads simultaneously without any coordination.

      The Random Number Space

      Recall that Philox is a counter-based PRNG. At its core, it's a function that maps a 128-bit counter to a 128-bit random output:

      Given a fixed key (derived from the seed), each unique counter value produces a unique set of random numbers. Since we have a 128-bit counter, we have:

      Each counter value produces 4 random 32-bit numbers (since 128 bits = 4 × 32 bits), giving us an enormous space of random numbers. We can visualize this as a huge one-dimensional array:

      Counter: 0 1 2 3 ... 2^128-1
      
      ↓ ↓ ↓ ↓ ↓
      
      Output: [r₀,r₁,r₂,r₃][r₄,r₅,r₆,r₇][r₈,r₉,r₁₀,r₁₁][r₁₂,...]...[...]
      

      How do we partition this massive space across parallel threads? One approach is to split the counter space between the threads.

      Partitioning the Counter Space

      The key insight is that we can split the 128-bit counter into two parts and use them to create a 2D address space. Think of the counter as having 4 components of 32 bits each: (c 0​,c 1​,c 2​,c 3​).

      We can partition this as:

      • Upper 64 bits : Which thread's region we're in

      • Lower 64 bits : The position within a thread's assigned region

      This partitioning scheme gives each thread its own "slice" of the random number space:

      • Thread 0 gets counters: (∗,∗,0,0) where ∗∗ can be any value

      • counter = (0,0,0,0) -> first 4 random numbers for thread 0

      • counter = (1,0,0,0) -> next 4 random numbers for thread 0

      • counter = (2,0,0,0) -> next 4 random numbers for thread 0

      • Thread 1 gets counters: (∗,∗,1,0)

      • counter = (0,0,1,0) -> first 4 random numbers for thread 1

      • counter = (1,0,1,0) -> next 4 random numbers for thread 1

      • counter = (2,0,1,0) -> next 4 random numbers for thread 1

      • Thread 2 gets counters: (∗,∗,2,0)

      • counter = (0,0,2,0) -> first 4 random numbers for thread 2

      • And so on…

      Terminology: Subsequence and Offset

      We now give names to these two parts:

      Subsequence : The upper 64 bits of the counter. This identifies which parallel thread or stream we're referring to. We can have up to 2^64 different subsequences running in parallel.

      Offset : The lower 64 bits of the counter. This identifies the position within a subsequence. Each subsequence can generate up to 2^64 sets of random numbers.

      Together, they form a coordinate system (s ,o) where:

      • s is the subsequence (which parallel stream)

      • o is the offset (position in that stream)

      The total capacity is:

      This matches exactly the size of our original counter space, we've simply reorganized it into a 2D structure that's easy to partition across threads.

      How Offsets Increment

      When a thread generates more random numbers, it increments the offset portion of the counter. Since Philox generates 4 random numbers at once, we typically increment by 1 each time (remembering that each offset value produces 4 numbers):

      Thread 0 subsequence = 0:
      
      offset=0: counter=[0,0,0,0] -> Philox -> [rand₀, rand₁, rand₂, rand₃]
      
      offset=1: counter=[1,0,0,0] -> Philox -> [rand₄, rand₅, rand₆, rand₇]
      
      offset=2: counter=[2,0,0,0] -> Philox -> [rand₈, rand₉, rand₁₀, rand₁₁]
      
      ...
      

      The offset is really tracking "which batch of 4" we're on. If we need the 10th random number (index 9, counting from 0):

      • Offset = ⌊9/4⌋=2

      • Position within batch = 19mod4=1

      • So we use counter [2,0,0,0] and take the second output (index 1)

      The Power of Skip-Ahead

      One powerful consequence of this design is skip-ahead : a thread can jump directly to any offset without computing intermediate values.

      Thread 0:
      
      - Jump to offset 1,000,000: counter = [1000000, 0, 0, 0]
      
      - Generate random numbers at this position
      
      - Jump to offset 5,000,000: counter = [5000000, 0, 0, 0]
      
      - No need to compute offsets 1 through 4,999,999!
      

      This is impossible with traditional sequential PRNGs where state n+1 n +1 depends on state n n.

      Setting Up for PyTorch

      Now that we understand how the counter space is partitioned, we can see how PyTorch uses this:

      When PyTorch generates random numbers on a GPU:

      1. It launches many threads (e.g., 1024 threads)

      2. Each thread is assigned a unique subsequence number (typically its thread ID)

      3. Each thread starts at offset 0 within its subsequence

      4. As each thread generates random numbers, it increments its offset

      5. PyTorch tracks the global offset to ensure future operations don't reuse the same counters

      With this foundation, let's now explore how PyTorch implements these concepts in its Philox engine.


      Philox Implementation in PyTorch

      PyTorch uses Philox-4x32-10 (4 values of 32 bits, 10 rounds) as its primary PRNG for CUDA operations. The implementation lives in aten/src/ATen/core/PhiloxRNGEngine.h and is designed to work on both CPU and GPU (via CUDA). Let's dissect this implementation to understand how the theoretical concepts we discussed earlier translate into actual code.

      Core Data Structures

      The implementation starts by defining some type aliases for clarity:

      typedef std::array<uint32_t, 4> UINT4; // Four 32-bit integers
      
      typedef std::array<uint32_t, 2> UINT2; // Two 32-bit integers
      
      typedef std::array<double, 2> DOUBLE2; // Two doubles
      
      typedef std::array<float, 2> FLOAT2; // Two floats
      

      These typedefs make the code more readable. UINT4 represents the 128-bit counter or output (4 × 32 bits = 128 bits), while UINT2 represents the 64-bit key (2 × 32 bits = 64 bits).

      The PhiloxEngine Class Structure

      The philox_engine class maintains four critical pieces of state:

      private:
      
      detail::UINT4 counter_; // 128-bit counter (c₀, c₁, c₂, c₃)
      detail::UINT4 output_; // Cached output from last round
      detail::UINT2 key_; // 64-bit key derived from seed (k₀, k₁)
      uint32_t STATE; // Position in current output (0-3)
      

      Let's understand each field:

      counter_: This is the 128-bit counter that gets incremented and transformed through the Philox rounds. It's divided into four 32-bit components:

      • counter_[0] and counter_[1]: Lower 64 bits represent the offset (which random number in the subsequence)

      • counter_[2] and counter_[3]: Upper 64 bits represent the subsequence (which parallel stream)

      key_: The 64-bit key derived from the seed. This remains constant for a given seed and is used in the XOR operations during each round.

      output_: Philox generates 4 random 32-bit numbers at once. This field caches those numbers so we don't have to recompute them for every call.

      STATE: A simple counter (0-3) that tracks which of the four cached output values to return next. This is an optimization to avoid regenerating when we have unused random numbers.

      Initialization and State Management

      The constructor initializes the engine with a seed, subsequence, and offset:

      The philox_engine constructor
definitionThe philox_engine constructor definition

      The C10_HOST_DEVICE macro is crucial here, it tells the compiler that this function can run on both the CPU (host) and GPU (device). This allows the same code to be used in both contexts.

      Let's look at how reset_state sets up the initial state:

      The reset_state function that resets the state of the
philox_engineThe reset_state function that resets the state of the philox_engine

      This initialization strategy is clever:

      1. The seed is split into the two key components key_[0] and key_[1]

      2. The subsequence goes into the upper half of the counter (counter_[2] and counter_[3])

      3. The offset (lower half of counter) starts at zero but can be set later via incr_n(offset)

      This design allows for massive parallelism. Imagine running 1024 CUDA threads simultaneously:

      Thread 0: subsequence=0, offset=0 -> counter = [0, 0, 0, 0]
      
      Thread 1: subsequence=1, offset=0 -> counter = [0, 0, 1, 0]
      
      Thread 2: subsequence=2, offset=0 -> counter = [0, 0, 2, 0]
      
      ...
      
      Thread 1023: subsequence=1023, offset=0 -> counter = [0, 0, 1023, 0]
      

      Each thread has a unique counter value from the start, so they all generate independent random sequences without any coordination.

      The Core Algorithm: Single Round

      Now let's examine the heart of the Philox algorithm--the single_round function:

      The single_round function that implements one round of
PhiloxThe single_round function that implements one round of Philox

      Let's break this down step by step, mapping it to our earlier theoretical description:

      Step 1: Multiply and Split

      uint32_t lo0 = mulhilo32(kPhiloxSA, ctr[0], &hi0);
      uint32_t lo1 = mulhilo32(kPhiloxSB, ctr[2], &hi1);
      

      Here we multiply:

      • ctr[0] by kPhiloxSA (the constant 0xD2511F53)

      • ctr[2] by kPhiloxSB (the constant 0xCD9E8D57)

      The mulhilo32 function performs the multiplication and splits the 64-bit result:

      • Returns the low 32 bits (lo0 or lo1)

      • Stores the high 32 bits in the passed pointer (hi0 or hi1)

      Let's look at mulhilo32 itself:

      The definition of the mulhilo32
functionThe definition of the mulhilo32 function

      This function has two implementations:

      On CUDA (GPU) : Uses the intrinsic __umulhi which directly computes the high 32 bits of a multiplication. This is extremely fast on GPU hardware.

      On CPU : Promotes both operands to 64 bits, multiplies them, then extracts high and low parts manually via shifting and casting.

      Here's what happens mathematically:

      Step 2: XOR and Permute

      ret[0] = hi1 ^ ctr[1] ^ in_key[0];
      ret[1] = lo1;
      ret[2] = hi0 ^ ctr[3] ^ in_key[1];
      ret[3] = lo0;
      

      Notice the pattern:

      • ret[0]: Takes hi1 (high bits from second multiplication), XORs with ctr[1] and in_key[0]

      • ret[1]: Simply uses lo1 (low bits from second multiplication)

      • ret[2]: Takes hi0 (high bits from first multiplication), XORs with ctr[3] and in_key[1]

      • ret[3]: Simply uses lo0 (low bits from first multiplication)

      Let us visualize this transformation:

      Visualization of the operations performed during a single round of
PhiloxVisualization of the operations performed during a single round of Philox

      This permutation ensures that bits from different positions get mixed together in subsequent rounds.

      Constants: The Magic Numbers

      You might wonder where these constants come from:

      static const uint32_t kPhilox10A = 0x9E3779B9; // Weyl sequence
      static const uint32_t kPhilox10B = 0xBB67AE85; // Weyl sequence
      static const uint32_t kPhiloxSA = 0xD2511F53; // Multiplier
      static const uint32_t kPhiloxSB = 0xCD9E8D57; // Multiplier
      

      Weyl sequence constants (kPhilox10A and kPhilox10B): These are derived from the golden ratio. The constants are:

      The golden ratio has special properties that make it useful for distributing values uniformly. These constants are added to the key after each round to ensure different key material is used.

      Multiplier constants (kPhiloxSA and kPhiloxSB): These were carefully chosen through empirical testing to maximize statistical quality. They need to have good bit-mixing properties when multiplied with typical counter values.

      Running Multiple Rounds

      The rand function orchestrates running all rounds:

      Definition of the rand function that applies multiple rounds of Philox to
produce random
numbersDefinition of the rand function that applies multiple rounds of Philox to produce random numbers

      This is straightforward:

      1. Run n_rounds - 1 iterations where we:

        1. Apply single_round to transform the counter

        2. Update the key by adding the Weyl constants

      2. Apply one final round without updating the key

      By default, PyTorch uses 10 rounds (n_rounds = 10), which provides a good balance between performance and statistical quality.

      Generating Random Numbers: The Operator

      The operator () is what users call to get random numbers:

      Definition of the operator() that is called by users to generate random
numbersDefinition of the operator() that is called by users to generate random numbers

      This function is clever in its efficiency:

      Check if we need new random numbers : if(STATE == 0) checks if we've exhausted the previous batch. Remember, STATE cycles through 0, 1, 2, 3.

      Generate a batch : When needed, it:

      • Runs the full Philox algorithm via rand(counter, key, n_rounds)

      • Stores the result in output_ (four 32-bit random numbers)

      • Increments the counter for next time via incr()

      Return next value : Grab the current position from output_, then advance STATE.

      The line STATE = (STATE + 1) & 3 is a bit trick equivalent to STATE = (STATE + 1) % 4, using bitwise AND since 3 is binary 11.

      This batching strategy is a significant performance optimization. Instead of running Philox for every random number, we run it once per four random numbers.

      Counter Increment Logic

      The counter increment operations deserve special attention because they handle the 128-bit arithmetic correctly. Let's start with the simple case:

      Definition of the incr function that increments the
counterDefinition of the incr function that increments the counter

      This increments the 128-bit counter by 1. The logic is:

      1. Increment counter_[0] (least significant 32 bits)

      2. If it's non-zero after increment, we're done (no overflow)

      3. If it overflowed to zero, carry to counter_[1]

      4. Continue propagating carries until we find a non-zero result

      The more complex function is incr_n, which increments by an arbitrary 64-bit value:

      Definition of incr_n function that increments the counter by an arbitrary
64-bit
valueDefinition of incr_n function that increments the counter by an arbitrary 64-bit value

      This function is more intricate because it needs to:

      1. Split the 64-bit increment n into nlo and nhi

      2. Add nlo to counter_[0]

      3. Detect overflow by checking if counter_[0] < nlo (if the result is less than what we added, overflow occurred)

      4. If overflow, increment nhi to carry over

      5. Add nhi to counter_[1] and check for overflow again

      6. If still overflowing, propagate to the upper 64 bits

      The overflow detection counter_[0] < nlo is a standard technique in multi- precision arithmetic. After adding, if the result is less than one of the operands, an overflow must have occurred since we're working with unsigned integers.

      Converting to Floating Point

      For machine learning applications, we often need floating-point random numbers in the range [0, 1), while Philox gives us integers. So, PyTorch applies a conversion function:

      Definition of the uint32_to_uniform_float function that converts a 32-bit
integer to a float value in the range
[0,1)Definition of the uint32_to_uniform_float function that converts a 32-bit integer to a float value in the range [0,1)

      This function is carefully designed:

      Mask off sign bit : value & 0x7FFFFFFF clears the highest bit, giving us values from 0 to 2^31−1

      Scale down : Multiplying by scale = 4.6566127342e-10 maps these integers to floats in [0, 1).

      The scale factor is calculated as:

      Why use only 31 bits instead of all 32? Because:

      1. We want only positive values (for [0, 1) range)

      2. The highest representable float less than 1.0 needs careful handling

      3. Using 31 bits avoids potential rounding issues near 1.0

      Normal Distribution Generation

      The randn function generates normally distributed random numbers using the Box-Muller transform:

      Definition of the randn function that generates random numbers from a
normal
distributionDefinition of the randn function that generates random numbers from a normal distribution

      The Box-Muller transform converts two uniform random variables U 1​,U 2​∼Uniform(0,1) into a normal random variable Z ∼N(0,1):

      Memory Layout and Efficiency

      One of the beauties of this implementation is how compact the state is. Each philox_engine instance requires:

      counter_: 4 × 4 bytes = 16 bytes
      
      output_: 4 × 4 bytes = 16 bytes
      
      key_: 2 × 4 bytes = 8 bytes
      
      STATE: 4 bytes = 4 bytes
      
      Total = 44 bytes
      

      This is tiny! On a GPU, you could have millions of these generators running in parallel, each consuming only 44 bytes. In comparision, traditional RNGs can take kilobytes of state per instance.


      Summary

      In this article, we explored Philox, a counter-based PRNG designed for parallel computing environments. We learned:

      1. Why traditional PRNGs don 't parallelize well: Sequential state dependencies create bottlenecks on parallel hardware like GPUs.

      2. How Philox works : By treating random number generation as a function f(counter, key), Philox allows direct computation of any random number without computing predecessors.

      3. The algorithm 's core operations: Multiplication with carefully chosen constants, high-low splitting, XOR with key material, and permutation, repeated for 10 rounds to ensure statistical quality.

      4. Parallelization through counter partitioning : The 128-bit counter space is split into subsequences (upper 64 bits) and offsets (lower 64 bits), allowing up to 2^64 parallel threads each generating 2^64 random numbers.

      5. PyTorch 's implementation: A compact 44-byte state per engine instance, efficient batching of 4 numbers at a time, and careful handling of counter arithmetic for both CPU and GPU execution.


      Articles like this take time and research to get right. If you 'd like to support more deep dives into CPU internals and performance engineering, you can upgrade to a paid subscription and help keep this work sustainable.

      Share

    19. 🔗 HexRaysSA/plugin-repository commits sync repo: +1 release, -1 release rss
      sync repo: +1 release, -1 release
      
      ## New releases
      - [SuperHint](https://github.com/p05wn/SuperHint): 1.2.2
      
      ## Changes
      - [SuperHint](https://github.com/p05wn/SuperHint):
        - removed version(s): 1.2.0
      
    20. 🔗 HexRaysSA/plugin-repository commits sync repo: +1 release, ~1 changed rss
      sync repo: +1 release, ~1 changed
      
      ## New releases
      - [SuperHint](https://github.com/p05wn/SuperHint): 1.2.0
      
      ## Changes
      - [SuperHint](https://github.com/p05wn/SuperHint):
        - 1.0.0: archive contents changed, download URL changed
      
    21. 🔗 Console.dev newsletter Runme rss

      Description: Devops notebooks.

      What we like: Like a lightweight Jupyter notebook for devops. Run inside your IDE e.g. VS Code, on Codespaces, via a CLI, or by launching a web UI. Integrates with cloud environments to launch resources from the notebook. Everything is based on Markdown so can be committed to source control.

      What we dislike: No way to determine how errors are detected or handled - commands aren’t idempotent.

    22. 🔗 Console.dev newsletter Tangled rss

      Description: Decentralized Git hosting.

      What we like: Host on your own infra using headless servers for both the repo and CI/CD. Issue tracking with threaded comments and pull requests which support Jujutsu change IDs and stacking. Uses the AT Protocol so you can own all your data.

      What we dislike: No search for issues or pull requests. Requires using an existing Bluesky or other federated server account (they run their own you can sign up with).

    23. 🔗 Will McGugan Toad is a unified experience for AI in the terminal rss

      My startup for terminals wrapped up mid-2025 when the funding ran dry. So I don’t have money, but what I do have are a very particular set of skills. Skills I have acquired over a very long career convincing terminals they are actually GUIs.

      Skills which I have used to create a terminal app that offers a more pleasant experience for agentic coding. Toad (a play on Textual Code) is a front-end for AI tools such as OpenHands, Claude Code, Gemini CLI, and many more. All of which run seamlessly under a single terminal UI, thanks to the ACP protocol.

      At the time of writing, Toad supports 12 agent CLIs, and I expect many more to come online soon.

      Here’s a screenshot:

      Toad UI

      So what does Toad offer over the CLI apps from big tech?

      It has most of the UI interactions users have come to expect from agentic coding, but hopefully more refined. For instance the “@” character to bring in files into the context. Here’s Toad’s implementation:

      Toad fuzzy files

      A snappy fuzzy search which filters patterns from the project’s .gitignore (if there is one).

      The prompt editor offers an experience which you might be surprised to find in a terminal. You can navigate and select with the keyboard and mouse, select, cut, copy, paste, etc. The prompt will highlight Markdown as you type (even syntax highlighting code fences before you close them).

      Toad prompt

      Toad has really nice Markdown streaming, based on the techniques I described here. It remains fast with large documents, and renders everything from tables to syntax highlighted code fences.

      Toad Markdown streaming

      Many other tools either don’t bother to render the Markdown, or they do a fairly half-hearted job.

      Another goal I had for Toad was to integrate a shell. I wanted the conversation with AI to feel like a natural extension of a traditional terminal based workflow.

      Most tools stop at displaying monochrome output from commands. Some will break if you run something interactive, like a TUI. Toad doesn’t have this limitation, and will let you run all your CLI apps with full color, interactivity, and mouse support.

      At the time of writing the only terminal based agentic coding tool I know of that runs dynamic commands inline is Gemini.

      Toad running htop

      Toad adopts the convention of using a ! character to introduce a shell command. There is also a list of commands in settings which will automatically trigger shell mode. In practice, this means that you rarely need to explicitly introduce shell commands—just type what’s on your mind.

      Toad borrows tab completion from the shell. You’ll appreciate this if you have worked in the terminal long enough to commit this interaction to muscle memory. Hit tab to complete the command or path. If there is more than one possibility you can hit tab again to cycle through them, and enter to accept.

      Toad tab complete

      In addition to the shell, Toad implements a few concepts from Jupyter notebooks. You can cursor through previous conversation, moving a logical block at a time, and interactive with it again. At the moment that feature is used as a convenience to copy content to the clipboard or prompt, and a few other niceties like exporting a SVG.

      Cursor block

      Toad will lean more heavily in to this kind of interaction in the future.

      Friends of Toad

      I was very fortunate to collaborate with OpenHands, who are doing some amazing work in this space. Check out their blog post on Toad!

      I also collaborated with Hugging Face on this release. Check out their blog post on their inference explorers!

      Try Toad

      When this post is live you will be able to install Toad yourself.

      The work is ongoing: a few missing features and interface improvements to be done, but Toad is solid enough to use as your daily driver for AI. I used it to create batrachian.ai, where you will find install instructions.

      For more details, see the Toad repository.

      I need a break (sabbaticals are tiring), but I’ll be picking things up in 2026. I’m hoping that by the time my year off ends, Toad could become my full- time gig. If you want to help make that happen, consider sponsoring my work.

    24. 🔗 Ampcode News Agentic Review rss

      Amp has a new agent and it specializes in code review.

      In the VS Code extension, you can use this agent by going to the review panel. Start by dragging the selection of changes you want to review. Sometimes, you want to review a single commit; other times you want to review all outstanding changes on your branch:

      Amp will pre-scan the diff and recommend an order in which to review the files. It will also provide a summary of the changes in each file and the changeset overall:

      This addresses a key difficulty in reviewing large changesets, as it's often difficult to know where to start.

      Clicking on a file will open the full file diff, which is editable and has code navigation if your diff includes the current working changes:

      Review Agent

      The review agent lives in a separate panel below. It analyzes the changes and posts a list of actionable improvements, which can then be fed back into the main Amp agent to close the feedback loop:

      There's a big improvement in review quality over the first version of the review panel, which used a single-shot LLM request. The new review agent uses Gemini 3 Pro and a review-oriented toolset to perform a much deeper analysis that surfaces more bugs and actionable feedback while filtering out noise.

      To get to the review panel, click the button in the navbar. We've also added a  ; keybinding to make it easy to toggle in and out of review mode.

      An Opinionated Read-Write Loop

      If you're wondering how best to incorporate this into your day-to-day workflow, here's our opinionated loop for agentic coding in the editor:

      1
      Write code with the agent
      2
      Open the review panel ⌘ ;
      3
      Drag your target diff range
      4
      Request agentic review and
      read summaries + diffs while waiting
      5
      Feed comments back into the agent

      Open Questions

      We think the review agent and UI help substantially with the bottleneck of reviewing code written by agent. But we're still pondering some more open questions:

      • How do reviews map to threads? It's not 1-1, since you can review the output of multiple threads at once.
      • How do we incorporate review feedback into long-term memory? When you accept or reject review comments, should Amp learn from that? Should it incorporate feedback into AGENTS.md?
      • What does the TUI version of this review interface look like? Should there exist an editable review interface in the TUI? Or should we integrate with existing terminal-based editors and diff viewers?
    25. 🔗 Ampcode News A Codebase by an Agent for an Agent rss
      Human stepping back while orb takes over the drafting table

      When Tim started to use Amp to build the TUI framework for our CLI, he often interrupted the agent. "No, this function should be called that," he'd say, or "this type should go into this file."

      But then he realized: wait, what I'm telling the agent to do - it's not statistically likely. It's not what an agent would do.

      So he stopped and instead let the agent build the framework and the codebase for itself.

      And now the agent rips through it.

  3. December 17, 2025
    1. 🔗 IDA Plugin Updates IDA Plugin Updates on 2025-12-17 rss

      IDA Plugin Updates on 2025-12-17

      New Releases:

      Activity:

    2. 🔗 Simon Willison Gemini 3 Flash rss

      It continues to be a busy December, if not quite as busy as last year. Today's big news is Gemini 3 Flash, the latest in Google's "Flash" line of faster and less expensive models.

      Google are emphasizing the comparison between the new Flash and their previous generation's top model Gemini 2.5 Pro:

      Building on 3 Pro’s strong multimodal, coding and agentic features, 3 Flash offers powerful performance at less than a quarter the cost of 3 Pro, along with higher rate limits. The new 3 Flash model surpasses 2.5 Pro across many benchmarks while delivering faster speeds.

      Gemini 3 Flash's characteristics are almost identical to Gemini 3 Pro: it accepts text, image, video, audio, and PDF, outputs only text, handles 1,048,576 maximum input tokens and up to 65,536 output tokens, and has the same knowledge cut-off date of January 2025 (also shared with the Gemini 2.5 series).

      The benchmarks look good. The cost is appealing: 1/4 the price of Gemini 3 Pro ≤200k and 1/8 the price of Gemini 3 Pro >200k, and it's nice not to have a price increase for the new Flash at larger token lengths.

      It's a little more expensive than previous Flash models - Gemini 2.5 Flash was $0.30/million input tokens and $2.50/million on output, Gemini 3 Flash is $0.50/million and $3/million respectively.

      Google claim it may still end up cheaper though, due to more efficient output token usage:

      > Gemini 3 Flash is able to modulate how much it thinks. It may think longer for more complex use cases, but it also uses 30% fewer tokens on average than 2.5 Pro.

      Here's a more extensive price comparison on my llm-prices.com site.

      Generating some SVGs of pelicans

      I released llm-gemini 0.28 this morning with support for the new model. You can try it out like this:

      llm install -U llm-gemini
      llm keys set gemini # paste in key
      llm -m gemini-3-flash-preview "Generate an SVG of a pelican riding a bicycle"
      

      According to the developer docs the new model supports four different thinking level options: minimal, low, medium, and high. This is different from Gemini 3 Pro, which only supported low and high.

      You can run those like this:

      llm -m gemini-3-flash-preview --thinking-level minimal "Generate an SVG of a pelican riding a bicycle"
      

      Here are four pelicans, for thinking levels minimal, low, medium, and high:

      <img alt="A minimalist vector illustration of a stylized white bird with a long orange beak and a red cap riding a dark blue bicycle on a single grey ground line against a plain white background." src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-minimal-pelican-svg.jpg" />
      <img alt="Minimalist illustration: A stylized white bird with a large, wedge-shaped orange beak and a single black dot for an eye rides a red bicycle with black wheels and a yellow pedal against a solid light blue background." src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-low-pelican-svg.jpg" />
      <img alt="A minimalist illustration of a stylized white bird with a large yellow beak riding a red road bicycle in a racing position on a light blue background." src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-medium-pelican-svg.jpg" />
      <img alt="Minimalist line-art illustration of a stylized white bird with a large orange beak riding a simple black bicycle with one orange pedal, centered against a light blue circular background." src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-high-pelican-svg.jpg" />
      

      The gallery above uses a new Web Component which I built using Gemini 3 Flash to try out its coding abilities. The code on the page looks like this:

      <image-gallery width="4">
          <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-minimal-pelican-svg.jpg" alt="A minimalist vector illustration of a stylized white bird with a long orange beak and a red cap riding a dark blue bicycle on a single grey ground line against a plain white background." />
          <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-low-pelican-svg.jpg" alt="Minimalist illustration: A stylized white bird with a large, wedge-shaped orange beak and a single black dot for an eye rides a red bicycle with black wheels and a yellow pedal against a solid light blue background." />
          <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-medium-pelican-svg.jpg" alt="A minimalist illustration of a stylized white bird with a large yellow beak riding a red road bicycle in a racing position on a light blue background." />
          <img src="https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-high-pelican-svg.jpg" alt="Minimalist line-art illustration of a stylized white bird with a large orange beak riding a simple black bicycle with one orange pedal, centered against a light blue circular background." />
      </image-gallery>

      Those alt attributes are all generated by Gemini 3 Flash as well, using this recipe:

      llm -m gemini-3-flash-preview --system '
      You write alt text for any image pasted in by the user. Alt text is always presented in a
      fenced code block to make it easy to copy and paste out. It is always presented on a single
      line so it can be used easily in Markdown images. All text on the image (for screenshots etc)
      must be exactly included. A short note describing the nature of the image itself should go first.' \
      -a https://static.simonwillison.net/static/2025/gemini-3-flash-preview-thinking-level-high-pelican-svg.jpg

      You can see the code that powers the image gallery Web Component here on GitHub. I built it by prompting Gemini 3 Flash via LLM like this:

      llm -m gemini-3-flash-preview '
      Build a Web Component that implements a simple image gallery. Usage is like this:
      
      <image-gallery width="5">
        <img src="image1.jpg" alt="Image 1">
        <img src="image2.jpg" alt="Image 2" data-thumb="image2-thumb.jpg">
        <img src="image3.jpg" alt="Image 3">
      </image-gallery>
      
      If an image has a data-thumb= attribute that one is used instead, other images are scaled down. 
      
      The image gallery always takes up 100% of available width. The width="5" attribute means that five images will be shown next to each other in each row. The default is 3. There are gaps between the images. When an image is clicked it opens a modal dialog with the full size image.
      
      Return a complete HTML file with both the implementation of the Web Component several example uses of it. Use https://picsum.photos/300/200 URLs for those example images.'

      It took a few follow-up prompts using llm -c:

      llm -c 'Use a real modal such that keyboard shortcuts and accessibility features work without extra JS'
      
      llm -c 'Use X for the close icon and make it a bit more subtle'
      
      llm -c 'remove the hover effect entirely'
      
      llm -c 'I want no border on the close icon even when it is focused'

      Here's the full transcript, exported using llm logs -cue.

      Those five prompts took:

      • 225 input, 3,269 output
      • 2,243 input, 2,908 output
      • 4,319 input, 2,516 output
      • 6,376 input, 2,094 output
      • 8,151 input, 1,806 output

      Added together that's 21,314 input and 12,593 output for a grand total of 4.8436 cents.

      The guide to migrating from Gemini 2.5 reveals one disappointment:

      Image segmentation: Image segmentation capabilities (returning pixel-level masks for objects) are not supported in Gemini 3 Pro or Gemini 3 Flash. For workloads requiring native image segmentation, we recommend continuing to utilize Gemini 2.5 Flash with thinking turned off or Gemini Robotics-ER 1.5.

      I wrote about this capability in Gemini 2.5 back in April. I hope they come back in future models - they're a really neat capability that is unique to Gemini.

      You are only seeing the long-form articles from my blog. Subscribe to /atom/everything/ to get all of my posts, or take a look at my other subscription options.

    3. 🔗 r/wiesbaden Friseur Empfehlungen rss

      Hey Freunde,

      Ich hab meine Haare lange wachsen lassen, weshalb ich seit Jahren nicht mehr beim Friseur war, nun überlege ich, sie kurz bzw. mittellang schneiden zu lassen— könnt ihr einen guten Friseur empfehlen? In Mainz wäre auch okay; aber mehr als schlichtes Seiten kurz, oben bisschen gestylet möchte ich schon gerne, also jemanden der sich bisschen mit längeren Haaren auskennt

      Danke schonmal!

      submitted by /u/PsyDubLukere369
      [link] [comments]

    4. 🔗 r/LocalLLaMA Nvidia plans heavy cuts to GPU supply in early 2026 rss
    5. 🔗 r/LocalLLaMA Hey, LocalLLaMa. We need to talk... rss

      I look on the front page and I see people who have spent time and effort to make something, and they share it willingly. They are getting no upvotes.

      We are here because we are local and we are open source. Those things depend on people who give us things , and they don't ask for anything in return, but they need something in return or they will stop.

      Pop your head into the smaller posts where someone is showing work they have done. Give honest and constructive feedback. UPVOTE IT.

      The project may be terrible -- encourage them to grow by telling them how they can make it better.

      The project may be awesome. They would love to hear how awesome it is. But if you use it, then they would love 100 times more to hear how you use it and how it helps you.

      Engage with the people who share their things, and not just with the entertainment.

      It take so little effort but it makes so much difference.

      submitted by /u/Eisenstein
      [link] [comments]

    6. 🔗 r/reverseengineering Decompiling the Synergy: An Empirical Study of Human–LLM Teaming in Software Reverse Engineering rss
    7. 🔗 r/wiesbaden Kreuzung Luisenstraße Ecke Bahnhofstraße rss

      Was ist eigentlich in letzter Zeit da los? Wurde innerhalb einer Woche als Fußgänger zweimal fast angefahren. Das eine Mal ein Linksabbieger, der beschleunigt hat und gerade so ausweichen konnte. Eben ein Rechtsabbieger, der keinerlei Anstalten gemacht hat zu bremsen. Zum Glück habe ich es noch gesehen und bin selbst stehengeblieben.

      Hatte noch jemand solche Erlebnisse in letzter Zeit in der Stadt? Wohne schon ewig hier und hatte das nie.

      submitted by /u/sleepless_92
      [link] [comments]

    8. 🔗 r/LocalLLaMA Apple introduces SHARP, a model that generates a photorealistic 3D Gaussian representation from a single image in seconds. rss
    9. 🔗 r/reverseengineering Be Careful About Your Data on the Internet (Reverse Engineering a Dating App) rss
    10. 🔗 Evan Schwartz Short-Circuiting Correlated Subqueries in SQLite rss

      I recently added domain exclusion lists and paywalled content filtering to Scour. This blog post describes a small but useful SQL(ite) query optimization I came across between the first and final drafts of these features: using an uncorrelated scalar subquery to skip a correlated subquery (if you don't know what that means, I'll explain it below).

      Scour searches noisy sources for content related to users' interests. At the time of writing, it ingests between 1 and 3 million pieces of content from over 15,000 sources each month. For better and for worse, Scour does ranking on the fly, so the performance of the ranking database query directly translates to page load time.

      The Ranking SQL Query

      The main SQL query Scour uses for ranking applies a number of filters and streams the item embeddings through the application code for scoring.

      Scour uses brute force search rather than a vector database, which works well enough for now because of three factors:

      1. Scour uses SQLite, so the data is colocated with the application code.
      2. It uses binary-quantized vector embeddings with Hamming Distance comparisons, which only take ~5 nanoseconds each.
      3. We care most about recent posts so we can significantly narrow the search set by publish date.

      A simplified version of the query looks something like:

      SELECT *
      FROM items i
      WHERE i.lang IN (SELECT lang FROM user_languages WHERE user_id = ?1)
      AND i.published BETWEEN ?2 AND ?3
      AND ...(more filters)...
      

      The query plan shows that this makes good use of indexes:

      QUERY PLAN
         |--SEARCH i USING INDEX idx_items_lang_published (lang=? AND published>? AND published<?)
         `--LIST SUBQUERY 1
            `--SEARCH user_languages USING COVERING INDEX sqlite_autoindex_user_languages_1 (user_id=?)
      

      Domain Filters Using Correlated Subqueries

      To add user-specified domain blocklists, I created the user_excluded_domains table and added this filter clause to the main ranking query:

      AND NOT EXISTS (
          SELECT 1
          FROM user_excluded_domains ued
          WHERE user_id = ?1
          AND ued.domain = i.domain
      )
      

      The domain exclusion table uses (user_id, domain) as a primary key, so the lookup is efficient. However, this lookup is done for every row returned from the first part of the query. This is a correlated subquery :

      QUERY PLAN
         |--SEARCH i USING INDEX idx_items_lang_published (lang=? AND published>? AND published<?)
         |--LIST SUBQUERY 1
         |  `--SEARCH user_languages USING COVERING INDEX sqlite_autoindex_user_languages_1 (user_id=?)
         `--CORRELATED SCALAR SUBQUERY 2
            `--SEARCH ued USING COVERING INDEX sqlite_autoindex_user_excluded_domains_1 (user_id=? AND domain=?)
      

      Short-Circuiting Correlated Subqueries

      A problem with the way we just added this feature is that most users don't exclude any domains, but we've added a check that is run for every row anyway.

      To speed up the queries for users who aren't using the feature, we could first check the user's settings and then dynamically build the query. But we don't have to, because we can accomplish the same effect within one static query.

      We can change our domain exclusion filter to first check whether the user has any excluded domains:

      AND (
          NOT EXISTS (
              SELECT 1
              FROM user_excluded_domains
              WHERE user_id = ?1
          )
          OR NOT EXISTS (
                 SELECT 1
                 FROM user_excluded_domains ued
                 WHERE user_id = ?1
                 AND ued.domain = i.domain
          )
      )
      

      Since the OR short-circuits, if the first NOT EXISTS returns true (when the user has no excluded domains), SQLite never evaluates the correlated subquery at all.

      The first NOT EXISTS clause does not reference any column in items, so SQLite can evaluate it once and reuse the boolean result for all of the rows. This "uncorrelated scalar subquery" is extremely cheap to evaluate and, when it returns true, lets us short-circuit and skip the more expensive correlated subquery that checks each item's domain against the exclusion list.

      Here is the query plan for this updated query. Note how the second subquery says SCALAR SUBQUERY, whereas the third one is a CORRELATED SCALAR SUBQUERY. The latter is the per-row check, but it can be skipped by the second subquery.

      QUERY PLAN
         |--SEARCH i USING INDEX idx_items_lang_published (lang=? AND published>? AND published<?)
         |--LIST SUBQUERY 1
         |  `--SEARCH user_languages USING COVERING INDEX sqlite_autoindex_user_languages_1 (user_id=?)
         |--SCALAR SUBQUERY 2
         |  `--SEARCH user_excluded_domains USING COVERING INDEX sqlite_autoindex_user_excluded_domains_1 (user_id=?)
         `--CORRELATED SCALAR SUBQUERY 3
            `--SEARCH ued USING COVERING INDEX sqlite_autoindex_user_excluded_domains_1 (user_id=? AND domain=?)
      

      Benchmarking

      To test the performance of each of these queries, I replaced the SELECT * with SELECT COUNT(*) and used a simple bash script to invoke the sqlite3 binary 100 times for each query on my laptop. Starting up the sqlite3 process each time adds overhead, but we're comparing relative differences.

      At the time of this benchmark, the last week had 235,975 items, 144,229 of which were in English. The two example users I tested this for below only look for English content.

      User Without Excluded Domains

      This test represents most users, who have not configured any excluded domains:

      Approach | Min (ms) | Max (ms) | Avg (ms) | Stddev (ms) | Diff (ms) | Diff (%)
      ---|---|---|---|---|---|---
      Baseline (no filter) | 67 | 91 | 72.7 | 4.7 | — | —
      Correlated Subquery | 80 | 108 | 85.2 | 5.5 | +12.5 | +17.1%
      With Short-Circuit | 69 | 91 | 72.7 | 3.8 | +0 | +0%

      This shows that the short-circuit query adds practically no overhead for users without excluded domains, whereas the correlated subquery alone makes queries 17% slower for these users.

      User with Excluded Domains

      This test uses an example user that has excluded content from 2 domains:

      Approach | Min (ms) | Max (ms) | Avg (ms) | Stddev (ms) | Diff (ms) | Diff (%)
      ---|---|---|---|---|---|---
      Baseline (no filter) | 68 | 99 | 76.2 | 7.6 | — | —
      Correlated Subquery | 84 | 112 | 90.5 | 6.8 | +14.3 | +18.7%
      With Short-Circuit | 82 | 109 | 88.5 | 8.1 | +12.3 | +16.1%

      In this case, we do need to check each row against the domain filter. But this shows that the short-circuit still adds no overhead on top of the query.

      Conclusion

      When using SQL subqueries to filter down result sets, it's worth thinking about whether each subquery is really needed for most users or most queries. If the check is needed most of the time, this approach won't help. However if the per-row check isn't always needed, using an uncorrelated scalar subquery to short-circuit a condition can dramatically speed up the average case with practically zero overhead.

      This is extra important because the slow-down from each additional subquery compounds. In this blog post, I described and benchmarked a single additional filter. But this is only one of multiple subquery filters.

      Earlier, I also mentioned that users had asked for a way to filter out paywalled content. This works similarly to filtering out content from excluded domains. Some users opt-in to hiding paywalled content. For those users, we check if each item is paywalled. If so, we check if it comes from a site the user has specifically allowed paywalled content from (because they have a subscription). I used the same uncorrelated subquery approach to first check if the feature is enabled for the user and, only then, does SQLite need to check each row.

      Concretely, the paywalled content filter subquery looks like:

      AND (
          (
              SELECT COALESCE(hide_paywalled_content, 0) = 0
              FROM users
              WHERE user_id = ?1
          ) -- note these parentheses are needed so SQLite doesn't mistakenly think this query is correlated with `items`
          OR COALESCE(i.is_paywalled, 0) = 0
          OR i.domain IN (
              SELECT domain
              FROM user_paywall_allowed_domains
              WHERE user_id = ?1
          )
      )
      

      In short, a trivial uncorrelated scalar subquery can help us short-circuit and avoid a more expensive per-row check when we don't need it.

      Appendix: NOT EXISTS vs NOT IN vs LEFT JOIN

      There are multiple ways to exclude rows from an SQL query.

      Here are the results from the same benchmark I ran above, but with two other ways of checking for whether an item comes from an excluded domain.

      The NULL-safe NOT IN version of the query uses the subquery:

      ...
      AND (
          i.domain IS NULL
          OR i.domain NOT IN (
              SELECT domain
              FROM user_excluded_domains
              WHERE user_id = ?1
          )
      )
      

      The LEFT JOIN variation joins items with user_excluded_domains and then checks for NULL:

      SELECT *
      FROM items i
      LEFT JOIN user_excluded_domains ued on ued.user_id = ?1 AND ued.domain = i.domain
      WHERE i.lang IN (SELECT lang FROM user_languages WHERE user_id = ?1)
      AND i.published BETWEEN ?2 AND ?3
      AND ued.domain IS NULL
      

      And here are the full benchmarks:

      User Without Excluded Domains

      Approach | Min (ms) | Max (ms) | Avg (ms) | Stddev (ms) | Diff (ms) | Diff (%)
      ---|---|---|---|---|---|---
      Baseline (no filter) | 67 | 91 | 72.7 | 4.7 | — | —
      NOT EXISTS (no short-circuit) | 80 | 108 | 85.2 | 5.5 | +12.5 | +17.1%
      NOT EXISTS + short-circuit | 69 | 91 | 72.7 | 3.8 | +0 | +0%
      NULL-safe NOT IN (no short-circuit) | 75 | 111 | 79.5 | 7.1 | +6.8 | +9.3%
      NULL-safe NOT IN + short-circuit | 69 | 103 | 74.8 | 6.6 | +2.1 | +2.8%
      LEFT JOIN (no short-circuit) | 74 | 100 | 79.1 | 5.1 | +6.4 | +8.8%
      LEFT JOIN + short-circuit | 76 | 103 | 84.4 | 7.4 | +11.7 | +16.0%

      For users without excluded domains, we can see that the NOT EXISTS query using the short-circuit wins and adds no overhead.

      User With Excluded Domains

      Approach | Min (ms) | Max (ms) | Avg (ms) | Stddev (ms) | Diff (ms) | Diff (%)
      ---|---|---|---|---|---|---
      Baseline (no filter) | 68 | 99 | 76.2 | 7.6 | — | —
      NOT EXISTS (no short-circuit) | 84 | 112 | 90.5 | 6.8 | +14.3 | +18.7%
      NOT EXISTS + short-circuit | 82 | 109 | 88.5 | 8.1 | +12.3 | +16.1%
      NULL-safe NOT IN (no short-circuit) | 83 | 112 | 89.7 | 8.4 | +13.5 | +17.7%
      NULL-safe NOT IN + short-circuit | 84 | 112 | 91.3 | 8.2 | +15.1 | +19.8%
      LEFT JOIN (no short-circuit) | 81 | 107 | 86.3 | 6.7 | +10.1 | +13.2%
      LEFT JOIN + short-circuit | 82 | 126 | 89.8 | 7.7 | +13.6 | +17.8%

      For users who do have excluded domains, the LEFT JOIN is faster than the NOT EXISTS version. However, this version raises the exact problem this whole blog post is designed to address. Since joins happen no matter what, we cannot use the short-circuit to avoid the overhead for users without excluded domains. At least for now, this is why I've gone with the NOT EXISTS subquery using the short-circuit.


      Discuss on Hacker News, Lobsters, r/programming, r/sqlite.


    11. 🔗 r/LocalLLaMA Microsoft's TRELLIS 2-4B, An Open-Source Image-to-3D Model rss

      Microsoft's TRELLIS 2-4B, An Open-Source Image-to-3D Model | Model Details

      • Model Type: Flow-Matching Transformers with Sparse Voxel based 3D VAE
      • Parameters: 4 Billion
      • Input: Single Image
      • Output: 3D Asset

      Model - https://huggingface.co/microsoft/TRELLIS.2-4B Demo - https://huggingface.co/spaces/microsoft/TRELLIS.2 Blog post - https://microsoft.github.io/TRELLIS.2/ submitted by /u/Dear-Success-1441
      [link] [comments]
      ---|---

    12. 🔗 @cxiao@infosec.exchange clip credit to StitchAndRollCrits on reddit! mastodon
    13. 🔗 @cxiao@infosec.exchange mentally im here mastodon

      mentally im here
      (goldeneyes vs charge game tonight was absolute peak tho)

      #pwhl #hnom

    14. 🔗 r/reverseengineering Released an update to my Mach-O triage tool for macOS (REPL, strings, hexdump) rss
    15. 🔗 Ampcode News Gemini 3 Flash in the Search Subagent rss

      Amp's codebase search subagent now uses Gemini 3 Flash instead of Haiku 4.5. It's 3x faster for the same quality.

      Gemini 3 Flash is better at parallel tool calls, explores with diverse queries, and concludes early. Where Haiku 4.5 averaged ~2.5 parallel calls per iteration, Gemini 3 Flash fires off ~8. It wraps up in ~3 turns instead of ~9.

      Performance vs Latency chart showing Gemini 3 Flash in the optimal zone: high F1 score, low latency