The Hunt for Midori ～ take

The Hunt for Midori

"My goal is to eliminate every line of C and C++ from Microsoft by 2030," Microsoft Distinguished Engineer Galen Hunt writes in a post on LinkedIn. "Our strategy is to combine AI and Algorithms to rewrite Microsoft’s largest codebases. Our North Star is ‘1 engineer, 1 month, 1 million lines of code.’ To accomplish this previously unimaginable task, we’ve built a powerful code processing infrastructure. Our algorithmic infrastructure creates a scalable graph over source code at scale."

The post was later retracted, in that most famous telltale sign of our times, the reluctant "I'm sorry you fuckers misrepresented what I thought by reporting literally what was said".

I work with .NET in my day job, almost exclusively on non-Windows platforms these days. .NET, for its many flaws, has been an island of relatively open thinking within the Microsoft bubble, which otherwise is just as, or more, prone to hivemind company-thinking than its acolytes regularly accuse Apple enthusiasts of being. (As a recent example of Apple users reacting to changes in their platform, look only to Nikita Prokopov's vicious-in-its-incisiveness, but entirely justified, takedown of everything-having-an-icon in macOS Tahoe.)

Ignoring hype and corporate arrogance, having been conversant in .NET for a significant portion of my life, my thoughts go to Midori. Midori was a legendary ground-up implementation of an operating system, object capability model and asynchronous programming in pure managed, memory-safe code that went as far as to power production code. It directly birthed the concepts behind async and await, which has now spread to pretty much every language in the decade since its introduction, as well as brought the concept of contiguous memory-safe slices, christened Span<T> to C# and .NET, where it now infiltrates all levels of the stack and brings down memory allocations and by extension garbage collection.

I don't know what Mr Hunt is up to, but it does have the ring of a similar project. Putting its fate in the hands of the stochastic parrot is somewhat worrying, but I do at least trust the people conversant in these three languages (and Microsoft's dialects of at least C and C++) to be competent enough to evaluate its successes and build guard rails around it, given that most code bases will either not compile or malfunction during use. (I predict a substantial probability that the outcome will be: the technology to do this safely and competently, according to the ambitious goals, simply does not exist yet.)

What all this similarly leads to is unknown. Rust is helpful but legendary in its lack of ingratiation. The Rust community seems to have developed a blind eye to the consequences of algorithms that work around the borrow checker. Yes, it means there are 20-30 lines of unsafe code to audit instead of the whole codebase. There's a next step missing, to make that unsafe code as expressible as ordinary Rust code, where its reasoning can be similarly encoded - as in: in which way can I teach the checker about the semantics of the concurrent or self-referential data structure that I had to jump through the escape hatch to actually implement. This is not easy and it not being done is not because of laziness, nor lack of ambition. It is as unbounded in scope as it is just plain difficult to solve - the borrow checker and current memory model is already so much of a miracle that it is barely replicated elsewhere yet.

But there still is a series of next steps. My hope is that this project, alongside the current effort to only allow new codebases in Rust in the Windows kernel, helps push on the state of the art by trying to do what research projects do best - which is to start with an oft-absurd idea and then take it, over time, with purpose and still with connection to what the real world wants to accomplish, to a logical conclusion.