Design of the type system, particularly defining local variables.
I just built a treeshaker for SubX programs. To my pleasant surprise it took just about 2 hours.
https://github.com/akkartik/mu/blob/master/tools/treeshake.cc
I don't trust it yet, but it does yield some stats:
https://github.com/akkartik/mu/blob/master/linux/stats.txt
Notes:
a) Binary sizes, in spite of being just tens of KB, are quite bloated by tests and unused library functions. They can usually be shrunk to ~20% their original size. Programming in machine code naturally focuses on the essence.
b) 75% LoC in sources are tests. Verbose!
In the last 2 weeks I went from being able to translate:
fn foo {
}
to:
fn foo n: int {
increment n
}
That required 2k LoC. So it seems fair to say that things are still in the "black triangles" phase. And there are still gaping holes. Variable declarations don't really exist yet. (I can just parse them.)
One voice in my head (the one often active when interacting in this forum) whispers that if only I had better tools the process could have been shortened.
Another voice in my head whispers that I'm stupid for taking so long to figure out something some putative body else would find obvious. ("If deleting no-op nodes in a dependency graph causes nodes to fire before they're ready, that means some edges are being spuriously cut.") Or maybe I'm rusty, because I don't work anymore with graphs 12 years after finishing grad school.
But the dominant voice in my head is just elation, the flush of insight, of having tamed a small portion of the wilderness around me and inside my own head. And it wouldn't have happened without struggling for a while with the wilderness, no matter what tools I had. A big portion of today was spent trying to visualize graphs and finding them too large for my tools to handle. So I had to resort to progressively more and more precise tools. Text-mode scalpels over graphical assistants. And that process of going beyond what my regular tools can handle is a key characteristic of going out into the wilderness. When tools fail, the only thing left is to try something, see what happens, and think. No improvement in tools can substitute for the experience of having gone beyond your tools, over and over again.
There's a famous saying that insights come to the prepared mind. It's easy to read and watch Bret Victor and imagine that we are in the insight delivery business. But we're really in the mind preparing business.
Not much to report this week. Last week I implemented the instruction increment x
when x
is on the stack. This week I did x <- increment
when x is a register.
(In Mu, registers written to are return values, while memory locations written to are just passed in by reference.)
The good news: I now have a core data structure in place for code-generating different instructions, and this includes static dispatch.
http://akkartik.name/post/mu-2019-2
First statements now translating!
http://akkartik.github.io/mu/html/linux/mu.subx.html
fn foo {}
=> function prologue and epilogue
increment x
when x is on the stack.
foo x
when x is on the stack and foo is a user-defined function.
Seems like small progress, but.. http://rampantgames.com/blog/?p=7745
I feel like I'm finally starting to get closure on https://news.ycombinator.com/item?id=13608810#13610366
I wrote about it last week, but already that post is growing out of date. Here's an initial sketch.
So far all that works is function definitions with no arguments or body. They emit just the prologue and epilogue code sequences.
But I have a sketch of the next few steps of the design in there.
The design has changed in 2 ways.
#1: no more uninitialized variables. There's just no safe way to avoid initialization when structs can contain pointers.
This adds overheads to local variables in two ways: performance, and translator complexity for length-prefixed arrays. We can't just zero out everything, we also have to write array lengths in. Since structs may contain arrays, initializing a variable may involve writing lengths to multiple locations.
#2: unwinding the stack when exiting a scope. I can't just increment the stack pointer, I need to also restore any necessary registers along the way.
Another complication is that I need Java's labeled break
s. Since break
does double-duty for both conditionals and loops, it's too onerous to only be able to break
out of one scope at a time. But labeled break
s increase the frequency with which I need to unwind the stack. More overhead.
The common theme here seems to be that the translator needs to maintain 'symbolic' representations of contiguous memory regions, tracking either values or register bindings for offsets.
I'm not sure how I feel about these choices. Perhaps I should give up on this whole design and switch to something more traditional. A memory-safe C-like language. So I'd love to hear feedback and suggestions.
Since I'm building it out of machine code, memory management is a perennial concern. My parsing in level 1 has mostly used static arrays. But now I think I'm going to switch to dynamic linked lists and trees. Leak some memory.
I just wrote up a summary of the state of Mu, in two parts.
Part 1 summarizes the past year as a sequence of major design decisions:
http://akkartik.name/post/mu-2019-1
Part 2 is a sketch of what I plan to build next, again structured as a sequence of design decisions:
http://akkartik.name/post/mu-2019-2
(The flow from design constraints to decisions is inspired by Christopher Alexander.)
Any and all feedback appreciated. I'd like it to be clear to any programmer.