The enum member name case was changed in
ziglang/zig@099f3c4039. This appears to
be the only change required to bring us up to compatibility with zig
0.12.0-dev.3561+f45ba7d0c.
This is the first cut at providing human-readable context for command
line parsing failures. Since these failures are due to incorrect
input (normally produced by a human), closing the information loop at
the human layer makes a hell of a lot more sense than dumping an error
traceback with a (possibly cryptic) error name and calling it a day.
This approach doesn't print anything out by default and still depends
on the user to choose exactly how the handle and print the error
message. Errors are propagated from subcommands, though they end up
being copied, which shouldn't be strictly necessary. Maybe this can be
improved in the future. OutOfMemory has been added to ParseError to
simplify the code a bit.
The demo has been updated with a simplistic example of what presenting
error messages to the user may look like. I don't know that this
produces useful messages for every possible failure scenario, but it
does for the most common ones.
This changes the parse/callback flow so that subcommands are run
iteratively from the base command rather than recursively. The primary
advantages of this approach are some stack space savings and much less
convoluted backtraces for deeply nested command hierarchies. The
overall order of operations has not changed, i.e. the full command
line is parsed before command callback dispatch starts.
Fixes: #12
I may move this commit to a separate branch, since there are a variety
of improvements that I think I want to get applied to the
0.11.x-compatible codebase still. However, I have also not been
motivated to work on those fixes, since this codebase is kind of
crusty due to being the first thing I ever wrote in zig. Doing a
bigger rewrite might supply the motivation to make those improvements.
I will have to think about it. For now, I am going to focus elsewhere.
ziglang/zig@a02bd81760 changed
builtin.StructField to necessitate this, but this should also be
backwards compatible since these should decay to plain slices just
fine.
This works, but there are probably interesting edge cases around how it
works when directly adjacent to other paragraphs. I will have to think
about it a bit. This wrapping code in general would benefit from term
queries.
Perhaps violating the principle of least astonishment quite severely is
the fact that "> a" and ">" are detected as preformatted, but ">a" is
normal wrapped text. Supporting both ">a" and "> a" leads to
nonobvious whitespace behavior, but this code should not be able to
runtime error outside of the writer dying. This may be reevaluated in
the future, but I will leave it as-is for now.
This just creates an empty command with an auto-assigned noop callback.
This is useful sugar for creating a group of commands under a common
name because previously the user would have to define their own noop
callback and bind it. This just takes a description string
(and, optionally, a help flag override).
This is a feeble attempt to unify some logic, as I realized that
Command.createInterface had different logic for handling the user
context than Parser did, which broke certain use cases (using a slice
as the context for example).
I'm not convinced this really unifies the logic as much as wraps it in
another layer of indirection, but at least the core problem is solved.
I think I had initially intended 0-length descriptions to be "hidden"
options, but this doesn't really work well with arguments, and it also
doesn't make intention clear. Perhaps an additional field should be
added to the parameter specification to support hiding options
(this does not make sense for non-named options).
This was kind of an annoying change to make since 0.11.0 has issues
where it will point to the wrong srcloc on compile errors in generic
code (which this 100% is) fortunately fixed in master. The motivation
for this change is that the arg vector already contains 0-terminated
strings, so we can avoid a lot of copies. This makes forwarding
command-line arguments to C-functions that expect zero-terminated
strings much more straightforward, and they also automatically decay
to normal slices.
Unfortunately, environment variable values are NOT zero-terminated, so
they are currently copied with zero-termination. This seems to be the
fault of Windows/WASI, both of which already are performing
allocations (Windows to convert from UTF-16 to UTF-8, and WASI to get
a copy of the environment). By duplicating the std EnvMap
implementation, we could make a version that generates 0-terminated
env vars without extra copies, but I'll skip on doing that for now.
This means that ParserInterface can do all of the important things that
Parser can do, which makes Command.createInterface a lot more useful
(there wasn't previously a way to add subcommands to an interface
created that way without a mass of extremely suspect casting).
This commit also makes the language around this. They're subcommands,
not children, and they have names, not verbs, associated with them.
Glad we could clear that up.
I think I still prefer snake_case stylistically, but this style fits in
a lot better with other zig code and is the officially endorsed style,
so it makes sense to use it. Since I don't have good test coverage,
some conversions may have been missed, but the demo still builds,
which is a decent amount of coverage. This was pretty easy to do with
a find and replace, so it should be reasonably thorough.
The main value of this method is that it allows runtime access to the
help description of the subcommand. This could allow implementation of
a help flag that takes the name of a subcommand to print help for or
something. Anyway, it's probably useful.
This allocates the interface with its own arena allocator, allowing it
to live beyond its stack lifetime. This enables some useful patterns
for composing a CLI from multiple functions or files. This is actually
probably the preferred method over `create_parser` in most
circumstances.
There are a couple of other places where []u8 is treated implicitly
like a string, which isn't strictly correct. Ultimately, some kind of
metasignal will be required to make this type truly unambiguous in
interpretation.
Stay a while and listen to my story.
Due to the design of the parser execution flow, the only reasonable way
to avoid leaking memory in the parser is to use an arena allocator
because the parser itself doesn't have direct access to everything it
allocates, and everything it allocates needs to live for the duration
of whatever callbacks are called.
Now, you might say, if the items it allocates are stored for the
lifetime of whatever callbacks, then that means that the items it
allocates stay allocated for effectively the entire life of the
program. In which case there's really not much point in freeing them
at all, as it's just extra work on exit that the OS should normally
clean up. And you'd be right, except for two details: if the user uses
the current GeneralPurposeAllocator, it will complain about leaks when
deinitialized, which simply isn't groovy. The other detail is that
technically the user can run any code they want after the parser
execution finishes, so forcing the user to leak memory by having an
incomplete API is rude.
The other option would be, as before, forcing the user to supply their
own arena allocator if they don't want to leak, but that's kind of a
rude thing to do and goes against the "all allocators look the same"
design of the standard library, which is what makes it so easy to use
and create allocators with advanced functionality. That seems like an
ugly thing to do, so, instead, each parser gets to eat the memory cost
of storing a pointer to its arena allocator (and the heap cost of the
arena allocator itself).
In theory, subcommands could borrow the arena allocator of their parent
command to save a bit of heap space, but that would make a variety of
creation and cleanup-related tasks less isomorphic between the parents
and the subcommands. I like the current design where commands and
subcommands are the same thing, and I'm not in a rush to disturb that.
I don't think the overhead cost of the arena allocator itself, which
can be measured in double digit bytes, is a particularly steep price
to pay.
enumToInt changed to intFromEnum, and the casting builtins figured out
how to automagically infer the cast type. This results in some minor
simplification, which is nice.
At some point, (probably during the llvm 16 upgrade, though I haven't
done the legwork to actually narrow it down), zig developed a crash
around the way inline was used here. Since using `inline` was an air
quotes optimization, we can just chuck the designation for the time
being so that compilation will succeed. This may remove more inlines
than is strictly necessary, but I am bravely willing to make that
sacrifice.
See: ziglang/zig#15668
Why am I inventing my own documentation format? It's hard to say, but
it's probably because I Am Stupid. The main problem here will be lack
of automatic hyperlinks generated in the documentation. Oh well, it's
experimental.
This is an interesting change. While I think generally passing in
constant userdata is not terribly useful, the previous implementation
precluded it entirely. Interface types, for example, are often passed
directly and stored as constants (they hold pointers to their mutable
state).
Since we type erase this so it can be bound to the generic interface
object, non-pointer objects must be passed by reference to avoid
binding the parser interface to a temporary stack copy of the object.
This means we have to handle these cases slightly differently. Also,
while technically being classified as pointers, slices don't really
behave like pointers, which is understandable but annoying. There's a
bit of asymmetry here, as CommandBuilder(*u32) and CommandBuilder
(u32) both require an *u32 when binding the parser interface. This is
of course because pointers do not need to be rewrapped to be type
erased. The same code path could be used for both cases, but then the
user would have to pass in a pointer to a pointer, which actually
looks a bit silly because then it potentially means having to
do &&my_var.
This recognizes block labels. Actually implementing this successfully
took more attempts than I'd like to admit. I originally had a
streaming version using a tail queue, but it had problems that seemed
to be intractable. So instead, everything is just jammed in an
arraylist and processed as a whole once the tokenizing is complete. It
increases the maximum memory usage to store all the intermediates
during tokenizing the whole file, but it does work, and frankly I'm ok
with it using a few MB of memory. It can tokenize itself.
Even though the goal is for this to be run with an arena allocator,
nothing is currently enforcing that, so we should try to keep tidy.
This still leaks memory like crazy without an arena allocator, though.
Turns out the line wrapping logic is kind of ugly. I think probably a
couple of helper functions would make a big difference. But it appears
to work and even handles the edge cases I've currently encountered.