12 Commits

Author SHA1 Message Date
21a9753d46
parser: change omitted value behavior to work with all default values
Special casing optional values was a little odd before. Now, the user
can supply a default value for any field that may be omitted from the
serialized data. This behaves the same way as the stdlib JSON parser
as well.
2023-11-23 17:47:21 -08:00
1c5d7af552
parser: don't leak on parseTo error
A good idea.
2023-10-22 16:49:12 -07:00
f371aa281c
parser: default expect enum values with leading .
I prefer this, personally. And this is all about personal preference.
2023-10-22 16:48:45 -07:00
f371f16e2f
slam dunk that minimum viable product vibe 2023-10-22 16:16:57 -07:00
4690f0b808
parser: add option for case-insensitive scalar comparison
This does not support unicode case folding, which is very much a
sorry-not-sorry situation because unicode is a disgusting labyrinthine
chaotic hellformat. Actually, our unicode support isn't very good from
the standpoint that we don't do any form of normalization, so
specifying non-ASCII values for scalar comparisons is probably asking
for trouble.
2023-10-18 21:34:07 -07:00
34ec58e0d2
value: implement parsing to objects
There are still some untested codepaths here, but this does seem to
work for nontrivial objects, so, woohoo. It's worth noting that this
is a recursive implementation (which seems silly after I hand-rolled
the non-recursive main parser). The thinking is that if you have a
deeply-enough nested object that you run out of stack space here, you
probably shouldn't be converting it directly to an object.

I may revisit this, though I am still not 100% certain how
straightforward it would be to make this nonrecursive with all the
weird comptime objects. Basically the "parse stack" would have to be
created at comptime.
2023-10-03 23:17:37 -07:00
01f98f9aff
parser: start the arduous journey of hooking up diagnostics
The errors in the line buffer and tokenizer now have diagnostics. The
line number is trivial to keep track of due to the line buffer, but
the column index requires quite a bit of juggling, as we pass
successively trimmed down buffers to the internals of the parser.
There will probably be some column index counting problems in the
future. Also, handling the diagnostics is a bit awkward, since it's a
mandatory out-parameter of the parse functions now. The user must
provide a valid diagnostics object that survives for the life of the
parser.
2023-09-27 23:44:06 -07:00
3258e7fdb5
tokenizer: add finish function to check if there is trailing data
Since the tokenizer is decoupled from the parser, there's no good way
to do this. Also without attempting to parse the last line, it's
impossible to say if it is junk data or simply a missing trailing new
line.
2023-09-27 23:35:24 -07:00
0e60719c85
linebuffer: add strictness options
When the buffer was separated from the tokenizer, we lost some
validation, including really aggressive carriage return detection.
This brings this back in full force and adds some additional
validation on top of it.
2023-09-26 00:06:39 -07:00
7f82c24584
parser: implement streaming parser
With my pathological 50MiB 10_000 line nested list test, this is
definitely slower than the one shot parser, but it has peak memory
usage of 5MiB compared to the 120MiB of the one-shot parsing. Not bad.
Obviously this result is largely dependent on the fact that this
particular benchmark is 99% whitespace, which does not get copied into
the resulting document. A (significantly) smaller improvement will be
observed in files that are mostly data with little indentation or
empty lines.

But a win is a win.
2023-09-25 01:18:09 -07:00
1d65b072ee
parser: stateful reentrancy
finally the flow parser has been "integrated" with the main parser in
that they now share a stack. The bigger thing is that the parsing has
been decoupled from the tokenization, which will allow parsing
documents without loading them fully into memory first.

I've been calling this the streaming parser, but it's worth noting that
I am referring to streaming input, not streaming output. It would
certainly be possible to do streaming output, but I am not interested
in that at the moment (it would be the lowest-memory-overhead
approach, but it's a lot of work for little gain, and it is less
flexible for converting input to objects).
2023-09-24 22:24:33 -07:00
38e47b39dc
all: do some restructuring
I don't like big monolithic source files, so let's restructure a bit.
parser.zig is still bigger than I would like it to be, but there isn't
a good way to break up the two state machine parsers, which take up
most of the space. This is the last junk commit before I am seriously
going to implement the "streaming" parser. Which is the last change
before implementing deserialization to object. I am definitely not
just spinning my wheels here.
2023-09-24 18:22:12 -07:00