torque/lore - lore - epicyclic.dev

7 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
torque	ab580fa80a	config: differentiate fields in Value This makes handling Value very slightly more work, but it provides useful metadata that can be used to perform better conversion and serialization. The motivation behind the "scalar" type is that in general, only scalars can be coerced to other types. For example, a scalar `null` and a string `> null` have the same in-memory representation. If they are treated identically, this precludes unambiguously converting an optional string whose contents are "null". With the two disambiguated, we can choose to convert `null` to the null object and `> null` to a string of contents "null". This ambiguity does not necessary exist for the standard boolean values `true` and `false`, but it does allow the conversion to be more strict, and it will theoretically result in documents that read more naturally. The motivation behind exposing flow_list and flow_map is that it will allow preserving document formatting round trip (well, this isn't strictly true: single line explicit strings neither remember whether they were line strings or space strings, and they don't remember if they were indented. However, that is much less information to lose). The following formulations will parse to the same indistinguishable value: key: > value key: > value key: \| value key: \| value I think that's okay. It's a lot easier to chose a canonical form for this case than it is for a map/list without any hints regarding its origin.	2023-09-22 01:00:17 -07:00
torque	b18326a07a	config: start doing some code cleanup I was pretty sloppy with the code organization while writing out the state machines because my focus was on thinking through the parsing process and logic there. However, The code was not in good shape to continue implementing code features (not document features). This is the first of probably several commits that will work on cleaning up some things. Value has been promoted to the top level namespace, and Document has an initializer function. Referencing Value.List and Value.Map are much cleaner now. Type aliases are good. For the flow parser, `popStack` does not have to access anything except the current stack. This can be passed in as a parameter. This means that `parse` is ready to be refactored to take a buffer and an allocator. The main next steps for code improvement are: 1. reentrant/streaming parser. I am planning to leave it as line-buffered, though I could go further. Line-buffered has two main benefits: the tokenizer doesn't need to be refactored significantly, and the flow parser doesn't need to be made reentrant. I may reevaluate this as I am implementing it, however, as those changes may be simpler than I think. 2. Actually implement the error diagnostics info. I have some skeleton structure in place for this, so it should just be doing the work of getting it hooked up. 3. Parse into object. Metaprogramming, let's go. It will be interesting to try to do this non-recursively, as well (curious to see if it results in code bloat). 4. Object to Document. This is probably going to be annoying, since there are a variety of edge cases that will have to be handled. And lots of objects that cannot be represented as documents. 5. Serialize Document. One thing the parser does not preserve is whether a Value was flow-style or not, so it will be impossible to do round-trip formatting preservation. That's currently a non-goal, and I haven't decided yet if flow-style output should be based on some heuristic (number/length of values in container) or just never emitted. Lack of round-trip preservation does make using this as a general purpose config format a lot more dubious, so I will have to think about this some more. 6. Document to JSON. Why not? I will hand roll this and it will suck. And then everything will be perfect and never need to be touched again.	2023-09-22 00:53:26 -07:00
torque	cd05097a78	config: add terminated strings This was the final feature I wanted to add to the format. Also some other things have been cleaned up a little bit (for example, the inline parser does not need the dangling key to be attached to each stack level just like the normal parser doesn't). There was also an off-by-one error that bugged out detecting the pathological case of a flow list consisting of only an empty string (`[ ]`, not to be mistaken for the empty list `[]`). Mixed multiline strings are a bit confusing but internally consistent. > what character does this string end with? \| ends with a newline character because that's the style of the second-to-last line. However, seeing \| last makes my brain think it should end with a space. The reason it ends with a newline is because our concatenation strategy consists of appending to the string early (as soon as a line is added) rather than lazily. This is a tradeoff, though. while lazy appending would make this result more intuitive (the string would end with a space) and it would allow us to remove the self-proclaimed cheesy hack, it would make the opposite boundary condition confusing: > \| what character does this string start with? With lazy appending, this string would start with a space (despite > making it look like it should have a leading newline). While both of these are likely to be uncommon edge cases, it doesn't seem we can have it both ways. Of the two options, I think the current logic is a little bit more clear.	2023-09-22 00:53:26 -07:00
torque	8b5a0114ef	config: allow nested flow structures This was kind of a pain in the butt to implement because it basically required a second full state machine parser (though this one is a bit simpler since there are less possible value types). It seems likely to me that I will probably shove this directly into the main parser struct at some point in the near future.	2023-09-17 19:47:18 -07:00
torque	ec875ef1f7	config: fix several things There was no actual check that lines weren't being indented too far. Inline strings weren't having their trailing newline get chopped. Printing is still janky, but it's better than it was.	2023-09-17 19:30:58 -07:00
torque	c8375d6d3a	mostly functioning config parser This hand-rolled wonder of switch statements is capable of parsing a 5 byte document in less than a gigasecond. This was an interesting exercise in writing a non-recursive parser for a nested structure format. There's a lot of very slightly different repetition, which I'm not wild about, but it can handle deeply nested documents. I tested with a 50 mb indented list tree document (10_000 lines of nesting) and a ReleaseFast build was able to parse it in approximately 50 ms with a peak memory footprint of about 100 MB (of which, half was the contents of the document itself, as the file is read into a single allocated buffer that does not get freed until program exit). I don't consider myself to be someone who writes high performance software, but I think those results are quite acceptable, and I doubt any recursive implementation would even be able to parse that document at all (the python NestedText implementation smashes directly into a RecursionError, unsurprisingly). Anyway, let's call this a success. I will actually probably export this to a separate project soon. The main problem is coming up with a name. I also strongly suspect there are some lurking bugs still, and I think I do want to add nested inline map/list support (and also parsing directly into objects).	2023-09-14 23:42:31 -07:00
torque	bff1fa95c9	ignominious dump of creation it's beautiful. and broken	2023-09-13 00:11:45 -07:00