nice-data

Author	SHA1	Message	Date
torque	7db6094dd5	state/tokenizer: go completely the opposite direction re: whitespace This commit makes both the parser and tokenizer a lot more willing to accept whitespace in places where it would previously cause strange behavior. Also, whitespace is ignored preceding and following all values and keys in flow-style objects now (in regular objects, trailing whitespace is an error, and it is also an error for non-flow map keys to have whitespace before the colon). Tabs are no longer allowed as whitespace in the line. They can be inside scalar values, though, including map keys. Also strings allow tabs inside of them. The primary motivation here is to apply the principle of least astonishment. For example, the following - [hello, there] would previously have been parsed as the scalar " [hello, there]" due to the presence of an additional space after the "-" list item indicator. This obviously looks like a flow list, and the way it was previously parsed was very visually confusing (this change does mean that scalars cannot start with [, but strings can, so this is not a real limitation. Note that strings still allow leading whitespace, so > hello will produce the string " hello" due to the additional space after the string designator. For flow lists, [ a, b ] would have been parsed as ["a", "b "], which was obviously confusing. The previous commit fixed this by making whitespace rules more strict. This commit fixes this by making whitespace rules more relaxed. In particular, all whitespace preceding and following flow items is now stripped. The main motivation for going in this direction is to allow aligning list items over multiple lines, visually, which can make data much easier to read for people, an explicit design goal. For example key: [ 1, 2, 3 ] other: [ 10, 20, 30 ] is now allowed. The indentation rules do not allow right-aligning "key" to "other", but I think that is acceptable (if we forced using tabs for indentation, we could actually allow this, which I think is worth consideration, at least). Flow maps are more generous: foo: { bar: baz } fooq: { barq: bazq } is allowed because flow maps do not use whitespace as a structural designator. These changes do affect how some things can be represented. Scalar values can no longer contain leading or trailing whitespace (previously the could contain leading whitespace). Map keys cannot contain trailing whitespace (they could before. This also means that keys consisting of whitespace cannot be represented at all). Ultimately, given the other restrictions the format imposes on keys and values, I find these to be acceptable and consistent with the goal of the format.	2023-10-04 22:54:53 -07:00
torque	01f98f9aff	parser: start the arduous journey of hooking up diagnostics The errors in the line buffer and tokenizer now have diagnostics. The line number is trivial to keep track of due to the line buffer, but the column index requires quite a bit of juggling, as we pass successively trimmed down buffers to the internals of the parser. There will probably be some column index counting problems in the future. Also, handling the diagnostics is a bit awkward, since it's a mandatory out-parameter of the parse functions now. The user must provide a valid diagnostics object that survives for the life of the parser.	2023-09-27 23:44:06 -07:00
torque	3258e7fdb5	tokenizer: add finish function to check if there is trailing data Since the tokenizer is decoupled from the parser, there's no good way to do this. Also without attempting to parse the last line, it's impossible to say if it is junk data or simply a missing trailing new line.	2023-09-27 23:35:24 -07:00
torque	0e60719c85	linebuffer: add strictness options When the buffer was separated from the tokenizer, we lost some validation, including really aggressive carriage return detection. This brings this back in full force and adds some additional validation on top of it.	2023-09-26 00:06:39 -07:00
torque	38e47b39dc	all: do some restructuring I don't like big monolithic source files, so let's restructure a bit. parser.zig is still bigger than I would like it to be, but there isn't a good way to break up the two state machine parsers, which take up most of the space. This is the last junk commit before I am seriously going to implement the "streaming" parser. Which is the last change before implementing deserialization to object. I am definitely not just spinning my wheels here.	2023-09-24 18:22:12 -07:00

5 Commits