Compare commits

...

3 Commits

Author SHA1 Message Date
6c1eb176be config: differentiate fields in Value
this makes handling Value very slightly more work, but it provides
useful metadata that can be used to perform better conversion and
serialization.

The motivation behind the "scalar" type is that in general, only
scalars can be coerced to other types. For example, a scalar `null`
and a string `> null` have the same in-memory representation. If they
are treated identically, this precludes unambiguously converting an
optional string whose contents are "null". With the two disambiguated,
we can choose to convert `null` to the null object and `> null` to a
string of contents "null". This ambiguity does not necessary exist for
the standard boolean values `true` and `false`, but it does allow the
conversion to be more strict, and it will theoretically result in
documents that read more naturally.

The motivation behind exposing flow_list and flow_map is that it will
allow preserving document formatting round trip (well, this isn't
strictly true: single line explicit strings neither remember whether
they were line strings or space strings, and they don't remember if
they were indented. However, that is much less information to lose).

The following formulations will parse to the same indistinguishable
value:

  key: > value
  key:
    > value
  key: | value
  key:
    | value

I think that's okay. It's a lot easier to chose a canonical form for
this case than it is for a map/list without any hints regarding its
origin.
2023-09-19 00:25:47 -07:00
db8f6b6345 config: start doing some code cleanup
I was pretty sloppy with the code organization while writing out the
state machines because my focus was on thinking through the parsing
process and logic there. However, The code was not in good shape to
continue implementing code features (not document features). This is
the first of probably several commits that will work on cleaning up
some things.

Value has been promoted to the top level namespace, and Document has an
initializer function. Referencing Value.List and Value.Map are much
cleaner now. Type aliases are good.

For the flow parser, `popStack` does not have to access anything except
the current stack. This can be passed in as a parameter. This means
that `parse` is ready to be refactored to take a buffer and an
allocator.

The main next steps for code improvement are:

1. reentrant/streaming parser. I am planning to leave it as
   line-buffered, though I could go further. Line-buffered has two main
   benefits: the tokenizer doesn't need to be refactored significantly,
   and the flow parser doesn't need to be made reentrant. I may
   reevaluate this as I am implementing it, however, as those changes
   may be simpler than I think.

2. Actually implement the error diagnostics info. I have some skeleton
   structure in place for this, so it should just be doing the work of
   getting it hooked up.

3. Parse into object. Metaprogramming, let's go. It will be interesting
   to try to do this non-recursively, as well (curious to see if it
   results in code bloat).

4. Object to Document. This is probably going to be annoying, since
   there are a variety of edge cases that will have to be handled. And
   lots of objects that cannot be represented as documents.

5. Serialize Document. One thing the parser does not preserve is
   whether a Value was flow-style or not, so it will be impossible to
   do round-trip formatting preservation. That's currently a non-goal,
   and I haven't decided yet if flow-style output should be based on
   some heuristic (number/length of values in container) or just never
   emitted. Lack of round-trip preservation does make using this as a
   general purpose config format a lot more dubious, so I will have to
   think about this some more.

6. Document to JSON. Why not? I will hand roll this and it will suck.

And then everything will be perfect and never need to be touched again.
2023-09-18 00:01:36 -07:00
3b68f1dc7a config: add terminated strings
This was the final feature I wanted to add to the format. Also some
other things have been cleaned up a little bit (for example, the
inline parser does not need the dangling key to be attached to each
stack level just like the normal parser doesn't). There was also an
off-by-one error that bugged out detecting the pathological case of a
flow list consisting of only an empty string (`[ ]`, not to be
mistaken for the empty list `[]`).

Mixed multiline strings are a bit confusing but internally consistent.

    > what character does this string end with?
    |

ends with a newline character because that's the style of the
second-to-last line. However, seeing | last makes my brain think it
should end with a space. The reason it ends with a newline is because
our concatenation strategy consists of appending to the string early
(as soon as a line is added) rather than lazily. This is a tradeoff,
though.  while lazy appending would make this result more intuitive
(the string would end with a space) and it would allow us to remove
the self-proclaimed cheesy hack, it would make the opposite boundary
condition a confusing:

    >
    | what character does this string start with?

With lazy appending, this string would start with a space
(despite > making it look like it should start have a leading
newline). While both of these are likely to be uncommon edge cases, it
doesn't seem we can have it both ways. Of the two options, I think the
current logic is a little bit more clear.
2023-09-17 23:09:26 -07:00

View File

@@ -54,6 +54,9 @@
// Like multiline strings, the final space is stripped (I guess this is a very // Like multiline strings, the final space is stripped (I guess this is a very
// janky way to add trailing whitespace to a string). // janky way to add trailing whitespace to a string).
// //
// - terminated strings to allow trailing whitespace:
// | this string has trailing whitespace |
// > and so does this one |
// - The parser is both strict and probably sloppy and may have weird edge // - The parser is both strict and probably sloppy and may have weird edge
// cases since I'm slinging code, not writing a spec. For example, tabs are // cases since I'm slinging code, not writing a spec. For example, tabs are
// not trimmed from the values of inline lists/maps // not trimmed from the values of inline lists/maps
@@ -94,10 +97,19 @@ pub const LineTokenizer = struct {
const InlineItem = union(enum) { const InlineItem = union(enum) {
empty: void, empty: void,
scalar: []const u8, scalar: []const u8,
string: []const u8, line_string: []const u8,
space_string: []const u8,
flow_list: []const u8, flow_list: []const u8,
flow_map: []const u8, flow_map: []const u8,
fn lineEnding(self: InlineItem) u8 {
return switch (self) {
.line_string => '\n',
.space_string => ' ',
else => unreachable,
};
}
}; };
const LineContents = union(enum) { const LineContents = union(enum) {
@@ -288,26 +300,38 @@ pub const LineTokenizer = struct {
if (buf.len == 0) return .empty; if (buf.len == 0) return .empty;
switch (buf[0]) { switch (buf[0]) {
'|', '>' => |char| { '>', '|' => |char| {
if (buf.len > 1 and buf[1] != ' ') return error.BadToken; if (buf.len > 1 and buf[1] != ' ') return error.BadToken;
return .{ const slice: []const u8 = switch (buf[buf.len - 1]) {
.string = buf.ptr[@min(2, buf.len) .. buf.len + @intFromBool(char == '|')], ' ', '\t' => return error.TrailingWhitespace,
'|' => buf[@min(2, buf.len) .. buf.len - @intFromBool(buf.len > 1)],
else => buf[@min(2, buf.len)..buf.len],
}; };
return if (char == '>')
.{ .line_string = slice }
else
.{ .space_string = slice };
}, },
'[' => { '[' => {
if (buf.len < 2 or buf[buf.len - 1] != ']') return error.BadToken; if (buf.len < 2 or buf[buf.len - 1] != ']')
return error.BadToken;
// keep the closing ] for the flow parser // keep the closing ] for the flow parser
return .{ .flow_list = buf[1..] }; return .{ .flow_list = buf[1..] };
}, },
'{' => { '{' => {
if (buf.len < 2 or buf[buf.len - 1] != '}') return error.BadToken; if (buf.len < 2 or buf[buf.len - 1] != '}')
return error.BadToken;
// keep the closing } fpr the flow parser // keep the closing } fpr the flow parser
return .{ .flow_map = buf[1..] }; return .{ .flow_map = buf[1..] };
}, },
else => { else => {
if (buf[buf.len - 1] == ' ' or buf[buf.len - 1] == '\t')
return error.TrailingWhitespace;
return .{ .scalar = buf }; return .{ .scalar = buf };
}, },
} }
@@ -326,6 +350,127 @@ pub const LineTokenizer = struct {
} }
}; };
pub const Value = union(enum) {
pub const String = std.ArrayList(u8);
pub const Map = std.StringHashMap(Value);
pub const List = std.ArrayList(Value);
pub const TagType = @typeInfo(Value).Union.tag_type.?;
scalar: String,
string: String,
list: List,
flow_list: List,
map: Map,
flow_map: Map,
pub inline fn fromScalar(alloc: std.mem.Allocator, input: []const u8) !Value {
return try _fromScalarOrString(alloc, .scalar, input);
}
pub inline fn fromString(alloc: std.mem.Allocator, input: []const u8) !Value {
return try _fromScalarOrString(alloc, .string, input);
}
inline fn _fromScalarOrString(alloc: std.mem.Allocator, comptime classification: TagType, input: []const u8) !Value {
var res = @unionInit(Value, @tagName(classification), try String.initCapacity(alloc, input.len));
@field(res, @tagName(classification)).appendSliceAssumeCapacity(input);
return res;
}
pub inline fn newScalar(alloc: std.mem.Allocator) Value {
return .{ .scalar = String.init(alloc) };
}
pub inline fn newString(alloc: std.mem.Allocator) Value {
return .{ .string = String.init(alloc) };
}
pub inline fn newList(alloc: std.mem.Allocator) Value {
return .{ .list = List.init(alloc) };
}
pub inline fn newFlowList(alloc: std.mem.Allocator) Value {
return .{ .flow_list = List.init(alloc) };
}
pub inline fn newMap(alloc: std.mem.Allocator) Value {
return .{ .map = Map.init(alloc) };
}
pub inline fn newFlowMap(alloc: std.mem.Allocator) Value {
return .{ .flow_map = Map.init(alloc) };
}
pub fn printDebug(self: Value) void {
self.printRecursive(0);
std.debug.print("\n", .{});
}
fn printRecursive(self: Value, indent: usize) void {
switch (self) {
.scalar, .string => |str| {
if (std.mem.indexOfScalar(u8, str.items, '\n')) |_| {
var lines = std.mem.splitScalar(u8, str.items, '\n');
std.debug.print("\n", .{});
while (lines.next()) |line| {
std.debug.print(
"{[empty]s: >[indent]}{[line]s}{[nl]s}",
.{
.empty = "",
.indent = indent,
.line = line,
.nl = if (lines.peek() == null) "" else "\n",
},
);
}
} else {
std.debug.print("{s}", .{str.items});
}
},
.list, .flow_list => |list| {
if (list.items.len == 0) {
std.debug.print("[]", .{});
return;
}
std.debug.print("[\n", .{});
for (list.items, 0..) |value, idx| {
std.debug.print("{[empty]s: >[indent]}[{[idx]d}] = ", .{ .empty = "", .indent = indent, .idx = idx });
value.printRecursive(indent + 2);
std.debug.print(",\n", .{});
}
std.debug.print(
"{[empty]s: >[indent]}]",
.{ .empty = "", .indent = indent },
);
},
.map, .flow_map => |map| {
if (map.count() == 0) {
std.debug.print("{{}}", .{});
return;
}
std.debug.print("{{\n", .{});
var iter = map.iterator();
while (iter.next()) |entry| {
std.debug.print(
"{[empty]s: >[indent]}{[key]s}: ",
.{ .empty = "", .indent = indent + 2, .key = entry.key_ptr.* },
);
entry.value_ptr.printRecursive(indent + 4);
std.debug.print(",\n", .{});
}
std.debug.print(
"{[empty]s: >[indent]}}}",
.{ .empty = "", .indent = indent },
);
},
}
}
};
pub const Parser = struct { pub const Parser = struct {
allocator: std.mem.Allocator, allocator: std.mem.Allocator,
dupe_behavior: DuplicateKeyBehavior = .fail, dupe_behavior: DuplicateKeyBehavior = .fail,
@@ -359,102 +504,6 @@ pub const Parser = struct {
fail, fail,
}; };
pub const Map = std.StringHashMap;
pub const List = std.ArrayList;
pub const Value = union(enum) {
string: std.ArrayList(u8),
list: List(Value),
map: Map(Value),
pub inline fn fromString(alloc: std.mem.Allocator, input: []const u8) !Value {
var res: Value = .{ .string = try std.ArrayList(u8).initCapacity(alloc, input.len) };
res.string.appendSliceAssumeCapacity(input);
return res;
}
pub inline fn newString(alloc: std.mem.Allocator) Value {
return .{ .string = std.ArrayList(u8).init(alloc) };
}
pub inline fn newList(alloc: std.mem.Allocator) Value {
return .{ .list = List(Value).init(alloc) };
}
pub inline fn newMap(alloc: std.mem.Allocator) Value {
return .{ .map = Map(Value).init(alloc) };
}
pub fn printDebug(self: Value) void {
self.printRecursive(0);
std.debug.print("\n", .{});
}
fn printRecursive(self: Value, indent: usize) void {
switch (self) {
.string => |str| {
if (std.mem.indexOfScalar(u8, str.items, '\n')) |_| {
var lines = std.mem.splitScalar(u8, str.items, '\n');
std.debug.print("\n", .{});
while (lines.next()) |line| {
std.debug.print(
"{[empty]s: >[indent]}{[line]s}{[nl]s}",
.{
.empty = "",
.indent = indent,
.line = line,
.nl = if (lines.peek() == null) "" else "\n",
},
);
}
} else {
std.debug.print("{s}", .{str.items});
}
},
.list => |list| {
if (list.items.len == 0) {
std.debug.print("[]", .{});
return;
}
std.debug.print("[\n", .{});
for (list.items, 0..) |value, idx| {
std.debug.print("{[empty]s: >[indent]}[{[idx]d}] = ", .{ .empty = "", .indent = indent, .idx = idx });
value.printRecursive(indent + 2);
std.debug.print(",\n", .{});
}
std.debug.print(
"{[empty]s: >[indent]}]",
.{ .empty = "", .indent = indent },
);
},
.map => |map| {
if (map.count() == 0) {
std.debug.print("{{}}", .{});
return;
}
std.debug.print("{{\n", .{});
var iter = map.iterator();
while (iter.next()) |entry| {
std.debug.print(
"{[empty]s: >[indent]}{[key]s}: ",
.{ .empty = "", .indent = indent + 2, .key = entry.key_ptr.* },
);
entry.value_ptr.printRecursive(indent + 4);
std.debug.print(",\n", .{});
}
std.debug.print(
"{[empty]s: >[indent]}}}",
.{ .empty = "", .indent = indent },
);
},
}
}
};
pub const ParseState = enum { pub const ParseState = enum {
initial, initial,
value, value,
@@ -465,6 +514,13 @@ pub const Parser = struct {
arena: std.heap.ArenaAllocator, arena: std.heap.ArenaAllocator,
root: Value, root: Value,
pub fn init(alloc: std.mem.Allocator) Document {
return .{
.arena = std.heap.ArenaAllocator.init(alloc),
.root = undefined,
};
}
pub fn printDebug(self: Document) void { pub fn printDebug(self: Document) void {
return self.root.printDebug(); return self.root.printDebug();
} }
@@ -474,11 +530,29 @@ pub const Parser = struct {
} }
}; };
pub const State = struct {
pub const Stack = std.ArrayList(*Value);
document: Document,
value_stack: Stack,
state: ParseState = .initial,
expect_shift: LineTokenizer.ShiftDirection = .none,
dangling_key: ?[]const u8 = null,
pub fn init(alloc: std.mem.Allocator) State {
return .{
.document = Document.init(alloc),
.value_stack = Stack.init(alloc),
};
}
pub fn deinit(self: State) void {
self.value_stack.deinit();
}
};
pub fn parseBuffer(self: *Parser, buffer: []const u8) Error!Document { pub fn parseBuffer(self: *Parser, buffer: []const u8) Error!Document {
var document: Document = .{ var document = Document.init(self.allocator);
.arena = std.heap.ArenaAllocator.init(self.allocator),
.root = undefined,
};
errdefer document.deinit(); errdefer document.deinit();
const arena_alloc = document.arena.allocator(); const arena_alloc = document.arena.allocator();
@@ -507,13 +581,18 @@ pub const Parser = struct {
// empty scalars are only emitted for a list_item or a map_item // empty scalars are only emitted for a list_item or a map_item
.empty => unreachable, .empty => unreachable,
.scalar => |str| { .scalar => |str| {
document.root = try valueFromString(arena_alloc, str); document.root = try Value.fromScalar(arena_alloc, str);
// this is a cheesy hack. If the document consists
// solely of a scalar, the finalizer will try to
// chop a line ending off of it, so we need to add
// a sacrificial padding character to avoid
// chopping off something that matters.
try document.root.string.append(' ');
state = .done; state = .done;
}, },
.string => |str| { .line_string, .space_string => |str| {
document.root = try valueFromString(arena_alloc, str); document.root = try Value.fromString(arena_alloc, str);
// cheesy technique for differentiating the different string types try document.root.string.append(in_line.lineEnding());
if (str[str.len - 1] != '\n') try document.root.string.append(' ');
try stack.append(&document.root); try stack.append(&document.root);
state = .value; state = .value;
}, },
@@ -527,7 +606,7 @@ pub const Parser = struct {
}, },
}, },
.list_item => |value| { .list_item => |value| {
document.root = .{ .list = List(Value).init(arena_alloc) }; document.root = Value.newList(arena_alloc);
try stack.append(&document.root); try stack.append(&document.root);
switch (value) { switch (value) {
@@ -535,8 +614,12 @@ pub const Parser = struct {
expect_shift = .indent; expect_shift = .indent;
state = .value; state = .value;
}, },
.string, .scalar => |str| { .scalar => |str| {
try document.root.list.append(try valueFromString(arena_alloc, chopNewline(str))); try document.root.list.append(try Value.fromScalar(arena_alloc, str));
state = .value;
},
.line_string, .space_string => |str| {
try document.root.list.append(try Value.fromString(arena_alloc, str));
state = .value; state = .value;
}, },
.flow_list => |str| { .flow_list => |str| {
@@ -550,7 +633,7 @@ pub const Parser = struct {
} }
}, },
.map_item => |pair| { .map_item => |pair| {
document.root = .{ .map = Map(Value).init(arena_alloc) }; document.root = Value.newMap(arena_alloc);
try stack.append(&document.root); try stack.append(&document.root);
switch (pair.val) { switch (pair.val) {
@@ -565,10 +648,16 @@ pub const Parser = struct {
dangling_key = pair.key; dangling_key = pair.key;
state = .value; state = .value;
}, },
.string, .scalar => |str| { .scalar => |str| {
// we can do direct puts here because this is // we can do direct puts here because this is
// the very first line of the document // the very first line of the document
try document.root.map.put(pair.key, try valueFromString(arena_alloc, chopNewline(str))); try document.root.map.put(pair.key, try Value.fromScalar(arena_alloc, str));
state = .value;
},
.line_string, .space_string => |str| {
// we can do direct puts here because this is
// the very first line of the document
try document.root.map.put(pair.key, try Value.fromString(arena_alloc, str));
state = .value; state = .value;
}, },
.flow_list => |str| { .flow_list => |str| {
@@ -584,9 +673,20 @@ pub const Parser = struct {
} }
}, },
.value => switch (stack.getLast().*) { .value => switch (stack.getLast().*) {
// these three states are never reachable here. flow_list and
// flow_map are parsed with a separate state machine. These
// value tyeps can only be present by themselves as the first
// line of the document, in which case the document consists
// only of that single line: this parser jumps immediately into
// the .done state, bypassing the .value state in which this
// switch is embedded.
.scalar, .flow_list, .flow_map => unreachable,
.string => |*string| { .string => |*string| {
if (line.indent == .indent) return error.UnexpectedIndent; if (line.indent == .indent)
return error.UnexpectedIndent;
if (!flop and line.indent == .dedent) { if (!flop and line.indent == .dedent) {
// kick off the last trailing space or newline
_ = string.pop(); _ = string.pop();
var dedent_depth = line.indent.dedent; var dedent_depth = line.indent.dedent;
@@ -600,9 +700,9 @@ pub const Parser = struct {
.comment => unreachable, .comment => unreachable,
.in_line => |in_line| switch (in_line) { .in_line => |in_line| switch (in_line) {
.empty => unreachable, .empty => unreachable,
.string => |str| { .line_string, .space_string => |str| {
try string.appendSlice(str); try string.appendSlice(str);
if (str[str.len - 1] != '\n') try string.append(' '); try string.append(in_line.lineEnding());
}, },
else => return error.UnexpectedValue, else => return error.UnexpectedValue,
}, },
@@ -610,14 +710,21 @@ pub const Parser = struct {
} }
}, },
.list => |*list| { .list => |*list| {
// detect that the previous item was actually empty
//
// -
// - something
//
// the first line here creates the expect_shift, but the second line
// is a valid continuation of the list despite not being indented
if (expect_shift == .indent and line.indent != .indent) if (expect_shift == .indent and line.indent != .indent)
try list.append(try valueFromString(arena_alloc, "")); try list.append(Value.newScalar(arena_alloc));
// Consider: // Consider:
// //
// - // -
// own-line scalar // own-line scalar
// - inline scalar // - inline scalar
// //
// the own-line scalar will not push the stack but the next list item will be a dedent // the own-line scalar will not push the stack but the next list item will be a dedent
if (!flop and line.indent == .dedent) { if (!flop and line.indent == .dedent) {
@@ -643,15 +750,14 @@ pub const Parser = struct {
expect_shift = .dedent; expect_shift = .dedent;
switch (in_line) { switch (in_line) {
.empty => unreachable, .empty => unreachable,
.scalar => |str| try list.append(try valueFromString(arena_alloc, str)), .scalar => |str| try list.append(try Value.fromScalar(arena_alloc, str)),
.flow_list => |str| try list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)), .flow_list => |str| try list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)),
.flow_map => |str| try list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)), .flow_map => |str| try list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)),
.string => |str| { .line_string, .space_string => |str| {
// string pushes the stack // string pushes the stack
const new_string = try appendListGetValue(list, try valueFromString(arena_alloc, str)); const new_string = try appendListGetValue(list, try Value.fromString(arena_alloc, str));
if (str[str.len - 1] != '\n') try new_string.string.append(in_line.lineEnding());
try new_string.string.append(' ');
try stack.append(new_string); try stack.append(new_string);
expect_shift = .none; expect_shift = .none;
@@ -665,7 +771,8 @@ pub const Parser = struct {
expect_shift = .none; expect_shift = .none;
switch (value) { switch (value) {
.empty => expect_shift = .indent, .empty => expect_shift = .indent,
.scalar, .string => |str| try list.append(try valueFromString(arena_alloc, chopNewline(str))), .scalar => |str| try list.append(try Value.fromScalar(arena_alloc, str)),
.line_string, .space_string => |str| try list.append(try Value.fromString(arena_alloc, str)),
.flow_list => |str| try list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)), .flow_list => |str| try list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)),
.flow_map => |str| try list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)), .flow_map => |str| try list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)),
} }
@@ -675,13 +782,14 @@ pub const Parser = struct {
if (expect_shift != .indent) if (expect_shift != .indent)
return error.UnexpectedIndent; return error.UnexpectedIndent;
const new_list = try appendListGetValue(list, .{ .list = List(Value).init(arena_alloc) }); const new_list = try appendListGetValue(list, Value.newList(arena_alloc));
try stack.append(new_list); try stack.append(new_list);
expect_shift = .none; expect_shift = .none;
switch (value) { switch (value) {
.empty => expect_shift = .indent, .empty => expect_shift = .indent,
.scalar, .string => |str| try new_list.list.append(try valueFromString(arena_alloc, chopNewline(str))), .scalar => |str| try new_list.list.append(try Value.fromScalar(arena_alloc, str)),
.line_string, .space_string => |str| try new_list.list.append(try Value.fromString(arena_alloc, str)),
.flow_list => |str| try new_list.list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)), .flow_list => |str| try new_list.list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)),
.flow_map => |str| try new_list.list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)), .flow_map => |str| try new_list.list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)),
} }
@@ -701,7 +809,7 @@ pub const Parser = struct {
if (line.indent != .indent) if (line.indent != .indent)
return error.UnexpectedValue; return error.UnexpectedValue;
const new_map = try appendListGetValue(list, .{ .map = Map(Value).init(arena_alloc) }); const new_map = try appendListGetValue(list, Value.newMap(arena_alloc));
try stack.append(new_map); try stack.append(new_map);
expect_shift = .none; expect_shift = .none;
@@ -710,7 +818,8 @@ pub const Parser = struct {
dangling_key = pair.key; dangling_key = pair.key;
expect_shift = .indent; expect_shift = .indent;
}, },
.scalar, .string => |str| try new_map.map.put(pair.key, try valueFromString(arena_alloc, chopNewline(str))), .scalar => |str| try new_map.map.put(pair.key, try Value.fromScalar(arena_alloc, str)),
.line_string, .space_string => |str| try new_map.map.put(pair.key, try Value.fromString(arena_alloc, str)),
.flow_list => |str| try new_map.map.put(pair.key, try parseFlowList(arena_alloc, str, self.dupe_behavior)), .flow_list => |str| try new_map.map.put(pair.key, try parseFlowList(arena_alloc, str, self.dupe_behavior)),
.flow_map => |str| try new_map.map.put(pair.key, try parseFlowMap(arena_alloc, str, self.dupe_behavior)), .flow_map => |str| try new_map.map.put(pair.key, try parseFlowMap(arena_alloc, str, self.dupe_behavior)),
} }
@@ -718,11 +827,18 @@ pub const Parser = struct {
} }
}, },
.map => |*map| { .map => |*map| {
// detect that the previous item was actually empty
//
// foo:
// bar: baz
//
// the first line here creates the expect_shift, but the second line
// is a valid continuation of the map despite not being indented
if (expect_shift == .indent and line.indent != .indent) { if (expect_shift == .indent and line.indent != .indent) {
try putMap( try putMap(
map, map,
dangling_key orelse return error.Fail, dangling_key orelse return error.Fail,
try valueFromString(arena_alloc, ""), Value.newScalar(arena_alloc),
self.dupe_behavior, self.dupe_behavior,
); );
dangling_key = null; dangling_key = null;
@@ -749,15 +865,15 @@ pub const Parser = struct {
switch (in_line) { switch (in_line) {
.empty => unreachable, .empty => unreachable,
.scalar => |str| try putMap(map, dangling_key.?, try valueFromString(arena_alloc, str), self.dupe_behavior), .scalar => |str| try putMap(map, dangling_key.?, try Value.fromScalar(arena_alloc, str), self.dupe_behavior),
.flow_list => |str| try putMap(map, dangling_key.?, try parseFlowList(arena_alloc, str, self.dupe_behavior), self.dupe_behavior), .flow_list => |str| try putMap(map, dangling_key.?, try parseFlowList(arena_alloc, str, self.dupe_behavior), self.dupe_behavior),
.flow_map => |str| { .flow_map => |str| {
try putMap(map, dangling_key.?, try parseFlowMap(arena_alloc, str, self.dupe_behavior), self.dupe_behavior); try putMap(map, dangling_key.?, try parseFlowMap(arena_alloc, str, self.dupe_behavior), self.dupe_behavior);
}, },
.string => |str| { .line_string, .space_string => |str| {
// string pushes the stack // string pushes the stack
const new_string = try putMapGetValue(map, dangling_key.?, try valueFromString(arena_alloc, str), self.dupe_behavior); const new_string = try putMapGetValue(map, dangling_key.?, try Value.fromString(arena_alloc, str), self.dupe_behavior);
if (str[str.len - 1] != '\n') try new_string.string.append(' '); try new_string.string.append(in_line.lineEnding());
try stack.append(new_string); try stack.append(new_string);
expect_shift = .none; expect_shift = .none;
}, },
@@ -777,14 +893,15 @@ pub const Parser = struct {
if (expect_shift != .indent or line.indent != .indent or dangling_key == null) if (expect_shift != .indent or line.indent != .indent or dangling_key == null)
return error.UnexpectedValue; return error.UnexpectedValue;
const new_list = try putMapGetValue(map, dangling_key.?, .{ .list = List(Value).init(arena_alloc) }, self.dupe_behavior); const new_list = try putMapGetValue(map, dangling_key.?, Value.newList(arena_alloc), self.dupe_behavior);
try stack.append(new_list); try stack.append(new_list);
dangling_key = null; dangling_key = null;
expect_shift = .none; expect_shift = .none;
switch (value) { switch (value) {
.empty => expect_shift = .indent, .empty => expect_shift = .indent,
.scalar, .string => |str| try new_list.list.append(try valueFromString(arena_alloc, chopNewline(str))), .scalar => |str| try new_list.list.append(try Value.fromScalar(arena_alloc, str)),
.line_string, .space_string => |str| try new_list.list.append(try Value.fromString(arena_alloc, str)),
.flow_list => |str| try new_list.list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)), .flow_list => |str| try new_list.list.append(try parseFlowList(arena_alloc, str, self.dupe_behavior)),
.flow_map => |str| try new_list.list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)), .flow_map => |str| try new_list.list.append(try parseFlowMap(arena_alloc, str, self.dupe_behavior)),
} }
@@ -798,7 +915,8 @@ pub const Parser = struct {
expect_shift = .indent; expect_shift = .indent;
dangling_key = pair.key; dangling_key = pair.key;
}, },
.scalar, .string => |str| try putMap(map, pair.key, try valueFromString(arena_alloc, chopNewline(str)), self.dupe_behavior), .scalar => |str| try putMap(map, pair.key, try Value.fromScalar(arena_alloc, str), self.dupe_behavior),
.line_string, .space_string => |str| try putMap(map, pair.key, try Value.fromString(arena_alloc, str), self.dupe_behavior),
.flow_list => |str| try putMap(map, pair.key, try parseFlowList(arena_alloc, str, self.dupe_behavior), self.dupe_behavior), .flow_list => |str| try putMap(map, pair.key, try parseFlowList(arena_alloc, str, self.dupe_behavior), self.dupe_behavior),
.flow_map => |str| try putMap(map, pair.key, try parseFlowMap(arena_alloc, str, self.dupe_behavior), self.dupe_behavior), .flow_map => |str| try putMap(map, pair.key, try parseFlowMap(arena_alloc, str, self.dupe_behavior), self.dupe_behavior),
}, },
@@ -806,7 +924,7 @@ pub const Parser = struct {
.indent => { .indent => {
if (expect_shift != .indent or dangling_key == null) return error.UnexpectedValue; if (expect_shift != .indent or dangling_key == null) return error.UnexpectedValue;
const new_map = try putMapGetValue(map, dangling_key.?, .{ .map = Map(Value).init(arena_alloc) }, self.dupe_behavior); const new_map = try putMapGetValue(map, dangling_key.?, Value.newMap(arena_alloc), self.dupe_behavior);
try stack.append(new_map); try stack.append(new_map);
dangling_key = null; dangling_key = null;
@@ -815,7 +933,8 @@ pub const Parser = struct {
expect_shift = .indent; expect_shift = .indent;
dangling_key = pair.key; dangling_key = pair.key;
}, },
.scalar, .string => |str| try new_map.map.put(pair.key, try valueFromString(arena_alloc, chopNewline(str))), .scalar => |str| try new_map.map.put(pair.key, try Value.fromScalar(arena_alloc, str)),
.line_string, .space_string => |str| try new_map.map.put(pair.key, try Value.fromString(arena_alloc, str)),
.flow_list => |str| try new_map.map.put(pair.key, try parseFlowList(arena_alloc, str, self.dupe_behavior)), .flow_list => |str| try new_map.map.put(pair.key, try parseFlowList(arena_alloc, str, self.dupe_behavior)),
.flow_map => |str| try new_map.map.put(pair.key, try parseFlowMap(arena_alloc, str, self.dupe_behavior)), .flow_map => |str| try new_map.map.put(pair.key, try parseFlowMap(arena_alloc, str, self.dupe_behavior)),
} }
@@ -837,17 +956,18 @@ pub const Parser = struct {
switch (state) { switch (state) {
.initial => switch (self.default_object) { .initial => switch (self.default_object) {
.string => document.root = .{ .string = std.ArrayList(u8).init(arena_alloc) }, .string => document.root = .{ .string = std.ArrayList(u8).init(arena_alloc) },
.list => document.root = .{ .list = List(Value).init(arena_alloc) }, .list => document.root = Value.newList(arena_alloc),
.map => document.root = .{ .map = Map(Value).init(arena_alloc) }, .map => document.root = Value.newMap(arena_alloc),
.fail => return error.EmptyDocument, .fail => return error.EmptyDocument,
}, },
.value => switch (stack.getLast().*) { .value => switch (stack.getLast().*) {
// remove the final trailing newline or space // remove the final trailing newline or space
.string => |*string| _ = string.popOrNull(), .scalar, .string => |*string| _ = string.popOrNull(),
// if we have a dangling -, attach an empty string to it // if we have a dangling -, attach an empty string to it
.list => |*list| if (expect_shift == .indent) try list.append(try valueFromString(arena_alloc, "")), .list => |*list| if (expect_shift == .indent) try list.append(Value.newScalar(arena_alloc)),
// if we have a dangling "key:", attach an empty string to it // if we have a dangling "key:", attach an empty string to it
.map => |*map| if (dangling_key) |dk| try putMap(map, dk, try valueFromString(arena_alloc, ""), self.dupe_behavior), .map => |*map| if (dangling_key) |dk| try putMap(map, dk, Value.newScalar(arena_alloc), self.dupe_behavior),
.flow_list, .flow_map => {},
}, },
.done => {}, .done => {},
} }
@@ -855,16 +975,6 @@ pub const Parser = struct {
return document; return document;
} }
inline fn chopNewline(buf: []const u8) []const u8 {
return if (buf.len > 0 and buf[buf.len - 1] == '\n') buf[0 .. buf.len - 1] else buf;
}
fn valueFromString(alloc: std.mem.Allocator, buffer: []const u8) Error!Value {
var result: Value = .{ .string = try std.ArrayList(u8).initCapacity(alloc, buffer.len) };
result.string.appendSliceAssumeCapacity(buffer);
return result;
}
fn parseFlowList(alloc: std.mem.Allocator, contents: []const u8, dupe_behavior: DuplicateKeyBehavior) Error!Value { fn parseFlowList(alloc: std.mem.Allocator, contents: []const u8, dupe_behavior: DuplicateKeyBehavior) Error!Value {
var parser = try FlowParser.initList(alloc, contents); var parser = try FlowParser.initList(alloc, contents);
defer parser.deinit(); defer parser.deinit();
@@ -879,16 +989,16 @@ pub const Parser = struct {
return try parser.parse(dupe_behavior); return try parser.parse(dupe_behavior);
} }
inline fn appendListGetValue(list: *List(Value), value: Value) Error!*Value { inline fn appendListGetValue(list: *Value.List, value: Value) Error!*Value {
try list.append(value); try list.append(value);
return &list.items[list.items.len - 1]; return &list.items[list.items.len - 1];
} }
inline fn putMap(map: *Map(Value), key: []const u8, value: Value, dupe_behavior: DuplicateKeyBehavior) Error!void { inline fn putMap(map: *Value.Map, key: []const u8, value: Value, dupe_behavior: DuplicateKeyBehavior) Error!void {
_ = try putMapGetValue(map, key, value, dupe_behavior); _ = try putMapGetValue(map, key, value, dupe_behavior);
} }
inline fn putMapGetValue(map: *Map(Value), key: []const u8, value: Value, dupe_behavior: DuplicateKeyBehavior) Error!*Value { inline fn putMapGetValue(map: *Value.Map, key: []const u8, value: Value, dupe_behavior: DuplicateKeyBehavior) Error!*Value {
const gop = try map.getOrPut(key); const gop = try map.getOrPut(key);
if (gop.found_existing) if (gop.found_existing)
@@ -948,13 +1058,10 @@ pub const Parser = struct {
}; };
pub const FlowParser = struct { pub const FlowParser = struct {
pub const Value = Parser.Value;
const FlowStackItem = struct { const FlowStackItem = struct {
value: *Value, value: *Value,
// lists need this. maps do also for keys and values. // lists need this. maps do also for keys and values.
item_start: usize = 0, item_start: usize = 0,
dangling_key: ?[]const u8 = null,
}; };
const FlowStack: type = std.ArrayList(FlowStackItem); const FlowStack: type = std.ArrayList(FlowStackItem);
@@ -1017,52 +1124,29 @@ pub const FlowParser = struct {
stack.items[stack.items.len - 1].item_start = start; stack.items[stack.items.len - 1].item_start = start;
} }
inline fn setStackDanglingKey(stack: FlowStack, key: []const u8) Error!void { inline fn popStack(self: *FlowParser) Parser.Error!ParseState {
if (stack.items.len == 0) return error.BadState; if (self.stack.popOrNull() == null)
stack.items[stack.items.len - 1].dangling_key = key; return error.BadState;
}
inline fn popStack(self: *FlowParser, idx: usize) Parser.Error!void { const parent = self.stack.getLastOrNull() orelse return .done;
const finished = self.stack.popOrNull() orelse return error.BadState;
if (finished.value.* == .list) {
// this is not valid if we are in the want_list_separator state because
// there is no trailing comma in that state
if (self.state == .want_list_item and (finished.value.list.items.len > 0 or idx > finished.item_start + 1)) return switch (parent.value.*) {
try finished.value.list.append( .flow_list => .want_list_separator,
try Parser.valueFromString(self.alloc, ""), .flow_map => .want_map_separator,
)
else if (self.state == .consuming_list_item)
try finished.value.list.append(
try Parser.valueFromString(
self.alloc,
self.buffer[finished.item_start..idx],
),
);
}
const parent = self.stack.getLastOrNull() orelse {
self.state = .done;
return;
};
switch (parent.value.*) {
.list => self.state = .want_list_separator,
.map => self.state = .want_map_separator,
else => return error.BadState, else => return error.BadState,
} };
} }
pub fn parse(self: *FlowParser, dupe_behavior: Parser.DuplicateKeyBehavior) Parser.Error!Value { pub fn parse(self: *FlowParser, dupe_behavior: Parser.DuplicateKeyBehavior) Parser.Error!Value {
// prime the stack: // prime the stack:
switch (self.state) { switch (self.state) {
.want_list_item => { .want_list_item => {
self.root = Value.newList(self.alloc); self.root = Value.newFlowList(self.alloc);
self.stack = try FlowStack.initCapacity(self.alloc, 1); self.stack = try FlowStack.initCapacity(self.alloc, 1);
self.stack.appendAssumeCapacity(.{ .value = &self.root }); self.stack.appendAssumeCapacity(.{ .value = &self.root });
}, },
.want_map_key => { .want_map_key => {
self.root = Value.newMap(self.alloc); self.root = Value.newFlowMap(self.alloc);
self.stack = try FlowStack.initCapacity(self.alloc, 1); self.stack = try FlowStack.initCapacity(self.alloc, 1);
self.stack.appendAssumeCapacity(.{ .value = &self.root }); self.stack.appendAssumeCapacity(.{ .value = &self.root });
}, },
@@ -1071,6 +1155,8 @@ pub const FlowParser = struct {
}, },
} }
var dangling_key: ?[]const u8 = null;
charloop: for (self.buffer, 0..) |char, idx| { charloop: for (self.buffer, 0..) |char, idx| {
// std.debug.print("{s} => {c}\n", .{ @tagName(self.state), char }); // std.debug.print("{s} => {c}\n", .{ @tagName(self.state), char });
switch (self.state) { switch (self.state) {
@@ -1079,15 +1165,15 @@ pub const FlowParser = struct {
',' => { ',' => {
// empty value // empty value
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
try tip.value.list.append(try Value.fromString(self.alloc, "")); try tip.value.flow_list.append(Value.newScalar(self.alloc));
tip.item_start = idx + 1; tip.item_start = idx + 1;
}, },
'{' => { '{' => {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
const new_map = try Parser.appendListGetValue( const new_map = try Parser.appendListGetValue(
&tip.value.list, &tip.value.flow_list,
Value.newMap(self.alloc), Value.newFlowMap(self.alloc),
); );
tip.item_start = idx; tip.item_start = idx;
@@ -1098,15 +1184,20 @@ pub const FlowParser = struct {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
const new_list = try Parser.appendListGetValue( const new_list = try Parser.appendListGetValue(
&tip.value.list, &tip.value.flow_list,
Value.newList(self.alloc), Value.newFlowList(self.alloc),
); );
tip.item_start = idx; tip.item_start = idx;
try self.stack.append(.{ .value = new_list, .item_start = idx + 1 }); try self.stack.append(.{ .value = new_list, .item_start = idx + 1 });
self.state = .want_list_item; self.state = .want_list_item;
}, },
']' => try self.popStack(idx), ']' => {
const finished = self.stack.getLastOrNull() orelse return error.BadState;
if (finished.value.flow_list.items.len > 0 or idx > finished.item_start)
try finished.value.flow_list.append(Value.newScalar(self.alloc));
self.state = try self.popStack();
},
else => { else => {
try setStackItemStart(self.stack, idx); try setStackItemStart(self.stack, idx);
self.state = .consuming_list_item; self.state = .consuming_list_item;
@@ -1116,14 +1207,20 @@ pub const FlowParser = struct {
',' => { ',' => {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
try tip.value.list.append( try tip.value.flow_list.append(
try Value.fromString(self.alloc, self.buffer[tip.item_start..idx]), try Value.fromScalar(self.alloc, self.buffer[tip.item_start..idx]),
); );
tip.item_start = idx + 1; tip.item_start = idx + 1;
self.state = .want_list_item; self.state = .want_list_item;
}, },
']' => try self.popStack(idx), ']' => {
const finished = self.stack.getLastOrNull() orelse return error.BadState;
try finished.value.flow_list.append(
try Value.fromScalar(self.alloc, self.buffer[finished.item_start..idx]),
);
self.state = try self.popStack();
},
else => continue :charloop, else => continue :charloop,
}, },
.want_list_separator => switch (char) { .want_list_separator => switch (char) {
@@ -1132,7 +1229,7 @@ pub const FlowParser = struct {
try setStackItemStart(self.stack, idx); try setStackItemStart(self.stack, idx);
self.state = .want_list_item; self.state = .want_list_item;
}, },
']' => try self.popStack(idx), ']' => self.state = try self.popStack(),
else => return error.BadToken, else => return error.BadToken,
}, },
.want_map_key => switch (char) { .want_map_key => switch (char) {
@@ -1143,10 +1240,10 @@ pub const FlowParser = struct {
'{', '[', '#', '>', '|', ',' => return error.BadToken, '{', '[', '#', '>', '|', ',' => return error.BadToken,
':' => { ':' => {
// we have an empty map key // we have an empty map key
try setStackDanglingKey(self.stack, ""); dangling_key = "";
self.state = .want_map_value; self.state = .want_map_value;
}, },
'}' => try self.popStack(idx), '}' => self.state = try self.popStack(),
else => { else => {
try setStackItemStart(self.stack, idx); try setStackItemStart(self.stack, idx);
self.state = .consuming_map_key; self.state = .consuming_map_key;
@@ -1155,7 +1252,7 @@ pub const FlowParser = struct {
.consuming_map_key => switch (char) { .consuming_map_key => switch (char) {
':' => { ':' => {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
tip.dangling_key = self.buffer[tip.item_start..idx]; dangling_key = self.buffer[tip.item_start..idx];
self.state = .want_map_value; self.state = .want_map_value;
}, },
@@ -1166,51 +1263,55 @@ pub const FlowParser = struct {
',' => { ',' => {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
try Parser.putMap( try Parser.putMap(
&tip.value.map, &tip.value.flow_map,
tip.dangling_key.?, dangling_key.?,
try Parser.valueFromString(self.alloc, ""), Value.newScalar(self.alloc),
dupe_behavior, dupe_behavior,
); );
dangling_key = null;
self.state = .want_map_key; self.state = .want_map_key;
}, },
'[' => { '[' => {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
const new_list = try Parser.putMapGetValue( const new_list = try Parser.putMapGetValue(
&tip.value.map, &tip.value.flow_map,
tip.dangling_key.?, dangling_key.?,
Value.newList(self.alloc), Value.newFlowList(self.alloc),
dupe_behavior, dupe_behavior,
); );
try self.stack.append(.{ .value = new_list, .item_start = idx + 1 }); try self.stack.append(.{ .value = new_list, .item_start = idx + 1 });
dangling_key = null;
self.state = .want_list_item; self.state = .want_list_item;
}, },
'{' => { '{' => {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
const new_map = try Parser.putMapGetValue( const new_map = try Parser.putMapGetValue(
&tip.value.map, &tip.value.flow_map,
tip.dangling_key.?, dangling_key.?,
Value.newMap(self.alloc), Value.newFlowMap(self.alloc),
dupe_behavior, dupe_behavior,
); );
try self.stack.append(.{ .value = new_map }); try self.stack.append(.{ .value = new_map });
dangling_key = null;
self.state = .want_map_key; self.state = .want_map_key;
}, },
'}' => { '}' => {
// the value is an empty string and this map is closed // the value is an empty string and this map is closed
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
try Parser.putMap( try Parser.putMap(
&tip.value.map, &tip.value.flow_map,
tip.dangling_key.?, dangling_key.?,
try Parser.valueFromString(self.alloc, ""), Value.newScalar(self.alloc),
dupe_behavior, dupe_behavior,
); );
try self.popStack(idx); dangling_key = null;
self.state = try self.popStack();
}, },
else => { else => {
try setStackItemStart(self.stack, idx); try setStackItemStart(self.stack, idx);
@@ -1221,21 +1322,21 @@ pub const FlowParser = struct {
',', '}' => |term| { ',', '}' => |term| {
const tip = try getStackTip(self.stack); const tip = try getStackTip(self.stack);
try Parser.putMap( try Parser.putMap(
&tip.value.map, &tip.value.flow_map,
tip.dangling_key.?, dangling_key.?,
try Parser.valueFromString(self.alloc, self.buffer[tip.item_start..idx]), try Value.fromScalar(self.alloc, self.buffer[tip.item_start..idx]),
dupe_behavior, dupe_behavior,
); );
dangling_key = null;
self.state = .want_map_key; self.state = .want_map_key;
if (term == '}') try self.popStack(idx); if (term == '}') self.state = try self.popStack();
}, },
else => continue :charloop, else => continue :charloop,
}, },
.want_map_separator => switch (char) { .want_map_separator => switch (char) {
' ', '\t' => continue :charloop, ' ', '\t' => continue :charloop,
',' => self.state = .want_map_key, ',' => self.state = .want_map_key,
'}' => try self.popStack(idx), '}' => self.state = try self.popStack(),
else => return error.BadToken, else => return error.BadToken,
}, },
// the root value was closed but there are characters remaining // the root value was closed but there are characters remaining