Compare commits

...

3 Commits

Author SHA1 Message Date
d64bf4e402
start adding the readme 2023-10-18 00:20:42 -07:00
258cf2ae83
parser: reintroduce space strings and change token parsing strategy
I don't think I have the wherewithal to write this full commit message
right now. Since it should be a long one.

Basically. `+ ` now is the string space concatenation operator because
that is a very common use case. It's essentially the soft-wrap
character.

Also, lines starting with -, +, >, and | will now try to tokenize as
map keys if they do not contain the following space. The motivation
here is numeric map keys. Specifically, +/- are numeric leaders.

To facilitate this change, own-line scalars are now prohibited. So, for
example:

    key: -1000

is still fine, but

    key:
        -1000

is no longer accepted.
2023-10-18 00:20:19 -07:00
25386ac87a
rename flow_(list|map) to inline_(list|map)
This is simply better word choice.
2023-10-18 00:07:12 -07:00
4 changed files with 278 additions and 183 deletions

60
readme.md Normal file
View File

@ -0,0 +1,60 @@
Have you ever wished someone would walk up to you and say, in a tremendously exaggerated, stereotypical surfer voice, "nice data, dude"? Well, wish no longer because now your data can be nice by definition, due to our patented Manipulative Marketing Naming technique. Introducing!
# Nice Data: There's no Escape
```nice
# this is an example of some nice data.
project:
name: nice data
description:
| A file format for storing structured data. Nice uses syntactic whitespace
+ to represent the data structure. It defines two types of data, scalars and
+ strings, which are be used to compose its two data structures, lists and maps.
>
> Nice to write, nice to read.
inspiration:
- { name: NestedText, url: https://nestedtext.org }
- { name: YAML, url: https://yaml.org }
- A fervent dislike of TOML
non-goals: [ general-purpose data serialization, world domination ]
epic freaking funny number lol: 42069580089001421337666
```
Nice Data is a format for storing structured data in a file. It's pleasant to read and adheres to the philosophy that form should match structure. It's heavily inspired by [NestedText], though it also looks similar to [YAML].
## Syntax
- structured indentation using tabs or spaces
- scalars
- strings
- line, space, and concat strings
- lists
- inline lists
- maps
- inline maps
## Restrictions
Nice documents must be encoded in valid UTF-8. They must use `LF`-only newlines (`CR` characters are forbidden). Tabs and spaces cannot be mixed for indentation. Indentation *must* adhere to a consistent quantum. Nonprinting ASCII characters are forbidden. Trailing whitespace, including lines consisting only of whitespace, is forbidden, although empty lines are permitted. Some keys and values cannot be represented.
## Philosophy
### Let the Application Interpret Data Types (Bring Your Own Schema)
An arbitrarily structured data format with strict types makes the format more work to parse and is a challenge. Numbers in JSON are represented by a sequence of ASCII characters, but they are defined by the format to represent double precision floating point numbers. Of course, it is possible to generate a numeric ASCII sequence that does not fit into a double precision floating point number. If you want to represent a 128-bit integer in JSON, you have to encode it as a string, and decode it in your application, as the format cannot accommodate it as a direct numeric value. The same is true of an RFC 3339 datetime. It's not possible for a format to account for every possible data type that an application may need, so don't bother. Users are encouraged to parse Nice documents directly into well-defined, typed structures.
Nice explicitly differentiates between bare scalars and strings so that `null` may be disambiguated and interpreted differently from `| null`.
### Simplicity over Flexibility. What You See is What You Get
Nice is not, and does not try to be, a general-purpose data serialization format. There are, in fact, many values that simply cannot be represented in nice. For example, map keys cannot start with the a variety of characters, including `#`, `-`, `>`, `+`, `|`, or whitespace, and this is a conscious design choice. In general, nice is not a format designed with consideration for being produced by a computer. While programmatic serialization is certainly possible, this reference implementation has no functionality to do so.
### There's No Need to Conquer the World
Nice has no exhaustive specification or formal grammar. The parser is handwritten, and there are pretty much guaranteed to be some strange edge cases that weren't considered when writing it. Standardization is a good thing, generally speaking, but it's not a goal here.
# The Implementation
[NestedText]: https://nestedtext.org
[YAML]: https://yaml.org

View File

@ -59,7 +59,7 @@ pub const State = struct {
},
},
.value => switch (state.value_stack.getLast().*) {
// remove the final trailing newline or space
// we have an in-progress string, finish it.
.string => |*string| string.* = try state.string_builder.toOwnedSlice(arena_alloc),
// if we have a dangling -, attach an empty scalar to it
.list => |*list| if (state.expect_shift == .indent) try list.append(Value.emptyScalar()),
@ -70,7 +70,7 @@ pub const State = struct {
Value.emptyScalar(),
options.duplicate_key_behavior,
),
.scalar, .flow_list, .flow_map => {},
.scalar, .inline_list, .inline_map => {},
},
.done => {},
}
@ -104,18 +104,18 @@ pub const State = struct {
state.document.root = try Value.fromScalar(arena_alloc, str);
state.mode = .done;
},
.line_string, .concat_string => |str| {
.line_string, .space_string, .concat_string => |str| {
state.document.root = Value.emptyString();
try state.string_builder.appendSlice(arena_alloc, str);
try state.value_stack.append(&state.document.root);
state.mode = .value;
},
.flow_list => |str| {
state.document.root = try state.parseFlow(str, .flow_list, dkb);
.inline_list => |str| {
state.document.root = try state.parseFlow(str, .inline_list, dkb);
state.mode = .done;
},
.flow_map => |str| {
state.document.root = try state.parseFlow(str, .flow_map, dkb);
.inline_map => |str| {
state.document.root = try state.parseFlow(str, .inline_map, dkb);
state.mode = .done;
},
},
@ -128,9 +128,9 @@ pub const State = struct {
switch (value) {
.empty => state.expect_shift = .indent,
.scalar => |str| try rootlist.append(try Value.fromScalar(arena_alloc, str)),
.line_string, .concat_string => |str| try rootlist.append(try Value.fromString(arena_alloc, str)),
.flow_list => |str| try rootlist.append(try state.parseFlow(str, .flow_list, dkb)),
.flow_map => |str| try rootlist.append(try state.parseFlow(str, .flow_map, dkb)),
.line_string, .space_string, .concat_string => |str| try rootlist.append(try Value.fromString(arena_alloc, str)),
.inline_list => |str| try rootlist.append(try state.parseFlow(str, .inline_list, dkb)),
.inline_map => |str| try rootlist.append(try state.parseFlow(str, .inline_map, dkb)),
}
},
.map_item => |pair| {
@ -146,22 +146,26 @@ pub const State = struct {
state.dangling_key = dupekey;
},
.scalar => |str| try rootmap.put(dupekey, try Value.fromScalar(arena_alloc, str)),
.line_string, .concat_string => |str| try rootmap.put(dupekey, try Value.fromString(arena_alloc, str)),
.flow_list => |str| try rootmap.put(dupekey, try state.parseFlow(str, .flow_list, dkb)),
.flow_map => |str| try rootmap.put(dupekey, try state.parseFlow(str, .flow_map, dkb)),
.line_string, .space_string, .concat_string => |str| try rootmap.put(dupekey, try Value.fromString(arena_alloc, str)),
.inline_list => |str| try rootmap.put(dupekey, try state.parseFlow(str, .inline_list, dkb)),
.inline_map => |str| try rootmap.put(dupekey, try state.parseFlow(str, .inline_map, dkb)),
}
},
}
},
.value => switch (state.value_stack.getLast().*) {
// these three states are never reachable here. flow_list and
// flow_map are parsed with a separate state machine. These
// these three states are never reachable here. inline_list and
// inline_map are parsed with a separate state machine. These
// value types can only be present by themselves as the first
// line of the document, in which case the document consists
// only of that single line: this parser jumps immediately into
// the .done state, bypassing the .value state in which this
// switch is embedded.
.scalar, .flow_list, .flow_map => return error.Fail,
.scalar, .inline_list, .inline_map => {
state.diagnostics.length = 1;
state.diagnostics.message = "the document contains invalid data following a single-line value";
return error.Fail;
},
.string => |*string| {
if (line.shift == .indent) {
state.diagnostics.length = 1;
@ -184,9 +188,11 @@ pub const State = struct {
.comment => unreachable,
.in_line => |in_line| switch (in_line) {
.empty => unreachable,
inline .line_string, .concat_string => |str, tag| {
inline .line_string, .space_string, .concat_string => |str, tag| {
if (tag == .line_string)
try state.string_builder.append(arena_alloc, '\n');
if (tag == .space_string)
try state.string_builder.append(arena_alloc, ' ');
try state.string_builder.appendSlice(arena_alloc, str);
},
else => {
@ -245,10 +251,14 @@ pub const State = struct {
state.expect_shift = .dedent;
switch (in_line) {
.empty => unreachable,
.scalar => |str| try list.append(try Value.fromScalar(arena_alloc, str)),
.flow_list => |str| try list.append(try state.parseFlow(str, .flow_list, dkb)),
.flow_map => |str| try list.append(try state.parseFlow(str, .flow_map, dkb)),
.line_string, .concat_string => |str| {
.scalar => {
state.diagnostics.length = 1;
state.diagnostics.message = "the document may not contain a scalar value on its own line";
return error.UnexpectedValue;
},
.inline_list => |str| try list.append(try state.parseFlow(str, .inline_list, dkb)),
.inline_map => |str| try list.append(try state.parseFlow(str, .inline_map, dkb)),
.line_string, .space_string, .concat_string => |str| {
const new_string = try appendListGetValue(list, Value.emptyString());
try state.string_builder.appendSlice(arena_alloc, str);
try state.value_stack.append(new_string);
@ -262,9 +272,9 @@ pub const State = struct {
switch (value) {
.empty => state.expect_shift = .indent,
.scalar => |str| try list.append(try Value.fromScalar(arena_alloc, str)),
.line_string, .concat_string => |str| try list.append(try Value.fromString(arena_alloc, str)),
.flow_list => |str| try list.append(try state.parseFlow(str, .flow_list, dkb)),
.flow_map => |str| try list.append(try state.parseFlow(str, .flow_map, dkb)),
.line_string, .space_string, .concat_string => |str| try list.append(try Value.fromString(arena_alloc, str)),
.inline_list => |str| try list.append(try state.parseFlow(str, .inline_list, dkb)),
.inline_map => |str| try list.append(try state.parseFlow(str, .inline_map, dkb)),
}
} else if (line.shift == .indent) {
if (state.expect_shift != .indent) return error.UnexpectedIndent;
@ -287,7 +297,7 @@ pub const State = struct {
if (state.expect_shift != .indent or line.shift != .indent) {
state.diagnostics.length = 1;
state.diagnostics.message = "the document contains an invalid map key in a list";
state.diagnostics.message = "the document contains a map item where a list item is expected";
return error.UnexpectedValue;
}
@ -344,12 +354,16 @@ pub const State = struct {
switch (in_line) {
.empty => unreachable,
.scalar => |str| try state.putMap(map, state.dangling_key.?, try Value.fromScalar(arena_alloc, str), dkb),
.flow_list => |str| try state.putMap(map, state.dangling_key.?, try state.parseFlow(str, .flow_list, dkb), dkb),
.flow_map => |str| {
try state.putMap(map, state.dangling_key.?, try state.parseFlow(str, .flow_map, dkb), dkb);
.scalar => {
state.diagnostics.length = 1;
state.diagnostics.message = "the document may not contain a scalar value on its own line";
return error.UnexpectedValue;
},
.line_string, .concat_string => |str| {
.inline_list => |str| try state.putMap(map, state.dangling_key.?, try state.parseFlow(str, .inline_list, dkb), dkb),
.inline_map => |str| {
try state.putMap(map, state.dangling_key.?, try state.parseFlow(str, .inline_map, dkb), dkb);
},
.line_string, .space_string, .concat_string => |str| {
// string pushes the stack
const new_string = try state.putMapGetValue(map, state.dangling_key.?, Value.emptyString(), dkb);
try state.string_builder.appendSlice(arena_alloc, str);
@ -371,7 +385,7 @@ pub const State = struct {
if (state.expect_shift != .indent or line.shift != .indent or state.dangling_key == null) {
state.diagnostics.length = 1;
state.diagnostics.message = "the document contains an invalid list item in a map";
state.diagnostics.message = "the document contains a list item where a map item is expected";
return error.UnexpectedValue;
}
@ -391,9 +405,9 @@ pub const State = struct {
state.dangling_key = dupekey;
},
.scalar => |str| try state.putMap(map, dupekey, try Value.fromScalar(arena_alloc, str), dkb),
.line_string, .concat_string => |str| try state.putMap(map, dupekey, try Value.fromString(arena_alloc, str), dkb),
.flow_list => |str| try state.putMap(map, dupekey, try state.parseFlow(str, .flow_list, dkb), dkb),
.flow_map => |str| try state.putMap(map, dupekey, try state.parseFlow(str, .flow_map, dkb), dkb),
.line_string, .space_string, .concat_string => |str| try state.putMap(map, dupekey, try Value.fromString(arena_alloc, str), dkb),
.inline_list => |str| try state.putMap(map, dupekey, try state.parseFlow(str, .inline_list, dkb), dkb),
.inline_map => |str| try state.putMap(map, dupekey, try state.parseFlow(str, .inline_map, dkb), dkb),
}
} else if (line.shift == .indent) {
if (state.expect_shift != .indent or state.dangling_key == null) {
@ -432,17 +446,17 @@ pub const State = struct {
const arena_alloc = state.document.arena.allocator();
var root: Value = switch (root_type) {
.flow_list => Value.newFlowList(arena_alloc),
.flow_map => Value.newFlowMap(arena_alloc),
.inline_list => Value.newFlowList(arena_alloc),
.inline_map => Value.newFlowMap(arena_alloc),
else => {
state.diagnostics.length = 1;
state.diagnostics.message = "the flow item was closed too many times";
state.diagnostics.message = "the inline map or list was closed too many times";
return error.BadState;
},
};
var pstate: FlowParseState = switch (root_type) {
.flow_list => .want_list_item,
.flow_map => .want_map_key,
.inline_list => .want_list_item,
.inline_map => .want_map_key,
else => unreachable,
};
@ -460,14 +474,14 @@ pub const State = struct {
',' => {
// empty value
const tip = try state.getStackTip();
try tip.flow_list.append(Value.emptyScalar());
try tip.inline_list.append(Value.emptyScalar());
item_start = idx + 1;
},
'{' => {
const tip = try state.getStackTip();
const new_map = try appendListGetValue(
&tip.flow_list,
&tip.inline_list,
Value.newFlowMap(arena_alloc),
);
@ -479,7 +493,7 @@ pub const State = struct {
const tip = try state.getStackTip();
const new_list = try appendListGetValue(
&tip.flow_list,
&tip.inline_list,
Value.newFlowList(arena_alloc),
);
@ -490,11 +504,11 @@ pub const State = struct {
']' => {
const finished = state.value_stack.getLastOrNull() orelse {
state.diagnostics.length = 1;
state.diagnostics.message = "the flow list was closed too many times";
state.diagnostics.message = "the inline list was closed too many times";
return error.BadState;
};
if (finished.flow_list.items.len > 0 or idx > item_start)
try finished.flow_list.append(Value.emptyScalar());
if (finished.inline_list.items.len > 0 or idx > item_start)
try finished.inline_list.append(Value.emptyScalar());
pstate = try state.popFlowStack();
},
else => {
@ -514,7 +528,7 @@ pub const State = struct {
};
const tip = try state.getStackTip();
try tip.flow_list.append(
try tip.inline_list.append(
try Value.fromScalar(arena_alloc, contents[item_start..end]),
);
item_start = idx + 1;
@ -533,10 +547,10 @@ pub const State = struct {
const finished = state.value_stack.getLastOrNull() orelse {
state.diagnostics.length = 1;
state.diagnostics.message = "the flow list was closed too many times";
state.diagnostics.message = "the inline list was closed too many times";
return error.BadState;
};
try finished.flow_list.append(
try finished.inline_list.append(
try Value.fromScalar(arena_alloc, contents[item_start..end]),
);
pstate = try state.popFlowStack();
@ -553,19 +567,24 @@ pub const State = struct {
']' => pstate = try state.popFlowStack(),
else => return {
state.diagnostics.length = 1;
state.diagnostics.message = "the document contains an invalid flow list separator";
state.diagnostics.message = "the document contains an invalid inline list separator";
return error.BadToken;
},
},
.want_map_key => switch (char) {
' ' => continue :charloop,
'\t' => return error.IllegalTabWhitespaceInLine,
// forbid these characters so that flow dictionary keys cannot start
// forbid these characters so that inline dictionary keys cannot start
// with characters that regular dictionary keys cannot start with
// (even though they're unambiguous in this specific context).
'{', '[', '#', '-', '>', '|', ',' => return {
'{', '[', '#', ',' => return {
state.diagnostics.length = 1;
state.diagnostics.message = "this document contains a flow map key that starts with an invalid character";
state.diagnostics.message = "this document contains a inline map key that starts with an invalid character";
return error.BadToken;
},
'-', '>', '+', '|' => if ((idx + 1) < contents.len and contents[idx + 1] == ' ') {
state.diagnostics.length = 1;
state.diagnostics.message = "this document contains a inline map key that starts with an invalid sequence";
return error.BadToken;
},
':' => {
@ -600,7 +619,7 @@ pub const State = struct {
',' => {
const tip = try state.getStackTip();
try state.putMap(
&tip.flow_map,
&tip.inline_map,
dangling_key.?,
Value.emptyScalar(),
dkb,
@ -613,7 +632,7 @@ pub const State = struct {
const tip = try state.getStackTip();
const new_list = try state.putMapGetValue(
&tip.flow_map,
&tip.inline_map,
dangling_key.?,
Value.newFlowList(arena_alloc),
dkb,
@ -628,7 +647,7 @@ pub const State = struct {
const tip = try state.getStackTip();
const new_map = try state.putMapGetValue(
&tip.flow_map,
&tip.inline_map,
dangling_key.?,
Value.newFlowMap(arena_alloc),
dkb,
@ -642,7 +661,7 @@ pub const State = struct {
// the value is an empty string and this map is closed
const tip = try state.getStackTip();
try state.putMap(
&tip.flow_map,
&tip.inline_map,
dangling_key.?,
Value.emptyScalar(),
dkb,
@ -669,7 +688,7 @@ pub const State = struct {
const tip = try state.getStackTip();
try state.putMap(
&tip.flow_map,
&tip.inline_map,
dangling_key.?,
try Value.fromScalar(arena_alloc, contents[item_start..end]),
dkb,
@ -689,7 +708,7 @@ pub const State = struct {
const tip = try state.getStackTip();
try state.putMap(
&tip.flow_map,
&tip.inline_map,
dangling_key.?,
try Value.fromScalar(arena_alloc, contents[item_start..end]),
dkb,
@ -706,7 +725,7 @@ pub const State = struct {
'}' => pstate = try state.popFlowStack(),
else => return {
state.diagnostics.length = 1;
state.diagnostics.message = "this document contains an invalid character instead of a flow map separator";
state.diagnostics.message = "this document contains an invalid character instead of a inline map separator";
return error.BadToken;
},
},
@ -722,7 +741,7 @@ pub const State = struct {
// we ran out of characters while still in the middle of an object
if (pstate != .done) return {
state.diagnostics.length = 1;
state.diagnostics.message = "this document contains an unterminated flow item";
state.diagnostics.message = "this document contains an unterminated inline map or list";
return error.BadState;
};
@ -747,8 +766,8 @@ pub const State = struct {
const parent = state.value_stack.getLastOrNull() orelse return .done;
return switch (parent.*) {
.flow_list => .want_list_separator,
.flow_map => .want_map_separator,
.inline_list => .want_list_separator,
.inline_map => .want_map_separator,
else => .done,
};
}

View File

@ -49,9 +49,9 @@ pub const Value = union(enum) {
scalar: String,
string: String,
list: List,
flow_list: List,
inline_list: List,
map: Map,
flow_map: Map,
inline_map: Map,
pub fn convertTo(self: Value, comptime T: type, allocator: std.mem.Allocator, options: Options) !T {
switch (@typeInfo(T)) {
@ -104,7 +104,7 @@ pub const Value = union(enum) {
// TODO: This also doesn't handle sentinels properly.
switch (self) {
.scalar, .string => |str| return if (ptr.child == u8) str else error.BadValue,
.list, .flow_list => |lst| {
.list, .inline_list => |lst| {
var result = try std.ArrayList(ptr.child).initCapacity(allocator, lst.items.len);
errdefer result.deinit();
for (lst.items) |item| {
@ -138,7 +138,7 @@ pub const Value = union(enum) {
return result;
} else return error.BadValue;
},
.list, .flow_list => |lst| {
.list, .inline_list => |lst| {
var storage = try std.ArrayList(arr.child).initCapacity(allocator, arr.len);
defer storage.deinit();
for (lst.items) |item| {
@ -158,7 +158,7 @@ pub const Value = union(enum) {
if (stt.is_tuple) {
switch (self) {
.list, .flow_list => |list| {
.list, .inline_list => |list| {
if (list.items.len != stt.fields.len) return error.BadValue;
var result: T = undefined;
inline for (stt.fields, 0..) |field, idx| {
@ -171,7 +171,7 @@ pub const Value = union(enum) {
}
switch (self) {
.map, .flow_map => |map| {
.map, .inline_map => |map| {
var result: T = undefined;
if (options.ignore_extra_fields) {
@ -232,7 +232,7 @@ pub const Value = union(enum) {
if (unn.tag_type == null) @compileError("Cannot deserialize into untagged union " ++ @typeName(T));
switch (self) {
.map, .flow_map => |map| {
.map, .inline_map => |map| {
// a union may not ever be deserialized from a map with more than one value
if (map.count() != 1) return error.BadValue;
const key = map.keys()[0];
@ -289,7 +289,7 @@ pub const Value = union(enum) {
}
pub inline fn newFlowList(alloc: std.mem.Allocator) Value {
return .{ .flow_list = List.init(alloc) };
return .{ .inline_list = List.init(alloc) };
}
pub inline fn newMap(alloc: std.mem.Allocator) Value {
@ -297,21 +297,21 @@ pub const Value = union(enum) {
}
pub inline fn newFlowMap(alloc: std.mem.Allocator) Value {
return .{ .flow_map = Map.init(alloc) };
return .{ .inline_map = Map.init(alloc) };
}
pub fn recursiveEqualsExact(self: Value, other: Value) bool {
if (@as(TagType, self) != other) return false;
switch (self) {
inline .scalar, .string => |str, tag| return std.mem.eql(u8, str, @field(other, @tagName(tag))),
inline .list, .flow_list => |lst, tag| {
inline .list, .inline_list => |lst, tag| {
const olst = @field(other, @tagName(tag));
if (lst.items.len != olst.items.len) return false;
for (lst.items, olst.items) |this, that| if (!this.recursiveEqualsExact(that)) return false;
return true;
},
inline .map, .flow_map => |map, tag| {
inline .map, .inline_map => |map, tag| {
const omap = @field(other, @tagName(tag));
if (map.count() != omap.count()) return false;
@ -355,7 +355,7 @@ pub const Value = union(enum) {
std.debug.print("{s}", .{str});
}
},
.list, .flow_list => |list| {
.list, .inline_list => |list| {
if (list.items.len == 0) {
std.debug.print("[]", .{});
return;
@ -372,7 +372,7 @@ pub const Value = union(enum) {
.{ .empty = "", .indent = indent },
);
},
.map, .flow_map => |map| {
.map, .inline_map => |map| {
if (map.count() == 0) {
std.debug.print("{{}}", .{});
return;

View File

@ -23,10 +23,11 @@ pub const InlineItem = union(enum) {
empty: void,
scalar: []const u8,
line_string: []const u8,
space_string: []const u8,
concat_string: []const u8,
flow_list: []const u8,
flow_map: []const u8,
inline_list: []const u8,
inline_map: []const u8,
};
pub const LineContents = union(enum) {
@ -162,104 +163,113 @@ pub fn LineTokenizer(comptime Buffer: type) type {
// this should not be possible, as empty lines are caught earlier.
if (line.len == 0) return error.Impossible;
switch (line[0]) {
'#' => {
// force comments to be followed by a space. This makes them
// behave the same way as strings, actually.
if (line.len > 1 and line[1] != ' ') {
self.buffer.diag().line_offset += 1;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line is missing a space after the start of comment character '#'";
return error.BadToken;
}
// simply lie about indentation when the line is a comment.
quantized = self.last_indent;
return .{
.shift = .none,
.contents = .{ .comment = line[1..] },
.raw = line,
};
},
'|', '>', '[', '{' => {
return .{
.shift = shift,
.contents = .{ .in_line = try self.detectInlineItem(line) },
.raw = line,
};
},
'-' => {
if (line.len > 1 and line[1] != ' ') {
self.buffer.diag().line_offset += 1;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line is missing a space after the list entry character '-'";
return error.BadToken;
}
// blindly add 2 here because an empty item cannot fail in
// the value, only if a bogus dedent has occurred
self.buffer.diag().line_offset += 2;
return if (line.len == 1) .{
.shift = shift,
.contents = .{ .list_item = .empty },
.raw = line,
} else .{
.shift = shift,
.contents = .{ .list_item = try self.detectInlineItem(line[2..]) },
.raw = line,
};
},
else => {
for (line, 0..) |char, idx| {
if (char == ':') {
if (idx > 0 and (line[idx - 1] == ' ' or line[idx - 1] == '\t')) {
self.buffer.diag().line_offset += idx - 1;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line contains space before the map key-value separator character ':'";
return error.TrailingWhitespace;
}
if (idx + 1 == line.len) {
self.buffer.diag().line_offset += idx + 1;
return .{
.shift = shift,
.contents = .{ .map_item = .{ .key = line[0..idx], .val = .empty } },
.raw = line,
};
}
if (line[idx + 1] != ' ') {
self.buffer.diag().line_offset += idx + 1;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line is missing a space after the map key-value separator character ':'";
return error.BadToken;
}
return .{
.shift = shift,
.contents = .{ .map_item = .{
.key = line[0..idx],
.val = try self.detectInlineItem(line[idx + 2 ..]),
} },
.raw = line,
};
sigil: {
switch (line[0]) {
'#' => {
// Force comments to be followed by a space. We could
// allow #: to be interpreted as a map key, but I'm going
// to specifically forbid it instead.
if (line.len > 1 and line[1] != ' ') {
self.buffer.diag().line_offset += 1;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line is missing a space after the start of comment character '#'";
return error.BadToken;
}
}
return .{
.shift = shift,
.contents = .{ .in_line = .{ .scalar = line } },
.raw = line,
};
},
// simply lie about indentation when the line is a comment.
quantized = self.last_indent;
return .{
.shift = .none,
.contents = .{ .comment = line[1..] },
.raw = line,
};
},
'|', '>', '+' => {
if (line.len > 1 and line[1] != ' ') {
// we want to try parsing this as a map key
break :sigil;
}
return .{
.shift = shift,
.contents = .{ .in_line = try self.detectInlineItem(line) },
.raw = line,
};
},
'[', '{' => {
// these don't require being followed by a space, so they
// cannot be interpreted as starting a map key in any way.
return .{
.shift = shift,
.contents = .{ .in_line = try self.detectInlineItem(line) },
.raw = line,
};
},
'-' => {
if (line.len > 1 and line[1] != ' ') {
// we want to try parsing this as a map key
break :sigil;
}
// blindly add 2 here because an empty item cannot fail in
// the value, only if a bogus dedent has occurred
self.buffer.diag().line_offset += 2;
return if (line.len == 1) .{
.shift = shift,
.contents = .{ .list_item = .empty },
.raw = line,
} else .{
.shift = shift,
.contents = .{ .list_item = try self.detectInlineItem(line[2..]) },
.raw = line,
};
},
else => break :sigil,
}
}
// somehow everything else has failed
self.buffer.diag().line_offset = 0;
self.buffer.diag().length = raw_line.len;
self.buffer.diag().message = "this document contains an unknown error. Please report this.";
return error.Impossible;
for (line, 0..) |char, idx| {
if (char == ':') {
if (idx > 0 and (line[idx - 1] == ' ' or line[idx - 1] == '\t')) {
self.buffer.diag().line_offset += idx - 1;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line contains space before the map key-value separator character ':'";
return error.TrailingWhitespace;
}
if (idx + 1 == line.len) {
self.buffer.diag().line_offset += idx + 1;
return .{
.shift = shift,
.contents = .{ .map_item = .{ .key = line[0..idx], .val = .empty } },
.raw = line,
};
}
if (line[idx + 1] != ' ') {
self.buffer.diag().line_offset += idx + 1;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line is missing a space after the map key-value separator character ':'";
return error.BadToken;
}
return .{
.shift = shift,
.contents = .{ .map_item = .{
.key = line[0..idx],
.val = try self.detectInlineItem(line[idx + 2 ..]),
} },
.raw = line,
};
}
}
return .{
.shift = shift,
.contents = .{ .in_line = .{ .scalar = line } },
.raw = line,
};
}
return null;
}
@ -281,8 +291,12 @@ pub fn LineTokenizer(comptime Buffer: type) type {
};
switch (buf[start]) {
'>', '|' => |char| {
if (buf.len - start > 1 and buf[start + 1] != ' ') return error.BadToken;
'>', '|', '+' => |char| {
if (buf.len - start > 1 and buf[start + 1] != ' ') {
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line is missing a space after the string start character";
return error.BadToken;
}
const slice: []const u8 = switch (buf[buf.len - 1]) {
' ', '\t' => {
@ -295,32 +309,34 @@ pub fn LineTokenizer(comptime Buffer: type) type {
else => buf[start + @min(2, buf.len - start) .. buf.len],
};
return if (char == '>')
.{ .line_string = slice }
else
.{ .concat_string = slice };
return switch (char) {
'>' => .{ .line_string = slice },
'+' => .{ .space_string = slice },
'|' => .{ .concat_string = slice },
else => unreachable,
};
},
'[' => {
if (buf.len - start < 2 or buf[buf.len - 1] != ']') {
self.buffer.diag().line_offset = 0;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line contains a flow-style list but does not end with the closing character ']'";
self.buffer.diag().message = "this line contains a inline list but does not end with the closing character ']'";
return error.BadToken;
}
// keep the closing ] for the flow parser
return .{ .flow_list = buf[start + 1 ..] };
// keep the closing ] for the inline parser
return .{ .inline_list = buf[start + 1 ..] };
},
'{' => {
if (buf.len - start < 2 or buf[buf.len - 1] != '}') {
self.buffer.diag().line_offset = 0;
self.buffer.diag().length = 1;
self.buffer.diag().message = "this line contains a flow-style map but does not end with the closing character '}'";
self.buffer.diag().message = "this line contains a inline map but does not end with the closing character '}'";
return error.BadToken;
}
// keep the closing } fpr the flow parser
return .{ .flow_map = buf[start + 1 ..] };
// keep the closing } for the inline parser
return .{ .inline_map = buf[start + 1 ..] };
},
else => {
if (buf[buf.len - 1] == ' ' or buf[buf.len - 1] == '\t') {