This changes the parse/callback flow so that subcommands are run iteratively from the base command rather than recursively. The primary advantages of this approach are some stack space savings and much less convoluted backtraces for deeply nested command hierarchies. The overall order of operations has not changed, i.e. the full command line is parsed before command callback dispatch starts. Fixes: #12