Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relative tokenizer #14270

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

lukaszsamson
Copy link
Contributor

This is an attempt at making the tokenizer produce relative tokens. My intention is to enable building an incremental parsers better suited to LSP use case. With relative tokenization and parsing it may be possible to build a parser that does not need to rebuild the entire AST on each document edit.

Design choices:

  • In relative mode token line and column represents a difference from the last token position
  • In case of interpolated binaries, charlists and sigils the parts are relative to the position of the token itself
  • Inside interpolation the positions are relative to the begin of interpolation #{

The current state:

  • the relative mode produces valid relative tokens that after conversion to absolute via :elixir_tokenizer.to_absolute_tokens are identical to the ones produced by absolute mode. I verified this over elixir source as well as a number of other projects.
  • all tests pass
  • parser and errors tests return the same tokens and errors/warnings

Examples:

iex(2)> :elixir_tokenizer.tokenize(~c'fun(x + 1)', 1, 1, [mode: :absolute]) |> elem(4) |> Enum.reverse
[
  {:paren_identifier, {1, 1, ~c"fun"}, :fun},
  {:"(", {1, 4, nil}},
  {:identifier, {1, 5, ~c"x"}, :x},
  {:dual_op, {1, 7, nil}, :+},
  {:int, {1, 9, 1}, ~c"1"},
  {:")", {1, 10, nil}}
]
iex(3)> :elixir_tokenizer.tokenize(~c'fun(x + 1)', 1, 1, [mode: :relative]) |> elem(4) |> Enum.reverse
[
  {:paren_identifier, {0, 0, ~c"fun"}, :fun},
  {:"(", {0, 3, nil}},
  {:identifier, {0, 1, ~c"x"}, :x},
  {:dual_op, {0, 2, nil}, :+},
  {:int, {0, 2, 1}, ~c"1"},
  {:")", {0, 1, nil}}
]
iex(7)> :elixir_tokenizer.tokenize(~c'"\#{fun(x + 1)}" <> ""', 1, 1, [mode: :absolute]) |> elem(4) |> Enum.reverse
[
  {:bin_string, {1, 1, nil},
   [
     {{1, 2, nil}, {1, 14, nil},
      [
        {:paren_identifier, {1, 4, ~c"fun"}, :fun},
        {:"(", {1, 7, nil}},
        {:identifier, {1, 8, ~c"x"}, :x},
        {:dual_op, {1, 10, nil}, :+},
        {:int, {1, 12, 1}, ~c"1"},
        {:")", {1, 13, nil}}
      ]}
   ]},
  {:concat_op, {1, 17, nil}, :<>},
  {:bin_string, {1, 20, nil}, [""]}
]
iex(8)> :elixir_tokenizer.tokenize(~c'"\#{fun(x + 1)}" <> ""', 1, 1, [mode: :relative]) |> elem(4) |> Enum.reverse
[
  {:bin_string, {0, 0, nil},
   [
     {{0, 1, nil}, {0, 13, nil},
      [
        {:paren_identifier, {0, 3, ~c"fun"}, :fun},
        {:"(", {0, 3, nil}},
        {:identifier, {0, 1, ~c"x"}, :x},
        {:dual_op, {0, 2, nil}, :+},
        {:int, {0, 2, 1}, ~c"1"},
        {:")", {0, 1, nil}}
      ]}
   ]},
  {:concat_op, {0, 16, nil}, :<>},
  {:bin_string, {0, 3, nil}, [""]}
]

@josevalim
Copy link
Member

I have been thinking about this. Couldn't this be implemented by doing a later pass on the tokens or the AST that computes the difference and relative positions? Or perhaps we include more information on the metadata so it can be done by a later pass?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants