Implement greedy name parsing by skaupper · Pull Request #2 · Schottkyc137/rust_hdl

skaupper · 2025-01-20T08:35:31Z

I implemented a first version of name parsing.

As already discussed through Matrix chat, this first implementation cannot fully differentiate some constructs and therefore clumps some tokens together in a generic Internal(...Tokens) node. This can be improved upon, but for simple names (especially selected names) it should work just fine.

Also, I added some test to show how the resulting structure may look like. The resulting SyntaxNode structure and naming is up for debate.

I implemented some (but definitely not all) of the name related production rules, like association_list, range and signature.

The big TODO is parsing of expression and subtype_indication. If there is a way to differentiate between these two on the fly, we could maybe avoid doing a second pass to parse Internal(...) nodes.

Schottkyc137

This looks really good! I feel like we're moving big steps into the right direction with this PR.
I have left some comments about what I think about some stuff you implemented, but most is up for discussion. Just let me know what you think.

skaupper · 2025-01-29T13:12:39Z

I think I got all your comments. To better support bounded lookaheads I introduced a token_index field to the parser.

Calling the unbounded Parser::actual_part now expects the whole token stream to be part of the production, which is not the case if it is followed by a token like RightPar. But I guess that is to be expected, when you call a normally bounded function with usize::MAX 🤷

Let me know what you think.

(We can also move the token_index field to the NodeBuilder. Other than saving a few self.token_index += 1; I don't see the benefit, though.)

skaupper · 2025-02-03T10:29:09Z

@Schottkyc137 Not sure if you missed it, or just didn't have time yet but here is a ping anyway :)

Schottkyc137 · 2025-02-03T10:30:47Z

Sorry about the delay, it's the latter :/
I'll try to review your changes today.

skaupper · 2025-02-03T10:32:47Z

Oh okay. Sorry for the ping then 😅
Take your time.

skaupper · 2025-02-04T10:12:39Z

I haven't really considered it until now, but providing a bounded variant of Parser::name is probably a good idea as well. That way we could work around its greediness in some situations.

I scanned through the LRM and found a few grammar rules which might benefit from it: entity_aspect, function_call, procedure_call, instantiated_unit, actual_part, formal_part, group_declaration

While extracting the trailing parenthesized list from a name might not be a lot of trouble during analyzation, Parser::name wouldn't really parse it, but pack it in a RawTokens node and that wouldn't be convenient at all.

Do you see anything against it?

Schottkyc137 · 2025-02-04T22:05:50Z

I think I got all your comments. To better support bounded lookaheads I introduced a token_index field to the parser.

Calling the unbounded Parser::actual_part now expects the whole token stream to be part of the production, which is not the case if it is followed by a token like RightPar. But I guess that is to be expected, when you call a normally bounded function with usize::MAX 🤷

Let me know what you think.

(We can also move the token_index field to the NodeBuilder. Other than saving a few self.token_index += 1; I don't see the benefit, though.)

Could you motivate for me why a separate token_index is needed and cannot be replaced with the amount of lookahead needed? I.e., why can't you just say 'lookahead for N tokens' instead of saying 'lookahead until index x'?
I think I'm sure I'm missing something obvious here, but couldn't find it.

Schottkyc137 · 2025-02-04T22:09:50Z

I haven't really considered it until now, but providing a bounded variant of Parser::name is probably a good idea as well. That way we could work around its greediness in some situations.

I scanned through the LRM and found a few grammar rules which might benefit from it: entity_aspect, function_call, procedure_call, instantiated_unit, actual_part, formal_part, group_declaration

While extracting the trailing parenthesized list from a name might not be a lot of trouble during analyzation, Parser::name wouldn't really parse it, but pack it in a RawTokens node and that wouldn't be convenient at all.

Do you see anything against it?

So far I don't see anything against it. It should also be easy enough to revert, if the benefit isn't there. But for the examples you provided, this could make sense.

skaupper · 2025-02-05T07:19:08Z

Could you motivate for me why a separate token_index is needed and cannot be replaced with the amount of lookahead needed?

It could be done either way, really. Using an absolute token index as bound instead of a relative token length is less housekeeping if you need to use the bound more than once.

While implementing association_list_bounded, association_element_bounded and actual_part_bounded I ran into the problem, that the caller of association_list_bounded determines the bound (the index/distance to a trailing RightPar or something) and every time a token is parsed (Parser::expect_token etc.) this bound has to be corrected.

If that's not clear enough, take a look at association_element_bounded. If there is a formal_part, the bound for the subsequent actual_part_bounded call will have to be corrected by the amount of tokens, this formal_part consists of (+1 because of the right arrow).

I figured using absolute indices avoids a lot of headache :)

So far I don't see anything against it.

Perfect. I will give it a go then.

Schottkyc137 · 2025-02-05T16:40:25Z

Could you motivate for me why a separate token_index is needed and cannot be replaced with the amount of lookahead needed?

It could be done either way, really. Using an absolute token index as bound instead of a relative token length is less housekeeping if you need to use the bound more than once.

I see, this makes sense to me now. Generally, I prefer not adding more fields to the Parser, but in this case it should be fine. Still, I'd prefer if this was in the builder, since this is also where the rel_offset and text_offset reside, which serve a similar purpose. The interface is also a lot clearer, so the chance of accidentally forgetting to add a token_index + 1 or similar in some function is reduced.

…ead functions

skaupper · 2025-02-13T15:08:29Z

I think I got all your comments worked in and I am happy with it.
Do you see anything blocking a merge?

Schottkyc137 · 2025-02-13T20:55:27Z

I think I got all your comments worked in and I am happy with it. Do you see anything blocking a merge?

Nope, I think this is in a pretty good state now. This truly brings this project forward; name parsing was one of the things I was a bit worried about. Thanks a lot for putting in the work!

Implement greedy name parsing

221349e

skaupper commented Jan 20, 2025

View reviewed changes

Comment thread vhdl_syntax/src/parser/productions/names.rs Outdated

skaupper commented Jan 20, 2025

View reviewed changes

Comment thread vhdl_syntax/src/parser/productions/names.rs

skaupper mentioned this pull request Jan 24, 2025

Type declaration parsing #3

Merged

Schottkyc137 reviewed Jan 25, 2025

View reviewed changes

skaupper force-pushed the name-parsing branch from 4c60dbb to 566c862 Compare January 29, 2025 13:10

Improve (bounded) lookahead and simplify parsing of ambiguous names

15b2594

skaupper force-pushed the name-parsing branch from 566c862 to 15b2594 Compare February 3, 2025 13:34

Schottkyc137 reviewed Feb 4, 2025

View reviewed changes

Comment thread vhdl_syntax/src/parser/productions/names.rs

Schottkyc137 reviewed Feb 4, 2025

View reviewed changes

Comment thread vhdl_syntax/src/parser/productions/names.rs Outdated

Move token_index to NodeBuilder and improve error handling for lookah…

90ddc81

…ead functions

Schottkyc137 merged commit 7034996 into Schottkyc137:syntax-crate Feb 13, 2025

Conversation

skaupper commented Jan 20, 2025

Uh oh!

Uh oh!

Uh oh!

Schottkyc137 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

skaupper commented Jan 29, 2025

Uh oh!

skaupper commented Feb 3, 2025

Uh oh!

Schottkyc137 commented Feb 3, 2025

Uh oh!

skaupper commented Feb 3, 2025

Uh oh!

skaupper commented Feb 4, 2025

Uh oh!

Uh oh!

Uh oh!

Schottkyc137 commented Feb 4, 2025

Uh oh!

Schottkyc137 commented Feb 4, 2025

Uh oh!

skaupper commented Feb 5, 2025

Uh oh!

Schottkyc137 commented Feb 5, 2025

Uh oh!

skaupper commented Feb 13, 2025

Uh oh!

Schottkyc137 commented Feb 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants