Initial draft of a public API to navigate and manipulate VHDL syntax by Schottkyc137 · Pull Request #1 · Schottkyc137/rust_hdl

Schottkyc137 · 2025-01-01T17:38:06Z

No description provided.

This is opposed to the previous implementation where the leading trivia would only return the immediate leading trivia stored in the token, which could be confusing. To get the trivia between two tokens, the user would have to manually extract leading and trailing trivias for the current and the previous resp. next token.

…ead functions

Implement greedy name parsing

… into syntax-crate

Schottkyc137 · 2026-01-03T12:21:53Z

Finally had some more time to work on this and I think it's looking quite good. The current status is that the new vhdl_syntax crate is now capable of parsing and re-emitting every VHDL file in the example project (except for one file that uses PSL syntax which is still not supported).

There are also a bunch of other changes - like a new snapshot-based testing system which simplifies writing tests a bunch, optional serde integration for AST dumping (there is now also the vhdl-dump-ast crate + binary that uses these capabilities) and a working parse -> syntax tree -> dump roundtrip that preserves all nodes.

I think that this might be ready to be merged into the main branch quite soon. There is still a bit of clean-up to do and of course lots of testing.

@skaupper do have any further comments on this? I think requesting a full review from you is unreasonable as this is a gigantic PR with > 300 files changed - but if you have any high level comments I'd appreciate them!

skaupper · 2026-01-09T14:37:25Z

Being able to parse the example project is already pretty good.

If I see that correctly the parser still uses the hand coded productions with the generated syntax nodes being the interface to the user together with the AstNode trait. I like how these user facing nodes are generated from a far more readable format.

One thing I am curious about: The generated nodes do not distinguish between mandatory and optional tokens or am I missing something? E.g. ContextDeclarationSyntax::context_token returns an Option<SyntaxToken> which does not feel right. I think a return type of SyntaxToken with a panic (and a proper error message) if the token is not found feels better. In the end the underlying productions must assure that the mandatory tokens are there, otherwise it would be a bug anyways.

As far as I see error reporting is supported in a basic form. Do you have plans for error recovery as well?

In some of the examples you mention that the parser API is not finalized. Do you want to further improve that after merging it or is that part of the cleanup?

Schottkyc137 · 2026-01-09T21:29:09Z

Thanks for the feedback! I really appreciate it!

The fact that all AstNodes return an Option is on purpose, and it's related to error tolerant parsing.
One of the goals of this library is to represent source trees in a lossless form - no matter the errors they contain. As a consequence, the AST must support this as well. The fact that the vhdl_lang implementation is not capable of doing this to the extend that this crate is able to is a pain point for lots of implementations like completions or fault-tolerant analysis.

That being said, I agree that the current design makes certain applications very un-ergonomic, especially if a consumer only wants to accept correct source code.
I have thought about how to handle this and have come up with two solutions (that other, similar frameworks also use):

Have a secondary "Correct" API (called CorrectAst for the sake of simplicity). You could transform an AstNode into a Result<CorrectAstNode, ErrorKind> where all fields are checked recursively.
Make all Tokens and non-optional fields actually non-optional and insert fake tokens instead. This is what SwiftSyntax uses if I'm not mistaken and it has some advantages. Each "virtual token" could be tagged whether it is actually present in the source code or synthetic and consumers can deal with that information however they like.
The disadvantage imo is from an API perspective. Conceptually, there is not much of a difference between a "missing token" (I.e., Option<SyntaxToken>) and a "token that is declared missing" (I.e., a token where the presence is tagged), except that the former declares the intent more directly. If you don't know anything about the API, the functions will at least tell you that a token or node can be missing. A tagged token has a high chance of missuse.

So I think I prefer option 1, if any. It's a you-get-what-you-pay-for solution, can easily be generated from the YAML files and can has an intuitive API.
But imo it still remains to be seen whether it's even needed. With the current node API, lots of use-cases that I can think of can already be accomplished (linter, analyzer, formatter). Other use-cases are a bit more annoying, but also feasible.
I'm absolutely not sure about this point, so I'm happy to hear your opinion! Do you prefer Option 1, 2, or any other option?

Concerning the finalization of the parser API: This is what I plan to do after merging. I think that at some point done is good enough and a 40k merge request is already tons of changes. The only thing I want to avoid is merging something this big and after a couple of weeks / months decide that it needs to be refactored. That would defeat the whole point.

The only thing I'm currently considering is whether the actual AST nodes need a little refactoring. I have started to build a linter as a use-case for the library because it is something I had not thought about in the first place (and I'd also appreciate a good VHDL linter;). Refactoring ArchitectureBody from Keyword(Architecture), Identifier, Keyword(Of), Name, Keyword(Is), ... into ArchitectureHeader, ArchitectureDeclarativePart, ... has made the job easier and I hypothesize other tools would appreciate that as well. But I'm still not sure if it's worth the effort. If you're interested I can push the branch - but it's quite bare-bones at the moment.

skaupper · 2026-01-12T08:42:34Z

Okay I get where you want to go with the AST now. Even if you would like to accept incorrect code, how should that work? I've toyed around with the example given in linting.rs and by changing the VHDL code slightly the parser fell apart (notice the missing is keyword in the first line):

entity baz
    port (
        a: in std_ulogic
    );
end entity baz;

entity foo is
end entity bar;

To be fair, error recovery is still a TODO according to Parser::expect_token.

Lets say the parser can recover from an error by synchronizing to an end; or something. You'd have at least one SyntaxNode which is at best incomplete and at worst completely erroneous. Isn't it sufficient to flag the node itself as erroneous instead of the tokens returned by the query functions? The information about errors shouldn't be too hard to add since Parser::expect_token already generates diagnostics. Users then can either ignore the node altogether or try making sense of it with .raw().tokens() etc. I would even consider calling query functions on an erroneous node as an error because the results can be misleading in the worst case. This approach would probably also benefit from the introduction of further intermediate nodes like the example you mentioned with ArchitectureHeader, ArchitectureDeclarativePart, ....

Schottkyc137 · 2026-01-12T21:07:23Z

Even if you would like to accept incorrect code, how should that work?

Currently, the scope of this library is to parse correct code (tbh that's hard enough :D).
But eventually, the parser should be capable of error handling (I'd like to say "better error handling", but at the moment errors like you mention will break the entire code). I will attempt to lay out how that would look like:
For your example, say that the is token is missing. One possible error recovery strategy is to check the next token; if it is the port or generic token – good, just do not emit the is token. If not, eat the token hoping that the user just inserted some bogus input.
There are other, simple case that the parser currently is not capable of dealing with like seeing a : token instead of a ; token, an identifier archtecture instead of architecture, e.t.c.
In general, the idea is that the parser should try to attempt to get the best match based on the user's input.

You'd have at least one SyntaxNode which is at best incomplete and at worst completely erroneous

I will try to motivate the need for a structure (AST / CST / SyntaxNodes) that support having incomplete or erroneous nodes and tokens, and it's the applications:

A language server must be able to deal with broken trees. At the point where you are typing code, the tree is broken:
```
a <= b and c.
```
From the standpoint of a language, the statement above is horribly miss-formed. There is at least a semicolon missing and nothing after a dot (one would expect a selected name). But this is how you usually type code, and users expect language servers to generate suggestions at this place. To enable this case, it's extremely valuable to have a parser that can parse most of this statement into any kind of SignalAssignment.
Good formatters can also deal with incorrect and incomplete code. This can be valuable to pin-point where an error is, if the issue is obstructed, i.e., from copy-paste code.
Linters should also have certain error recovery capabilities. You wouldn't want a linter that stops at the first error because you would need to run it again and again, once for every error in the source file.

Users then can either ignore the node altogether or try making sense of it with .raw().tokens() etc.

This raises the issue that you would have to re-implement complex recovery logic instead of at one central place.
Even if one provides a secondary API to do error recovery after emitting an erroneous token, the parser will still need to implement some kind of error recovery itself if it is to be used for any of the previous use-cases, so you can't really only outsource that to some other API

Isn't it sufficient to flag the node itself as erroneous instead of the tokens returned by the query functions?

I believe that this is the critical point: At what granularity does the AST allow erroneous nodes? I believe that the answer should be either at every possible position or at none. The case "at every possible position" is used for the examples above and the case "none" is used for documentation generators, test bench extractors, and in general code that expects to work with correct VHDL code.
What you describe seems to me like a chimera between the two worlds: Some kind of errors are allowed in the AST, but not everywhere.

Currently implemented is the "errors are possible everywhere" API whereas what's still missing is the "everything is correct (or there are diagnostics)" API.

Schottkyc137 · 2026-01-31T15:55:30Z

Now part of the main repo

Schottkyc137 added 2 commits January 1, 2025 18:36

Initial draft of a public API to navigate and manipulate VHDL syntax

59316cc

Enable parsing of simple entity headers

f551bd9

skaupper reviewed Jan 3, 2025

View reviewed changes

Comment thread vhdl_syntax/src/tokens/tokenizer.rs Outdated

Schottkyc137 and others added 27 commits January 3, 2025 14:54

Fix: use `skip instead of next

7ba2c59

Enable an automatic way to generate the syntax tokens

8701a33

Enable token rewriting

9910b46

Fix the offsets; enable cloning with a new token

3e4e694

Add productions for name parsing to documentation

ad88ec9

Add Alias Declaration

5ae8e7e

Add Attribute Specifications

0e16152

Add component declarations

fbf2646

Start parsing block statements

fad5d50

Restructure productions into own module

2810749

Enable Token interning

a4c3f07

Implement greedy name parsing

221349e

Add several more productions

43e0639

Add mode declarations

0e798be

Concurrent statements

5331ebc

add more concurrent statements

00ef775

Merge branch 'master' into syntax-crate

2d1d4c8

Improve (bounded) lookahead and simplify parsing of ambiguous names

15b2594

Move token_index to NodeBuilder and improve error handling for lookah…

90ddc81

…ead functions

Merge pull request #2 from skaupper/name-parsing

7034996

Implement greedy name parsing

Merge branch 'syntax-crate' of https://github.com/Schottkyc137/rust_hdl…

aa4251c

… into syntax-crate

Add attribute declaration

355ffc4

Parse expressions

2280dbe

sequential statements; tests

b799ef3

Parse next and exit statements

da273f2

Parse loop, case and if statement

df2dfdf

Schottkyc137 added 3 commits January 2, 2026 16:30

Improve Latin1 API

61eb747

Consequent use of Latin1Str

dd98188

More ergonomic and correct APIs

025056b

Schottkyc137 added 7 commits January 3, 2026 15:06

serde: emit bytes instead of encoded string

79eb5b2

Explain comments and encoding in readme

24b44ca

Add tests and fixes for configurations

f32425c

Contexts and test cases

9de75d3

Test cases for design units

0b48bfe

More test cases

9fdd8c1

Move main function to integration test

cfc9b27

Schottkyc137 force-pushed the syntax-crate branch from 44757e6 to cfc9b27 Compare January 3, 2026 18:27

Schottkyc137 added 5 commits January 4, 2026 17:44

Final fixes and tests

d06c318

Remove .DS_Store

9ad19d3

Remove build-time only members from workspace cargo.toml

5a70227

Rework TokenStream implementation

b494a04

Collect vec instead of vec deque

207da8e

Schottkyc137 force-pushed the master branch from de865fa to ef313ed Compare January 5, 2026 08:33

Fix some pedantic clippy warnings to reduce code length

5f66787

Schottkyc137 added 4 commits January 31, 2026 12:29

Updates from the formatter branch

bbc1e26

Fixes

0f0562b

Update readme

34ca2ad

Merge branch 'master' into syntax-crate

e2ec8ea

Schottkyc137 force-pushed the syntax-crate branch from 9f9f4e3 to e2ec8ea Compare January 31, 2026 15:07

Schottkyc137 closed this Jan 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial draft of a public API to navigate and manipulate VHDL syntax#1

Initial draft of a public API to navigate and manipulate VHDL syntax#1
Schottkyc137 wants to merge 106 commits intomasterfrom
syntax-crate

Schottkyc137 commented Jan 1, 2025

Uh oh!

Uh oh!

Schottkyc137 commented Jan 3, 2026

Uh oh!

skaupper commented Jan 9, 2026

Uh oh!

Schottkyc137 commented Jan 9, 2026

Uh oh!

skaupper commented Jan 12, 2026

Uh oh!

Schottkyc137 commented Jan 12, 2026

Uh oh!

Schottkyc137 commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Schottkyc137 commented Jan 1, 2025

Uh oh!

Uh oh!

Schottkyc137 commented Jan 3, 2026

Uh oh!

skaupper commented Jan 9, 2026

Uh oh!

Schottkyc137 commented Jan 9, 2026

Uh oh!

skaupper commented Jan 12, 2026

Uh oh!

Schottkyc137 commented Jan 12, 2026

Uh oh!

Schottkyc137 commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants