Faulty code improve scanner on error tokens by privat · Pull Request #13135 · pharo-project/pharo

privat · 2023-03-24T02:13:24Z

Error positions on unknown characters the content of bad ## literal were off.
The culprit is RBScanner, so this PR tries to fix the handling of error content and position in the scanner.

Is issue is that scanError:from: and scanError: (and scanBackFrom:) included the current character in its string.
This is ok at the end of the stream, but problematic inside the stream, because the current token is often the problematic one that ends the previous token.
Therefore, clients used to call scanError: before consuming the last character, or just hard-coded the error token contents. But hardcoding were off for dry # if multiple # are present and the stop position was wrong for unknown character because the error was produced before consuming the bad character.

All the PR basically does, is to not include the current character in scanError: and cie. and update client to call it at the right time (after consuming the last character) or stopping hard-coding the text because it is now the correct one.

Related question, but unmodified by he PR: currently the error for a bad # literal (e.g. #1 or # at the end of file) is Literal expected with the location (where to put the cursor in front of the problematic character) after the #. I'm not sure that it is the best message nor location.

Maybe Invalid literal (or something) could be used instead, with the cursor in front of #?

Also, #1 does not consume the 1, so you have Literal expected before the 1 (that can be misleading, because 1 is a kind of literal) and the 1 will be used for the next token. Perhaps #1 should be scanned as a single invalid literal token instead?

…r else it messes stop location)

…w that they are reliable)

…and content) of unexpecter character and bad literal

privat · 2023-03-24T02:53:21Z

I tried to see what kind of syntax error other languages give for invalid symbols scanning/parsing. But there is not a lot of languages with literal symbols.

Scheme consider "quote" of an integer to be the integer, and not a symbol, and symbol content are otherwise quite liberal for the accepted characters (tested with racket)

> (symbol? '1)
#f
> (= '1 1)
#t
> (symbol? '¿→☭)
#t

Ruby is drunk

$ ruby -e ':1'
-e:1: syntax error, unexpected integer literal, expecting literal content or terminator or tSTRING_DBEG or tSTRING_DVAR

Erlang have atoms that are equivalent, but the syntax is not with a prefix character, so the scanner does not have the same issue.

Maybe try with other Smalltalk dialects?

jecisc

Seems good to me when I read the code. I like the duplication removal a lot :)
But I don't know that part of the system much

MarcusDenker · 2023-03-28T11:51:54Z

Related to multiple ###, I really wonder about it.

####literal

is the same as #literal... odd.

as for #1... yes, if we keep it as an error, the position of the error message looks wrong. But I even wonder why we this has to be an error...

privat added 7 commits March 23, 2023 21:18

RBScanner: handle eof in scanToken instead of next

4e8718d

RBScanner>>#scanBackFrom: do not consume the current character

b0ccd5d

RBScanner>>#scanLiteral error include the source of the full token (o…

0d62acd

…r else it messes stop location)

RBParser>>#parseErrorNode: use the correct stop location of token (no…

8bcc12d

…w that they are reliable)

RBCodeSnippet class>>#badExpressions update to fix correct location (…

9c08c3d

…and content) of unexpecter character and bad literal

RBScanner: simplify the logic of scanner errors

8026184

RBScanner improve method comments

73a595a

privat added the Status: Tests passed please review! label Mar 24, 2023

jecisc approved these changes Mar 24, 2023

View reviewed changes

privat changed the title ~~Faulty code improve scanner~~ Faulty code improve scanner on error tokens Mar 24, 2023

privat mentioned this pull request Mar 24, 2023

[Meta-issue] improve faulty compiling #12883

Open

78 tasks

MarcusDenker approved these changes Mar 28, 2023

View reviewed changes

MarcusDenker merged commit 1b2c77a into pharo-project:Pharo12 Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Faulty code improve scanner on error tokens#13135

Faulty code improve scanner on error tokens#13135
MarcusDenker merged 7 commits intopharo-project:Pharo12from
privat:faulty-code-improve-scanner

privat commented Mar 24, 2023

Uh oh!

privat commented Mar 24, 2023 •

edited

Loading

Uh oh!

jecisc left a comment

Uh oh!

MarcusDenker commented Mar 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

privat commented Mar 24, 2023

Uh oh!

privat commented Mar 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jecisc left a comment

Choose a reason for hiding this comment

Uh oh!

MarcusDenker commented Mar 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

privat commented Mar 24, 2023 •

edited

Loading