Skip to content

Faulty code improve scanner on error tokens#13135

Merged
MarcusDenker merged 7 commits intopharo-project:Pharo12from
privat:faulty-code-improve-scanner
Mar 28, 2023
Merged

Faulty code improve scanner on error tokens#13135
MarcusDenker merged 7 commits intopharo-project:Pharo12from
privat:faulty-code-improve-scanner

Conversation

@privat
Copy link
Contributor

@privat privat commented Mar 24, 2023

Error positions on unknown characters the content of bad ## literal were off.
The culprit is RBScanner, so this PR tries to fix the handling of error content and position in the scanner.

Is issue is that scanError:from: and scanError: (and scanBackFrom:) included the current character in its string.
This is ok at the end of the stream, but problematic inside the stream, because the current token is often the problematic one that ends the previous token.
Therefore, clients used to call scanError: before consuming the last character, or just hard-coded the error token contents. But hardcoding were off for dry # if multiple # are present and the stop position was wrong for unknown character because the error was produced before consuming the bad character.

All the PR basically does, is to not include the current character in scanError: and cie. and update client to call it at the right time (after consuming the last character) or stopping hard-coding the text because it is now the correct one.

Related question, but unmodified by he PR: currently the error for a bad # literal (e.g. #1 or # at the end of file) is Literal expected with the location (where to put the cursor in front of the problematic character) after the #. I'm not sure that it is the best message nor location.

Maybe Invalid literal (or something) could be used instead, with the cursor in front of #?

Also, #1 does not consume the 1, so you have Literal expected before the 1 (that can be misleading, because 1 is a kind of literal) and the 1 will be used for the next token. Perhaps #1 should be scanned as a single invalid literal token instead?

@privat
Copy link
Contributor Author

privat commented Mar 24, 2023

I tried to see what kind of syntax error other languages give for invalid symbols scanning/parsing. But there is not a lot of languages with literal symbols.

Scheme consider "quote" of an integer to be the integer, and not a symbol, and symbol content are otherwise quite liberal for the accepted characters (tested with racket)

> (symbol? '1)
#f
> (= '1 1)
#t
> (symbol? '¿→☭)
#t

Ruby is drunk

$ ruby -e ':1'
-e:1: syntax error, unexpected integer literal, expecting literal content or terminator or tSTRING_DBEG or tSTRING_DVAR

Erlang have atoms that are equivalent, but the syntax is not with a prefix character, so the scanner does not have the same issue.

Maybe try with other Smalltalk dialects?

Copy link
Member

@jecisc jecisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me when I read the code. I like the duplication removal a lot :)
But I don't know that part of the system much

@privat privat changed the title Faulty code improve scanner Faulty code improve scanner on error tokens Mar 24, 2023
@MarcusDenker
Copy link
Member

Related to multiple ###, I really wonder about it.

####literal 

is the same as #literal... odd.

as for #1... yes, if we keep it as an error, the position of the error message looks wrong. But I even wonder why we this has to be an error...

@MarcusDenker MarcusDenker merged commit 1b2c77a into pharo-project:Pharo12 Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants