Faulty code improve scanner on error tokens#13135
Faulty code improve scanner on error tokens#13135MarcusDenker merged 7 commits intopharo-project:Pharo12from
Conversation
…r else it messes stop location)
…w that they are reliable)
…and content) of unexpecter character and bad literal
|
I tried to see what kind of syntax error other languages give for invalid symbols scanning/parsing. But there is not a lot of languages with literal symbols. Scheme consider "quote" of an integer to be the integer, and not a symbol, and symbol content are otherwise quite liberal for the accepted characters (tested with racket) Ruby is drunk Erlang have atoms that are equivalent, but the syntax is not with a prefix character, so the scanner does not have the same issue. Maybe try with other Smalltalk dialects? |
jecisc
left a comment
There was a problem hiding this comment.
Seems good to me when I read the code. I like the duplication removal a lot :)
But I don't know that part of the system much
|
Related to multiple ###, I really wonder about it. is the same as #literal... odd. as for #1... yes, if we keep it as an error, the position of the error message looks wrong. But I even wonder why we this has to be an error... |
Error positions on unknown characters the content of bad
##literal were off.The culprit is RBScanner, so this PR tries to fix the handling of error content and position in the scanner.
Is issue is that
scanError:from:andscanError:(andscanBackFrom:) included the current character in its string.This is ok at the end of the stream, but problematic inside the stream, because the current token is often the problematic one that ends the previous token.
Therefore, clients used to call
scanError:before consuming the last character, or just hard-coded the error token contents. But hardcoding were off for dry#if multiple#are present and the stop position was wrong for unknown character because the error was produced before consuming the bad character.All the PR basically does, is to not include the current character in
scanError:and cie. and update client to call it at the right time (after consuming the last character) or stopping hard-coding the text because it is now the correct one.Related question, but unmodified by he PR: currently the error for a bad
#literal (e.g.#1or#at the end of file) isLiteral expectedwith the location (where to put the cursor in front of the problematic character) after the#. I'm not sure that it is the best message nor location.Maybe
Invalid literal(or something) could be used instead, with the cursor in front of#?Also,
#1does not consume the1, so you haveLiteral expectedbefore the1(that can be misleading, because1is a kind of literal) and the 1 will be used for the next token. Perhaps#1should be scanned as a single invalid literal token instead?