Just before I go into fiddling about here, what is the desired output? For the scanner error to be raised but there still to be no issue? Do I need to add a special case for semicolon into my scanner, so that within statements, unterminated strings are fine?
Here are my logs:
[tester::#OE4] [test-4] [test.lox] // This program prints the result of a comparison operation
[tester::#OE4] [test-4] [test.lox] // It also tests multi-line strings and non-ASCII characters
[tester::#OE4] [test-4] [test.lox] print false != true;
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "74
[tester::#OE4] [test-4] [test.lox] 11
[tester::#OE4] [test-4] [test.lox] 89
[tester::#OE4] [test-4] [test.lox] ";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "There should be an empty line above this.";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "(" + "" + ")";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "non-ascii: ॐ";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] $ ./your_program.sh run test.lox
[your_program] [line 5] Error: Unterminated string.
[your_program] [line 8] Error: Unterminated string.
[tester::#OE4] [test-4] expected no error (exit code 0), got exit code 65
Here’s the scanner logic:
# Handle strings
elif char.value == '"':
self.advance() # Consume the opening quote
string_value = ""
while True:
next_char = self.peek()
if not next_char: # Handle unexpected None
print(f"[line {current_line}] Error: Unterminated string.", file=sys.stderr)
self.lex_errors = True
break
if next_char.value == '"': # Found the closing quote
self.advance() # Consume the closing quote
tokens.append(Token(TokenType.STRING, f'"{string_value}"', string_value, current_line))
break
if next_char.value == '\n' or self.is_at_end_of_line(): # Unterminated string due to newline
print(f"[line {current_line}] Error: Unterminated string.", file=sys.stderr)
self.lex_errors = True
break
string_value += self.advance().value # Safe to add
For no particular reason, Lox supports multi-line strings. There are pros and cons to that, but prohibiting them was a little more complex than allowing them, so I left them in. That does mean we also need to update line when we hit a newline inside a string.
So the string that starts on line 5 is terminated on line 8.
Thanks, that helps. I must have missed it because there’s no newline token so the tokenizer would be the same whether I was handling newlines correctly or incorrectly in strings. Also why did I not have elifs??
Corrected code:
while True:
next_char = self.peek()
if not next_char or self.is_at_end(): # Handle unexpected None
print(f"[line {current_line}] Error: Unterminated string.", file=sys.stderr)
self.lex_errors = True
break
elif next_char.value == '"': # Found the closing quote
self.advance() # Consume the closing quote
tokens.append(Token(TokenType.STRING, f'"{string_value}"', string_value, current_line))
break
elif next_char.value == '\n': # Unterminated string due to newline
self.advance()
string_value += '\n'
break
else:
string_value += self.advance().value # Safe to add
As it happens, I now have a new parser error, which I assume is to do with new lines and white spaces issues. I added in line output for parser errors to help but it hasn’t helped.
[tester::#OE4] [test-4] [test.lox] // This program prints the result of a comparison operation
[tester::#OE4] [test-4] [test.lox] // It also tests multi-line strings and non-ASCII characters
[tester::#OE4] [test-4] [test.lox] print true != false;
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "48
[tester::#OE4] [test-4] [test.lox] 82
[tester::#OE4] [test-4] [test.lox] 66
[tester::#OE4] [test-4] [test.lox] ";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "There should be an empty line above this.";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "(" + "" + ")";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] [test.lox] print "non-ascii: ॐ";
[tester::#OE4] [test-4] [test.lox]
[tester::#OE4] [test-4] $ ./your_program.sh run test.lox
[your_program] Parser error: [line 7] Expect ';' after value.
[tester::#OE4] [test-4] expected no error (exit code 0), got exit code 65
Nevermind, it’s still to do with me not handling multiline strings correctly, but now in a different way. Do I need a kind of “invisible” token for newlines, I can’t see where this is mentioned in the book?
EDIT (SOLVED):
Basically, I forgot how I had implemented my advance method for the scanner, which did a lot of the work for me.
self.advance: (note this returns a Character object that I have defined previously to keep track of position of tokens in files (can be converted to a python string literal by doing str(Character) due to the existence of a return value for __str__ method in the Character class).
def advance(self):
"""Consume the current character and return it"""
if self.is_at_end():
return None
current_line = self.file_content.lines[self.current_line_index]
if self.is_at_end_of_line():
# Move to the next line
self.current_line_index += 1
self.current_char_index = 0
return '\n' # Return newline as a special character
char = current_line.characters[self.current_char_index]
self.current_char_index += 1
return char
Then the snippet with the fix:
elif next_char.value == '"': # Found the closing quote
self.advance() # Consume the closing quote
tokens.append(Token(TokenType.STRING, f'"{string_value}"', string_value, current_line))
break
else:
# hackery going on here self.advance() actually does things as well as returning the character
string_value += str(self.advance())