Port of MicroPython to the mbed platform. See micropython-repl for an interactive program.

Dependents:   micropython-repl

This a port of MicroPython to the mbed Classic platform.

This provides an interpreter running on the board's USB serial connection.

Getting Started

Import the micropython-repl program into your IDE workspace on developer.mbed.org. Compile and download to your board. Connect to the USB serial port in your usual manner. You should get a startup message similar to the following:

  MicroPython v1.7-155-gdddcdd8 on 2016-04-23; K64F with ARM
  Type "help()" for more information.
  >>>

Then you can start using micropython. For example:

  >>> from mbed import DigitalOut
  >>> from pins import LED1
  >>> led = DigitalOut(LED1)
  >>> led.write(1)

Requirements

You need approximately 100K of flash memory, so this will be no good for boards with smaller amounts of storage.

Caveats

This can be considered an alpha release of the port; things may not work; APIs may change in later releases. It is NOT an official part part the micropython project, so if anything doesn't work, blame me. If it does work, most of the credit is due to micropython.

  • Only a few of the mbed classes are available in micropython so far, and not all methods of those that are.
  • Only a few boards have their full range of pin names available; for others, only a few standard ones (USBTX, USBRX, LED1) are implemented.
  • The garbage collector is not yet implemented. The interpreter will gradually consume memory and then fail.
  • Exceptions from the mbed classes are not yet handled.
  • Asynchronous processing (e.g. events on inputs) is not supported.

Credits

  • Damien P. George and other contributors who created micropython.
  • Colin Hogben, author of this port.
Committer:
Colin Hogben
Date:
Wed Apr 27 22:11:29 2016 +0100
Revision:
10:33521d742af1
Parent:
2:c89e95946844
Update README and version

Who changed what in which revision?

UserRevisionLine numberNew contents of line
pythontech 0:5868e8752d44 1 /*
pythontech 0:5868e8752d44 2 * This file is part of the Micro Python project, http://micropython.org/
pythontech 0:5868e8752d44 3 *
pythontech 0:5868e8752d44 4 * The MIT License (MIT)
pythontech 0:5868e8752d44 5 *
pythontech 0:5868e8752d44 6 * Copyright (c) 2013, 2014 Damien P. George
pythontech 0:5868e8752d44 7 *
pythontech 0:5868e8752d44 8 * Permission is hereby granted, free of charge, to any person obtaining a copy
pythontech 0:5868e8752d44 9 * of this software and associated documentation files (the "Software"), to deal
pythontech 0:5868e8752d44 10 * in the Software without restriction, including without limitation the rights
pythontech 0:5868e8752d44 11 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
pythontech 0:5868e8752d44 12 * copies of the Software, and to permit persons to whom the Software is
pythontech 0:5868e8752d44 13 * furnished to do so, subject to the following conditions:
pythontech 0:5868e8752d44 14 *
pythontech 0:5868e8752d44 15 * The above copyright notice and this permission notice shall be included in
pythontech 0:5868e8752d44 16 * all copies or substantial portions of the Software.
pythontech 0:5868e8752d44 17 *
pythontech 0:5868e8752d44 18 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
pythontech 0:5868e8752d44 19 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
pythontech 0:5868e8752d44 20 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
pythontech 0:5868e8752d44 21 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
pythontech 0:5868e8752d44 22 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
pythontech 0:5868e8752d44 23 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
pythontech 0:5868e8752d44 24 * THE SOFTWARE.
pythontech 0:5868e8752d44 25 */
pythontech 0:5868e8752d44 26
pythontech 0:5868e8752d44 27 #include <stdio.h>
pythontech 0:5868e8752d44 28 #include <assert.h>
pythontech 0:5868e8752d44 29
pythontech 0:5868e8752d44 30 #include "py/mpstate.h"
pythontech 0:5868e8752d44 31 #include "py/lexer.h"
pythontech 0:5868e8752d44 32 #include "py/runtime.h"
pythontech 0:5868e8752d44 33
pythontech 0:5868e8752d44 34 #if MICROPY_ENABLE_COMPILER
pythontech 0:5868e8752d44 35
pythontech 0:5868e8752d44 36 #define TAB_SIZE (8)
pythontech 0:5868e8752d44 37
pythontech 0:5868e8752d44 38 // TODO seems that CPython allows NULL byte in the input stream
pythontech 0:5868e8752d44 39 // don't know if that's intentional or not, but we don't allow it
pythontech 0:5868e8752d44 40
pythontech 0:5868e8752d44 41 // TODO replace with a call to a standard function
pythontech 0:5868e8752d44 42 STATIC bool str_strn_equal(const char *str, const char *strn, mp_uint_t len) {
pythontech 0:5868e8752d44 43 mp_uint_t i = 0;
pythontech 0:5868e8752d44 44
pythontech 0:5868e8752d44 45 while (i < len && *str == *strn) {
pythontech 0:5868e8752d44 46 ++i;
pythontech 0:5868e8752d44 47 ++str;
pythontech 0:5868e8752d44 48 ++strn;
pythontech 0:5868e8752d44 49 }
pythontech 0:5868e8752d44 50
pythontech 0:5868e8752d44 51 return i == len && *str == 0;
pythontech 0:5868e8752d44 52 }
pythontech 0:5868e8752d44 53
pythontech 0:5868e8752d44 54 #define CUR_CHAR(lex) ((lex)->chr0)
pythontech 0:5868e8752d44 55
pythontech 0:5868e8752d44 56 STATIC bool is_end(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 57 return lex->chr0 == MP_LEXER_EOF;
pythontech 0:5868e8752d44 58 }
pythontech 0:5868e8752d44 59
pythontech 0:5868e8752d44 60 STATIC bool is_physical_newline(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 61 return lex->chr0 == '\n';
pythontech 0:5868e8752d44 62 }
pythontech 0:5868e8752d44 63
pythontech 0:5868e8752d44 64 STATIC bool is_char(mp_lexer_t *lex, byte c) {
pythontech 0:5868e8752d44 65 return lex->chr0 == c;
pythontech 0:5868e8752d44 66 }
pythontech 0:5868e8752d44 67
pythontech 0:5868e8752d44 68 STATIC bool is_char_or(mp_lexer_t *lex, byte c1, byte c2) {
pythontech 0:5868e8752d44 69 return lex->chr0 == c1 || lex->chr0 == c2;
pythontech 0:5868e8752d44 70 }
pythontech 0:5868e8752d44 71
pythontech 0:5868e8752d44 72 STATIC bool is_char_or3(mp_lexer_t *lex, byte c1, byte c2, byte c3) {
pythontech 0:5868e8752d44 73 return lex->chr0 == c1 || lex->chr0 == c2 || lex->chr0 == c3;
pythontech 0:5868e8752d44 74 }
pythontech 0:5868e8752d44 75
pythontech 0:5868e8752d44 76 /*
pythontech 0:5868e8752d44 77 STATIC bool is_char_following(mp_lexer_t *lex, byte c) {
pythontech 0:5868e8752d44 78 return lex->chr1 == c;
pythontech 0:5868e8752d44 79 }
pythontech 0:5868e8752d44 80 */
pythontech 0:5868e8752d44 81
pythontech 0:5868e8752d44 82 STATIC bool is_char_following_or(mp_lexer_t *lex, byte c1, byte c2) {
pythontech 0:5868e8752d44 83 return lex->chr1 == c1 || lex->chr1 == c2;
pythontech 0:5868e8752d44 84 }
pythontech 0:5868e8752d44 85
pythontech 0:5868e8752d44 86 STATIC bool is_char_following_following_or(mp_lexer_t *lex, byte c1, byte c2) {
pythontech 0:5868e8752d44 87 return lex->chr2 == c1 || lex->chr2 == c2;
pythontech 0:5868e8752d44 88 }
pythontech 0:5868e8752d44 89
pythontech 0:5868e8752d44 90 STATIC bool is_char_and(mp_lexer_t *lex, byte c1, byte c2) {
pythontech 0:5868e8752d44 91 return lex->chr0 == c1 && lex->chr1 == c2;
pythontech 0:5868e8752d44 92 }
pythontech 0:5868e8752d44 93
pythontech 0:5868e8752d44 94 STATIC bool is_whitespace(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 95 return unichar_isspace(lex->chr0);
pythontech 0:5868e8752d44 96 }
pythontech 0:5868e8752d44 97
pythontech 0:5868e8752d44 98 STATIC bool is_letter(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 99 return unichar_isalpha(lex->chr0);
pythontech 0:5868e8752d44 100 }
pythontech 0:5868e8752d44 101
pythontech 0:5868e8752d44 102 STATIC bool is_digit(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 103 return unichar_isdigit(lex->chr0);
pythontech 0:5868e8752d44 104 }
pythontech 0:5868e8752d44 105
pythontech 0:5868e8752d44 106 STATIC bool is_following_digit(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 107 return unichar_isdigit(lex->chr1);
pythontech 0:5868e8752d44 108 }
pythontech 0:5868e8752d44 109
pythontech 0:5868e8752d44 110 STATIC bool is_following_base_char(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 111 const unichar chr1 = lex->chr1 | 0x20;
pythontech 0:5868e8752d44 112 return chr1 == 'b' || chr1 == 'o' || chr1 == 'x';
pythontech 0:5868e8752d44 113 }
pythontech 0:5868e8752d44 114
pythontech 0:5868e8752d44 115 STATIC bool is_following_odigit(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 116 return lex->chr1 >= '0' && lex->chr1 <= '7';
pythontech 0:5868e8752d44 117 }
pythontech 0:5868e8752d44 118
pythontech 0:5868e8752d44 119 // to easily parse utf-8 identifiers we allow any raw byte with high bit set
pythontech 0:5868e8752d44 120 STATIC bool is_head_of_identifier(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 121 return is_letter(lex) || lex->chr0 == '_' || lex->chr0 >= 0x80;
pythontech 0:5868e8752d44 122 }
pythontech 0:5868e8752d44 123
pythontech 0:5868e8752d44 124 STATIC bool is_tail_of_identifier(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 125 return is_head_of_identifier(lex) || is_digit(lex);
pythontech 0:5868e8752d44 126 }
pythontech 0:5868e8752d44 127
pythontech 0:5868e8752d44 128 STATIC void next_char(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 129 if (lex->chr0 == MP_LEXER_EOF) {
pythontech 0:5868e8752d44 130 return;
pythontech 0:5868e8752d44 131 }
pythontech 0:5868e8752d44 132
pythontech 0:5868e8752d44 133 if (lex->chr0 == '\n') {
pythontech 0:5868e8752d44 134 // a new line
pythontech 0:5868e8752d44 135 ++lex->line;
pythontech 0:5868e8752d44 136 lex->column = 1;
pythontech 0:5868e8752d44 137 } else if (lex->chr0 == '\t') {
pythontech 0:5868e8752d44 138 // a tab
pythontech 0:5868e8752d44 139 lex->column = (((lex->column - 1 + TAB_SIZE) / TAB_SIZE) * TAB_SIZE) + 1;
pythontech 0:5868e8752d44 140 } else {
pythontech 0:5868e8752d44 141 // a character worth one column
pythontech 0:5868e8752d44 142 ++lex->column;
pythontech 0:5868e8752d44 143 }
pythontech 0:5868e8752d44 144
pythontech 0:5868e8752d44 145 lex->chr0 = lex->chr1;
pythontech 0:5868e8752d44 146 lex->chr1 = lex->chr2;
pythontech 0:5868e8752d44 147 lex->chr2 = lex->stream_next_byte(lex->stream_data);
pythontech 0:5868e8752d44 148
pythontech 0:5868e8752d44 149 if (lex->chr0 == '\r') {
pythontech 0:5868e8752d44 150 // CR is a new line, converted to LF
pythontech 0:5868e8752d44 151 lex->chr0 = '\n';
pythontech 0:5868e8752d44 152 if (lex->chr1 == '\n') {
pythontech 0:5868e8752d44 153 // CR LF is a single new line
pythontech 0:5868e8752d44 154 lex->chr1 = lex->chr2;
pythontech 0:5868e8752d44 155 lex->chr2 = lex->stream_next_byte(lex->stream_data);
pythontech 0:5868e8752d44 156 }
pythontech 0:5868e8752d44 157 }
pythontech 0:5868e8752d44 158
pythontech 0:5868e8752d44 159 if (lex->chr2 == MP_LEXER_EOF) {
pythontech 0:5868e8752d44 160 // EOF, check if we need to insert a newline at end of file
pythontech 0:5868e8752d44 161 if (lex->chr1 != MP_LEXER_EOF && lex->chr1 != '\n') {
pythontech 0:5868e8752d44 162 // if lex->chr1 == '\r' then this makes a CR LF which will be converted to LF above
pythontech 0:5868e8752d44 163 // otherwise it just inserts a LF
pythontech 0:5868e8752d44 164 lex->chr2 = '\n';
pythontech 0:5868e8752d44 165 }
pythontech 0:5868e8752d44 166 }
pythontech 0:5868e8752d44 167 }
pythontech 0:5868e8752d44 168
pythontech 0:5868e8752d44 169 STATIC void indent_push(mp_lexer_t *lex, mp_uint_t indent) {
pythontech 0:5868e8752d44 170 if (lex->num_indent_level >= lex->alloc_indent_level) {
pythontech 0:5868e8752d44 171 // TODO use m_renew_maybe and somehow indicate an error if it fails... probably by using MP_TOKEN_MEMORY_ERROR
pythontech 0:5868e8752d44 172 lex->indent_level = m_renew(uint16_t, lex->indent_level, lex->alloc_indent_level, lex->alloc_indent_level + MICROPY_ALLOC_LEXEL_INDENT_INC);
pythontech 0:5868e8752d44 173 lex->alloc_indent_level += MICROPY_ALLOC_LEXEL_INDENT_INC;
pythontech 0:5868e8752d44 174 }
pythontech 0:5868e8752d44 175 lex->indent_level[lex->num_indent_level++] = indent;
pythontech 0:5868e8752d44 176 }
pythontech 0:5868e8752d44 177
pythontech 0:5868e8752d44 178 STATIC mp_uint_t indent_top(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 179 return lex->indent_level[lex->num_indent_level - 1];
pythontech 0:5868e8752d44 180 }
pythontech 0:5868e8752d44 181
pythontech 0:5868e8752d44 182 STATIC void indent_pop(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 183 lex->num_indent_level -= 1;
pythontech 0:5868e8752d44 184 }
pythontech 0:5868e8752d44 185
pythontech 0:5868e8752d44 186 // some tricky operator encoding:
pythontech 0:5868e8752d44 187 // <op> = begin with <op>, if this opchar matches then begin here
pythontech 0:5868e8752d44 188 // e<op> = end with <op>, if this opchar matches then end
pythontech 0:5868e8752d44 189 // E<op> = mandatory end with <op>, this opchar must match, then end
pythontech 0:5868e8752d44 190 // c<op> = continue with <op>, if this opchar matches then continue matching
pythontech 0:5868e8752d44 191 // this means if the start of two ops are the same then they are equal til the last char
pythontech 0:5868e8752d44 192
pythontech 0:5868e8752d44 193 STATIC const char *tok_enc =
pythontech 0:5868e8752d44 194 "()[]{},:;@~" // singles
pythontech 0:5868e8752d44 195 "<e=c<e=" // < <= << <<=
pythontech 0:5868e8752d44 196 ">e=c>e=" // > >= >> >>=
pythontech 0:5868e8752d44 197 "*e=c*e=" // * *= ** **=
pythontech 0:5868e8752d44 198 "+e=" // + +=
pythontech 0:5868e8752d44 199 "-e=e>" // - -= ->
pythontech 0:5868e8752d44 200 "&e=" // & &=
pythontech 0:5868e8752d44 201 "|e=" // | |=
pythontech 0:5868e8752d44 202 "/e=c/e=" // / /= // //=
pythontech 0:5868e8752d44 203 "%e=" // % %=
pythontech 0:5868e8752d44 204 "^e=" // ^ ^=
pythontech 0:5868e8752d44 205 "=e=" // = ==
pythontech 0:5868e8752d44 206 "!E="; // !=
pythontech 0:5868e8752d44 207
pythontech 0:5868e8752d44 208 // TODO static assert that number of tokens is less than 256 so we can safely make this table with byte sized entries
pythontech 0:5868e8752d44 209 STATIC const uint8_t tok_enc_kind[] = {
pythontech 0:5868e8752d44 210 MP_TOKEN_DEL_PAREN_OPEN, MP_TOKEN_DEL_PAREN_CLOSE,
pythontech 0:5868e8752d44 211 MP_TOKEN_DEL_BRACKET_OPEN, MP_TOKEN_DEL_BRACKET_CLOSE,
pythontech 0:5868e8752d44 212 MP_TOKEN_DEL_BRACE_OPEN, MP_TOKEN_DEL_BRACE_CLOSE,
pythontech 0:5868e8752d44 213 MP_TOKEN_DEL_COMMA, MP_TOKEN_DEL_COLON, MP_TOKEN_DEL_SEMICOLON, MP_TOKEN_DEL_AT, MP_TOKEN_OP_TILDE,
pythontech 0:5868e8752d44 214
pythontech 0:5868e8752d44 215 MP_TOKEN_OP_LESS, MP_TOKEN_OP_LESS_EQUAL, MP_TOKEN_OP_DBL_LESS, MP_TOKEN_DEL_DBL_LESS_EQUAL,
pythontech 0:5868e8752d44 216 MP_TOKEN_OP_MORE, MP_TOKEN_OP_MORE_EQUAL, MP_TOKEN_OP_DBL_MORE, MP_TOKEN_DEL_DBL_MORE_EQUAL,
pythontech 0:5868e8752d44 217 MP_TOKEN_OP_STAR, MP_TOKEN_DEL_STAR_EQUAL, MP_TOKEN_OP_DBL_STAR, MP_TOKEN_DEL_DBL_STAR_EQUAL,
pythontech 0:5868e8752d44 218 MP_TOKEN_OP_PLUS, MP_TOKEN_DEL_PLUS_EQUAL,
pythontech 0:5868e8752d44 219 MP_TOKEN_OP_MINUS, MP_TOKEN_DEL_MINUS_EQUAL, MP_TOKEN_DEL_MINUS_MORE,
pythontech 0:5868e8752d44 220 MP_TOKEN_OP_AMPERSAND, MP_TOKEN_DEL_AMPERSAND_EQUAL,
pythontech 0:5868e8752d44 221 MP_TOKEN_OP_PIPE, MP_TOKEN_DEL_PIPE_EQUAL,
pythontech 0:5868e8752d44 222 MP_TOKEN_OP_SLASH, MP_TOKEN_DEL_SLASH_EQUAL, MP_TOKEN_OP_DBL_SLASH, MP_TOKEN_DEL_DBL_SLASH_EQUAL,
pythontech 0:5868e8752d44 223 MP_TOKEN_OP_PERCENT, MP_TOKEN_DEL_PERCENT_EQUAL,
pythontech 0:5868e8752d44 224 MP_TOKEN_OP_CARET, MP_TOKEN_DEL_CARET_EQUAL,
pythontech 0:5868e8752d44 225 MP_TOKEN_DEL_EQUAL, MP_TOKEN_OP_DBL_EQUAL,
pythontech 0:5868e8752d44 226 MP_TOKEN_OP_NOT_EQUAL,
pythontech 0:5868e8752d44 227 };
pythontech 0:5868e8752d44 228
pythontech 0:5868e8752d44 229 // must have the same order as enum in lexer.h
pythontech 0:5868e8752d44 230 STATIC const char *tok_kw[] = {
pythontech 0:5868e8752d44 231 "False",
pythontech 0:5868e8752d44 232 "None",
pythontech 0:5868e8752d44 233 "True",
pythontech 0:5868e8752d44 234 "and",
pythontech 0:5868e8752d44 235 "as",
pythontech 0:5868e8752d44 236 "assert",
Colin Hogben 2:c89e95946844 237 #if MICROPY_PY_ASYNC_AWAIT
Colin Hogben 2:c89e95946844 238 "async",
Colin Hogben 2:c89e95946844 239 "await",
Colin Hogben 2:c89e95946844 240 #endif
pythontech 0:5868e8752d44 241 "break",
pythontech 0:5868e8752d44 242 "class",
pythontech 0:5868e8752d44 243 "continue",
pythontech 0:5868e8752d44 244 "def",
pythontech 0:5868e8752d44 245 "del",
pythontech 0:5868e8752d44 246 "elif",
pythontech 0:5868e8752d44 247 "else",
pythontech 0:5868e8752d44 248 "except",
pythontech 0:5868e8752d44 249 "finally",
pythontech 0:5868e8752d44 250 "for",
pythontech 0:5868e8752d44 251 "from",
pythontech 0:5868e8752d44 252 "global",
pythontech 0:5868e8752d44 253 "if",
pythontech 0:5868e8752d44 254 "import",
pythontech 0:5868e8752d44 255 "in",
pythontech 0:5868e8752d44 256 "is",
pythontech 0:5868e8752d44 257 "lambda",
pythontech 0:5868e8752d44 258 "nonlocal",
pythontech 0:5868e8752d44 259 "not",
pythontech 0:5868e8752d44 260 "or",
pythontech 0:5868e8752d44 261 "pass",
pythontech 0:5868e8752d44 262 "raise",
pythontech 0:5868e8752d44 263 "return",
pythontech 0:5868e8752d44 264 "try",
pythontech 0:5868e8752d44 265 "while",
pythontech 0:5868e8752d44 266 "with",
pythontech 0:5868e8752d44 267 "yield",
pythontech 0:5868e8752d44 268 "__debug__",
pythontech 0:5868e8752d44 269 };
pythontech 0:5868e8752d44 270
pythontech 0:5868e8752d44 271 // This is called with CUR_CHAR() before first hex digit, and should return with
pythontech 0:5868e8752d44 272 // it pointing to last hex digit
pythontech 0:5868e8752d44 273 // num_digits must be greater than zero
pythontech 0:5868e8752d44 274 STATIC bool get_hex(mp_lexer_t *lex, mp_uint_t num_digits, mp_uint_t *result) {
pythontech 0:5868e8752d44 275 mp_uint_t num = 0;
pythontech 0:5868e8752d44 276 while (num_digits-- != 0) {
pythontech 0:5868e8752d44 277 next_char(lex);
pythontech 0:5868e8752d44 278 unichar c = CUR_CHAR(lex);
pythontech 0:5868e8752d44 279 if (!unichar_isxdigit(c)) {
pythontech 0:5868e8752d44 280 return false;
pythontech 0:5868e8752d44 281 }
pythontech 0:5868e8752d44 282 num = (num << 4) + unichar_xdigit_value(c);
pythontech 0:5868e8752d44 283 }
pythontech 0:5868e8752d44 284 *result = num;
pythontech 0:5868e8752d44 285 return true;
pythontech 0:5868e8752d44 286 }
pythontech 0:5868e8752d44 287
pythontech 0:5868e8752d44 288 STATIC void mp_lexer_next_token_into(mp_lexer_t *lex, bool first_token) {
pythontech 0:5868e8752d44 289 // start new token text
pythontech 0:5868e8752d44 290 vstr_reset(&lex->vstr);
pythontech 0:5868e8752d44 291
pythontech 0:5868e8752d44 292 // skip white space and comments
pythontech 0:5868e8752d44 293 bool had_physical_newline = false;
pythontech 0:5868e8752d44 294 while (!is_end(lex)) {
pythontech 0:5868e8752d44 295 if (is_physical_newline(lex)) {
pythontech 0:5868e8752d44 296 had_physical_newline = true;
pythontech 0:5868e8752d44 297 next_char(lex);
pythontech 0:5868e8752d44 298 } else if (is_whitespace(lex)) {
pythontech 0:5868e8752d44 299 next_char(lex);
pythontech 0:5868e8752d44 300 } else if (is_char(lex, '#')) {
pythontech 0:5868e8752d44 301 next_char(lex);
pythontech 0:5868e8752d44 302 while (!is_end(lex) && !is_physical_newline(lex)) {
pythontech 0:5868e8752d44 303 next_char(lex);
pythontech 0:5868e8752d44 304 }
pythontech 0:5868e8752d44 305 // had_physical_newline will be set on next loop
pythontech 0:5868e8752d44 306 } else if (is_char(lex, '\\')) {
pythontech 0:5868e8752d44 307 // backslash (outside string literals) must appear just before a physical newline
pythontech 0:5868e8752d44 308 next_char(lex);
pythontech 0:5868e8752d44 309 if (!is_physical_newline(lex)) {
pythontech 0:5868e8752d44 310 // SyntaxError: unexpected character after line continuation character
pythontech 0:5868e8752d44 311 lex->tok_line = lex->line;
pythontech 0:5868e8752d44 312 lex->tok_column = lex->column;
pythontech 0:5868e8752d44 313 lex->tok_kind = MP_TOKEN_BAD_LINE_CONTINUATION;
pythontech 0:5868e8752d44 314 return;
pythontech 0:5868e8752d44 315 } else {
pythontech 0:5868e8752d44 316 next_char(lex);
pythontech 0:5868e8752d44 317 }
pythontech 0:5868e8752d44 318 } else {
pythontech 0:5868e8752d44 319 break;
pythontech 0:5868e8752d44 320 }
pythontech 0:5868e8752d44 321 }
pythontech 0:5868e8752d44 322
pythontech 0:5868e8752d44 323 // set token source information
pythontech 0:5868e8752d44 324 lex->tok_line = lex->line;
pythontech 0:5868e8752d44 325 lex->tok_column = lex->column;
pythontech 0:5868e8752d44 326
pythontech 0:5868e8752d44 327 if (first_token && lex->line == 1 && lex->column != 1) {
pythontech 0:5868e8752d44 328 // check that the first token is in the first column
pythontech 0:5868e8752d44 329 // if first token is not on first line, we get a physical newline and
pythontech 0:5868e8752d44 330 // this check is done as part of normal indent/dedent checking below
pythontech 0:5868e8752d44 331 // (done to get equivalence with CPython)
pythontech 0:5868e8752d44 332 lex->tok_kind = MP_TOKEN_INDENT;
pythontech 0:5868e8752d44 333
pythontech 0:5868e8752d44 334 } else if (lex->emit_dent < 0) {
pythontech 0:5868e8752d44 335 lex->tok_kind = MP_TOKEN_DEDENT;
pythontech 0:5868e8752d44 336 lex->emit_dent += 1;
pythontech 0:5868e8752d44 337
pythontech 0:5868e8752d44 338 } else if (lex->emit_dent > 0) {
pythontech 0:5868e8752d44 339 lex->tok_kind = MP_TOKEN_INDENT;
pythontech 0:5868e8752d44 340 lex->emit_dent -= 1;
pythontech 0:5868e8752d44 341
pythontech 0:5868e8752d44 342 } else if (had_physical_newline && lex->nested_bracket_level == 0) {
pythontech 0:5868e8752d44 343 lex->tok_kind = MP_TOKEN_NEWLINE;
pythontech 0:5868e8752d44 344
pythontech 0:5868e8752d44 345 mp_uint_t num_spaces = lex->column - 1;
pythontech 0:5868e8752d44 346 lex->emit_dent = 0;
pythontech 0:5868e8752d44 347 if (num_spaces == indent_top(lex)) {
pythontech 0:5868e8752d44 348 } else if (num_spaces > indent_top(lex)) {
pythontech 0:5868e8752d44 349 indent_push(lex, num_spaces);
pythontech 0:5868e8752d44 350 lex->emit_dent += 1;
pythontech 0:5868e8752d44 351 } else {
pythontech 0:5868e8752d44 352 while (num_spaces < indent_top(lex)) {
pythontech 0:5868e8752d44 353 indent_pop(lex);
pythontech 0:5868e8752d44 354 lex->emit_dent -= 1;
pythontech 0:5868e8752d44 355 }
pythontech 0:5868e8752d44 356 if (num_spaces != indent_top(lex)) {
pythontech 0:5868e8752d44 357 lex->tok_kind = MP_TOKEN_DEDENT_MISMATCH;
pythontech 0:5868e8752d44 358 }
pythontech 0:5868e8752d44 359 }
pythontech 0:5868e8752d44 360
pythontech 0:5868e8752d44 361 } else if (is_end(lex)) {
pythontech 0:5868e8752d44 362 if (indent_top(lex) > 0) {
pythontech 0:5868e8752d44 363 lex->tok_kind = MP_TOKEN_NEWLINE;
pythontech 0:5868e8752d44 364 lex->emit_dent = 0;
pythontech 0:5868e8752d44 365 while (indent_top(lex) > 0) {
pythontech 0:5868e8752d44 366 indent_pop(lex);
pythontech 0:5868e8752d44 367 lex->emit_dent -= 1;
pythontech 0:5868e8752d44 368 }
pythontech 0:5868e8752d44 369 } else {
pythontech 0:5868e8752d44 370 lex->tok_kind = MP_TOKEN_END;
pythontech 0:5868e8752d44 371 }
pythontech 0:5868e8752d44 372
pythontech 0:5868e8752d44 373 } else if (is_char_or(lex, '\'', '\"')
pythontech 0:5868e8752d44 374 || (is_char_or3(lex, 'r', 'u', 'b') && is_char_following_or(lex, '\'', '\"'))
pythontech 0:5868e8752d44 375 || ((is_char_and(lex, 'r', 'b') || is_char_and(lex, 'b', 'r')) && is_char_following_following_or(lex, '\'', '\"'))) {
pythontech 0:5868e8752d44 376 // a string or bytes literal
pythontech 0:5868e8752d44 377
pythontech 0:5868e8752d44 378 // parse type codes
pythontech 0:5868e8752d44 379 bool is_raw = false;
pythontech 0:5868e8752d44 380 bool is_bytes = false;
pythontech 0:5868e8752d44 381 if (is_char(lex, 'u')) {
pythontech 0:5868e8752d44 382 next_char(lex);
pythontech 0:5868e8752d44 383 } else if (is_char(lex, 'b')) {
pythontech 0:5868e8752d44 384 is_bytes = true;
pythontech 0:5868e8752d44 385 next_char(lex);
pythontech 0:5868e8752d44 386 if (is_char(lex, 'r')) {
pythontech 0:5868e8752d44 387 is_raw = true;
pythontech 0:5868e8752d44 388 next_char(lex);
pythontech 0:5868e8752d44 389 }
pythontech 0:5868e8752d44 390 } else if (is_char(lex, 'r')) {
pythontech 0:5868e8752d44 391 is_raw = true;
pythontech 0:5868e8752d44 392 next_char(lex);
pythontech 0:5868e8752d44 393 if (is_char(lex, 'b')) {
pythontech 0:5868e8752d44 394 is_bytes = true;
pythontech 0:5868e8752d44 395 next_char(lex);
pythontech 0:5868e8752d44 396 }
pythontech 0:5868e8752d44 397 }
pythontech 0:5868e8752d44 398
pythontech 0:5868e8752d44 399 // set token kind
pythontech 0:5868e8752d44 400 if (is_bytes) {
pythontech 0:5868e8752d44 401 lex->tok_kind = MP_TOKEN_BYTES;
pythontech 0:5868e8752d44 402 } else {
pythontech 0:5868e8752d44 403 lex->tok_kind = MP_TOKEN_STRING;
pythontech 0:5868e8752d44 404 }
pythontech 0:5868e8752d44 405
pythontech 0:5868e8752d44 406 // get first quoting character
pythontech 0:5868e8752d44 407 char quote_char = '\'';
pythontech 0:5868e8752d44 408 if (is_char(lex, '\"')) {
pythontech 0:5868e8752d44 409 quote_char = '\"';
pythontech 0:5868e8752d44 410 }
pythontech 0:5868e8752d44 411 next_char(lex);
pythontech 0:5868e8752d44 412
pythontech 0:5868e8752d44 413 // work out if it's a single or triple quoted literal
pythontech 0:5868e8752d44 414 mp_uint_t num_quotes;
pythontech 0:5868e8752d44 415 if (is_char_and(lex, quote_char, quote_char)) {
pythontech 0:5868e8752d44 416 // triple quotes
pythontech 0:5868e8752d44 417 next_char(lex);
pythontech 0:5868e8752d44 418 next_char(lex);
pythontech 0:5868e8752d44 419 num_quotes = 3;
pythontech 0:5868e8752d44 420 } else {
pythontech 0:5868e8752d44 421 // single quotes
pythontech 0:5868e8752d44 422 num_quotes = 1;
pythontech 0:5868e8752d44 423 }
pythontech 0:5868e8752d44 424
pythontech 0:5868e8752d44 425 // parse the literal
pythontech 0:5868e8752d44 426 mp_uint_t n_closing = 0;
pythontech 0:5868e8752d44 427 while (!is_end(lex) && (num_quotes > 1 || !is_char(lex, '\n')) && n_closing < num_quotes) {
pythontech 0:5868e8752d44 428 if (is_char(lex, quote_char)) {
pythontech 0:5868e8752d44 429 n_closing += 1;
pythontech 0:5868e8752d44 430 vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech 0:5868e8752d44 431 } else {
pythontech 0:5868e8752d44 432 n_closing = 0;
pythontech 0:5868e8752d44 433 if (is_char(lex, '\\')) {
pythontech 0:5868e8752d44 434 next_char(lex);
pythontech 0:5868e8752d44 435 unichar c = CUR_CHAR(lex);
pythontech 0:5868e8752d44 436 if (is_raw) {
pythontech 0:5868e8752d44 437 // raw strings allow escaping of quotes, but the backslash is also emitted
pythontech 0:5868e8752d44 438 vstr_add_char(&lex->vstr, '\\');
pythontech 0:5868e8752d44 439 } else {
pythontech 0:5868e8752d44 440 switch (c) {
pythontech 0:5868e8752d44 441 case MP_LEXER_EOF: break; // TODO a proper error message?
pythontech 0:5868e8752d44 442 case '\n': c = MP_LEXER_EOF; break; // TODO check this works correctly (we are supposed to ignore it
pythontech 0:5868e8752d44 443 case '\\': break;
pythontech 0:5868e8752d44 444 case '\'': break;
pythontech 0:5868e8752d44 445 case '"': break;
pythontech 0:5868e8752d44 446 case 'a': c = 0x07; break;
pythontech 0:5868e8752d44 447 case 'b': c = 0x08; break;
pythontech 0:5868e8752d44 448 case 't': c = 0x09; break;
pythontech 0:5868e8752d44 449 case 'n': c = 0x0a; break;
pythontech 0:5868e8752d44 450 case 'v': c = 0x0b; break;
pythontech 0:5868e8752d44 451 case 'f': c = 0x0c; break;
pythontech 0:5868e8752d44 452 case 'r': c = 0x0d; break;
pythontech 0:5868e8752d44 453 case 'u':
pythontech 0:5868e8752d44 454 case 'U':
pythontech 0:5868e8752d44 455 if (is_bytes) {
pythontech 0:5868e8752d44 456 // b'\u1234' == b'\\u1234'
pythontech 0:5868e8752d44 457 vstr_add_char(&lex->vstr, '\\');
pythontech 0:5868e8752d44 458 break;
pythontech 0:5868e8752d44 459 }
pythontech 0:5868e8752d44 460 // Otherwise fall through.
pythontech 0:5868e8752d44 461 case 'x':
pythontech 0:5868e8752d44 462 {
pythontech 0:5868e8752d44 463 mp_uint_t num = 0;
pythontech 0:5868e8752d44 464 if (!get_hex(lex, (c == 'x' ? 2 : c == 'u' ? 4 : 8), &num)) {
pythontech 0:5868e8752d44 465 // not enough hex chars for escape sequence
pythontech 0:5868e8752d44 466 lex->tok_kind = MP_TOKEN_INVALID;
pythontech 0:5868e8752d44 467 }
pythontech 0:5868e8752d44 468 c = num;
pythontech 0:5868e8752d44 469 break;
pythontech 0:5868e8752d44 470 }
pythontech 0:5868e8752d44 471 case 'N':
pythontech 0:5868e8752d44 472 // Supporting '\N{LATIN SMALL LETTER A}' == 'a' would require keeping the
pythontech 0:5868e8752d44 473 // entire Unicode name table in the core. As of Unicode 6.3.0, that's nearly
pythontech 0:5868e8752d44 474 // 3MB of text; even gzip-compressed and with minimal structure, it'll take
pythontech 0:5868e8752d44 475 // roughly half a meg of storage. This form of Unicode escape may be added
pythontech 0:5868e8752d44 476 // later on, but it's definitely not a priority right now. -- CJA 20140607
pythontech 0:5868e8752d44 477 mp_not_implemented("unicode name escapes");
pythontech 0:5868e8752d44 478 break;
pythontech 0:5868e8752d44 479 default:
pythontech 0:5868e8752d44 480 if (c >= '0' && c <= '7') {
pythontech 0:5868e8752d44 481 // Octal sequence, 1-3 chars
pythontech 0:5868e8752d44 482 mp_uint_t digits = 3;
pythontech 0:5868e8752d44 483 mp_uint_t num = c - '0';
pythontech 0:5868e8752d44 484 while (is_following_odigit(lex) && --digits != 0) {
pythontech 0:5868e8752d44 485 next_char(lex);
pythontech 0:5868e8752d44 486 num = num * 8 + (CUR_CHAR(lex) - '0');
pythontech 0:5868e8752d44 487 }
pythontech 0:5868e8752d44 488 c = num;
pythontech 0:5868e8752d44 489 } else {
pythontech 0:5868e8752d44 490 // unrecognised escape character; CPython lets this through verbatim as '\' and then the character
pythontech 0:5868e8752d44 491 vstr_add_char(&lex->vstr, '\\');
pythontech 0:5868e8752d44 492 }
pythontech 0:5868e8752d44 493 break;
pythontech 0:5868e8752d44 494 }
pythontech 0:5868e8752d44 495 }
pythontech 0:5868e8752d44 496 if (c != MP_LEXER_EOF) {
pythontech 0:5868e8752d44 497 if (MICROPY_PY_BUILTINS_STR_UNICODE_DYNAMIC) {
pythontech 0:5868e8752d44 498 if (c < 0x110000 && !is_bytes) {
pythontech 0:5868e8752d44 499 vstr_add_char(&lex->vstr, c);
pythontech 0:5868e8752d44 500 } else if (c < 0x100 && is_bytes) {
pythontech 0:5868e8752d44 501 vstr_add_byte(&lex->vstr, c);
pythontech 0:5868e8752d44 502 } else {
pythontech 0:5868e8752d44 503 // unicode character out of range
pythontech 0:5868e8752d44 504 // this raises a generic SyntaxError; could provide more info
pythontech 0:5868e8752d44 505 lex->tok_kind = MP_TOKEN_INVALID;
pythontech 0:5868e8752d44 506 }
pythontech 0:5868e8752d44 507 } else {
pythontech 0:5868e8752d44 508 // without unicode everything is just added as an 8-bit byte
pythontech 0:5868e8752d44 509 if (c < 0x100) {
pythontech 0:5868e8752d44 510 vstr_add_byte(&lex->vstr, c);
pythontech 0:5868e8752d44 511 } else {
pythontech 0:5868e8752d44 512 // 8-bit character out of range
pythontech 0:5868e8752d44 513 // this raises a generic SyntaxError; could provide more info
pythontech 0:5868e8752d44 514 lex->tok_kind = MP_TOKEN_INVALID;
pythontech 0:5868e8752d44 515 }
pythontech 0:5868e8752d44 516 }
pythontech 0:5868e8752d44 517 }
pythontech 0:5868e8752d44 518 } else {
pythontech 0:5868e8752d44 519 // Add the "character" as a byte so that we remain 8-bit clean.
pythontech 0:5868e8752d44 520 // This way, strings are parsed correctly whether or not they contain utf-8 chars.
pythontech 0:5868e8752d44 521 vstr_add_byte(&lex->vstr, CUR_CHAR(lex));
pythontech 0:5868e8752d44 522 }
pythontech 0:5868e8752d44 523 }
pythontech 0:5868e8752d44 524 next_char(lex);
pythontech 0:5868e8752d44 525 }
pythontech 0:5868e8752d44 526
pythontech 0:5868e8752d44 527 // check we got the required end quotes
pythontech 0:5868e8752d44 528 if (n_closing < num_quotes) {
pythontech 0:5868e8752d44 529 lex->tok_kind = MP_TOKEN_LONELY_STRING_OPEN;
pythontech 0:5868e8752d44 530 }
pythontech 0:5868e8752d44 531
pythontech 0:5868e8752d44 532 // cut off the end quotes from the token text
pythontech 0:5868e8752d44 533 vstr_cut_tail_bytes(&lex->vstr, n_closing);
pythontech 0:5868e8752d44 534
pythontech 0:5868e8752d44 535 } else if (is_head_of_identifier(lex)) {
pythontech 0:5868e8752d44 536 lex->tok_kind = MP_TOKEN_NAME;
pythontech 0:5868e8752d44 537
pythontech 0:5868e8752d44 538 // get first char (add as byte to remain 8-bit clean and support utf-8)
pythontech 0:5868e8752d44 539 vstr_add_byte(&lex->vstr, CUR_CHAR(lex));
pythontech 0:5868e8752d44 540 next_char(lex);
pythontech 0:5868e8752d44 541
pythontech 0:5868e8752d44 542 // get tail chars
pythontech 0:5868e8752d44 543 while (!is_end(lex) && is_tail_of_identifier(lex)) {
pythontech 0:5868e8752d44 544 vstr_add_byte(&lex->vstr, CUR_CHAR(lex));
pythontech 0:5868e8752d44 545 next_char(lex);
pythontech 0:5868e8752d44 546 }
pythontech 0:5868e8752d44 547
pythontech 0:5868e8752d44 548 } else if (is_digit(lex) || (is_char(lex, '.') && is_following_digit(lex))) {
pythontech 0:5868e8752d44 549 bool forced_integer = false;
pythontech 0:5868e8752d44 550 if (is_char(lex, '.')) {
pythontech 0:5868e8752d44 551 lex->tok_kind = MP_TOKEN_FLOAT_OR_IMAG;
pythontech 0:5868e8752d44 552 } else {
pythontech 0:5868e8752d44 553 lex->tok_kind = MP_TOKEN_INTEGER;
pythontech 0:5868e8752d44 554 if (is_char(lex, '0') && is_following_base_char(lex)) {
pythontech 0:5868e8752d44 555 forced_integer = true;
pythontech 0:5868e8752d44 556 }
pythontech 0:5868e8752d44 557 }
pythontech 0:5868e8752d44 558
pythontech 0:5868e8752d44 559 // get first char
pythontech 0:5868e8752d44 560 vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech 0:5868e8752d44 561 next_char(lex);
pythontech 0:5868e8752d44 562
pythontech 0:5868e8752d44 563 // get tail chars
pythontech 0:5868e8752d44 564 while (!is_end(lex)) {
pythontech 0:5868e8752d44 565 if (!forced_integer && is_char_or(lex, 'e', 'E')) {
pythontech 0:5868e8752d44 566 lex->tok_kind = MP_TOKEN_FLOAT_OR_IMAG;
pythontech 0:5868e8752d44 567 vstr_add_char(&lex->vstr, 'e');
pythontech 0:5868e8752d44 568 next_char(lex);
pythontech 0:5868e8752d44 569 if (is_char(lex, '+') || is_char(lex, '-')) {
pythontech 0:5868e8752d44 570 vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech 0:5868e8752d44 571 next_char(lex);
pythontech 0:5868e8752d44 572 }
pythontech 0:5868e8752d44 573 } else if (is_letter(lex) || is_digit(lex) || is_char(lex, '.')) {
pythontech 0:5868e8752d44 574 if (is_char_or3(lex, '.', 'j', 'J')) {
pythontech 0:5868e8752d44 575 lex->tok_kind = MP_TOKEN_FLOAT_OR_IMAG;
pythontech 0:5868e8752d44 576 }
pythontech 0:5868e8752d44 577 vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech 0:5868e8752d44 578 next_char(lex);
pythontech 0:5868e8752d44 579 } else {
pythontech 0:5868e8752d44 580 break;
pythontech 0:5868e8752d44 581 }
pythontech 0:5868e8752d44 582 }
pythontech 0:5868e8752d44 583
pythontech 0:5868e8752d44 584 } else if (is_char(lex, '.')) {
pythontech 0:5868e8752d44 585 // special handling for . and ... operators, because .. is not a valid operator
pythontech 0:5868e8752d44 586
pythontech 0:5868e8752d44 587 // get first char
pythontech 0:5868e8752d44 588 vstr_add_char(&lex->vstr, '.');
pythontech 0:5868e8752d44 589 next_char(lex);
pythontech 0:5868e8752d44 590
pythontech 0:5868e8752d44 591 if (is_char_and(lex, '.', '.')) {
pythontech 0:5868e8752d44 592 vstr_add_char(&lex->vstr, '.');
pythontech 0:5868e8752d44 593 vstr_add_char(&lex->vstr, '.');
pythontech 0:5868e8752d44 594 next_char(lex);
pythontech 0:5868e8752d44 595 next_char(lex);
pythontech 0:5868e8752d44 596 lex->tok_kind = MP_TOKEN_ELLIPSIS;
pythontech 0:5868e8752d44 597 } else {
pythontech 0:5868e8752d44 598 lex->tok_kind = MP_TOKEN_DEL_PERIOD;
pythontech 0:5868e8752d44 599 }
pythontech 0:5868e8752d44 600
pythontech 0:5868e8752d44 601 } else {
pythontech 0:5868e8752d44 602 // search for encoded delimiter or operator
pythontech 0:5868e8752d44 603
pythontech 0:5868e8752d44 604 const char *t = tok_enc;
pythontech 0:5868e8752d44 605 mp_uint_t tok_enc_index = 0;
pythontech 0:5868e8752d44 606 for (; *t != 0 && !is_char(lex, *t); t += 1) {
pythontech 0:5868e8752d44 607 if (*t == 'e' || *t == 'c') {
pythontech 0:5868e8752d44 608 t += 1;
pythontech 0:5868e8752d44 609 } else if (*t == 'E') {
pythontech 0:5868e8752d44 610 tok_enc_index -= 1;
pythontech 0:5868e8752d44 611 t += 1;
pythontech 0:5868e8752d44 612 }
pythontech 0:5868e8752d44 613 tok_enc_index += 1;
pythontech 0:5868e8752d44 614 }
pythontech 0:5868e8752d44 615
pythontech 0:5868e8752d44 616 next_char(lex);
pythontech 0:5868e8752d44 617
pythontech 0:5868e8752d44 618 if (*t == 0) {
pythontech 0:5868e8752d44 619 // didn't match any delimiter or operator characters
pythontech 0:5868e8752d44 620 lex->tok_kind = MP_TOKEN_INVALID;
pythontech 0:5868e8752d44 621
pythontech 0:5868e8752d44 622 } else {
pythontech 0:5868e8752d44 623 // matched a delimiter or operator character
pythontech 0:5868e8752d44 624
pythontech 0:5868e8752d44 625 // get the maximum characters for a valid token
pythontech 0:5868e8752d44 626 t += 1;
pythontech 0:5868e8752d44 627 mp_uint_t t_index = tok_enc_index;
pythontech 0:5868e8752d44 628 for (;;) {
pythontech 0:5868e8752d44 629 for (; *t == 'e'; t += 1) {
pythontech 0:5868e8752d44 630 t += 1;
pythontech 0:5868e8752d44 631 t_index += 1;
pythontech 0:5868e8752d44 632 if (is_char(lex, *t)) {
pythontech 0:5868e8752d44 633 next_char(lex);
pythontech 0:5868e8752d44 634 tok_enc_index = t_index;
pythontech 0:5868e8752d44 635 break;
pythontech 0:5868e8752d44 636 }
pythontech 0:5868e8752d44 637 }
pythontech 0:5868e8752d44 638
pythontech 0:5868e8752d44 639 if (*t == 'E') {
pythontech 0:5868e8752d44 640 t += 1;
pythontech 0:5868e8752d44 641 if (is_char(lex, *t)) {
pythontech 0:5868e8752d44 642 next_char(lex);
pythontech 0:5868e8752d44 643 tok_enc_index = t_index;
pythontech 0:5868e8752d44 644 } else {
pythontech 0:5868e8752d44 645 lex->tok_kind = MP_TOKEN_INVALID;
pythontech 0:5868e8752d44 646 goto tok_enc_no_match;
pythontech 0:5868e8752d44 647 }
pythontech 0:5868e8752d44 648 break;
pythontech 0:5868e8752d44 649 }
pythontech 0:5868e8752d44 650
pythontech 0:5868e8752d44 651 if (*t == 'c') {
pythontech 0:5868e8752d44 652 t += 1;
pythontech 0:5868e8752d44 653 t_index += 1;
pythontech 0:5868e8752d44 654 if (is_char(lex, *t)) {
pythontech 0:5868e8752d44 655 next_char(lex);
pythontech 0:5868e8752d44 656 tok_enc_index = t_index;
pythontech 0:5868e8752d44 657 t += 1;
pythontech 0:5868e8752d44 658 } else {
pythontech 0:5868e8752d44 659 break;
pythontech 0:5868e8752d44 660 }
pythontech 0:5868e8752d44 661 } else {
pythontech 0:5868e8752d44 662 break;
pythontech 0:5868e8752d44 663 }
pythontech 0:5868e8752d44 664 }
pythontech 0:5868e8752d44 665
pythontech 0:5868e8752d44 666 // set token kind
pythontech 0:5868e8752d44 667 lex->tok_kind = tok_enc_kind[tok_enc_index];
pythontech 0:5868e8752d44 668
pythontech 0:5868e8752d44 669 tok_enc_no_match:
pythontech 0:5868e8752d44 670
pythontech 0:5868e8752d44 671 // compute bracket level for implicit line joining
pythontech 0:5868e8752d44 672 if (lex->tok_kind == MP_TOKEN_DEL_PAREN_OPEN || lex->tok_kind == MP_TOKEN_DEL_BRACKET_OPEN || lex->tok_kind == MP_TOKEN_DEL_BRACE_OPEN) {
pythontech 0:5868e8752d44 673 lex->nested_bracket_level += 1;
pythontech 0:5868e8752d44 674 } else if (lex->tok_kind == MP_TOKEN_DEL_PAREN_CLOSE || lex->tok_kind == MP_TOKEN_DEL_BRACKET_CLOSE || lex->tok_kind == MP_TOKEN_DEL_BRACE_CLOSE) {
pythontech 0:5868e8752d44 675 lex->nested_bracket_level -= 1;
pythontech 0:5868e8752d44 676 }
pythontech 0:5868e8752d44 677 }
pythontech 0:5868e8752d44 678 }
pythontech 0:5868e8752d44 679
pythontech 0:5868e8752d44 680 // check for keywords
pythontech 0:5868e8752d44 681 if (lex->tok_kind == MP_TOKEN_NAME) {
pythontech 0:5868e8752d44 682 // We check for __debug__ here and convert it to its value. This is so
pythontech 0:5868e8752d44 683 // the parser gives a syntax error on, eg, x.__debug__. Otherwise, we
pythontech 0:5868e8752d44 684 // need to check for this special token in many places in the compiler.
pythontech 0:5868e8752d44 685 // TODO improve speed of these string comparisons
pythontech 0:5868e8752d44 686 //for (mp_int_t i = 0; tok_kw[i] != NULL; i++) {
pythontech 0:5868e8752d44 687 for (size_t i = 0; i < MP_ARRAY_SIZE(tok_kw); i++) {
pythontech 0:5868e8752d44 688 if (str_strn_equal(tok_kw[i], lex->vstr.buf, lex->vstr.len)) {
pythontech 0:5868e8752d44 689 if (i == MP_ARRAY_SIZE(tok_kw) - 1) {
pythontech 0:5868e8752d44 690 // tok_kw[MP_ARRAY_SIZE(tok_kw) - 1] == "__debug__"
pythontech 0:5868e8752d44 691 lex->tok_kind = (MP_STATE_VM(mp_optimise_value) == 0 ? MP_TOKEN_KW_TRUE : MP_TOKEN_KW_FALSE);
pythontech 0:5868e8752d44 692 } else {
pythontech 0:5868e8752d44 693 lex->tok_kind = MP_TOKEN_KW_FALSE + i;
pythontech 0:5868e8752d44 694 }
pythontech 0:5868e8752d44 695 break;
pythontech 0:5868e8752d44 696 }
pythontech 0:5868e8752d44 697 }
pythontech 0:5868e8752d44 698 }
pythontech 0:5868e8752d44 699 }
pythontech 0:5868e8752d44 700
pythontech 0:5868e8752d44 701 mp_lexer_t *mp_lexer_new(qstr src_name, void *stream_data, mp_lexer_stream_next_byte_t stream_next_byte, mp_lexer_stream_close_t stream_close) {
pythontech 0:5868e8752d44 702 mp_lexer_t *lex = m_new_obj_maybe(mp_lexer_t);
pythontech 0:5868e8752d44 703
pythontech 0:5868e8752d44 704 // check for memory allocation error
pythontech 0:5868e8752d44 705 if (lex == NULL) {
pythontech 0:5868e8752d44 706 if (stream_close) {
pythontech 0:5868e8752d44 707 stream_close(stream_data);
pythontech 0:5868e8752d44 708 }
pythontech 0:5868e8752d44 709 return NULL;
pythontech 0:5868e8752d44 710 }
pythontech 0:5868e8752d44 711
pythontech 0:5868e8752d44 712 lex->source_name = src_name;
pythontech 0:5868e8752d44 713 lex->stream_data = stream_data;
pythontech 0:5868e8752d44 714 lex->stream_next_byte = stream_next_byte;
pythontech 0:5868e8752d44 715 lex->stream_close = stream_close;
pythontech 0:5868e8752d44 716 lex->line = 1;
pythontech 0:5868e8752d44 717 lex->column = 1;
pythontech 0:5868e8752d44 718 lex->emit_dent = 0;
pythontech 0:5868e8752d44 719 lex->nested_bracket_level = 0;
pythontech 0:5868e8752d44 720 lex->alloc_indent_level = MICROPY_ALLOC_LEXER_INDENT_INIT;
pythontech 0:5868e8752d44 721 lex->num_indent_level = 1;
pythontech 0:5868e8752d44 722 lex->indent_level = m_new_maybe(uint16_t, lex->alloc_indent_level);
pythontech 0:5868e8752d44 723 vstr_init(&lex->vstr, 32);
pythontech 0:5868e8752d44 724
pythontech 0:5868e8752d44 725 // check for memory allocation error
pythontech 0:5868e8752d44 726 if (lex->indent_level == NULL || vstr_had_error(&lex->vstr)) {
pythontech 0:5868e8752d44 727 mp_lexer_free(lex);
pythontech 0:5868e8752d44 728 return NULL;
pythontech 0:5868e8752d44 729 }
pythontech 0:5868e8752d44 730
pythontech 0:5868e8752d44 731 // store sentinel for first indentation level
pythontech 0:5868e8752d44 732 lex->indent_level[0] = 0;
pythontech 0:5868e8752d44 733
pythontech 0:5868e8752d44 734 // preload characters
pythontech 0:5868e8752d44 735 lex->chr0 = stream_next_byte(stream_data);
pythontech 0:5868e8752d44 736 lex->chr1 = stream_next_byte(stream_data);
pythontech 0:5868e8752d44 737 lex->chr2 = stream_next_byte(stream_data);
pythontech 0:5868e8752d44 738
pythontech 0:5868e8752d44 739 // if input stream is 0, 1 or 2 characters long and doesn't end in a newline, then insert a newline at the end
pythontech 0:5868e8752d44 740 if (lex->chr0 == MP_LEXER_EOF) {
pythontech 0:5868e8752d44 741 lex->chr0 = '\n';
pythontech 0:5868e8752d44 742 } else if (lex->chr1 == MP_LEXER_EOF) {
pythontech 0:5868e8752d44 743 if (lex->chr0 == '\r') {
pythontech 0:5868e8752d44 744 lex->chr0 = '\n';
pythontech 0:5868e8752d44 745 } else if (lex->chr0 != '\n') {
pythontech 0:5868e8752d44 746 lex->chr1 = '\n';
pythontech 0:5868e8752d44 747 }
pythontech 0:5868e8752d44 748 } else if (lex->chr2 == MP_LEXER_EOF) {
pythontech 0:5868e8752d44 749 if (lex->chr1 == '\r') {
pythontech 0:5868e8752d44 750 lex->chr1 = '\n';
pythontech 0:5868e8752d44 751 } else if (lex->chr1 != '\n') {
pythontech 0:5868e8752d44 752 lex->chr2 = '\n';
pythontech 0:5868e8752d44 753 }
pythontech 0:5868e8752d44 754 }
pythontech 0:5868e8752d44 755
pythontech 0:5868e8752d44 756 // preload first token
pythontech 0:5868e8752d44 757 mp_lexer_next_token_into(lex, true);
pythontech 0:5868e8752d44 758
pythontech 0:5868e8752d44 759 return lex;
pythontech 0:5868e8752d44 760 }
pythontech 0:5868e8752d44 761
pythontech 0:5868e8752d44 762 void mp_lexer_free(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 763 if (lex) {
pythontech 0:5868e8752d44 764 if (lex->stream_close) {
pythontech 0:5868e8752d44 765 lex->stream_close(lex->stream_data);
pythontech 0:5868e8752d44 766 }
pythontech 0:5868e8752d44 767 vstr_clear(&lex->vstr);
pythontech 0:5868e8752d44 768 m_del(uint16_t, lex->indent_level, lex->alloc_indent_level);
pythontech 0:5868e8752d44 769 m_del_obj(mp_lexer_t, lex);
pythontech 0:5868e8752d44 770 }
pythontech 0:5868e8752d44 771 }
pythontech 0:5868e8752d44 772
pythontech 0:5868e8752d44 773 void mp_lexer_to_next(mp_lexer_t *lex) {
pythontech 0:5868e8752d44 774 mp_lexer_next_token_into(lex, false);
pythontech 0:5868e8752d44 775 }
pythontech 0:5868e8752d44 776
pythontech 0:5868e8752d44 777 #if MICROPY_DEBUG_PRINTERS
pythontech 0:5868e8752d44 778 void mp_lexer_show_token(const mp_lexer_t *lex) {
pythontech 0:5868e8752d44 779 printf("(" UINT_FMT ":" UINT_FMT ") kind:%u str:%p len:%zu", lex->tok_line, lex->tok_column, lex->tok_kind, lex->vstr.buf, lex->vstr.len);
pythontech 0:5868e8752d44 780 if (lex->vstr.len > 0) {
pythontech 0:5868e8752d44 781 const byte *i = (const byte *)lex->vstr.buf;
pythontech 0:5868e8752d44 782 const byte *j = (const byte *)i + lex->vstr.len;
pythontech 0:5868e8752d44 783 printf(" ");
pythontech 0:5868e8752d44 784 while (i < j) {
pythontech 0:5868e8752d44 785 unichar c = utf8_get_char(i);
pythontech 0:5868e8752d44 786 i = utf8_next_char(i);
pythontech 0:5868e8752d44 787 if (unichar_isprint(c)) {
pythontech 0:5868e8752d44 788 printf("%c", (int)c);
pythontech 0:5868e8752d44 789 } else {
pythontech 0:5868e8752d44 790 printf("?");
pythontech 0:5868e8752d44 791 }
pythontech 0:5868e8752d44 792 }
pythontech 0:5868e8752d44 793 }
pythontech 0:5868e8752d44 794 printf("\n");
pythontech 0:5868e8752d44 795 }
pythontech 0:5868e8752d44 796 #endif
pythontech 0:5868e8752d44 797
pythontech 0:5868e8752d44 798 #endif // MICROPY_ENABLE_COMPILER