micropython - Port of MicroPython to the mbed platform. See mi…

Users » infinnovation » Code » micropython

Port of MicroPython to the mbed platform. See micropython-repl for an interactive program.

This a port of MicroPython to the mbed Classic platform.

This provides an interpreter running on the board's USB serial connection.

Getting Started

Import the micropython-repl program into your IDE workspace on developer.mbed.org. Compile and download to your board. Connect to the USB serial port in your usual manner. You should get a startup message similar to the following:

  MicroPython v1.7-155-gdddcdd8 on 2016-04-23; K64F with ARM
  Type "help()" for more information.
  >>>

Then you can start using micropython. For example:

  >>> from mbed import DigitalOut
  >>> from pins import LED1
  >>> led = DigitalOut(LED1)
  >>> led.write(1)

Requirements

You need approximately 100K of flash memory, so this will be no good for boards with smaller amounts of storage.

Caveats

This can be considered an alpha release of the port; things may not work; APIs may change in later releases. It is NOT an official part part the micropython project, so if anything doesn't work, blame me. If it does work, most of the credit is due to micropython.

Only a few of the mbed classes are available in micropython so far, and not all methods of those that are.

Only a few boards have their full range of pin names available; for others, only a few standard ones (USBTX, USBRX, LED1) are implemented.

The garbage collector is not yet implemented. The interpreter will gradually consume memory and then fail.

Exceptions from the mbed classes are not yet handled.

Asynchronous processing (e.g. events on inputs) is not supported.

Credits

Damien P. George and other contributors who created micropython.

Colin Hogben, author of this port.

py/lexer.c@10:33521d742af1, 2016-04-27 (annotated)

Committer:: Colin Hogben
Date:: Wed Apr 27 22:11:29 2016 +0100
Revision:: 10:33521d742af1
Parent:: 2:c89e95946844

Update README and version

Who changed what in which revision?

User	Revision	Line number	New contents of line
pythontech	0:5868e8752d44	1	/*
pythontech	0:5868e8752d44	2	* This file is part of the Micro Python project, http://micropython.org/
pythontech	0:5868e8752d44	3	*
pythontech	0:5868e8752d44	4	* The MIT License (MIT)
pythontech	0:5868e8752d44	5	*
pythontech	0:5868e8752d44	6	* Copyright (c) 2013, 2014 Damien P. George
pythontech	0:5868e8752d44	7	*
pythontech	0:5868e8752d44	8	* Permission is hereby granted, free of charge, to any person obtaining a copy
pythontech	0:5868e8752d44	9	* of this software and associated documentation files (the "Software"), to deal
pythontech	0:5868e8752d44	10	* in the Software without restriction, including without limitation the rights
pythontech	0:5868e8752d44	11	* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
pythontech	0:5868e8752d44	12	* copies of the Software, and to permit persons to whom the Software is
pythontech	0:5868e8752d44	13	* furnished to do so, subject to the following conditions:
pythontech	0:5868e8752d44	14	*
pythontech	0:5868e8752d44	15	* The above copyright notice and this permission notice shall be included in
pythontech	0:5868e8752d44	16	* all copies or substantial portions of the Software.
pythontech	0:5868e8752d44	17	*
pythontech	0:5868e8752d44	18	* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
pythontech	0:5868e8752d44	19	* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
pythontech	0:5868e8752d44	20	* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
pythontech	0:5868e8752d44	21	* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
pythontech	0:5868e8752d44	22	* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
pythontech	0:5868e8752d44	23	* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
pythontech	0:5868e8752d44	24	* THE SOFTWARE.
pythontech	0:5868e8752d44	25	*/
pythontech	0:5868e8752d44	26
pythontech	0:5868e8752d44	27	#include <stdio.h>
pythontech	0:5868e8752d44	28	#include <assert.h>
pythontech	0:5868e8752d44	29
pythontech	0:5868e8752d44	30	#include "py/mpstate.h"
pythontech	0:5868e8752d44	31	#include "py/lexer.h"
pythontech	0:5868e8752d44	32	#include "py/runtime.h"
pythontech	0:5868e8752d44	33
pythontech	0:5868e8752d44	34	#if MICROPY_ENABLE_COMPILER
pythontech	0:5868e8752d44	35
pythontech	0:5868e8752d44	36	#define TAB_SIZE (8)
pythontech	0:5868e8752d44	37
pythontech	0:5868e8752d44	38	// TODO seems that CPython allows NULL byte in the input stream
pythontech	0:5868e8752d44	39	// don't know if that's intentional or not, but we don't allow it
pythontech	0:5868e8752d44	40
pythontech	0:5868e8752d44	41	// TODO replace with a call to a standard function
pythontech	0:5868e8752d44	42	STATIC bool str_strn_equal(const char str, const char strn, mp_uint_t len) {
pythontech	0:5868e8752d44	43	mp_uint_t i = 0;
pythontech	0:5868e8752d44	44
pythontech	0:5868e8752d44	45	while (i < len && str == strn) {
pythontech	0:5868e8752d44	46	++i;
pythontech	0:5868e8752d44	47	++str;
pythontech	0:5868e8752d44	48	++strn;
pythontech	0:5868e8752d44	49	}
pythontech	0:5868e8752d44	50
pythontech	0:5868e8752d44	51	return i == len && *str == 0;
pythontech	0:5868e8752d44	52	}
pythontech	0:5868e8752d44	53
pythontech	0:5868e8752d44	54	#define CUR_CHAR(lex) ((lex)->chr0)
pythontech	0:5868e8752d44	55
pythontech	0:5868e8752d44	56	STATIC bool is_end(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	57	return lex->chr0 == MP_LEXER_EOF;
pythontech	0:5868e8752d44	58	}
pythontech	0:5868e8752d44	59
pythontech	0:5868e8752d44	60	STATIC bool is_physical_newline(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	61	return lex->chr0 == '\n';
pythontech	0:5868e8752d44	62	}
pythontech	0:5868e8752d44	63
pythontech	0:5868e8752d44	64	STATIC bool is_char(mp_lexer_t *lex, byte c) {
pythontech	0:5868e8752d44	65	return lex->chr0 == c;
pythontech	0:5868e8752d44	66	}
pythontech	0:5868e8752d44	67
pythontech	0:5868e8752d44	68	STATIC bool is_char_or(mp_lexer_t *lex, byte c1, byte c2) {
pythontech	0:5868e8752d44	69	return lex->chr0 == c1 \|\| lex->chr0 == c2;
pythontech	0:5868e8752d44	70	}
pythontech	0:5868e8752d44	71
pythontech	0:5868e8752d44	72	STATIC bool is_char_or3(mp_lexer_t *lex, byte c1, byte c2, byte c3) {
pythontech	0:5868e8752d44	73	return lex->chr0 == c1 \|\| lex->chr0 == c2 \|\| lex->chr0 == c3;
pythontech	0:5868e8752d44	74	}
pythontech	0:5868e8752d44	75
pythontech	0:5868e8752d44	76	/*
pythontech	0:5868e8752d44	77	STATIC bool is_char_following(mp_lexer_t *lex, byte c) {
pythontech	0:5868e8752d44	78	return lex->chr1 == c;
pythontech	0:5868e8752d44	79	}
pythontech	0:5868e8752d44	80	*/
pythontech	0:5868e8752d44	81
pythontech	0:5868e8752d44	82	STATIC bool is_char_following_or(mp_lexer_t *lex, byte c1, byte c2) {
pythontech	0:5868e8752d44	83	return lex->chr1 == c1 \|\| lex->chr1 == c2;
pythontech	0:5868e8752d44	84	}
pythontech	0:5868e8752d44	85
pythontech	0:5868e8752d44	86	STATIC bool is_char_following_following_or(mp_lexer_t *lex, byte c1, byte c2) {
pythontech	0:5868e8752d44	87	return lex->chr2 == c1 \|\| lex->chr2 == c2;
pythontech	0:5868e8752d44	88	}
pythontech	0:5868e8752d44	89
pythontech	0:5868e8752d44	90	STATIC bool is_char_and(mp_lexer_t *lex, byte c1, byte c2) {
pythontech	0:5868e8752d44	91	return lex->chr0 == c1 && lex->chr1 == c2;
pythontech	0:5868e8752d44	92	}
pythontech	0:5868e8752d44	93
pythontech	0:5868e8752d44	94	STATIC bool is_whitespace(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	95	return unichar_isspace(lex->chr0);
pythontech	0:5868e8752d44	96	}
pythontech	0:5868e8752d44	97
pythontech	0:5868e8752d44	98	STATIC bool is_letter(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	99	return unichar_isalpha(lex->chr0);
pythontech	0:5868e8752d44	100	}
pythontech	0:5868e8752d44	101
pythontech	0:5868e8752d44	102	STATIC bool is_digit(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	103	return unichar_isdigit(lex->chr0);
pythontech	0:5868e8752d44	104	}
pythontech	0:5868e8752d44	105
pythontech	0:5868e8752d44	106	STATIC bool is_following_digit(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	107	return unichar_isdigit(lex->chr1);
pythontech	0:5868e8752d44	108	}
pythontech	0:5868e8752d44	109
pythontech	0:5868e8752d44	110	STATIC bool is_following_base_char(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	111	const unichar chr1 = lex->chr1 \| 0x20;
pythontech	0:5868e8752d44	112	return chr1 == 'b' \|\| chr1 == 'o' \|\| chr1 == 'x';
pythontech	0:5868e8752d44	113	}
pythontech	0:5868e8752d44	114
pythontech	0:5868e8752d44	115	STATIC bool is_following_odigit(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	116	return lex->chr1 >= '0' && lex->chr1 <= '7';
pythontech	0:5868e8752d44	117	}
pythontech	0:5868e8752d44	118
pythontech	0:5868e8752d44	119	// to easily parse utf-8 identifiers we allow any raw byte with high bit set
pythontech	0:5868e8752d44	120	STATIC bool is_head_of_identifier(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	121	return is_letter(lex) \|\| lex->chr0 == '_' \|\| lex->chr0 >= 0x80;
pythontech	0:5868e8752d44	122	}
pythontech	0:5868e8752d44	123
pythontech	0:5868e8752d44	124	STATIC bool is_tail_of_identifier(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	125	return is_head_of_identifier(lex) \|\| is_digit(lex);
pythontech	0:5868e8752d44	126	}
pythontech	0:5868e8752d44	127
pythontech	0:5868e8752d44	128	STATIC void next_char(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	129	if (lex->chr0 == MP_LEXER_EOF) {
pythontech	0:5868e8752d44	130	return;
pythontech	0:5868e8752d44	131	}
pythontech	0:5868e8752d44	132
pythontech	0:5868e8752d44	133	if (lex->chr0 == '\n') {
pythontech	0:5868e8752d44	134	// a new line
pythontech	0:5868e8752d44	135	++lex->line;
pythontech	0:5868e8752d44	136	lex->column = 1;
pythontech	0:5868e8752d44	137	} else if (lex->chr0 == '\t') {
pythontech	0:5868e8752d44	138	// a tab
pythontech	0:5868e8752d44	139	lex->column = (((lex->column - 1 + TAB_SIZE) / TAB_SIZE) * TAB_SIZE) + 1;
pythontech	0:5868e8752d44	140	} else {
pythontech	0:5868e8752d44	141	// a character worth one column
pythontech	0:5868e8752d44	142	++lex->column;
pythontech	0:5868e8752d44	143	}
pythontech	0:5868e8752d44	144
pythontech	0:5868e8752d44	145	lex->chr0 = lex->chr1;
pythontech	0:5868e8752d44	146	lex->chr1 = lex->chr2;
pythontech	0:5868e8752d44	147	lex->chr2 = lex->stream_next_byte(lex->stream_data);
pythontech	0:5868e8752d44	148
pythontech	0:5868e8752d44	149	if (lex->chr0 == '\r') {
pythontech	0:5868e8752d44	150	// CR is a new line, converted to LF
pythontech	0:5868e8752d44	151	lex->chr0 = '\n';
pythontech	0:5868e8752d44	152	if (lex->chr1 == '\n') {
pythontech	0:5868e8752d44	153	// CR LF is a single new line
pythontech	0:5868e8752d44	154	lex->chr1 = lex->chr2;
pythontech	0:5868e8752d44	155	lex->chr2 = lex->stream_next_byte(lex->stream_data);
pythontech	0:5868e8752d44	156	}
pythontech	0:5868e8752d44	157	}
pythontech	0:5868e8752d44	158
pythontech	0:5868e8752d44	159	if (lex->chr2 == MP_LEXER_EOF) {
pythontech	0:5868e8752d44	160	// EOF, check if we need to insert a newline at end of file
pythontech	0:5868e8752d44	161	if (lex->chr1 != MP_LEXER_EOF && lex->chr1 != '\n') {
pythontech	0:5868e8752d44	162	// if lex->chr1 == '\r' then this makes a CR LF which will be converted to LF above
pythontech	0:5868e8752d44	163	// otherwise it just inserts a LF
pythontech	0:5868e8752d44	164	lex->chr2 = '\n';
pythontech	0:5868e8752d44	165	}
pythontech	0:5868e8752d44	166	}
pythontech	0:5868e8752d44	167	}
pythontech	0:5868e8752d44	168
pythontech	0:5868e8752d44	169	STATIC void indent_push(mp_lexer_t *lex, mp_uint_t indent) {
pythontech	0:5868e8752d44	170	if (lex->num_indent_level >= lex->alloc_indent_level) {
pythontech	0:5868e8752d44	171	// TODO use m_renew_maybe and somehow indicate an error if it fails... probably by using MP_TOKEN_MEMORY_ERROR
pythontech	0:5868e8752d44	172	lex->indent_level = m_renew(uint16_t, lex->indent_level, lex->alloc_indent_level, lex->alloc_indent_level + MICROPY_ALLOC_LEXEL_INDENT_INC);
pythontech	0:5868e8752d44	173	lex->alloc_indent_level += MICROPY_ALLOC_LEXEL_INDENT_INC;
pythontech	0:5868e8752d44	174	}
pythontech	0:5868e8752d44	175	lex->indent_level[lex->num_indent_level++] = indent;
pythontech	0:5868e8752d44	176	}
pythontech	0:5868e8752d44	177
pythontech	0:5868e8752d44	178	STATIC mp_uint_t indent_top(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	179	return lex->indent_level[lex->num_indent_level - 1];
pythontech	0:5868e8752d44	180	}
pythontech	0:5868e8752d44	181
pythontech	0:5868e8752d44	182	STATIC void indent_pop(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	183	lex->num_indent_level -= 1;
pythontech	0:5868e8752d44	184	}
pythontech	0:5868e8752d44	185
pythontech	0:5868e8752d44	186	// some tricky operator encoding:
pythontech	0:5868e8752d44	187	// <op> = begin with <op>, if this opchar matches then begin here
pythontech	0:5868e8752d44	188	// e<op> = end with <op>, if this opchar matches then end
pythontech	0:5868e8752d44	189	// E<op> = mandatory end with <op>, this opchar must match, then end
pythontech	0:5868e8752d44	190	// c<op> = continue with <op>, if this opchar matches then continue matching
pythontech	0:5868e8752d44	191	// this means if the start of two ops are the same then they are equal til the last char
pythontech	0:5868e8752d44	192
pythontech	0:5868e8752d44	193	STATIC const char *tok_enc =
pythontech	0:5868e8752d44	194	"()[]{},:;@~" // singles
pythontech	0:5868e8752d44	195	"<e=c<e=" // < <= << <<=
pythontech	0:5868e8752d44	196	">e=c>e=" // > >= >> >>=
pythontech	0:5868e8752d44	197	"e=ce=" // * = * **=
pythontech	0:5868e8752d44	198	"+e=" // + +=
pythontech	0:5868e8752d44	199	"-e=e>" // - -= ->
pythontech	0:5868e8752d44	200	"&e=" // & &=
pythontech	0:5868e8752d44	201	"\|e=" // \| \|=
pythontech	0:5868e8752d44	202	"/e=c/e=" // / /= // //=
pythontech	0:5868e8752d44	203	"%e=" // % %=
pythontech	0:5868e8752d44	204	"^e=" // ^ ^=
pythontech	0:5868e8752d44	205	"=e=" // = ==
pythontech	0:5868e8752d44	206	"!E="; // !=
pythontech	0:5868e8752d44	207
pythontech	0:5868e8752d44	208	// TODO static assert that number of tokens is less than 256 so we can safely make this table with byte sized entries
pythontech	0:5868e8752d44	209	STATIC const uint8_t tok_enc_kind[] = {
pythontech	0:5868e8752d44	210	MP_TOKEN_DEL_PAREN_OPEN, MP_TOKEN_DEL_PAREN_CLOSE,
pythontech	0:5868e8752d44	211	MP_TOKEN_DEL_BRACKET_OPEN, MP_TOKEN_DEL_BRACKET_CLOSE,
pythontech	0:5868e8752d44	212	MP_TOKEN_DEL_BRACE_OPEN, MP_TOKEN_DEL_BRACE_CLOSE,
pythontech	0:5868e8752d44	213	MP_TOKEN_DEL_COMMA, MP_TOKEN_DEL_COLON, MP_TOKEN_DEL_SEMICOLON, MP_TOKEN_DEL_AT, MP_TOKEN_OP_TILDE,
pythontech	0:5868e8752d44	214
pythontech	0:5868e8752d44	215	MP_TOKEN_OP_LESS, MP_TOKEN_OP_LESS_EQUAL, MP_TOKEN_OP_DBL_LESS, MP_TOKEN_DEL_DBL_LESS_EQUAL,
pythontech	0:5868e8752d44	216	MP_TOKEN_OP_MORE, MP_TOKEN_OP_MORE_EQUAL, MP_TOKEN_OP_DBL_MORE, MP_TOKEN_DEL_DBL_MORE_EQUAL,
pythontech	0:5868e8752d44	217	MP_TOKEN_OP_STAR, MP_TOKEN_DEL_STAR_EQUAL, MP_TOKEN_OP_DBL_STAR, MP_TOKEN_DEL_DBL_STAR_EQUAL,
pythontech	0:5868e8752d44	218	MP_TOKEN_OP_PLUS, MP_TOKEN_DEL_PLUS_EQUAL,
pythontech	0:5868e8752d44	219	MP_TOKEN_OP_MINUS, MP_TOKEN_DEL_MINUS_EQUAL, MP_TOKEN_DEL_MINUS_MORE,
pythontech	0:5868e8752d44	220	MP_TOKEN_OP_AMPERSAND, MP_TOKEN_DEL_AMPERSAND_EQUAL,
pythontech	0:5868e8752d44	221	MP_TOKEN_OP_PIPE, MP_TOKEN_DEL_PIPE_EQUAL,
pythontech	0:5868e8752d44	222	MP_TOKEN_OP_SLASH, MP_TOKEN_DEL_SLASH_EQUAL, MP_TOKEN_OP_DBL_SLASH, MP_TOKEN_DEL_DBL_SLASH_EQUAL,
pythontech	0:5868e8752d44	223	MP_TOKEN_OP_PERCENT, MP_TOKEN_DEL_PERCENT_EQUAL,
pythontech	0:5868e8752d44	224	MP_TOKEN_OP_CARET, MP_TOKEN_DEL_CARET_EQUAL,
pythontech	0:5868e8752d44	225	MP_TOKEN_DEL_EQUAL, MP_TOKEN_OP_DBL_EQUAL,
pythontech	0:5868e8752d44	226	MP_TOKEN_OP_NOT_EQUAL,
pythontech	0:5868e8752d44	227	};
pythontech	0:5868e8752d44	228
pythontech	0:5868e8752d44	229	// must have the same order as enum in lexer.h
pythontech	0:5868e8752d44	230	STATIC const char *tok_kw[] = {
pythontech	0:5868e8752d44	231	"False",
pythontech	0:5868e8752d44	232	"None",
pythontech	0:5868e8752d44	233	"True",
pythontech	0:5868e8752d44	234	"and",
pythontech	0:5868e8752d44	235	"as",
pythontech	0:5868e8752d44	236	"assert",
Colin Hogben	2:c89e95946844	237	#if MICROPY_PY_ASYNC_AWAIT
Colin Hogben	2:c89e95946844	238	"async",
Colin Hogben	2:c89e95946844	239	"await",
Colin Hogben	2:c89e95946844	240	#endif
pythontech	0:5868e8752d44	241	"break",
pythontech	0:5868e8752d44	242	"class",
pythontech	0:5868e8752d44	243	"continue",
pythontech	0:5868e8752d44	244	"def",
pythontech	0:5868e8752d44	245	"del",
pythontech	0:5868e8752d44	246	"elif",
pythontech	0:5868e8752d44	247	"else",
pythontech	0:5868e8752d44	248	"except",
pythontech	0:5868e8752d44	249	"finally",
pythontech	0:5868e8752d44	250	"for",
pythontech	0:5868e8752d44	251	"from",
pythontech	0:5868e8752d44	252	"global",
pythontech	0:5868e8752d44	253	"if",
pythontech	0:5868e8752d44	254	"import",
pythontech	0:5868e8752d44	255	"in",
pythontech	0:5868e8752d44	256	"is",
pythontech	0:5868e8752d44	257	"lambda",
pythontech	0:5868e8752d44	258	"nonlocal",
pythontech	0:5868e8752d44	259	"not",
pythontech	0:5868e8752d44	260	"or",
pythontech	0:5868e8752d44	261	"pass",
pythontech	0:5868e8752d44	262	"raise",
pythontech	0:5868e8752d44	263	"return",
pythontech	0:5868e8752d44	264	"try",
pythontech	0:5868e8752d44	265	"while",
pythontech	0:5868e8752d44	266	"with",
pythontech	0:5868e8752d44	267	"yield",
pythontech	0:5868e8752d44	268	"__debug__",
pythontech	0:5868e8752d44	269	};
pythontech	0:5868e8752d44	270
pythontech	0:5868e8752d44	271	// This is called with CUR_CHAR() before first hex digit, and should return with
pythontech	0:5868e8752d44	272	// it pointing to last hex digit
pythontech	0:5868e8752d44	273	// num_digits must be greater than zero
pythontech	0:5868e8752d44	274	STATIC bool get_hex(mp_lexer_t lex, mp_uint_t num_digits, mp_uint_t result) {
pythontech	0:5868e8752d44	275	mp_uint_t num = 0;
pythontech	0:5868e8752d44	276	while (num_digits-- != 0) {
pythontech	0:5868e8752d44	277	next_char(lex);
pythontech	0:5868e8752d44	278	unichar c = CUR_CHAR(lex);
pythontech	0:5868e8752d44	279	if (!unichar_isxdigit(c)) {
pythontech	0:5868e8752d44	280	return false;
pythontech	0:5868e8752d44	281	}
pythontech	0:5868e8752d44	282	num = (num << 4) + unichar_xdigit_value(c);
pythontech	0:5868e8752d44	283	}
pythontech	0:5868e8752d44	284	*result = num;
pythontech	0:5868e8752d44	285	return true;
pythontech	0:5868e8752d44	286	}
pythontech	0:5868e8752d44	287
pythontech	0:5868e8752d44	288	STATIC void mp_lexer_next_token_into(mp_lexer_t *lex, bool first_token) {
pythontech	0:5868e8752d44	289	// start new token text
pythontech	0:5868e8752d44	290	vstr_reset(&lex->vstr);
pythontech	0:5868e8752d44	291
pythontech	0:5868e8752d44	292	// skip white space and comments
pythontech	0:5868e8752d44	293	bool had_physical_newline = false;
pythontech	0:5868e8752d44	294	while (!is_end(lex)) {
pythontech	0:5868e8752d44	295	if (is_physical_newline(lex)) {
pythontech	0:5868e8752d44	296	had_physical_newline = true;
pythontech	0:5868e8752d44	297	next_char(lex);
pythontech	0:5868e8752d44	298	} else if (is_whitespace(lex)) {
pythontech	0:5868e8752d44	299	next_char(lex);
pythontech	0:5868e8752d44	300	} else if (is_char(lex, '#')) {
pythontech	0:5868e8752d44	301	next_char(lex);
pythontech	0:5868e8752d44	302	while (!is_end(lex) && !is_physical_newline(lex)) {
pythontech	0:5868e8752d44	303	next_char(lex);
pythontech	0:5868e8752d44	304	}
pythontech	0:5868e8752d44	305	// had_physical_newline will be set on next loop
pythontech	0:5868e8752d44	306	} else if (is_char(lex, '\\')) {
pythontech	0:5868e8752d44	307	// backslash (outside string literals) must appear just before a physical newline
pythontech	0:5868e8752d44	308	next_char(lex);
pythontech	0:5868e8752d44	309	if (!is_physical_newline(lex)) {
pythontech	0:5868e8752d44	310	// SyntaxError: unexpected character after line continuation character
pythontech	0:5868e8752d44	311	lex->tok_line = lex->line;
pythontech	0:5868e8752d44	312	lex->tok_column = lex->column;
pythontech	0:5868e8752d44	313	lex->tok_kind = MP_TOKEN_BAD_LINE_CONTINUATION;
pythontech	0:5868e8752d44	314	return;
pythontech	0:5868e8752d44	315	} else {
pythontech	0:5868e8752d44	316	next_char(lex);
pythontech	0:5868e8752d44	317	}
pythontech	0:5868e8752d44	318	} else {
pythontech	0:5868e8752d44	319	break;
pythontech	0:5868e8752d44	320	}
pythontech	0:5868e8752d44	321	}
pythontech	0:5868e8752d44	322
pythontech	0:5868e8752d44	323	// set token source information
pythontech	0:5868e8752d44	324	lex->tok_line = lex->line;
pythontech	0:5868e8752d44	325	lex->tok_column = lex->column;
pythontech	0:5868e8752d44	326
pythontech	0:5868e8752d44	327	if (first_token && lex->line == 1 && lex->column != 1) {
pythontech	0:5868e8752d44	328	// check that the first token is in the first column
pythontech	0:5868e8752d44	329	// if first token is not on first line, we get a physical newline and
pythontech	0:5868e8752d44	330	// this check is done as part of normal indent/dedent checking below
pythontech	0:5868e8752d44	331	// (done to get equivalence with CPython)
pythontech	0:5868e8752d44	332	lex->tok_kind = MP_TOKEN_INDENT;
pythontech	0:5868e8752d44	333
pythontech	0:5868e8752d44	334	} else if (lex->emit_dent < 0) {
pythontech	0:5868e8752d44	335	lex->tok_kind = MP_TOKEN_DEDENT;
pythontech	0:5868e8752d44	336	lex->emit_dent += 1;
pythontech	0:5868e8752d44	337
pythontech	0:5868e8752d44	338	} else if (lex->emit_dent > 0) {
pythontech	0:5868e8752d44	339	lex->tok_kind = MP_TOKEN_INDENT;
pythontech	0:5868e8752d44	340	lex->emit_dent -= 1;
pythontech	0:5868e8752d44	341
pythontech	0:5868e8752d44	342	} else if (had_physical_newline && lex->nested_bracket_level == 0) {
pythontech	0:5868e8752d44	343	lex->tok_kind = MP_TOKEN_NEWLINE;
pythontech	0:5868e8752d44	344
pythontech	0:5868e8752d44	345	mp_uint_t num_spaces = lex->column - 1;
pythontech	0:5868e8752d44	346	lex->emit_dent = 0;
pythontech	0:5868e8752d44	347	if (num_spaces == indent_top(lex)) {
pythontech	0:5868e8752d44	348	} else if (num_spaces > indent_top(lex)) {
pythontech	0:5868e8752d44	349	indent_push(lex, num_spaces);
pythontech	0:5868e8752d44	350	lex->emit_dent += 1;
pythontech	0:5868e8752d44	351	} else {
pythontech	0:5868e8752d44	352	while (num_spaces < indent_top(lex)) {
pythontech	0:5868e8752d44	353	indent_pop(lex);
pythontech	0:5868e8752d44	354	lex->emit_dent -= 1;
pythontech	0:5868e8752d44	355	}
pythontech	0:5868e8752d44	356	if (num_spaces != indent_top(lex)) {
pythontech	0:5868e8752d44	357	lex->tok_kind = MP_TOKEN_DEDENT_MISMATCH;
pythontech	0:5868e8752d44	358	}
pythontech	0:5868e8752d44	359	}
pythontech	0:5868e8752d44	360
pythontech	0:5868e8752d44	361	} else if (is_end(lex)) {
pythontech	0:5868e8752d44	362	if (indent_top(lex) > 0) {
pythontech	0:5868e8752d44	363	lex->tok_kind = MP_TOKEN_NEWLINE;
pythontech	0:5868e8752d44	364	lex->emit_dent = 0;
pythontech	0:5868e8752d44	365	while (indent_top(lex) > 0) {
pythontech	0:5868e8752d44	366	indent_pop(lex);
pythontech	0:5868e8752d44	367	lex->emit_dent -= 1;
pythontech	0:5868e8752d44	368	}
pythontech	0:5868e8752d44	369	} else {
pythontech	0:5868e8752d44	370	lex->tok_kind = MP_TOKEN_END;
pythontech	0:5868e8752d44	371	}
pythontech	0:5868e8752d44	372
pythontech	0:5868e8752d44	373	} else if (is_char_or(lex, '\'', '\"')
pythontech	0:5868e8752d44	374	\|\| (is_char_or3(lex, 'r', 'u', 'b') && is_char_following_or(lex, '\'', '\"'))
pythontech	0:5868e8752d44	375	\|\| ((is_char_and(lex, 'r', 'b') \|\| is_char_and(lex, 'b', 'r')) && is_char_following_following_or(lex, '\'', '\"'))) {
pythontech	0:5868e8752d44	376	// a string or bytes literal
pythontech	0:5868e8752d44	377
pythontech	0:5868e8752d44	378	// parse type codes
pythontech	0:5868e8752d44	379	bool is_raw = false;
pythontech	0:5868e8752d44	380	bool is_bytes = false;
pythontech	0:5868e8752d44	381	if (is_char(lex, 'u')) {
pythontech	0:5868e8752d44	382	next_char(lex);
pythontech	0:5868e8752d44	383	} else if (is_char(lex, 'b')) {
pythontech	0:5868e8752d44	384	is_bytes = true;
pythontech	0:5868e8752d44	385	next_char(lex);
pythontech	0:5868e8752d44	386	if (is_char(lex, 'r')) {
pythontech	0:5868e8752d44	387	is_raw = true;
pythontech	0:5868e8752d44	388	next_char(lex);
pythontech	0:5868e8752d44	389	}
pythontech	0:5868e8752d44	390	} else if (is_char(lex, 'r')) {
pythontech	0:5868e8752d44	391	is_raw = true;
pythontech	0:5868e8752d44	392	next_char(lex);
pythontech	0:5868e8752d44	393	if (is_char(lex, 'b')) {
pythontech	0:5868e8752d44	394	is_bytes = true;
pythontech	0:5868e8752d44	395	next_char(lex);
pythontech	0:5868e8752d44	396	}
pythontech	0:5868e8752d44	397	}
pythontech	0:5868e8752d44	398
pythontech	0:5868e8752d44	399	// set token kind
pythontech	0:5868e8752d44	400	if (is_bytes) {
pythontech	0:5868e8752d44	401	lex->tok_kind = MP_TOKEN_BYTES;
pythontech	0:5868e8752d44	402	} else {
pythontech	0:5868e8752d44	403	lex->tok_kind = MP_TOKEN_STRING;
pythontech	0:5868e8752d44	404	}
pythontech	0:5868e8752d44	405
pythontech	0:5868e8752d44	406	// get first quoting character
pythontech	0:5868e8752d44	407	char quote_char = '\'';
pythontech	0:5868e8752d44	408	if (is_char(lex, '\"')) {
pythontech	0:5868e8752d44	409	quote_char = '\"';
pythontech	0:5868e8752d44	410	}
pythontech	0:5868e8752d44	411	next_char(lex);
pythontech	0:5868e8752d44	412
pythontech	0:5868e8752d44	413	// work out if it's a single or triple quoted literal
pythontech	0:5868e8752d44	414	mp_uint_t num_quotes;
pythontech	0:5868e8752d44	415	if (is_char_and(lex, quote_char, quote_char)) {
pythontech	0:5868e8752d44	416	// triple quotes
pythontech	0:5868e8752d44	417	next_char(lex);
pythontech	0:5868e8752d44	418	next_char(lex);
pythontech	0:5868e8752d44	419	num_quotes = 3;
pythontech	0:5868e8752d44	420	} else {
pythontech	0:5868e8752d44	421	// single quotes
pythontech	0:5868e8752d44	422	num_quotes = 1;
pythontech	0:5868e8752d44	423	}
pythontech	0:5868e8752d44	424
pythontech	0:5868e8752d44	425	// parse the literal
pythontech	0:5868e8752d44	426	mp_uint_t n_closing = 0;
pythontech	0:5868e8752d44	427	while (!is_end(lex) && (num_quotes > 1 \|\| !is_char(lex, '\n')) && n_closing < num_quotes) {
pythontech	0:5868e8752d44	428	if (is_char(lex, quote_char)) {
pythontech	0:5868e8752d44	429	n_closing += 1;
pythontech	0:5868e8752d44	430	vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech	0:5868e8752d44	431	} else {
pythontech	0:5868e8752d44	432	n_closing = 0;
pythontech	0:5868e8752d44	433	if (is_char(lex, '\\')) {
pythontech	0:5868e8752d44	434	next_char(lex);
pythontech	0:5868e8752d44	435	unichar c = CUR_CHAR(lex);
pythontech	0:5868e8752d44	436	if (is_raw) {
pythontech	0:5868e8752d44	437	// raw strings allow escaping of quotes, but the backslash is also emitted
pythontech	0:5868e8752d44	438	vstr_add_char(&lex->vstr, '\\');
pythontech	0:5868e8752d44	439	} else {
pythontech	0:5868e8752d44	440	switch (c) {
pythontech	0:5868e8752d44	441	case MP_LEXER_EOF: break; // TODO a proper error message?
pythontech	0:5868e8752d44	442	case '\n': c = MP_LEXER_EOF; break; // TODO check this works correctly (we are supposed to ignore it
pythontech	0:5868e8752d44	443	case '\\': break;
pythontech	0:5868e8752d44	444	case '\'': break;
pythontech	0:5868e8752d44	445	case '"': break;
pythontech	0:5868e8752d44	446	case 'a': c = 0x07; break;
pythontech	0:5868e8752d44	447	case 'b': c = 0x08; break;
pythontech	0:5868e8752d44	448	case 't': c = 0x09; break;
pythontech	0:5868e8752d44	449	case 'n': c = 0x0a; break;
pythontech	0:5868e8752d44	450	case 'v': c = 0x0b; break;
pythontech	0:5868e8752d44	451	case 'f': c = 0x0c; break;
pythontech	0:5868e8752d44	452	case 'r': c = 0x0d; break;
pythontech	0:5868e8752d44	453	case 'u':
pythontech	0:5868e8752d44	454	case 'U':
pythontech	0:5868e8752d44	455	if (is_bytes) {
pythontech	0:5868e8752d44	456	// b'\u1234' == b'\\u1234'
pythontech	0:5868e8752d44	457	vstr_add_char(&lex->vstr, '\\');
pythontech	0:5868e8752d44	458	break;
pythontech	0:5868e8752d44	459	}
pythontech	0:5868e8752d44	460	// Otherwise fall through.
pythontech	0:5868e8752d44	461	case 'x':
pythontech	0:5868e8752d44	462	{
pythontech	0:5868e8752d44	463	mp_uint_t num = 0;
pythontech	0:5868e8752d44	464	if (!get_hex(lex, (c == 'x' ? 2 : c == 'u' ? 4 : 8), &num)) {
pythontech	0:5868e8752d44	465	// not enough hex chars for escape sequence
pythontech	0:5868e8752d44	466	lex->tok_kind = MP_TOKEN_INVALID;
pythontech	0:5868e8752d44	467	}
pythontech	0:5868e8752d44	468	c = num;
pythontech	0:5868e8752d44	469	break;
pythontech	0:5868e8752d44	470	}
pythontech	0:5868e8752d44	471	case 'N':
pythontech	0:5868e8752d44	472	// Supporting '\N{LATIN SMALL LETTER A}' == 'a' would require keeping the
pythontech	0:5868e8752d44	473	// entire Unicode name table in the core. As of Unicode 6.3.0, that's nearly
pythontech	0:5868e8752d44	474	// 3MB of text; even gzip-compressed and with minimal structure, it'll take
pythontech	0:5868e8752d44	475	// roughly half a meg of storage. This form of Unicode escape may be added
pythontech	0:5868e8752d44	476	// later on, but it's definitely not a priority right now. -- CJA 20140607
pythontech	0:5868e8752d44	477	mp_not_implemented("unicode name escapes");
pythontech	0:5868e8752d44	478	break;
pythontech	0:5868e8752d44	479	default:
pythontech	0:5868e8752d44	480	if (c >= '0' && c <= '7') {
pythontech	0:5868e8752d44	481	// Octal sequence, 1-3 chars
pythontech	0:5868e8752d44	482	mp_uint_t digits = 3;
pythontech	0:5868e8752d44	483	mp_uint_t num = c - '0';
pythontech	0:5868e8752d44	484	while (is_following_odigit(lex) && --digits != 0) {
pythontech	0:5868e8752d44	485	next_char(lex);
pythontech	0:5868e8752d44	486	num = num * 8 + (CUR_CHAR(lex) - '0');
pythontech	0:5868e8752d44	487	}
pythontech	0:5868e8752d44	488	c = num;
pythontech	0:5868e8752d44	489	} else {
pythontech	0:5868e8752d44	490	// unrecognised escape character; CPython lets this through verbatim as '\' and then the character
pythontech	0:5868e8752d44	491	vstr_add_char(&lex->vstr, '\\');
pythontech	0:5868e8752d44	492	}
pythontech	0:5868e8752d44	493	break;
pythontech	0:5868e8752d44	494	}
pythontech	0:5868e8752d44	495	}
pythontech	0:5868e8752d44	496	if (c != MP_LEXER_EOF) {
pythontech	0:5868e8752d44	497	if (MICROPY_PY_BUILTINS_STR_UNICODE_DYNAMIC) {
pythontech	0:5868e8752d44	498	if (c < 0x110000 && !is_bytes) {
pythontech	0:5868e8752d44	499	vstr_add_char(&lex->vstr, c);
pythontech	0:5868e8752d44	500	} else if (c < 0x100 && is_bytes) {
pythontech	0:5868e8752d44	501	vstr_add_byte(&lex->vstr, c);
pythontech	0:5868e8752d44	502	} else {
pythontech	0:5868e8752d44	503	// unicode character out of range
pythontech	0:5868e8752d44	504	// this raises a generic SyntaxError; could provide more info
pythontech	0:5868e8752d44	505	lex->tok_kind = MP_TOKEN_INVALID;
pythontech	0:5868e8752d44	506	}
pythontech	0:5868e8752d44	507	} else {
pythontech	0:5868e8752d44	508	// without unicode everything is just added as an 8-bit byte
pythontech	0:5868e8752d44	509	if (c < 0x100) {
pythontech	0:5868e8752d44	510	vstr_add_byte(&lex->vstr, c);
pythontech	0:5868e8752d44	511	} else {
pythontech	0:5868e8752d44	512	// 8-bit character out of range
pythontech	0:5868e8752d44	513	// this raises a generic SyntaxError; could provide more info
pythontech	0:5868e8752d44	514	lex->tok_kind = MP_TOKEN_INVALID;
pythontech	0:5868e8752d44	515	}
pythontech	0:5868e8752d44	516	}
pythontech	0:5868e8752d44	517	}
pythontech	0:5868e8752d44	518	} else {
pythontech	0:5868e8752d44	519	// Add the "character" as a byte so that we remain 8-bit clean.
pythontech	0:5868e8752d44	520	// This way, strings are parsed correctly whether or not they contain utf-8 chars.
pythontech	0:5868e8752d44	521	vstr_add_byte(&lex->vstr, CUR_CHAR(lex));
pythontech	0:5868e8752d44	522	}
pythontech	0:5868e8752d44	523	}
pythontech	0:5868e8752d44	524	next_char(lex);
pythontech	0:5868e8752d44	525	}
pythontech	0:5868e8752d44	526
pythontech	0:5868e8752d44	527	// check we got the required end quotes
pythontech	0:5868e8752d44	528	if (n_closing < num_quotes) {
pythontech	0:5868e8752d44	529	lex->tok_kind = MP_TOKEN_LONELY_STRING_OPEN;
pythontech	0:5868e8752d44	530	}
pythontech	0:5868e8752d44	531
pythontech	0:5868e8752d44	532	// cut off the end quotes from the token text
pythontech	0:5868e8752d44	533	vstr_cut_tail_bytes(&lex->vstr, n_closing);
pythontech	0:5868e8752d44	534
pythontech	0:5868e8752d44	535	} else if (is_head_of_identifier(lex)) {
pythontech	0:5868e8752d44	536	lex->tok_kind = MP_TOKEN_NAME;
pythontech	0:5868e8752d44	537
pythontech	0:5868e8752d44	538	// get first char (add as byte to remain 8-bit clean and support utf-8)
pythontech	0:5868e8752d44	539	vstr_add_byte(&lex->vstr, CUR_CHAR(lex));
pythontech	0:5868e8752d44	540	next_char(lex);
pythontech	0:5868e8752d44	541
pythontech	0:5868e8752d44	542	// get tail chars
pythontech	0:5868e8752d44	543	while (!is_end(lex) && is_tail_of_identifier(lex)) {
pythontech	0:5868e8752d44	544	vstr_add_byte(&lex->vstr, CUR_CHAR(lex));
pythontech	0:5868e8752d44	545	next_char(lex);
pythontech	0:5868e8752d44	546	}
pythontech	0:5868e8752d44	547
pythontech	0:5868e8752d44	548	} else if (is_digit(lex) \|\| (is_char(lex, '.') && is_following_digit(lex))) {
pythontech	0:5868e8752d44	549	bool forced_integer = false;
pythontech	0:5868e8752d44	550	if (is_char(lex, '.')) {
pythontech	0:5868e8752d44	551	lex->tok_kind = MP_TOKEN_FLOAT_OR_IMAG;
pythontech	0:5868e8752d44	552	} else {
pythontech	0:5868e8752d44	553	lex->tok_kind = MP_TOKEN_INTEGER;
pythontech	0:5868e8752d44	554	if (is_char(lex, '0') && is_following_base_char(lex)) {
pythontech	0:5868e8752d44	555	forced_integer = true;
pythontech	0:5868e8752d44	556	}
pythontech	0:5868e8752d44	557	}
pythontech	0:5868e8752d44	558
pythontech	0:5868e8752d44	559	// get first char
pythontech	0:5868e8752d44	560	vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech	0:5868e8752d44	561	next_char(lex);
pythontech	0:5868e8752d44	562
pythontech	0:5868e8752d44	563	// get tail chars
pythontech	0:5868e8752d44	564	while (!is_end(lex)) {
pythontech	0:5868e8752d44	565	if (!forced_integer && is_char_or(lex, 'e', 'E')) {
pythontech	0:5868e8752d44	566	lex->tok_kind = MP_TOKEN_FLOAT_OR_IMAG;
pythontech	0:5868e8752d44	567	vstr_add_char(&lex->vstr, 'e');
pythontech	0:5868e8752d44	568	next_char(lex);
pythontech	0:5868e8752d44	569	if (is_char(lex, '+') \|\| is_char(lex, '-')) {
pythontech	0:5868e8752d44	570	vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech	0:5868e8752d44	571	next_char(lex);
pythontech	0:5868e8752d44	572	}
pythontech	0:5868e8752d44	573	} else if (is_letter(lex) \|\| is_digit(lex) \|\| is_char(lex, '.')) {
pythontech	0:5868e8752d44	574	if (is_char_or3(lex, '.', 'j', 'J')) {
pythontech	0:5868e8752d44	575	lex->tok_kind = MP_TOKEN_FLOAT_OR_IMAG;
pythontech	0:5868e8752d44	576	}
pythontech	0:5868e8752d44	577	vstr_add_char(&lex->vstr, CUR_CHAR(lex));
pythontech	0:5868e8752d44	578	next_char(lex);
pythontech	0:5868e8752d44	579	} else {
pythontech	0:5868e8752d44	580	break;
pythontech	0:5868e8752d44	581	}
pythontech	0:5868e8752d44	582	}
pythontech	0:5868e8752d44	583
pythontech	0:5868e8752d44	584	} else if (is_char(lex, '.')) {
pythontech	0:5868e8752d44	585	// special handling for . and ... operators, because .. is not a valid operator
pythontech	0:5868e8752d44	586
pythontech	0:5868e8752d44	587	// get first char
pythontech	0:5868e8752d44	588	vstr_add_char(&lex->vstr, '.');
pythontech	0:5868e8752d44	589	next_char(lex);
pythontech	0:5868e8752d44	590
pythontech	0:5868e8752d44	591	if (is_char_and(lex, '.', '.')) {
pythontech	0:5868e8752d44	592	vstr_add_char(&lex->vstr, '.');
pythontech	0:5868e8752d44	593	vstr_add_char(&lex->vstr, '.');
pythontech	0:5868e8752d44	594	next_char(lex);
pythontech	0:5868e8752d44	595	next_char(lex);
pythontech	0:5868e8752d44	596	lex->tok_kind = MP_TOKEN_ELLIPSIS;
pythontech	0:5868e8752d44	597	} else {
pythontech	0:5868e8752d44	598	lex->tok_kind = MP_TOKEN_DEL_PERIOD;
pythontech	0:5868e8752d44	599	}
pythontech	0:5868e8752d44	600
pythontech	0:5868e8752d44	601	} else {
pythontech	0:5868e8752d44	602	// search for encoded delimiter or operator
pythontech	0:5868e8752d44	603
pythontech	0:5868e8752d44	604	const char *t = tok_enc;
pythontech	0:5868e8752d44	605	mp_uint_t tok_enc_index = 0;
pythontech	0:5868e8752d44	606	for (; t != 0 && !is_char(lex, t); t += 1) {
pythontech	0:5868e8752d44	607	if (t == 'e' \|\| t == 'c') {
pythontech	0:5868e8752d44	608	t += 1;
pythontech	0:5868e8752d44	609	} else if (*t == 'E') {
pythontech	0:5868e8752d44	610	tok_enc_index -= 1;
pythontech	0:5868e8752d44	611	t += 1;
pythontech	0:5868e8752d44	612	}
pythontech	0:5868e8752d44	613	tok_enc_index += 1;
pythontech	0:5868e8752d44	614	}
pythontech	0:5868e8752d44	615
pythontech	0:5868e8752d44	616	next_char(lex);
pythontech	0:5868e8752d44	617
pythontech	0:5868e8752d44	618	if (*t == 0) {
pythontech	0:5868e8752d44	619	// didn't match any delimiter or operator characters
pythontech	0:5868e8752d44	620	lex->tok_kind = MP_TOKEN_INVALID;
pythontech	0:5868e8752d44	621
pythontech	0:5868e8752d44	622	} else {
pythontech	0:5868e8752d44	623	// matched a delimiter or operator character
pythontech	0:5868e8752d44	624
pythontech	0:5868e8752d44	625	// get the maximum characters for a valid token
pythontech	0:5868e8752d44	626	t += 1;
pythontech	0:5868e8752d44	627	mp_uint_t t_index = tok_enc_index;
pythontech	0:5868e8752d44	628	for (;;) {
pythontech	0:5868e8752d44	629	for (; *t == 'e'; t += 1) {
pythontech	0:5868e8752d44	630	t += 1;
pythontech	0:5868e8752d44	631	t_index += 1;
pythontech	0:5868e8752d44	632	if (is_char(lex, *t)) {
pythontech	0:5868e8752d44	633	next_char(lex);
pythontech	0:5868e8752d44	634	tok_enc_index = t_index;
pythontech	0:5868e8752d44	635	break;
pythontech	0:5868e8752d44	636	}
pythontech	0:5868e8752d44	637	}
pythontech	0:5868e8752d44	638
pythontech	0:5868e8752d44	639	if (*t == 'E') {
pythontech	0:5868e8752d44	640	t += 1;
pythontech	0:5868e8752d44	641	if (is_char(lex, *t)) {
pythontech	0:5868e8752d44	642	next_char(lex);
pythontech	0:5868e8752d44	643	tok_enc_index = t_index;
pythontech	0:5868e8752d44	644	} else {
pythontech	0:5868e8752d44	645	lex->tok_kind = MP_TOKEN_INVALID;
pythontech	0:5868e8752d44	646	goto tok_enc_no_match;
pythontech	0:5868e8752d44	647	}
pythontech	0:5868e8752d44	648	break;
pythontech	0:5868e8752d44	649	}
pythontech	0:5868e8752d44	650
pythontech	0:5868e8752d44	651	if (*t == 'c') {
pythontech	0:5868e8752d44	652	t += 1;
pythontech	0:5868e8752d44	653	t_index += 1;
pythontech	0:5868e8752d44	654	if (is_char(lex, *t)) {
pythontech	0:5868e8752d44	655	next_char(lex);
pythontech	0:5868e8752d44	656	tok_enc_index = t_index;
pythontech	0:5868e8752d44	657	t += 1;
pythontech	0:5868e8752d44	658	} else {
pythontech	0:5868e8752d44	659	break;
pythontech	0:5868e8752d44	660	}
pythontech	0:5868e8752d44	661	} else {
pythontech	0:5868e8752d44	662	break;
pythontech	0:5868e8752d44	663	}
pythontech	0:5868e8752d44	664	}
pythontech	0:5868e8752d44	665
pythontech	0:5868e8752d44	666	// set token kind
pythontech	0:5868e8752d44	667	lex->tok_kind = tok_enc_kind[tok_enc_index];
pythontech	0:5868e8752d44	668
pythontech	0:5868e8752d44	669	tok_enc_no_match:
pythontech	0:5868e8752d44	670
pythontech	0:5868e8752d44	671	// compute bracket level for implicit line joining
pythontech	0:5868e8752d44	672	if (lex->tok_kind == MP_TOKEN_DEL_PAREN_OPEN \|\| lex->tok_kind == MP_TOKEN_DEL_BRACKET_OPEN \|\| lex->tok_kind == MP_TOKEN_DEL_BRACE_OPEN) {
pythontech	0:5868e8752d44	673	lex->nested_bracket_level += 1;
pythontech	0:5868e8752d44	674	} else if (lex->tok_kind == MP_TOKEN_DEL_PAREN_CLOSE \|\| lex->tok_kind == MP_TOKEN_DEL_BRACKET_CLOSE \|\| lex->tok_kind == MP_TOKEN_DEL_BRACE_CLOSE) {
pythontech	0:5868e8752d44	675	lex->nested_bracket_level -= 1;
pythontech	0:5868e8752d44	676	}
pythontech	0:5868e8752d44	677	}
pythontech	0:5868e8752d44	678	}
pythontech	0:5868e8752d44	679
pythontech	0:5868e8752d44	680	// check for keywords
pythontech	0:5868e8752d44	681	if (lex->tok_kind == MP_TOKEN_NAME) {
pythontech	0:5868e8752d44	682	// We check for __debug__ here and convert it to its value. This is so
pythontech	0:5868e8752d44	683	// the parser gives a syntax error on, eg, x.__debug__. Otherwise, we
pythontech	0:5868e8752d44	684	// need to check for this special token in many places in the compiler.
pythontech	0:5868e8752d44	685	// TODO improve speed of these string comparisons
pythontech	0:5868e8752d44	686	//for (mp_int_t i = 0; tok_kw[i] != NULL; i++) {
pythontech	0:5868e8752d44	687	for (size_t i = 0; i < MP_ARRAY_SIZE(tok_kw); i++) {
pythontech	0:5868e8752d44	688	if (str_strn_equal(tok_kw[i], lex->vstr.buf, lex->vstr.len)) {
pythontech	0:5868e8752d44	689	if (i == MP_ARRAY_SIZE(tok_kw) - 1) {
pythontech	0:5868e8752d44	690	// tok_kw[MP_ARRAY_SIZE(tok_kw) - 1] == "__debug__"
pythontech	0:5868e8752d44	691	lex->tok_kind = (MP_STATE_VM(mp_optimise_value) == 0 ? MP_TOKEN_KW_TRUE : MP_TOKEN_KW_FALSE);
pythontech	0:5868e8752d44	692	} else {
pythontech	0:5868e8752d44	693	lex->tok_kind = MP_TOKEN_KW_FALSE + i;
pythontech	0:5868e8752d44	694	}
pythontech	0:5868e8752d44	695	break;
pythontech	0:5868e8752d44	696	}
pythontech	0:5868e8752d44	697	}
pythontech	0:5868e8752d44	698	}
pythontech	0:5868e8752d44	699	}
pythontech	0:5868e8752d44	700
pythontech	0:5868e8752d44	701	mp_lexer_t mp_lexer_new(qstr src_name, void stream_data, mp_lexer_stream_next_byte_t stream_next_byte, mp_lexer_stream_close_t stream_close) {
pythontech	0:5868e8752d44	702	mp_lexer_t *lex = m_new_obj_maybe(mp_lexer_t);
pythontech	0:5868e8752d44	703
pythontech	0:5868e8752d44	704	// check for memory allocation error
pythontech	0:5868e8752d44	705	if (lex == NULL) {
pythontech	0:5868e8752d44	706	if (stream_close) {
pythontech	0:5868e8752d44	707	stream_close(stream_data);
pythontech	0:5868e8752d44	708	}
pythontech	0:5868e8752d44	709	return NULL;
pythontech	0:5868e8752d44	710	}
pythontech	0:5868e8752d44	711
pythontech	0:5868e8752d44	712	lex->source_name = src_name;
pythontech	0:5868e8752d44	713	lex->stream_data = stream_data;
pythontech	0:5868e8752d44	714	lex->stream_next_byte = stream_next_byte;
pythontech	0:5868e8752d44	715	lex->stream_close = stream_close;
pythontech	0:5868e8752d44	716	lex->line = 1;
pythontech	0:5868e8752d44	717	lex->column = 1;
pythontech	0:5868e8752d44	718	lex->emit_dent = 0;
pythontech	0:5868e8752d44	719	lex->nested_bracket_level = 0;
pythontech	0:5868e8752d44	720	lex->alloc_indent_level = MICROPY_ALLOC_LEXER_INDENT_INIT;
pythontech	0:5868e8752d44	721	lex->num_indent_level = 1;
pythontech	0:5868e8752d44	722	lex->indent_level = m_new_maybe(uint16_t, lex->alloc_indent_level);
pythontech	0:5868e8752d44	723	vstr_init(&lex->vstr, 32);
pythontech	0:5868e8752d44	724
pythontech	0:5868e8752d44	725	// check for memory allocation error
pythontech	0:5868e8752d44	726	if (lex->indent_level == NULL \|\| vstr_had_error(&lex->vstr)) {
pythontech	0:5868e8752d44	727	mp_lexer_free(lex);
pythontech	0:5868e8752d44	728	return NULL;
pythontech	0:5868e8752d44	729	}
pythontech	0:5868e8752d44	730
pythontech	0:5868e8752d44	731	// store sentinel for first indentation level
pythontech	0:5868e8752d44	732	lex->indent_level[0] = 0;
pythontech	0:5868e8752d44	733
pythontech	0:5868e8752d44	734	// preload characters
pythontech	0:5868e8752d44	735	lex->chr0 = stream_next_byte(stream_data);
pythontech	0:5868e8752d44	736	lex->chr1 = stream_next_byte(stream_data);
pythontech	0:5868e8752d44	737	lex->chr2 = stream_next_byte(stream_data);
pythontech	0:5868e8752d44	738
pythontech	0:5868e8752d44	739	// if input stream is 0, 1 or 2 characters long and doesn't end in a newline, then insert a newline at the end
pythontech	0:5868e8752d44	740	if (lex->chr0 == MP_LEXER_EOF) {
pythontech	0:5868e8752d44	741	lex->chr0 = '\n';
pythontech	0:5868e8752d44	742	} else if (lex->chr1 == MP_LEXER_EOF) {
pythontech	0:5868e8752d44	743	if (lex->chr0 == '\r') {
pythontech	0:5868e8752d44	744	lex->chr0 = '\n';
pythontech	0:5868e8752d44	745	} else if (lex->chr0 != '\n') {
pythontech	0:5868e8752d44	746	lex->chr1 = '\n';
pythontech	0:5868e8752d44	747	}
pythontech	0:5868e8752d44	748	} else if (lex->chr2 == MP_LEXER_EOF) {
pythontech	0:5868e8752d44	749	if (lex->chr1 == '\r') {
pythontech	0:5868e8752d44	750	lex->chr1 = '\n';
pythontech	0:5868e8752d44	751	} else if (lex->chr1 != '\n') {
pythontech	0:5868e8752d44	752	lex->chr2 = '\n';
pythontech	0:5868e8752d44	753	}
pythontech	0:5868e8752d44	754	}
pythontech	0:5868e8752d44	755
pythontech	0:5868e8752d44	756	// preload first token
pythontech	0:5868e8752d44	757	mp_lexer_next_token_into(lex, true);
pythontech	0:5868e8752d44	758
pythontech	0:5868e8752d44	759	return lex;
pythontech	0:5868e8752d44	760	}
pythontech	0:5868e8752d44	761
pythontech	0:5868e8752d44	762	void mp_lexer_free(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	763	if (lex) {
pythontech	0:5868e8752d44	764	if (lex->stream_close) {
pythontech	0:5868e8752d44	765	lex->stream_close(lex->stream_data);
pythontech	0:5868e8752d44	766	}
pythontech	0:5868e8752d44	767	vstr_clear(&lex->vstr);
pythontech	0:5868e8752d44	768	m_del(uint16_t, lex->indent_level, lex->alloc_indent_level);
pythontech	0:5868e8752d44	769	m_del_obj(mp_lexer_t, lex);
pythontech	0:5868e8752d44	770	}
pythontech	0:5868e8752d44	771	}
pythontech	0:5868e8752d44	772
pythontech	0:5868e8752d44	773	void mp_lexer_to_next(mp_lexer_t *lex) {
pythontech	0:5868e8752d44	774	mp_lexer_next_token_into(lex, false);
pythontech	0:5868e8752d44	775	}
pythontech	0:5868e8752d44	776
pythontech	0:5868e8752d44	777	#if MICROPY_DEBUG_PRINTERS
pythontech	0:5868e8752d44	778	void mp_lexer_show_token(const mp_lexer_t *lex) {
pythontech	0:5868e8752d44	779	printf("(" UINT_FMT ":" UINT_FMT ") kind:%u str:%p len:%zu", lex->tok_line, lex->tok_column, lex->tok_kind, lex->vstr.buf, lex->vstr.len);
pythontech	0:5868e8752d44	780	if (lex->vstr.len > 0) {
pythontech	0:5868e8752d44	781	const byte i = (const byte )lex->vstr.buf;
pythontech	0:5868e8752d44	782	const byte j = (const byte )i + lex->vstr.len;
pythontech	0:5868e8752d44	783	printf(" ");
pythontech	0:5868e8752d44	784	while (i < j) {
pythontech	0:5868e8752d44	785	unichar c = utf8_get_char(i);
pythontech	0:5868e8752d44	786	i = utf8_next_char(i);
pythontech	0:5868e8752d44	787	if (unichar_isprint(c)) {
pythontech	0:5868e8752d44	788	printf("%c", (int)c);
pythontech	0:5868e8752d44	789	} else {
pythontech	0:5868e8752d44	790	printf("?");
pythontech	0:5868e8752d44	791	}
pythontech	0:5868e8752d44	792	}
pythontech	0:5868e8752d44	793	}
pythontech	0:5868e8752d44	794	printf("\n");
pythontech	0:5868e8752d44	795	}
pythontech	0:5868e8752d44	796	#endif
pythontech	0:5868e8752d44	797
pythontech	0:5868e8752d44	798	#endif // MICROPY_ENABLE_COMPILER

Repository toolbox

Export to desktop IDE

Repository details

Type:	Library
Created:	27 Apr 2016
Imports:	279
Forks:	0
Commits:	11
Dependents:	1
Dependencies:	0
Followers:	18

The code in this repository is MIT licensed.