Port of MicroPython to the mbed platform. See micropython-repl for an interactive program.

Dependents:   micropython-repl

This a port of MicroPython to the mbed Classic platform.

This provides an interpreter running on the board's USB serial connection.

Getting Started

Import the micropython-repl program into your IDE workspace on developer.mbed.org. Compile and download to your board. Connect to the USB serial port in your usual manner. You should get a startup message similar to the following:

  MicroPython v1.7-155-gdddcdd8 on 2016-04-23; K64F with ARM
  Type "help()" for more information.
  >>>

Then you can start using micropython. For example:

  >>> from mbed import DigitalOut
  >>> from pins import LED1
  >>> led = DigitalOut(LED1)
  >>> led.write(1)

Requirements

You need approximately 100K of flash memory, so this will be no good for boards with smaller amounts of storage.

Caveats

This can be considered an alpha release of the port; things may not work; APIs may change in later releases. It is NOT an official part part the micropython project, so if anything doesn't work, blame me. If it does work, most of the credit is due to micropython.

  • Only a few of the mbed classes are available in micropython so far, and not all methods of those that are.
  • Only a few boards have their full range of pin names available; for others, only a few standard ones (USBTX, USBRX, LED1) are implemented.
  • The garbage collector is not yet implemented. The interpreter will gradually consume memory and then fail.
  • Exceptions from the mbed classes are not yet handled.
  • Asynchronous processing (e.g. events on inputs) is not supported.

Credits

  • Damien P. George and other contributors who created micropython.
  • Colin Hogben, author of this port.
Committer:
Colin Hogben
Date:
Wed Apr 27 22:11:29 2016 +0100
Revision:
10:33521d742af1
Parent:
2:c89e95946844
Update README and version

Who changed what in which revision?

UserRevisionLine numberNew contents of line
pythontech 0:5868e8752d44 1 /*
pythontech 0:5868e8752d44 2 * This file is part of the Micro Python project, http://micropython.org/
pythontech 0:5868e8752d44 3 *
pythontech 0:5868e8752d44 4 * The MIT License (MIT)
pythontech 0:5868e8752d44 5 *
pythontech 0:5868e8752d44 6 * Copyright (c) 2013, 2014 Damien P. George
pythontech 0:5868e8752d44 7 * Copyright (c) 2014 Paul Sokolovsky
pythontech 0:5868e8752d44 8 *
pythontech 0:5868e8752d44 9 * Permission is hereby granted, free of charge, to any person obtaining a copy
pythontech 0:5868e8752d44 10 * of this software and associated documentation files (the "Software"), to deal
pythontech 0:5868e8752d44 11 * in the Software without restriction, including without limitation the rights
pythontech 0:5868e8752d44 12 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
pythontech 0:5868e8752d44 13 * copies of the Software, and to permit persons to whom the Software is
pythontech 0:5868e8752d44 14 * furnished to do so, subject to the following conditions:
pythontech 0:5868e8752d44 15 *
pythontech 0:5868e8752d44 16 * The above copyright notice and this permission notice shall be included in
pythontech 0:5868e8752d44 17 * all copies or substantial portions of the Software.
pythontech 0:5868e8752d44 18 *
pythontech 0:5868e8752d44 19 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
pythontech 0:5868e8752d44 20 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
pythontech 0:5868e8752d44 21 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
pythontech 0:5868e8752d44 22 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
pythontech 0:5868e8752d44 23 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
pythontech 0:5868e8752d44 24 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
pythontech 0:5868e8752d44 25 * THE SOFTWARE.
pythontech 0:5868e8752d44 26 */
pythontech 0:5868e8752d44 27
pythontech 0:5868e8752d44 28 #include <string.h>
pythontech 0:5868e8752d44 29 #include <assert.h>
pythontech 0:5868e8752d44 30
pythontech 0:5868e8752d44 31 #include "py/nlr.h"
pythontech 0:5868e8752d44 32 #include "py/unicode.h"
pythontech 0:5868e8752d44 33 #include "py/objstr.h"
pythontech 0:5868e8752d44 34 #include "py/objlist.h"
pythontech 0:5868e8752d44 35 #include "py/runtime0.h"
pythontech 0:5868e8752d44 36 #include "py/runtime.h"
pythontech 0:5868e8752d44 37 #include "py/stackctrl.h"
pythontech 0:5868e8752d44 38
pythontech 0:5868e8752d44 39 STATIC mp_obj_t str_modulo_format(mp_obj_t pattern, mp_uint_t n_args, const mp_obj_t *args, mp_obj_t dict);
pythontech 0:5868e8752d44 40
pythontech 0:5868e8752d44 41 STATIC mp_obj_t mp_obj_new_bytes_iterator(mp_obj_t str);
pythontech 0:5868e8752d44 42 STATIC NORETURN void bad_implicit_conversion(mp_obj_t self_in);
pythontech 0:5868e8752d44 43
pythontech 0:5868e8752d44 44 /******************************************************************************/
pythontech 0:5868e8752d44 45 /* str */
pythontech 0:5868e8752d44 46
pythontech 0:5868e8752d44 47 void mp_str_print_quoted(const mp_print_t *print, const byte *str_data, mp_uint_t str_len, bool is_bytes) {
pythontech 0:5868e8752d44 48 // this escapes characters, but it will be very slow to print (calling print many times)
pythontech 0:5868e8752d44 49 bool has_single_quote = false;
pythontech 0:5868e8752d44 50 bool has_double_quote = false;
pythontech 0:5868e8752d44 51 for (const byte *s = str_data, *top = str_data + str_len; !has_double_quote && s < top; s++) {
pythontech 0:5868e8752d44 52 if (*s == '\'') {
pythontech 0:5868e8752d44 53 has_single_quote = true;
pythontech 0:5868e8752d44 54 } else if (*s == '"') {
pythontech 0:5868e8752d44 55 has_double_quote = true;
pythontech 0:5868e8752d44 56 }
pythontech 0:5868e8752d44 57 }
pythontech 0:5868e8752d44 58 int quote_char = '\'';
pythontech 0:5868e8752d44 59 if (has_single_quote && !has_double_quote) {
pythontech 0:5868e8752d44 60 quote_char = '"';
pythontech 0:5868e8752d44 61 }
pythontech 0:5868e8752d44 62 mp_printf(print, "%c", quote_char);
pythontech 0:5868e8752d44 63 for (const byte *s = str_data, *top = str_data + str_len; s < top; s++) {
pythontech 0:5868e8752d44 64 if (*s == quote_char) {
pythontech 0:5868e8752d44 65 mp_printf(print, "\\%c", quote_char);
pythontech 0:5868e8752d44 66 } else if (*s == '\\') {
pythontech 0:5868e8752d44 67 mp_print_str(print, "\\\\");
pythontech 0:5868e8752d44 68 } else if (*s >= 0x20 && *s != 0x7f && (!is_bytes || *s < 0x80)) {
pythontech 0:5868e8752d44 69 // In strings, anything which is not ascii control character
pythontech 0:5868e8752d44 70 // is printed as is, this includes characters in range 0x80-0xff
pythontech 0:5868e8752d44 71 // (which can be non-Latin letters, etc.)
pythontech 0:5868e8752d44 72 mp_printf(print, "%c", *s);
pythontech 0:5868e8752d44 73 } else if (*s == '\n') {
pythontech 0:5868e8752d44 74 mp_print_str(print, "\\n");
pythontech 0:5868e8752d44 75 } else if (*s == '\r') {
pythontech 0:5868e8752d44 76 mp_print_str(print, "\\r");
pythontech 0:5868e8752d44 77 } else if (*s == '\t') {
pythontech 0:5868e8752d44 78 mp_print_str(print, "\\t");
pythontech 0:5868e8752d44 79 } else {
pythontech 0:5868e8752d44 80 mp_printf(print, "\\x%02x", *s);
pythontech 0:5868e8752d44 81 }
pythontech 0:5868e8752d44 82 }
pythontech 0:5868e8752d44 83 mp_printf(print, "%c", quote_char);
pythontech 0:5868e8752d44 84 }
pythontech 0:5868e8752d44 85
pythontech 0:5868e8752d44 86 #if MICROPY_PY_UJSON
pythontech 0:5868e8752d44 87 void mp_str_print_json(const mp_print_t *print, const byte *str_data, size_t str_len) {
pythontech 0:5868e8752d44 88 // for JSON spec, see http://www.ietf.org/rfc/rfc4627.txt
pythontech 0:5868e8752d44 89 // if we are given a valid utf8-encoded string, we will print it in a JSON-conforming way
pythontech 0:5868e8752d44 90 mp_print_str(print, "\"");
pythontech 0:5868e8752d44 91 for (const byte *s = str_data, *top = str_data + str_len; s < top; s++) {
pythontech 0:5868e8752d44 92 if (*s == '"' || *s == '\\') {
pythontech 0:5868e8752d44 93 mp_printf(print, "\\%c", *s);
pythontech 0:5868e8752d44 94 } else if (*s >= 32) {
pythontech 0:5868e8752d44 95 // this will handle normal and utf-8 encoded chars
pythontech 0:5868e8752d44 96 mp_printf(print, "%c", *s);
pythontech 0:5868e8752d44 97 } else if (*s == '\n') {
pythontech 0:5868e8752d44 98 mp_print_str(print, "\\n");
pythontech 0:5868e8752d44 99 } else if (*s == '\r') {
pythontech 0:5868e8752d44 100 mp_print_str(print, "\\r");
pythontech 0:5868e8752d44 101 } else if (*s == '\t') {
pythontech 0:5868e8752d44 102 mp_print_str(print, "\\t");
pythontech 0:5868e8752d44 103 } else {
pythontech 0:5868e8752d44 104 // this will handle control chars
pythontech 0:5868e8752d44 105 mp_printf(print, "\\u%04x", *s);
pythontech 0:5868e8752d44 106 }
pythontech 0:5868e8752d44 107 }
pythontech 0:5868e8752d44 108 mp_print_str(print, "\"");
pythontech 0:5868e8752d44 109 }
pythontech 0:5868e8752d44 110 #endif
pythontech 0:5868e8752d44 111
pythontech 0:5868e8752d44 112 STATIC void str_print(const mp_print_t *print, mp_obj_t self_in, mp_print_kind_t kind) {
pythontech 0:5868e8752d44 113 GET_STR_DATA_LEN(self_in, str_data, str_len);
pythontech 0:5868e8752d44 114 #if MICROPY_PY_UJSON
pythontech 0:5868e8752d44 115 if (kind == PRINT_JSON) {
pythontech 0:5868e8752d44 116 mp_str_print_json(print, str_data, str_len);
pythontech 0:5868e8752d44 117 return;
pythontech 0:5868e8752d44 118 }
pythontech 0:5868e8752d44 119 #endif
pythontech 0:5868e8752d44 120 #if !MICROPY_PY_BUILTINS_STR_UNICODE
pythontech 0:5868e8752d44 121 bool is_bytes = MP_OBJ_IS_TYPE(self_in, &mp_type_bytes);
pythontech 0:5868e8752d44 122 #else
pythontech 0:5868e8752d44 123 bool is_bytes = true;
pythontech 0:5868e8752d44 124 #endif
pythontech 0:5868e8752d44 125 if (kind == PRINT_RAW || (!MICROPY_PY_BUILTINS_STR_UNICODE && kind == PRINT_STR && !is_bytes)) {
pythontech 0:5868e8752d44 126 mp_printf(print, "%.*s", str_len, str_data);
pythontech 0:5868e8752d44 127 } else {
pythontech 0:5868e8752d44 128 if (is_bytes) {
pythontech 0:5868e8752d44 129 mp_print_str(print, "b");
pythontech 0:5868e8752d44 130 }
pythontech 0:5868e8752d44 131 mp_str_print_quoted(print, str_data, str_len, is_bytes);
pythontech 0:5868e8752d44 132 }
pythontech 0:5868e8752d44 133 }
pythontech 0:5868e8752d44 134
pythontech 0:5868e8752d44 135 mp_obj_t mp_obj_str_make_new(const mp_obj_type_t *type, size_t n_args, size_t n_kw, const mp_obj_t *args) {
pythontech 0:5868e8752d44 136 #if MICROPY_CPYTHON_COMPAT
pythontech 0:5868e8752d44 137 if (n_kw != 0) {
pythontech 0:5868e8752d44 138 mp_arg_error_unimpl_kw();
pythontech 0:5868e8752d44 139 }
pythontech 0:5868e8752d44 140 #endif
pythontech 0:5868e8752d44 141
pythontech 0:5868e8752d44 142 mp_arg_check_num(n_args, n_kw, 0, 3, false);
pythontech 0:5868e8752d44 143
pythontech 0:5868e8752d44 144 switch (n_args) {
pythontech 0:5868e8752d44 145 case 0:
pythontech 0:5868e8752d44 146 return MP_OBJ_NEW_QSTR(MP_QSTR_);
pythontech 0:5868e8752d44 147
pythontech 0:5868e8752d44 148 case 1: {
pythontech 0:5868e8752d44 149 vstr_t vstr;
pythontech 0:5868e8752d44 150 mp_print_t print;
pythontech 0:5868e8752d44 151 vstr_init_print(&vstr, 16, &print);
pythontech 0:5868e8752d44 152 mp_obj_print_helper(&print, args[0], PRINT_STR);
pythontech 0:5868e8752d44 153 return mp_obj_new_str_from_vstr(type, &vstr);
pythontech 0:5868e8752d44 154 }
pythontech 0:5868e8752d44 155
pythontech 0:5868e8752d44 156 default: // 2 or 3 args
pythontech 0:5868e8752d44 157 // TODO: validate 2nd/3rd args
pythontech 0:5868e8752d44 158 if (MP_OBJ_IS_TYPE(args[0], &mp_type_bytes)) {
pythontech 0:5868e8752d44 159 GET_STR_DATA_LEN(args[0], str_data, str_len);
pythontech 0:5868e8752d44 160 GET_STR_HASH(args[0], str_hash);
pythontech 0:5868e8752d44 161 mp_obj_str_t *o = MP_OBJ_TO_PTR(mp_obj_new_str_of_type(type, NULL, str_len));
pythontech 0:5868e8752d44 162 o->data = str_data;
pythontech 0:5868e8752d44 163 o->hash = str_hash;
pythontech 0:5868e8752d44 164 return MP_OBJ_FROM_PTR(o);
pythontech 0:5868e8752d44 165 } else {
pythontech 0:5868e8752d44 166 mp_buffer_info_t bufinfo;
pythontech 0:5868e8752d44 167 mp_get_buffer_raise(args[0], &bufinfo, MP_BUFFER_READ);
pythontech 0:5868e8752d44 168 return mp_obj_new_str(bufinfo.buf, bufinfo.len, false);
pythontech 0:5868e8752d44 169 }
pythontech 0:5868e8752d44 170 }
pythontech 0:5868e8752d44 171 }
pythontech 0:5868e8752d44 172
pythontech 0:5868e8752d44 173 STATIC mp_obj_t bytes_make_new(const mp_obj_type_t *type_in, size_t n_args, size_t n_kw, const mp_obj_t *args) {
pythontech 0:5868e8752d44 174 (void)type_in;
pythontech 0:5868e8752d44 175
pythontech 0:5868e8752d44 176 #if MICROPY_CPYTHON_COMPAT
pythontech 0:5868e8752d44 177 if (n_kw != 0) {
pythontech 0:5868e8752d44 178 mp_arg_error_unimpl_kw();
pythontech 0:5868e8752d44 179 }
pythontech 0:5868e8752d44 180 #else
pythontech 0:5868e8752d44 181 (void)n_kw;
pythontech 0:5868e8752d44 182 #endif
pythontech 0:5868e8752d44 183
pythontech 0:5868e8752d44 184 if (n_args == 0) {
pythontech 0:5868e8752d44 185 return mp_const_empty_bytes;
pythontech 0:5868e8752d44 186 }
pythontech 0:5868e8752d44 187
pythontech 0:5868e8752d44 188 if (MP_OBJ_IS_STR(args[0])) {
pythontech 0:5868e8752d44 189 if (n_args < 2 || n_args > 3) {
pythontech 0:5868e8752d44 190 goto wrong_args;
pythontech 0:5868e8752d44 191 }
pythontech 0:5868e8752d44 192 GET_STR_DATA_LEN(args[0], str_data, str_len);
pythontech 0:5868e8752d44 193 GET_STR_HASH(args[0], str_hash);
pythontech 0:5868e8752d44 194 mp_obj_str_t *o = MP_OBJ_TO_PTR(mp_obj_new_str_of_type(&mp_type_bytes, NULL, str_len));
pythontech 0:5868e8752d44 195 o->data = str_data;
pythontech 0:5868e8752d44 196 o->hash = str_hash;
pythontech 0:5868e8752d44 197 return MP_OBJ_FROM_PTR(o);
pythontech 0:5868e8752d44 198 }
pythontech 0:5868e8752d44 199
pythontech 0:5868e8752d44 200 if (n_args > 1) {
pythontech 0:5868e8752d44 201 goto wrong_args;
pythontech 0:5868e8752d44 202 }
pythontech 0:5868e8752d44 203
pythontech 0:5868e8752d44 204 if (MP_OBJ_IS_SMALL_INT(args[0])) {
pythontech 0:5868e8752d44 205 uint len = MP_OBJ_SMALL_INT_VALUE(args[0]);
pythontech 0:5868e8752d44 206 vstr_t vstr;
pythontech 0:5868e8752d44 207 vstr_init_len(&vstr, len);
pythontech 0:5868e8752d44 208 memset(vstr.buf, 0, len);
pythontech 0:5868e8752d44 209 return mp_obj_new_str_from_vstr(&mp_type_bytes, &vstr);
pythontech 0:5868e8752d44 210 }
pythontech 0:5868e8752d44 211
pythontech 0:5868e8752d44 212 // check if argument has the buffer protocol
pythontech 0:5868e8752d44 213 mp_buffer_info_t bufinfo;
pythontech 0:5868e8752d44 214 if (mp_get_buffer(args[0], &bufinfo, MP_BUFFER_READ)) {
pythontech 0:5868e8752d44 215 return mp_obj_new_str_of_type(&mp_type_bytes, bufinfo.buf, bufinfo.len);
pythontech 0:5868e8752d44 216 }
pythontech 0:5868e8752d44 217
pythontech 0:5868e8752d44 218 vstr_t vstr;
pythontech 0:5868e8752d44 219 // Try to create array of exact len if initializer len is known
pythontech 0:5868e8752d44 220 mp_obj_t len_in = mp_obj_len_maybe(args[0]);
pythontech 0:5868e8752d44 221 if (len_in == MP_OBJ_NULL) {
pythontech 0:5868e8752d44 222 vstr_init(&vstr, 16);
pythontech 0:5868e8752d44 223 } else {
pythontech 0:5868e8752d44 224 mp_int_t len = MP_OBJ_SMALL_INT_VALUE(len_in);
pythontech 0:5868e8752d44 225 vstr_init(&vstr, len);
pythontech 0:5868e8752d44 226 }
pythontech 0:5868e8752d44 227
pythontech 0:5868e8752d44 228 mp_obj_t iterable = mp_getiter(args[0]);
pythontech 0:5868e8752d44 229 mp_obj_t item;
pythontech 0:5868e8752d44 230 while ((item = mp_iternext(iterable)) != MP_OBJ_STOP_ITERATION) {
pythontech 0:5868e8752d44 231 mp_int_t val = mp_obj_get_int(item);
pythontech 0:5868e8752d44 232 #if MICROPY_CPYTHON_COMPAT
pythontech 0:5868e8752d44 233 if (val < 0 || val > 255) {
pythontech 0:5868e8752d44 234 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError, "bytes value out of range"));
pythontech 0:5868e8752d44 235 }
pythontech 0:5868e8752d44 236 #endif
pythontech 0:5868e8752d44 237 vstr_add_byte(&vstr, val);
pythontech 0:5868e8752d44 238 }
pythontech 0:5868e8752d44 239
pythontech 0:5868e8752d44 240 return mp_obj_new_str_from_vstr(&mp_type_bytes, &vstr);
pythontech 0:5868e8752d44 241
pythontech 0:5868e8752d44 242 wrong_args:
pythontech 0:5868e8752d44 243 nlr_raise(mp_obj_new_exception_msg(&mp_type_TypeError, "wrong number of arguments"));
pythontech 0:5868e8752d44 244 }
pythontech 0:5868e8752d44 245
pythontech 0:5868e8752d44 246 // like strstr but with specified length and allows \0 bytes
pythontech 0:5868e8752d44 247 // TODO replace with something more efficient/standard
pythontech 0:5868e8752d44 248 const byte *find_subbytes(const byte *haystack, mp_uint_t hlen, const byte *needle, mp_uint_t nlen, mp_int_t direction) {
pythontech 0:5868e8752d44 249 if (hlen >= nlen) {
pythontech 0:5868e8752d44 250 mp_uint_t str_index, str_index_end;
pythontech 0:5868e8752d44 251 if (direction > 0) {
pythontech 0:5868e8752d44 252 str_index = 0;
pythontech 0:5868e8752d44 253 str_index_end = hlen - nlen;
pythontech 0:5868e8752d44 254 } else {
pythontech 0:5868e8752d44 255 str_index = hlen - nlen;
pythontech 0:5868e8752d44 256 str_index_end = 0;
pythontech 0:5868e8752d44 257 }
pythontech 0:5868e8752d44 258 for (;;) {
pythontech 0:5868e8752d44 259 if (memcmp(&haystack[str_index], needle, nlen) == 0) {
pythontech 0:5868e8752d44 260 //found
pythontech 0:5868e8752d44 261 return haystack + str_index;
pythontech 0:5868e8752d44 262 }
pythontech 0:5868e8752d44 263 if (str_index == str_index_end) {
pythontech 0:5868e8752d44 264 //not found
pythontech 0:5868e8752d44 265 break;
pythontech 0:5868e8752d44 266 }
pythontech 0:5868e8752d44 267 str_index += direction;
pythontech 0:5868e8752d44 268 }
pythontech 0:5868e8752d44 269 }
pythontech 0:5868e8752d44 270 return NULL;
pythontech 0:5868e8752d44 271 }
pythontech 0:5868e8752d44 272
pythontech 0:5868e8752d44 273 // Note: this function is used to check if an object is a str or bytes, which
pythontech 0:5868e8752d44 274 // works because both those types use it as their binary_op method. Revisit
pythontech 0:5868e8752d44 275 // MP_OBJ_IS_STR_OR_BYTES if this fact changes.
pythontech 0:5868e8752d44 276 mp_obj_t mp_obj_str_binary_op(mp_uint_t op, mp_obj_t lhs_in, mp_obj_t rhs_in) {
pythontech 0:5868e8752d44 277 // check for modulo
pythontech 0:5868e8752d44 278 if (op == MP_BINARY_OP_MODULO) {
pythontech 0:5868e8752d44 279 mp_obj_t *args;
pythontech 0:5868e8752d44 280 mp_uint_t n_args;
pythontech 0:5868e8752d44 281 mp_obj_t dict = MP_OBJ_NULL;
pythontech 0:5868e8752d44 282 if (MP_OBJ_IS_TYPE(rhs_in, &mp_type_tuple)) {
pythontech 0:5868e8752d44 283 // TODO: Support tuple subclasses?
pythontech 0:5868e8752d44 284 mp_obj_tuple_get(rhs_in, &n_args, &args);
pythontech 0:5868e8752d44 285 } else if (MP_OBJ_IS_TYPE(rhs_in, &mp_type_dict)) {
pythontech 0:5868e8752d44 286 args = NULL;
pythontech 0:5868e8752d44 287 n_args = 0;
pythontech 0:5868e8752d44 288 dict = rhs_in;
pythontech 0:5868e8752d44 289 } else {
pythontech 0:5868e8752d44 290 args = &rhs_in;
pythontech 0:5868e8752d44 291 n_args = 1;
pythontech 0:5868e8752d44 292 }
pythontech 0:5868e8752d44 293 return str_modulo_format(lhs_in, n_args, args, dict);
pythontech 0:5868e8752d44 294 }
pythontech 0:5868e8752d44 295
pythontech 0:5868e8752d44 296 // from now on we need lhs type and data, so extract them
pythontech 0:5868e8752d44 297 mp_obj_type_t *lhs_type = mp_obj_get_type(lhs_in);
pythontech 0:5868e8752d44 298 GET_STR_DATA_LEN(lhs_in, lhs_data, lhs_len);
pythontech 0:5868e8752d44 299
pythontech 0:5868e8752d44 300 // check for multiply
pythontech 0:5868e8752d44 301 if (op == MP_BINARY_OP_MULTIPLY) {
pythontech 0:5868e8752d44 302 mp_int_t n;
pythontech 0:5868e8752d44 303 if (!mp_obj_get_int_maybe(rhs_in, &n)) {
pythontech 0:5868e8752d44 304 return MP_OBJ_NULL; // op not supported
pythontech 0:5868e8752d44 305 }
pythontech 0:5868e8752d44 306 if (n <= 0) {
pythontech 0:5868e8752d44 307 if (lhs_type == &mp_type_str) {
pythontech 0:5868e8752d44 308 return MP_OBJ_NEW_QSTR(MP_QSTR_); // empty str
pythontech 0:5868e8752d44 309 } else {
pythontech 0:5868e8752d44 310 return mp_const_empty_bytes;
pythontech 0:5868e8752d44 311 }
pythontech 0:5868e8752d44 312 }
pythontech 0:5868e8752d44 313 vstr_t vstr;
pythontech 0:5868e8752d44 314 vstr_init_len(&vstr, lhs_len * n);
pythontech 0:5868e8752d44 315 mp_seq_multiply(lhs_data, sizeof(*lhs_data), lhs_len, n, vstr.buf);
pythontech 0:5868e8752d44 316 return mp_obj_new_str_from_vstr(lhs_type, &vstr);
pythontech 0:5868e8752d44 317 }
pythontech 0:5868e8752d44 318
pythontech 0:5868e8752d44 319 // From now on all operations allow:
pythontech 0:5868e8752d44 320 // - str with str
pythontech 0:5868e8752d44 321 // - bytes with bytes
pythontech 0:5868e8752d44 322 // - bytes with bytearray
pythontech 0:5868e8752d44 323 // - bytes with array.array
pythontech 0:5868e8752d44 324 // To do this efficiently we use the buffer protocol to extract the raw
pythontech 0:5868e8752d44 325 // data for the rhs, but only if the lhs is a bytes object.
pythontech 0:5868e8752d44 326 //
pythontech 0:5868e8752d44 327 // NOTE: CPython does not allow comparison between bytes ard array.array
pythontech 0:5868e8752d44 328 // (even if the array is of type 'b'), even though it allows addition of
pythontech 0:5868e8752d44 329 // such types. We are not compatible with this (we do allow comparison
pythontech 0:5868e8752d44 330 // of bytes with anything that has the buffer protocol). It would be
pythontech 0:5868e8752d44 331 // easy to "fix" this with a bit of extra logic below, but it costs code
pythontech 0:5868e8752d44 332 // size and execution time so we don't.
pythontech 0:5868e8752d44 333
pythontech 0:5868e8752d44 334 const byte *rhs_data;
pythontech 0:5868e8752d44 335 mp_uint_t rhs_len;
pythontech 0:5868e8752d44 336 if (lhs_type == mp_obj_get_type(rhs_in)) {
pythontech 0:5868e8752d44 337 GET_STR_DATA_LEN(rhs_in, rhs_data_, rhs_len_);
pythontech 0:5868e8752d44 338 rhs_data = rhs_data_;
pythontech 0:5868e8752d44 339 rhs_len = rhs_len_;
pythontech 0:5868e8752d44 340 } else if (lhs_type == &mp_type_bytes) {
pythontech 0:5868e8752d44 341 mp_buffer_info_t bufinfo;
pythontech 0:5868e8752d44 342 if (!mp_get_buffer(rhs_in, &bufinfo, MP_BUFFER_READ)) {
pythontech 0:5868e8752d44 343 return MP_OBJ_NULL; // op not supported
pythontech 0:5868e8752d44 344 }
pythontech 0:5868e8752d44 345 rhs_data = bufinfo.buf;
pythontech 0:5868e8752d44 346 rhs_len = bufinfo.len;
pythontech 0:5868e8752d44 347 } else {
pythontech 0:5868e8752d44 348 // incompatible types
pythontech 0:5868e8752d44 349 return MP_OBJ_NULL; // op not supported
pythontech 0:5868e8752d44 350 }
pythontech 0:5868e8752d44 351
pythontech 0:5868e8752d44 352 switch (op) {
pythontech 0:5868e8752d44 353 case MP_BINARY_OP_ADD:
pythontech 0:5868e8752d44 354 case MP_BINARY_OP_INPLACE_ADD: {
pythontech 0:5868e8752d44 355 vstr_t vstr;
pythontech 0:5868e8752d44 356 vstr_init_len(&vstr, lhs_len + rhs_len);
pythontech 0:5868e8752d44 357 memcpy(vstr.buf, lhs_data, lhs_len);
pythontech 0:5868e8752d44 358 memcpy(vstr.buf + lhs_len, rhs_data, rhs_len);
pythontech 0:5868e8752d44 359 return mp_obj_new_str_from_vstr(lhs_type, &vstr);
pythontech 0:5868e8752d44 360 }
pythontech 0:5868e8752d44 361
pythontech 0:5868e8752d44 362 case MP_BINARY_OP_IN:
pythontech 0:5868e8752d44 363 /* NOTE `a in b` is `b.__contains__(a)` */
pythontech 0:5868e8752d44 364 return mp_obj_new_bool(find_subbytes(lhs_data, lhs_len, rhs_data, rhs_len, 1) != NULL);
pythontech 0:5868e8752d44 365
pythontech 0:5868e8752d44 366 //case MP_BINARY_OP_NOT_EQUAL: // This is never passed here
pythontech 0:5868e8752d44 367 case MP_BINARY_OP_EQUAL: // This will be passed only for bytes, str is dealt with in mp_obj_equal()
pythontech 0:5868e8752d44 368 case MP_BINARY_OP_LESS:
pythontech 0:5868e8752d44 369 case MP_BINARY_OP_LESS_EQUAL:
pythontech 0:5868e8752d44 370 case MP_BINARY_OP_MORE:
pythontech 0:5868e8752d44 371 case MP_BINARY_OP_MORE_EQUAL:
pythontech 0:5868e8752d44 372 return mp_obj_new_bool(mp_seq_cmp_bytes(op, lhs_data, lhs_len, rhs_data, rhs_len));
pythontech 0:5868e8752d44 373 }
pythontech 0:5868e8752d44 374
pythontech 0:5868e8752d44 375 return MP_OBJ_NULL; // op not supported
pythontech 0:5868e8752d44 376 }
pythontech 0:5868e8752d44 377
pythontech 0:5868e8752d44 378 #if !MICROPY_PY_BUILTINS_STR_UNICODE
pythontech 0:5868e8752d44 379 // objstrunicode defines own version
pythontech 0:5868e8752d44 380 const byte *str_index_to_ptr(const mp_obj_type_t *type, const byte *self_data, size_t self_len,
pythontech 0:5868e8752d44 381 mp_obj_t index, bool is_slice) {
pythontech 0:5868e8752d44 382 mp_uint_t index_val = mp_get_index(type, self_len, index, is_slice);
pythontech 0:5868e8752d44 383 return self_data + index_val;
pythontech 0:5868e8752d44 384 }
pythontech 0:5868e8752d44 385 #endif
pythontech 0:5868e8752d44 386
pythontech 0:5868e8752d44 387 // This is used for both bytes and 8-bit strings. This is not used for unicode strings.
pythontech 0:5868e8752d44 388 STATIC mp_obj_t bytes_subscr(mp_obj_t self_in, mp_obj_t index, mp_obj_t value) {
pythontech 0:5868e8752d44 389 mp_obj_type_t *type = mp_obj_get_type(self_in);
pythontech 0:5868e8752d44 390 GET_STR_DATA_LEN(self_in, self_data, self_len);
pythontech 0:5868e8752d44 391 if (value == MP_OBJ_SENTINEL) {
pythontech 0:5868e8752d44 392 // load
pythontech 0:5868e8752d44 393 #if MICROPY_PY_BUILTINS_SLICE
pythontech 0:5868e8752d44 394 if (MP_OBJ_IS_TYPE(index, &mp_type_slice)) {
pythontech 0:5868e8752d44 395 mp_bound_slice_t slice;
pythontech 0:5868e8752d44 396 if (!mp_seq_get_fast_slice_indexes(self_len, index, &slice)) {
pythontech 0:5868e8752d44 397 mp_not_implemented("only slices with step=1 (aka None) are supported");
pythontech 0:5868e8752d44 398 }
pythontech 0:5868e8752d44 399 return mp_obj_new_str_of_type(type, self_data + slice.start, slice.stop - slice.start);
pythontech 0:5868e8752d44 400 }
pythontech 0:5868e8752d44 401 #endif
pythontech 0:5868e8752d44 402 mp_uint_t index_val = mp_get_index(type, self_len, index, false);
pythontech 0:5868e8752d44 403 // If we have unicode enabled the type will always be bytes, so take the short cut.
pythontech 0:5868e8752d44 404 if (MICROPY_PY_BUILTINS_STR_UNICODE || type == &mp_type_bytes) {
pythontech 0:5868e8752d44 405 return MP_OBJ_NEW_SMALL_INT(self_data[index_val]);
pythontech 0:5868e8752d44 406 } else {
pythontech 0:5868e8752d44 407 return mp_obj_new_str((char*)&self_data[index_val], 1, true);
pythontech 0:5868e8752d44 408 }
pythontech 0:5868e8752d44 409 } else {
pythontech 0:5868e8752d44 410 return MP_OBJ_NULL; // op not supported
pythontech 0:5868e8752d44 411 }
pythontech 0:5868e8752d44 412 }
pythontech 0:5868e8752d44 413
pythontech 0:5868e8752d44 414 STATIC mp_obj_t str_join(mp_obj_t self_in, mp_obj_t arg) {
pythontech 0:5868e8752d44 415 assert(MP_OBJ_IS_STR_OR_BYTES(self_in));
pythontech 0:5868e8752d44 416 const mp_obj_type_t *self_type = mp_obj_get_type(self_in);
pythontech 0:5868e8752d44 417
pythontech 0:5868e8752d44 418 // get separation string
pythontech 0:5868e8752d44 419 GET_STR_DATA_LEN(self_in, sep_str, sep_len);
pythontech 0:5868e8752d44 420
pythontech 0:5868e8752d44 421 // process args
pythontech 0:5868e8752d44 422 mp_uint_t seq_len;
pythontech 0:5868e8752d44 423 mp_obj_t *seq_items;
pythontech 0:5868e8752d44 424 if (MP_OBJ_IS_TYPE(arg, &mp_type_tuple)) {
pythontech 0:5868e8752d44 425 mp_obj_tuple_get(arg, &seq_len, &seq_items);
pythontech 0:5868e8752d44 426 } else {
pythontech 0:5868e8752d44 427 if (!MP_OBJ_IS_TYPE(arg, &mp_type_list)) {
pythontech 0:5868e8752d44 428 // arg is not a list, try to convert it to one
pythontech 0:5868e8752d44 429 // TODO: Try to optimize?
pythontech 0:5868e8752d44 430 arg = mp_type_list.make_new(&mp_type_list, 1, 0, &arg);
pythontech 0:5868e8752d44 431 }
pythontech 0:5868e8752d44 432 mp_obj_list_get(arg, &seq_len, &seq_items);
pythontech 0:5868e8752d44 433 }
pythontech 0:5868e8752d44 434
pythontech 0:5868e8752d44 435 // count required length
pythontech 0:5868e8752d44 436 mp_uint_t required_len = 0;
pythontech 0:5868e8752d44 437 for (mp_uint_t i = 0; i < seq_len; i++) {
pythontech 0:5868e8752d44 438 if (mp_obj_get_type(seq_items[i]) != self_type) {
pythontech 0:5868e8752d44 439 nlr_raise(mp_obj_new_exception_msg(&mp_type_TypeError,
pythontech 0:5868e8752d44 440 "join expects a list of str/bytes objects consistent with self object"));
pythontech 0:5868e8752d44 441 }
pythontech 0:5868e8752d44 442 if (i > 0) {
pythontech 0:5868e8752d44 443 required_len += sep_len;
pythontech 0:5868e8752d44 444 }
pythontech 0:5868e8752d44 445 GET_STR_LEN(seq_items[i], l);
pythontech 0:5868e8752d44 446 required_len += l;
pythontech 0:5868e8752d44 447 }
pythontech 0:5868e8752d44 448
pythontech 0:5868e8752d44 449 // make joined string
pythontech 0:5868e8752d44 450 vstr_t vstr;
pythontech 0:5868e8752d44 451 vstr_init_len(&vstr, required_len);
pythontech 0:5868e8752d44 452 byte *data = (byte*)vstr.buf;
pythontech 0:5868e8752d44 453 for (mp_uint_t i = 0; i < seq_len; i++) {
pythontech 0:5868e8752d44 454 if (i > 0) {
pythontech 0:5868e8752d44 455 memcpy(data, sep_str, sep_len);
pythontech 0:5868e8752d44 456 data += sep_len;
pythontech 0:5868e8752d44 457 }
pythontech 0:5868e8752d44 458 GET_STR_DATA_LEN(seq_items[i], s, l);
pythontech 0:5868e8752d44 459 memcpy(data, s, l);
pythontech 0:5868e8752d44 460 data += l;
pythontech 0:5868e8752d44 461 }
pythontech 0:5868e8752d44 462
pythontech 0:5868e8752d44 463 // return joined string
pythontech 0:5868e8752d44 464 return mp_obj_new_str_from_vstr(self_type, &vstr);
pythontech 0:5868e8752d44 465 }
pythontech 0:5868e8752d44 466
pythontech 0:5868e8752d44 467 enum {SPLIT = 0, KEEP = 1, SPLITLINES = 2};
pythontech 0:5868e8752d44 468
pythontech 0:5868e8752d44 469 STATIC inline mp_obj_t str_split_internal(mp_uint_t n_args, const mp_obj_t *args, int type) {
pythontech 0:5868e8752d44 470 const mp_obj_type_t *self_type = mp_obj_get_type(args[0]);
pythontech 0:5868e8752d44 471 mp_int_t splits = -1;
pythontech 0:5868e8752d44 472 mp_obj_t sep = mp_const_none;
pythontech 0:5868e8752d44 473 if (n_args > 1) {
pythontech 0:5868e8752d44 474 sep = args[1];
pythontech 0:5868e8752d44 475 if (n_args > 2) {
pythontech 0:5868e8752d44 476 splits = mp_obj_get_int(args[2]);
pythontech 0:5868e8752d44 477 }
pythontech 0:5868e8752d44 478 }
pythontech 0:5868e8752d44 479
pythontech 0:5868e8752d44 480 mp_obj_t res = mp_obj_new_list(0, NULL);
pythontech 0:5868e8752d44 481 GET_STR_DATA_LEN(args[0], s, len);
pythontech 0:5868e8752d44 482 const byte *top = s + len;
pythontech 0:5868e8752d44 483
pythontech 0:5868e8752d44 484 if (sep == mp_const_none) {
pythontech 0:5868e8752d44 485 // sep not given, so separate on whitespace
pythontech 0:5868e8752d44 486
pythontech 0:5868e8752d44 487 // Initial whitespace is not counted as split, so we pre-do it
pythontech 0:5868e8752d44 488 while (s < top && unichar_isspace(*s)) s++;
pythontech 0:5868e8752d44 489 while (s < top && splits != 0) {
pythontech 0:5868e8752d44 490 const byte *start = s;
pythontech 0:5868e8752d44 491 while (s < top && !unichar_isspace(*s)) s++;
pythontech 0:5868e8752d44 492 mp_obj_list_append(res, mp_obj_new_str_of_type(self_type, start, s - start));
pythontech 0:5868e8752d44 493 if (s >= top) {
pythontech 0:5868e8752d44 494 break;
pythontech 0:5868e8752d44 495 }
pythontech 0:5868e8752d44 496 while (s < top && unichar_isspace(*s)) s++;
pythontech 0:5868e8752d44 497 if (splits > 0) {
pythontech 0:5868e8752d44 498 splits--;
pythontech 0:5868e8752d44 499 }
pythontech 0:5868e8752d44 500 }
pythontech 0:5868e8752d44 501
pythontech 0:5868e8752d44 502 if (s < top) {
pythontech 0:5868e8752d44 503 mp_obj_list_append(res, mp_obj_new_str_of_type(self_type, s, top - s));
pythontech 0:5868e8752d44 504 }
pythontech 0:5868e8752d44 505
pythontech 0:5868e8752d44 506 } else {
pythontech 0:5868e8752d44 507 // sep given
pythontech 0:5868e8752d44 508 if (mp_obj_get_type(sep) != self_type) {
pythontech 0:5868e8752d44 509 bad_implicit_conversion(sep);
pythontech 0:5868e8752d44 510 }
pythontech 0:5868e8752d44 511
pythontech 0:5868e8752d44 512 mp_uint_t sep_len;
pythontech 0:5868e8752d44 513 const char *sep_str = mp_obj_str_get_data(sep, &sep_len);
pythontech 0:5868e8752d44 514
pythontech 0:5868e8752d44 515 if (sep_len == 0) {
pythontech 0:5868e8752d44 516 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError, "empty separator"));
pythontech 0:5868e8752d44 517 }
pythontech 0:5868e8752d44 518
pythontech 0:5868e8752d44 519 for (;;) {
pythontech 0:5868e8752d44 520 const byte *start = s;
pythontech 0:5868e8752d44 521 for (;;) {
pythontech 0:5868e8752d44 522 if (splits == 0 || s + sep_len > top) {
pythontech 0:5868e8752d44 523 s = top;
pythontech 0:5868e8752d44 524 break;
pythontech 0:5868e8752d44 525 } else if (memcmp(s, sep_str, sep_len) == 0) {
pythontech 0:5868e8752d44 526 break;
pythontech 0:5868e8752d44 527 }
pythontech 0:5868e8752d44 528 s++;
pythontech 0:5868e8752d44 529 }
pythontech 0:5868e8752d44 530 mp_uint_t sub_len = s - start;
pythontech 0:5868e8752d44 531 if (MP_LIKELY(!(sub_len == 0 && s == top && (type && SPLITLINES)))) {
pythontech 0:5868e8752d44 532 if (start + sub_len != top && (type & KEEP)) {
pythontech 0:5868e8752d44 533 sub_len++;
pythontech 0:5868e8752d44 534 }
pythontech 0:5868e8752d44 535 mp_obj_list_append(res, mp_obj_new_str_of_type(self_type, start, sub_len));
pythontech 0:5868e8752d44 536 }
pythontech 0:5868e8752d44 537 if (s >= top) {
pythontech 0:5868e8752d44 538 break;
pythontech 0:5868e8752d44 539 }
pythontech 0:5868e8752d44 540 s += sep_len;
pythontech 0:5868e8752d44 541 if (splits > 0) {
pythontech 0:5868e8752d44 542 splits--;
pythontech 0:5868e8752d44 543 }
pythontech 0:5868e8752d44 544 }
pythontech 0:5868e8752d44 545 }
pythontech 0:5868e8752d44 546
pythontech 0:5868e8752d44 547 return res;
pythontech 0:5868e8752d44 548 }
pythontech 0:5868e8752d44 549
pythontech 0:5868e8752d44 550 mp_obj_t mp_obj_str_split(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 551 return str_split_internal(n_args, args, SPLIT);
pythontech 0:5868e8752d44 552 }
pythontech 0:5868e8752d44 553
pythontech 0:5868e8752d44 554 #if MICROPY_PY_BUILTINS_STR_SPLITLINES
pythontech 0:5868e8752d44 555 STATIC mp_obj_t str_splitlines(size_t n_args, const mp_obj_t *pos_args, mp_map_t *kw_args) {
pythontech 0:5868e8752d44 556 static const mp_arg_t allowed_args[] = {
pythontech 0:5868e8752d44 557 { MP_QSTR_keepends, MP_ARG_BOOL, {.u_bool = false} },
pythontech 0:5868e8752d44 558 };
pythontech 0:5868e8752d44 559
pythontech 0:5868e8752d44 560 // parse args
pythontech 0:5868e8752d44 561 struct {
pythontech 0:5868e8752d44 562 mp_arg_val_t keepends;
pythontech 0:5868e8752d44 563 } args;
pythontech 0:5868e8752d44 564 mp_arg_parse_all(n_args - 1, pos_args + 1, kw_args,
pythontech 0:5868e8752d44 565 MP_ARRAY_SIZE(allowed_args), allowed_args, (mp_arg_val_t*)&args);
pythontech 0:5868e8752d44 566
Colin Hogben 2:c89e95946844 567 mp_obj_t new_args[2] = {pos_args[0], MP_OBJ_NEW_QSTR(MP_QSTR__0x0a_)};
pythontech 0:5868e8752d44 568 return str_split_internal(2, new_args, SPLITLINES | (args.keepends.u_bool ? KEEP : 0));
pythontech 0:5868e8752d44 569 }
pythontech 0:5868e8752d44 570 #endif
pythontech 0:5868e8752d44 571
pythontech 0:5868e8752d44 572 STATIC mp_obj_t str_rsplit(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 573 if (n_args < 3) {
pythontech 0:5868e8752d44 574 // If we don't have split limit, it doesn't matter from which side
pythontech 0:5868e8752d44 575 // we split.
pythontech 0:5868e8752d44 576 return mp_obj_str_split(n_args, args);
pythontech 0:5868e8752d44 577 }
pythontech 0:5868e8752d44 578 const mp_obj_type_t *self_type = mp_obj_get_type(args[0]);
pythontech 0:5868e8752d44 579 mp_obj_t sep = args[1];
pythontech 0:5868e8752d44 580 GET_STR_DATA_LEN(args[0], s, len);
pythontech 0:5868e8752d44 581
pythontech 0:5868e8752d44 582 mp_int_t splits = mp_obj_get_int(args[2]);
pythontech 0:5868e8752d44 583 mp_int_t org_splits = splits;
pythontech 0:5868e8752d44 584 // Preallocate list to the max expected # of elements, as we
pythontech 0:5868e8752d44 585 // will fill it from the end.
pythontech 0:5868e8752d44 586 mp_obj_list_t *res = MP_OBJ_TO_PTR(mp_obj_new_list(splits + 1, NULL));
pythontech 0:5868e8752d44 587 mp_int_t idx = splits;
pythontech 0:5868e8752d44 588
pythontech 0:5868e8752d44 589 if (sep == mp_const_none) {
pythontech 0:5868e8752d44 590 mp_not_implemented("rsplit(None,n)");
pythontech 0:5868e8752d44 591 } else {
pythontech 0:5868e8752d44 592 mp_uint_t sep_len;
pythontech 0:5868e8752d44 593 const char *sep_str = mp_obj_str_get_data(sep, &sep_len);
pythontech 0:5868e8752d44 594
pythontech 0:5868e8752d44 595 if (sep_len == 0) {
pythontech 0:5868e8752d44 596 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError, "empty separator"));
pythontech 0:5868e8752d44 597 }
pythontech 0:5868e8752d44 598
pythontech 0:5868e8752d44 599 const byte *beg = s;
pythontech 0:5868e8752d44 600 const byte *last = s + len;
pythontech 0:5868e8752d44 601 for (;;) {
pythontech 0:5868e8752d44 602 s = last - sep_len;
pythontech 0:5868e8752d44 603 for (;;) {
pythontech 0:5868e8752d44 604 if (splits == 0 || s < beg) {
pythontech 0:5868e8752d44 605 break;
pythontech 0:5868e8752d44 606 } else if (memcmp(s, sep_str, sep_len) == 0) {
pythontech 0:5868e8752d44 607 break;
pythontech 0:5868e8752d44 608 }
pythontech 0:5868e8752d44 609 s--;
pythontech 0:5868e8752d44 610 }
pythontech 0:5868e8752d44 611 if (s < beg || splits == 0) {
pythontech 0:5868e8752d44 612 res->items[idx] = mp_obj_new_str_of_type(self_type, beg, last - beg);
pythontech 0:5868e8752d44 613 break;
pythontech 0:5868e8752d44 614 }
pythontech 0:5868e8752d44 615 res->items[idx--] = mp_obj_new_str_of_type(self_type, s + sep_len, last - s - sep_len);
pythontech 0:5868e8752d44 616 last = s;
pythontech 0:5868e8752d44 617 if (splits > 0) {
pythontech 0:5868e8752d44 618 splits--;
pythontech 0:5868e8752d44 619 }
pythontech 0:5868e8752d44 620 }
pythontech 0:5868e8752d44 621 if (idx != 0) {
pythontech 0:5868e8752d44 622 // We split less parts than split limit, now go cleanup surplus
pythontech 0:5868e8752d44 623 mp_int_t used = org_splits + 1 - idx;
pythontech 0:5868e8752d44 624 memmove(res->items, &res->items[idx], used * sizeof(mp_obj_t));
pythontech 0:5868e8752d44 625 mp_seq_clear(res->items, used, res->alloc, sizeof(*res->items));
pythontech 0:5868e8752d44 626 res->len = used;
pythontech 0:5868e8752d44 627 }
pythontech 0:5868e8752d44 628 }
pythontech 0:5868e8752d44 629
pythontech 0:5868e8752d44 630 return MP_OBJ_FROM_PTR(res);
pythontech 0:5868e8752d44 631 }
pythontech 0:5868e8752d44 632
pythontech 0:5868e8752d44 633 STATIC mp_obj_t str_finder(mp_uint_t n_args, const mp_obj_t *args, mp_int_t direction, bool is_index) {
pythontech 0:5868e8752d44 634 const mp_obj_type_t *self_type = mp_obj_get_type(args[0]);
pythontech 0:5868e8752d44 635 assert(2 <= n_args && n_args <= 4);
pythontech 0:5868e8752d44 636 assert(MP_OBJ_IS_STR_OR_BYTES(args[0]));
pythontech 0:5868e8752d44 637
pythontech 0:5868e8752d44 638 // check argument type
pythontech 0:5868e8752d44 639 if (mp_obj_get_type(args[1]) != self_type) {
pythontech 0:5868e8752d44 640 bad_implicit_conversion(args[1]);
pythontech 0:5868e8752d44 641 }
pythontech 0:5868e8752d44 642
pythontech 0:5868e8752d44 643 GET_STR_DATA_LEN(args[0], haystack, haystack_len);
pythontech 0:5868e8752d44 644 GET_STR_DATA_LEN(args[1], needle, needle_len);
pythontech 0:5868e8752d44 645
pythontech 0:5868e8752d44 646 const byte *start = haystack;
pythontech 0:5868e8752d44 647 const byte *end = haystack + haystack_len;
pythontech 0:5868e8752d44 648 if (n_args >= 3 && args[2] != mp_const_none) {
pythontech 0:5868e8752d44 649 start = str_index_to_ptr(self_type, haystack, haystack_len, args[2], true);
pythontech 0:5868e8752d44 650 }
pythontech 0:5868e8752d44 651 if (n_args >= 4 && args[3] != mp_const_none) {
pythontech 0:5868e8752d44 652 end = str_index_to_ptr(self_type, haystack, haystack_len, args[3], true);
pythontech 0:5868e8752d44 653 }
pythontech 0:5868e8752d44 654
pythontech 0:5868e8752d44 655 const byte *p = find_subbytes(start, end - start, needle, needle_len, direction);
pythontech 0:5868e8752d44 656 if (p == NULL) {
pythontech 0:5868e8752d44 657 // not found
pythontech 0:5868e8752d44 658 if (is_index) {
pythontech 0:5868e8752d44 659 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError, "substring not found"));
pythontech 0:5868e8752d44 660 } else {
pythontech 0:5868e8752d44 661 return MP_OBJ_NEW_SMALL_INT(-1);
pythontech 0:5868e8752d44 662 }
pythontech 0:5868e8752d44 663 } else {
pythontech 0:5868e8752d44 664 // found
pythontech 0:5868e8752d44 665 #if MICROPY_PY_BUILTINS_STR_UNICODE
pythontech 0:5868e8752d44 666 if (self_type == &mp_type_str) {
pythontech 0:5868e8752d44 667 return MP_OBJ_NEW_SMALL_INT(utf8_ptr_to_index(haystack, p));
pythontech 0:5868e8752d44 668 }
pythontech 0:5868e8752d44 669 #endif
pythontech 0:5868e8752d44 670 return MP_OBJ_NEW_SMALL_INT(p - haystack);
pythontech 0:5868e8752d44 671 }
pythontech 0:5868e8752d44 672 }
pythontech 0:5868e8752d44 673
pythontech 0:5868e8752d44 674 STATIC mp_obj_t str_find(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 675 return str_finder(n_args, args, 1, false);
pythontech 0:5868e8752d44 676 }
pythontech 0:5868e8752d44 677
pythontech 0:5868e8752d44 678 STATIC mp_obj_t str_rfind(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 679 return str_finder(n_args, args, -1, false);
pythontech 0:5868e8752d44 680 }
pythontech 0:5868e8752d44 681
pythontech 0:5868e8752d44 682 STATIC mp_obj_t str_index(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 683 return str_finder(n_args, args, 1, true);
pythontech 0:5868e8752d44 684 }
pythontech 0:5868e8752d44 685
pythontech 0:5868e8752d44 686 STATIC mp_obj_t str_rindex(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 687 return str_finder(n_args, args, -1, true);
pythontech 0:5868e8752d44 688 }
pythontech 0:5868e8752d44 689
pythontech 0:5868e8752d44 690 // TODO: (Much) more variety in args
pythontech 0:5868e8752d44 691 STATIC mp_obj_t str_startswith(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 692 const mp_obj_type_t *self_type = mp_obj_get_type(args[0]);
pythontech 0:5868e8752d44 693 GET_STR_DATA_LEN(args[0], str, str_len);
pythontech 0:5868e8752d44 694 GET_STR_DATA_LEN(args[1], prefix, prefix_len);
pythontech 0:5868e8752d44 695 const byte *start = str;
pythontech 0:5868e8752d44 696 if (n_args > 2) {
pythontech 0:5868e8752d44 697 start = str_index_to_ptr(self_type, str, str_len, args[2], true);
pythontech 0:5868e8752d44 698 }
pythontech 0:5868e8752d44 699 if (prefix_len + (start - str) > str_len) {
pythontech 0:5868e8752d44 700 return mp_const_false;
pythontech 0:5868e8752d44 701 }
pythontech 0:5868e8752d44 702 return mp_obj_new_bool(memcmp(start, prefix, prefix_len) == 0);
pythontech 0:5868e8752d44 703 }
pythontech 0:5868e8752d44 704
pythontech 0:5868e8752d44 705 STATIC mp_obj_t str_endswith(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 706 GET_STR_DATA_LEN(args[0], str, str_len);
pythontech 0:5868e8752d44 707 GET_STR_DATA_LEN(args[1], suffix, suffix_len);
pythontech 0:5868e8752d44 708 if (n_args > 2) {
pythontech 0:5868e8752d44 709 mp_not_implemented("start/end indices");
pythontech 0:5868e8752d44 710 }
pythontech 0:5868e8752d44 711
pythontech 0:5868e8752d44 712 if (suffix_len > str_len) {
pythontech 0:5868e8752d44 713 return mp_const_false;
pythontech 0:5868e8752d44 714 }
pythontech 0:5868e8752d44 715 return mp_obj_new_bool(memcmp(str + (str_len - suffix_len), suffix, suffix_len) == 0);
pythontech 0:5868e8752d44 716 }
pythontech 0:5868e8752d44 717
pythontech 0:5868e8752d44 718 enum { LSTRIP, RSTRIP, STRIP };
pythontech 0:5868e8752d44 719
pythontech 0:5868e8752d44 720 STATIC mp_obj_t str_uni_strip(int type, mp_uint_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 721 assert(1 <= n_args && n_args <= 2);
pythontech 0:5868e8752d44 722 assert(MP_OBJ_IS_STR_OR_BYTES(args[0]));
pythontech 0:5868e8752d44 723 const mp_obj_type_t *self_type = mp_obj_get_type(args[0]);
pythontech 0:5868e8752d44 724
pythontech 0:5868e8752d44 725 const byte *chars_to_del;
pythontech 0:5868e8752d44 726 uint chars_to_del_len;
pythontech 0:5868e8752d44 727 static const byte whitespace[] = " \t\n\r\v\f";
pythontech 0:5868e8752d44 728
pythontech 0:5868e8752d44 729 if (n_args == 1) {
pythontech 0:5868e8752d44 730 chars_to_del = whitespace;
pythontech 0:5868e8752d44 731 chars_to_del_len = sizeof(whitespace);
pythontech 0:5868e8752d44 732 } else {
pythontech 0:5868e8752d44 733 if (mp_obj_get_type(args[1]) != self_type) {
pythontech 0:5868e8752d44 734 bad_implicit_conversion(args[1]);
pythontech 0:5868e8752d44 735 }
pythontech 0:5868e8752d44 736 GET_STR_DATA_LEN(args[1], s, l);
pythontech 0:5868e8752d44 737 chars_to_del = s;
pythontech 0:5868e8752d44 738 chars_to_del_len = l;
pythontech 0:5868e8752d44 739 }
pythontech 0:5868e8752d44 740
pythontech 0:5868e8752d44 741 GET_STR_DATA_LEN(args[0], orig_str, orig_str_len);
pythontech 0:5868e8752d44 742
pythontech 0:5868e8752d44 743 mp_uint_t first_good_char_pos = 0;
pythontech 0:5868e8752d44 744 bool first_good_char_pos_set = false;
pythontech 0:5868e8752d44 745 mp_uint_t last_good_char_pos = 0;
pythontech 0:5868e8752d44 746 mp_uint_t i = 0;
pythontech 0:5868e8752d44 747 mp_int_t delta = 1;
pythontech 0:5868e8752d44 748 if (type == RSTRIP) {
pythontech 0:5868e8752d44 749 i = orig_str_len - 1;
pythontech 0:5868e8752d44 750 delta = -1;
pythontech 0:5868e8752d44 751 }
pythontech 0:5868e8752d44 752 for (mp_uint_t len = orig_str_len; len > 0; len--) {
pythontech 0:5868e8752d44 753 if (find_subbytes(chars_to_del, chars_to_del_len, &orig_str[i], 1, 1) == NULL) {
pythontech 0:5868e8752d44 754 if (!first_good_char_pos_set) {
pythontech 0:5868e8752d44 755 first_good_char_pos_set = true;
pythontech 0:5868e8752d44 756 first_good_char_pos = i;
pythontech 0:5868e8752d44 757 if (type == LSTRIP) {
pythontech 0:5868e8752d44 758 last_good_char_pos = orig_str_len - 1;
pythontech 0:5868e8752d44 759 break;
pythontech 0:5868e8752d44 760 } else if (type == RSTRIP) {
pythontech 0:5868e8752d44 761 first_good_char_pos = 0;
pythontech 0:5868e8752d44 762 last_good_char_pos = i;
pythontech 0:5868e8752d44 763 break;
pythontech 0:5868e8752d44 764 }
pythontech 0:5868e8752d44 765 }
pythontech 0:5868e8752d44 766 last_good_char_pos = i;
pythontech 0:5868e8752d44 767 }
pythontech 0:5868e8752d44 768 i += delta;
pythontech 0:5868e8752d44 769 }
pythontech 0:5868e8752d44 770
pythontech 0:5868e8752d44 771 if (!first_good_char_pos_set) {
pythontech 0:5868e8752d44 772 // string is all whitespace, return ''
pythontech 0:5868e8752d44 773 if (self_type == &mp_type_str) {
pythontech 0:5868e8752d44 774 return MP_OBJ_NEW_QSTR(MP_QSTR_);
pythontech 0:5868e8752d44 775 } else {
pythontech 0:5868e8752d44 776 return mp_const_empty_bytes;
pythontech 0:5868e8752d44 777 }
pythontech 0:5868e8752d44 778 }
pythontech 0:5868e8752d44 779
pythontech 0:5868e8752d44 780 assert(last_good_char_pos >= first_good_char_pos);
pythontech 0:5868e8752d44 781 //+1 to accomodate the last character
pythontech 0:5868e8752d44 782 mp_uint_t stripped_len = last_good_char_pos - first_good_char_pos + 1;
pythontech 0:5868e8752d44 783 if (stripped_len == orig_str_len) {
pythontech 0:5868e8752d44 784 // If nothing was stripped, don't bother to dup original string
pythontech 0:5868e8752d44 785 // TODO: watch out for this case when we'll get to bytearray.strip()
pythontech 0:5868e8752d44 786 assert(first_good_char_pos == 0);
pythontech 0:5868e8752d44 787 return args[0];
pythontech 0:5868e8752d44 788 }
pythontech 0:5868e8752d44 789 return mp_obj_new_str_of_type(self_type, orig_str + first_good_char_pos, stripped_len);
pythontech 0:5868e8752d44 790 }
pythontech 0:5868e8752d44 791
pythontech 0:5868e8752d44 792 STATIC mp_obj_t str_strip(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 793 return str_uni_strip(STRIP, n_args, args);
pythontech 0:5868e8752d44 794 }
pythontech 0:5868e8752d44 795
pythontech 0:5868e8752d44 796 STATIC mp_obj_t str_lstrip(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 797 return str_uni_strip(LSTRIP, n_args, args);
pythontech 0:5868e8752d44 798 }
pythontech 0:5868e8752d44 799
pythontech 0:5868e8752d44 800 STATIC mp_obj_t str_rstrip(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 801 return str_uni_strip(RSTRIP, n_args, args);
pythontech 0:5868e8752d44 802 }
pythontech 0:5868e8752d44 803
pythontech 0:5868e8752d44 804 // Takes an int arg, but only parses unsigned numbers, and only changes
pythontech 0:5868e8752d44 805 // *num if at least one digit was parsed.
pythontech 0:5868e8752d44 806 STATIC const char *str_to_int(const char *str, const char *top, int *num) {
pythontech 0:5868e8752d44 807 if (str < top && '0' <= *str && *str <= '9') {
pythontech 0:5868e8752d44 808 *num = 0;
pythontech 0:5868e8752d44 809 do {
pythontech 0:5868e8752d44 810 *num = *num * 10 + (*str - '0');
pythontech 0:5868e8752d44 811 str++;
pythontech 0:5868e8752d44 812 }
pythontech 0:5868e8752d44 813 while (str < top && '0' <= *str && *str <= '9');
pythontech 0:5868e8752d44 814 }
pythontech 0:5868e8752d44 815 return str;
pythontech 0:5868e8752d44 816 }
pythontech 0:5868e8752d44 817
pythontech 0:5868e8752d44 818 STATIC bool isalignment(char ch) {
pythontech 0:5868e8752d44 819 return ch && strchr("<>=^", ch) != NULL;
pythontech 0:5868e8752d44 820 }
pythontech 0:5868e8752d44 821
pythontech 0:5868e8752d44 822 STATIC bool istype(char ch) {
pythontech 0:5868e8752d44 823 return ch && strchr("bcdeEfFgGnosxX%", ch) != NULL;
pythontech 0:5868e8752d44 824 }
pythontech 0:5868e8752d44 825
pythontech 0:5868e8752d44 826 STATIC bool arg_looks_integer(mp_obj_t arg) {
pythontech 0:5868e8752d44 827 return MP_OBJ_IS_TYPE(arg, &mp_type_bool) || MP_OBJ_IS_INT(arg);
pythontech 0:5868e8752d44 828 }
pythontech 0:5868e8752d44 829
pythontech 0:5868e8752d44 830 STATIC bool arg_looks_numeric(mp_obj_t arg) {
pythontech 0:5868e8752d44 831 return arg_looks_integer(arg)
pythontech 0:5868e8752d44 832 #if MICROPY_PY_BUILTINS_FLOAT
pythontech 0:5868e8752d44 833 || mp_obj_is_float(arg)
pythontech 0:5868e8752d44 834 #endif
pythontech 0:5868e8752d44 835 ;
pythontech 0:5868e8752d44 836 }
pythontech 0:5868e8752d44 837
pythontech 0:5868e8752d44 838 STATIC mp_obj_t arg_as_int(mp_obj_t arg) {
pythontech 0:5868e8752d44 839 #if MICROPY_PY_BUILTINS_FLOAT
pythontech 0:5868e8752d44 840 if (mp_obj_is_float(arg)) {
pythontech 0:5868e8752d44 841 return mp_obj_new_int_from_float(mp_obj_float_get(arg));
pythontech 0:5868e8752d44 842 }
pythontech 0:5868e8752d44 843 #endif
pythontech 0:5868e8752d44 844 return arg;
pythontech 0:5868e8752d44 845 }
pythontech 0:5868e8752d44 846
pythontech 0:5868e8752d44 847 STATIC NORETURN void terse_str_format_value_error(void) {
pythontech 0:5868e8752d44 848 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError, "bad format string"));
pythontech 0:5868e8752d44 849 }
pythontech 0:5868e8752d44 850
pythontech 0:5868e8752d44 851 STATIC vstr_t mp_obj_str_format_helper(const char *str, const char *top, int *arg_i, mp_uint_t n_args, const mp_obj_t *args, mp_map_t *kwargs) {
pythontech 0:5868e8752d44 852 vstr_t vstr;
pythontech 0:5868e8752d44 853 mp_print_t print;
pythontech 0:5868e8752d44 854 vstr_init_print(&vstr, 16, &print);
pythontech 0:5868e8752d44 855
pythontech 0:5868e8752d44 856 for (; str < top; str++) {
pythontech 0:5868e8752d44 857 if (*str == '}') {
pythontech 0:5868e8752d44 858 str++;
pythontech 0:5868e8752d44 859 if (str < top && *str == '}') {
pythontech 0:5868e8752d44 860 vstr_add_byte(&vstr, '}');
pythontech 0:5868e8752d44 861 continue;
pythontech 0:5868e8752d44 862 }
pythontech 0:5868e8752d44 863 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 864 terse_str_format_value_error();
pythontech 0:5868e8752d44 865 } else {
pythontech 0:5868e8752d44 866 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 867 "single '}' encountered in format string"));
pythontech 0:5868e8752d44 868 }
pythontech 0:5868e8752d44 869 }
pythontech 0:5868e8752d44 870 if (*str != '{') {
pythontech 0:5868e8752d44 871 vstr_add_byte(&vstr, *str);
pythontech 0:5868e8752d44 872 continue;
pythontech 0:5868e8752d44 873 }
pythontech 0:5868e8752d44 874
pythontech 0:5868e8752d44 875 str++;
pythontech 0:5868e8752d44 876 if (str < top && *str == '{') {
pythontech 0:5868e8752d44 877 vstr_add_byte(&vstr, '{');
pythontech 0:5868e8752d44 878 continue;
pythontech 0:5868e8752d44 879 }
pythontech 0:5868e8752d44 880
pythontech 0:5868e8752d44 881 // replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}"
pythontech 0:5868e8752d44 882
pythontech 0:5868e8752d44 883 const char *field_name = NULL;
pythontech 0:5868e8752d44 884 const char *field_name_top = NULL;
pythontech 0:5868e8752d44 885 char conversion = '\0';
pythontech 0:5868e8752d44 886 const char *format_spec = NULL;
pythontech 0:5868e8752d44 887
pythontech 0:5868e8752d44 888 if (str < top && *str != '}' && *str != '!' && *str != ':') {
pythontech 0:5868e8752d44 889 field_name = (const char *)str;
pythontech 0:5868e8752d44 890 while (str < top && *str != '}' && *str != '!' && *str != ':') {
pythontech 0:5868e8752d44 891 ++str;
pythontech 0:5868e8752d44 892 }
pythontech 0:5868e8752d44 893 field_name_top = (const char *)str;
pythontech 0:5868e8752d44 894 }
pythontech 0:5868e8752d44 895
pythontech 0:5868e8752d44 896 // conversion ::= "r" | "s"
pythontech 0:5868e8752d44 897
pythontech 0:5868e8752d44 898 if (str < top && *str == '!') {
pythontech 0:5868e8752d44 899 str++;
pythontech 0:5868e8752d44 900 if (str < top && (*str == 'r' || *str == 's')) {
pythontech 0:5868e8752d44 901 conversion = *str++;
pythontech 0:5868e8752d44 902 } else {
pythontech 0:5868e8752d44 903 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 904 terse_str_format_value_error();
pythontech 0:5868e8752d44 905 } else if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_NORMAL) {
pythontech 0:5868e8752d44 906 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 907 "bad conversion specifier"));
pythontech 0:5868e8752d44 908 } else {
pythontech 0:5868e8752d44 909 if (str >= top) {
pythontech 0:5868e8752d44 910 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 911 "end of format while looking for conversion specifier"));
pythontech 0:5868e8752d44 912 } else {
pythontech 0:5868e8752d44 913 nlr_raise(mp_obj_new_exception_msg_varg(&mp_type_ValueError,
pythontech 0:5868e8752d44 914 "unknown conversion specifier %c", *str));
pythontech 0:5868e8752d44 915 }
pythontech 0:5868e8752d44 916 }
pythontech 0:5868e8752d44 917 }
pythontech 0:5868e8752d44 918 }
pythontech 0:5868e8752d44 919
pythontech 0:5868e8752d44 920 if (str < top && *str == ':') {
pythontech 0:5868e8752d44 921 str++;
pythontech 0:5868e8752d44 922 // {:} is the same as {}, which is the same as {!s}
pythontech 0:5868e8752d44 923 // This makes a difference when passing in a True or False
pythontech 0:5868e8752d44 924 // '{}'.format(True) returns 'True'
pythontech 0:5868e8752d44 925 // '{:d}'.format(True) returns '1'
pythontech 0:5868e8752d44 926 // So we treat {:} as {} and this later gets treated to be {!s}
pythontech 0:5868e8752d44 927 if (*str != '}') {
pythontech 0:5868e8752d44 928 format_spec = str;
pythontech 0:5868e8752d44 929 for (int nest = 1; str < top;) {
pythontech 0:5868e8752d44 930 if (*str == '{') {
pythontech 0:5868e8752d44 931 ++nest;
pythontech 0:5868e8752d44 932 } else if (*str == '}') {
pythontech 0:5868e8752d44 933 if (--nest == 0) {
pythontech 0:5868e8752d44 934 break;
pythontech 0:5868e8752d44 935 }
pythontech 0:5868e8752d44 936 }
pythontech 0:5868e8752d44 937 ++str;
pythontech 0:5868e8752d44 938 }
pythontech 0:5868e8752d44 939 }
pythontech 0:5868e8752d44 940 }
pythontech 0:5868e8752d44 941 if (str >= top) {
pythontech 0:5868e8752d44 942 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 943 terse_str_format_value_error();
pythontech 0:5868e8752d44 944 } else {
pythontech 0:5868e8752d44 945 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 946 "unmatched '{' in format"));
pythontech 0:5868e8752d44 947 }
pythontech 0:5868e8752d44 948 }
pythontech 0:5868e8752d44 949 if (*str != '}') {
pythontech 0:5868e8752d44 950 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 951 terse_str_format_value_error();
pythontech 0:5868e8752d44 952 } else {
pythontech 0:5868e8752d44 953 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 954 "expected ':' after format specifier"));
pythontech 0:5868e8752d44 955 }
pythontech 0:5868e8752d44 956 }
pythontech 0:5868e8752d44 957
pythontech 0:5868e8752d44 958 mp_obj_t arg = mp_const_none;
pythontech 0:5868e8752d44 959
pythontech 0:5868e8752d44 960 if (field_name) {
pythontech 0:5868e8752d44 961 int index = 0;
pythontech 0:5868e8752d44 962 if (MP_LIKELY(unichar_isdigit(*field_name))) {
pythontech 0:5868e8752d44 963 if (*arg_i > 0) {
pythontech 0:5868e8752d44 964 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 965 terse_str_format_value_error();
pythontech 0:5868e8752d44 966 } else {
pythontech 0:5868e8752d44 967 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 968 "can't switch from automatic field numbering to manual field specification"));
pythontech 0:5868e8752d44 969 }
pythontech 0:5868e8752d44 970 }
pythontech 0:5868e8752d44 971 field_name = str_to_int(field_name, field_name_top, &index);
pythontech 0:5868e8752d44 972 if ((uint)index >= n_args - 1) {
pythontech 0:5868e8752d44 973 nlr_raise(mp_obj_new_exception_msg(&mp_type_IndexError, "tuple index out of range"));
pythontech 0:5868e8752d44 974 }
pythontech 0:5868e8752d44 975 arg = args[index + 1];
pythontech 0:5868e8752d44 976 *arg_i = -1;
pythontech 0:5868e8752d44 977 } else {
pythontech 0:5868e8752d44 978 const char *lookup;
pythontech 0:5868e8752d44 979 for (lookup = field_name; lookup < field_name_top && *lookup != '.' && *lookup != '['; lookup++);
pythontech 0:5868e8752d44 980 mp_obj_t field_q = mp_obj_new_str(field_name, lookup - field_name, true/*?*/);
pythontech 0:5868e8752d44 981 field_name = lookup;
pythontech 0:5868e8752d44 982 mp_map_elem_t *key_elem = mp_map_lookup(kwargs, field_q, MP_MAP_LOOKUP);
pythontech 0:5868e8752d44 983 if (key_elem == NULL) {
pythontech 0:5868e8752d44 984 nlr_raise(mp_obj_new_exception_arg1(&mp_type_KeyError, field_q));
pythontech 0:5868e8752d44 985 }
pythontech 0:5868e8752d44 986 arg = key_elem->value;
pythontech 0:5868e8752d44 987 }
pythontech 0:5868e8752d44 988 if (field_name < field_name_top) {
pythontech 0:5868e8752d44 989 mp_not_implemented("attributes not supported yet");
pythontech 0:5868e8752d44 990 }
pythontech 0:5868e8752d44 991 } else {
pythontech 0:5868e8752d44 992 if (*arg_i < 0) {
pythontech 0:5868e8752d44 993 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 994 terse_str_format_value_error();
pythontech 0:5868e8752d44 995 } else {
pythontech 0:5868e8752d44 996 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 997 "can't switch from manual field specification to automatic field numbering"));
pythontech 0:5868e8752d44 998 }
pythontech 0:5868e8752d44 999 }
pythontech 0:5868e8752d44 1000 if ((uint)*arg_i >= n_args - 1) {
pythontech 0:5868e8752d44 1001 nlr_raise(mp_obj_new_exception_msg(&mp_type_IndexError, "tuple index out of range"));
pythontech 0:5868e8752d44 1002 }
pythontech 0:5868e8752d44 1003 arg = args[(*arg_i) + 1];
pythontech 0:5868e8752d44 1004 (*arg_i)++;
pythontech 0:5868e8752d44 1005 }
pythontech 0:5868e8752d44 1006 if (!format_spec && !conversion) {
pythontech 0:5868e8752d44 1007 conversion = 's';
pythontech 0:5868e8752d44 1008 }
pythontech 0:5868e8752d44 1009 if (conversion) {
pythontech 0:5868e8752d44 1010 mp_print_kind_t print_kind;
pythontech 0:5868e8752d44 1011 if (conversion == 's') {
pythontech 0:5868e8752d44 1012 print_kind = PRINT_STR;
pythontech 0:5868e8752d44 1013 } else {
pythontech 0:5868e8752d44 1014 assert(conversion == 'r');
pythontech 0:5868e8752d44 1015 print_kind = PRINT_REPR;
pythontech 0:5868e8752d44 1016 }
pythontech 0:5868e8752d44 1017 vstr_t arg_vstr;
pythontech 0:5868e8752d44 1018 mp_print_t arg_print;
pythontech 0:5868e8752d44 1019 vstr_init_print(&arg_vstr, 16, &arg_print);
pythontech 0:5868e8752d44 1020 mp_obj_print_helper(&arg_print, arg, print_kind);
pythontech 0:5868e8752d44 1021 arg = mp_obj_new_str_from_vstr(&mp_type_str, &arg_vstr);
pythontech 0:5868e8752d44 1022 }
pythontech 0:5868e8752d44 1023
pythontech 0:5868e8752d44 1024 char sign = '\0';
pythontech 0:5868e8752d44 1025 char fill = '\0';
pythontech 0:5868e8752d44 1026 char align = '\0';
pythontech 0:5868e8752d44 1027 int width = -1;
pythontech 0:5868e8752d44 1028 int precision = -1;
pythontech 0:5868e8752d44 1029 char type = '\0';
pythontech 0:5868e8752d44 1030 int flags = 0;
pythontech 0:5868e8752d44 1031
pythontech 0:5868e8752d44 1032 if (format_spec) {
pythontech 0:5868e8752d44 1033 // The format specifier (from http://docs.python.org/2/library/string.html#formatspec)
pythontech 0:5868e8752d44 1034 //
pythontech 0:5868e8752d44 1035 // [[fill]align][sign][#][0][width][,][.precision][type]
pythontech 0:5868e8752d44 1036 // fill ::= <any character>
pythontech 0:5868e8752d44 1037 // align ::= "<" | ">" | "=" | "^"
pythontech 0:5868e8752d44 1038 // sign ::= "+" | "-" | " "
pythontech 0:5868e8752d44 1039 // width ::= integer
pythontech 0:5868e8752d44 1040 // precision ::= integer
pythontech 0:5868e8752d44 1041 // type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"
pythontech 0:5868e8752d44 1042
pythontech 0:5868e8752d44 1043 // recursively call the formatter to format any nested specifiers
pythontech 0:5868e8752d44 1044 MP_STACK_CHECK();
pythontech 0:5868e8752d44 1045 vstr_t format_spec_vstr = mp_obj_str_format_helper(format_spec, str, arg_i, n_args, args, kwargs);
pythontech 0:5868e8752d44 1046 const char *s = vstr_null_terminated_str(&format_spec_vstr);
pythontech 0:5868e8752d44 1047 const char *stop = s + format_spec_vstr.len;
pythontech 0:5868e8752d44 1048 if (isalignment(*s)) {
pythontech 0:5868e8752d44 1049 align = *s++;
pythontech 0:5868e8752d44 1050 } else if (*s && isalignment(s[1])) {
pythontech 0:5868e8752d44 1051 fill = *s++;
pythontech 0:5868e8752d44 1052 align = *s++;
pythontech 0:5868e8752d44 1053 }
pythontech 0:5868e8752d44 1054 if (*s == '+' || *s == '-' || *s == ' ') {
pythontech 0:5868e8752d44 1055 if (*s == '+') {
pythontech 0:5868e8752d44 1056 flags |= PF_FLAG_SHOW_SIGN;
pythontech 0:5868e8752d44 1057 } else if (*s == ' ') {
pythontech 0:5868e8752d44 1058 flags |= PF_FLAG_SPACE_SIGN;
pythontech 0:5868e8752d44 1059 }
pythontech 0:5868e8752d44 1060 sign = *s++;
pythontech 0:5868e8752d44 1061 }
pythontech 0:5868e8752d44 1062 if (*s == '#') {
pythontech 0:5868e8752d44 1063 flags |= PF_FLAG_SHOW_PREFIX;
pythontech 0:5868e8752d44 1064 s++;
pythontech 0:5868e8752d44 1065 }
pythontech 0:5868e8752d44 1066 if (*s == '0') {
pythontech 0:5868e8752d44 1067 if (!align) {
pythontech 0:5868e8752d44 1068 align = '=';
pythontech 0:5868e8752d44 1069 }
pythontech 0:5868e8752d44 1070 if (!fill) {
pythontech 0:5868e8752d44 1071 fill = '0';
pythontech 0:5868e8752d44 1072 }
pythontech 0:5868e8752d44 1073 }
pythontech 0:5868e8752d44 1074 s = str_to_int(s, stop, &width);
pythontech 0:5868e8752d44 1075 if (*s == ',') {
pythontech 0:5868e8752d44 1076 flags |= PF_FLAG_SHOW_COMMA;
pythontech 0:5868e8752d44 1077 s++;
pythontech 0:5868e8752d44 1078 }
pythontech 0:5868e8752d44 1079 if (*s == '.') {
pythontech 0:5868e8752d44 1080 s++;
pythontech 0:5868e8752d44 1081 s = str_to_int(s, stop, &precision);
pythontech 0:5868e8752d44 1082 }
pythontech 0:5868e8752d44 1083 if (istype(*s)) {
pythontech 0:5868e8752d44 1084 type = *s++;
pythontech 0:5868e8752d44 1085 }
pythontech 0:5868e8752d44 1086 if (*s) {
pythontech 0:5868e8752d44 1087 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1088 terse_str_format_value_error();
pythontech 0:5868e8752d44 1089 } else {
pythontech 0:5868e8752d44 1090 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1091 "invalid format specifier"));
pythontech 0:5868e8752d44 1092 }
pythontech 0:5868e8752d44 1093 }
pythontech 0:5868e8752d44 1094 vstr_clear(&format_spec_vstr);
pythontech 0:5868e8752d44 1095 }
pythontech 0:5868e8752d44 1096 if (!align) {
pythontech 0:5868e8752d44 1097 if (arg_looks_numeric(arg)) {
pythontech 0:5868e8752d44 1098 align = '>';
pythontech 0:5868e8752d44 1099 } else {
pythontech 0:5868e8752d44 1100 align = '<';
pythontech 0:5868e8752d44 1101 }
pythontech 0:5868e8752d44 1102 }
pythontech 0:5868e8752d44 1103 if (!fill) {
pythontech 0:5868e8752d44 1104 fill = ' ';
pythontech 0:5868e8752d44 1105 }
pythontech 0:5868e8752d44 1106
pythontech 0:5868e8752d44 1107 if (sign) {
pythontech 0:5868e8752d44 1108 if (type == 's') {
pythontech 0:5868e8752d44 1109 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1110 terse_str_format_value_error();
pythontech 0:5868e8752d44 1111 } else {
pythontech 0:5868e8752d44 1112 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1113 "sign not allowed in string format specifier"));
pythontech 0:5868e8752d44 1114 }
pythontech 0:5868e8752d44 1115 }
pythontech 0:5868e8752d44 1116 if (type == 'c') {
pythontech 0:5868e8752d44 1117 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1118 terse_str_format_value_error();
pythontech 0:5868e8752d44 1119 } else {
pythontech 0:5868e8752d44 1120 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1121 "sign not allowed with integer format specifier 'c'"));
pythontech 0:5868e8752d44 1122 }
pythontech 0:5868e8752d44 1123 }
pythontech 0:5868e8752d44 1124 } else {
pythontech 0:5868e8752d44 1125 sign = '-';
pythontech 0:5868e8752d44 1126 }
pythontech 0:5868e8752d44 1127
pythontech 0:5868e8752d44 1128 switch (align) {
pythontech 0:5868e8752d44 1129 case '<': flags |= PF_FLAG_LEFT_ADJUST; break;
pythontech 0:5868e8752d44 1130 case '=': flags |= PF_FLAG_PAD_AFTER_SIGN; break;
pythontech 0:5868e8752d44 1131 case '^': flags |= PF_FLAG_CENTER_ADJUST; break;
pythontech 0:5868e8752d44 1132 }
pythontech 0:5868e8752d44 1133
pythontech 0:5868e8752d44 1134 if (arg_looks_integer(arg)) {
pythontech 0:5868e8752d44 1135 switch (type) {
pythontech 0:5868e8752d44 1136 case 'b':
pythontech 0:5868e8752d44 1137 mp_print_mp_int(&print, arg, 2, 'a', flags, fill, width, 0);
pythontech 0:5868e8752d44 1138 continue;
pythontech 0:5868e8752d44 1139
pythontech 0:5868e8752d44 1140 case 'c':
pythontech 0:5868e8752d44 1141 {
pythontech 0:5868e8752d44 1142 char ch = mp_obj_get_int(arg);
pythontech 0:5868e8752d44 1143 mp_print_strn(&print, &ch, 1, flags, fill, width);
pythontech 0:5868e8752d44 1144 continue;
pythontech 0:5868e8752d44 1145 }
pythontech 0:5868e8752d44 1146
pythontech 0:5868e8752d44 1147 case '\0': // No explicit format type implies 'd'
pythontech 0:5868e8752d44 1148 case 'n': // I don't think we support locales in uPy so use 'd'
pythontech 0:5868e8752d44 1149 case 'd':
pythontech 0:5868e8752d44 1150 mp_print_mp_int(&print, arg, 10, 'a', flags, fill, width, 0);
pythontech 0:5868e8752d44 1151 continue;
pythontech 0:5868e8752d44 1152
pythontech 0:5868e8752d44 1153 case 'o':
pythontech 0:5868e8752d44 1154 if (flags & PF_FLAG_SHOW_PREFIX) {
pythontech 0:5868e8752d44 1155 flags |= PF_FLAG_SHOW_OCTAL_LETTER;
pythontech 0:5868e8752d44 1156 }
pythontech 0:5868e8752d44 1157
pythontech 0:5868e8752d44 1158 mp_print_mp_int(&print, arg, 8, 'a', flags, fill, width, 0);
pythontech 0:5868e8752d44 1159 continue;
pythontech 0:5868e8752d44 1160
pythontech 0:5868e8752d44 1161 case 'X':
pythontech 0:5868e8752d44 1162 case 'x':
pythontech 0:5868e8752d44 1163 mp_print_mp_int(&print, arg, 16, type - ('X' - 'A'), flags, fill, width, 0);
pythontech 0:5868e8752d44 1164 continue;
pythontech 0:5868e8752d44 1165
pythontech 0:5868e8752d44 1166 case 'e':
pythontech 0:5868e8752d44 1167 case 'E':
pythontech 0:5868e8752d44 1168 case 'f':
pythontech 0:5868e8752d44 1169 case 'F':
pythontech 0:5868e8752d44 1170 case 'g':
pythontech 0:5868e8752d44 1171 case 'G':
pythontech 0:5868e8752d44 1172 case '%':
pythontech 0:5868e8752d44 1173 // The floating point formatters all work with anything that
pythontech 0:5868e8752d44 1174 // looks like an integer
pythontech 0:5868e8752d44 1175 break;
pythontech 0:5868e8752d44 1176
pythontech 0:5868e8752d44 1177 default:
pythontech 0:5868e8752d44 1178 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1179 terse_str_format_value_error();
pythontech 0:5868e8752d44 1180 } else {
pythontech 0:5868e8752d44 1181 nlr_raise(mp_obj_new_exception_msg_varg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1182 "unknown format code '%c' for object of type '%s'",
pythontech 0:5868e8752d44 1183 type, mp_obj_get_type_str(arg)));
pythontech 0:5868e8752d44 1184 }
pythontech 0:5868e8752d44 1185 }
pythontech 0:5868e8752d44 1186 }
pythontech 0:5868e8752d44 1187
pythontech 0:5868e8752d44 1188 // NOTE: no else here. We need the e, f, g etc formats for integer
pythontech 0:5868e8752d44 1189 // arguments (from above if) to take this if.
pythontech 0:5868e8752d44 1190 if (arg_looks_numeric(arg)) {
pythontech 0:5868e8752d44 1191 if (!type) {
pythontech 0:5868e8752d44 1192
pythontech 0:5868e8752d44 1193 // Even though the docs say that an unspecified type is the same
pythontech 0:5868e8752d44 1194 // as 'g', there is one subtle difference, when the exponent
pythontech 0:5868e8752d44 1195 // is one less than the precision.
pythontech 0:5868e8752d44 1196 //
pythontech 0:5868e8752d44 1197 // '{:10.1}'.format(0.0) ==> '0e+00'
pythontech 0:5868e8752d44 1198 // '{:10.1g}'.format(0.0) ==> '0'
pythontech 0:5868e8752d44 1199 //
pythontech 0:5868e8752d44 1200 // TODO: Figure out how to deal with this.
pythontech 0:5868e8752d44 1201 //
pythontech 0:5868e8752d44 1202 // A proper solution would involve adding a special flag
pythontech 0:5868e8752d44 1203 // or something to format_float, and create a format_double
pythontech 0:5868e8752d44 1204 // to deal with doubles. In order to fix this when using
pythontech 0:5868e8752d44 1205 // sprintf, we'd need to use the e format and tweak the
pythontech 0:5868e8752d44 1206 // returned result to strip trailing zeros like the g format
pythontech 0:5868e8752d44 1207 // does.
pythontech 0:5868e8752d44 1208 //
pythontech 0:5868e8752d44 1209 // {:10.3} and {:10.2e} with 1.23e2 both produce 1.23e+02
pythontech 0:5868e8752d44 1210 // but with 1.e2 you get 1e+02 and 1.00e+02
pythontech 0:5868e8752d44 1211 //
pythontech 0:5868e8752d44 1212 // Stripping the trailing 0's (like g) does would make the
pythontech 0:5868e8752d44 1213 // e format give us the right format.
pythontech 0:5868e8752d44 1214 //
pythontech 0:5868e8752d44 1215 // CPython sources say:
pythontech 0:5868e8752d44 1216 // Omitted type specifier. Behaves in the same way as repr(x)
pythontech 0:5868e8752d44 1217 // and str(x) if no precision is given, else like 'g', but with
pythontech 0:5868e8752d44 1218 // at least one digit after the decimal point. */
pythontech 0:5868e8752d44 1219
pythontech 0:5868e8752d44 1220 type = 'g';
pythontech 0:5868e8752d44 1221 }
pythontech 0:5868e8752d44 1222 if (type == 'n') {
pythontech 0:5868e8752d44 1223 type = 'g';
pythontech 0:5868e8752d44 1224 }
pythontech 0:5868e8752d44 1225
pythontech 0:5868e8752d44 1226 switch (type) {
pythontech 0:5868e8752d44 1227 #if MICROPY_PY_BUILTINS_FLOAT
pythontech 0:5868e8752d44 1228 case 'e':
pythontech 0:5868e8752d44 1229 case 'E':
pythontech 0:5868e8752d44 1230 case 'f':
pythontech 0:5868e8752d44 1231 case 'F':
pythontech 0:5868e8752d44 1232 case 'g':
pythontech 0:5868e8752d44 1233 case 'G':
pythontech 0:5868e8752d44 1234 mp_print_float(&print, mp_obj_get_float(arg), type, flags, fill, width, precision);
pythontech 0:5868e8752d44 1235 break;
pythontech 0:5868e8752d44 1236
pythontech 0:5868e8752d44 1237 case '%':
pythontech 0:5868e8752d44 1238 flags |= PF_FLAG_ADD_PERCENT;
pythontech 0:5868e8752d44 1239 #if MICROPY_FLOAT_IMPL == MICROPY_FLOAT_IMPL_FLOAT
pythontech 0:5868e8752d44 1240 #define F100 100.0F
pythontech 0:5868e8752d44 1241 #else
pythontech 0:5868e8752d44 1242 #define F100 100.0
pythontech 0:5868e8752d44 1243 #endif
pythontech 0:5868e8752d44 1244 mp_print_float(&print, mp_obj_get_float(arg) * F100, 'f', flags, fill, width, precision);
pythontech 0:5868e8752d44 1245 #undef F100
pythontech 0:5868e8752d44 1246 break;
pythontech 0:5868e8752d44 1247 #endif
pythontech 0:5868e8752d44 1248
pythontech 0:5868e8752d44 1249 default:
pythontech 0:5868e8752d44 1250 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1251 terse_str_format_value_error();
pythontech 0:5868e8752d44 1252 } else {
pythontech 0:5868e8752d44 1253 nlr_raise(mp_obj_new_exception_msg_varg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1254 "unknown format code '%c' for object of type 'float'",
pythontech 0:5868e8752d44 1255 type, mp_obj_get_type_str(arg)));
pythontech 0:5868e8752d44 1256 }
pythontech 0:5868e8752d44 1257 }
pythontech 0:5868e8752d44 1258 } else {
pythontech 0:5868e8752d44 1259 // arg doesn't look like a number
pythontech 0:5868e8752d44 1260
pythontech 0:5868e8752d44 1261 if (align == '=') {
pythontech 0:5868e8752d44 1262 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1263 terse_str_format_value_error();
pythontech 0:5868e8752d44 1264 } else {
pythontech 0:5868e8752d44 1265 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1266 "'=' alignment not allowed in string format specifier"));
pythontech 0:5868e8752d44 1267 }
pythontech 0:5868e8752d44 1268 }
pythontech 0:5868e8752d44 1269
pythontech 0:5868e8752d44 1270 switch (type) {
pythontech 0:5868e8752d44 1271 case '\0': // no explicit format type implies 's'
pythontech 0:5868e8752d44 1272 case 's': {
pythontech 0:5868e8752d44 1273 mp_uint_t slen;
pythontech 0:5868e8752d44 1274 const char *s = mp_obj_str_get_data(arg, &slen);
pythontech 0:5868e8752d44 1275 if (precision < 0) {
pythontech 0:5868e8752d44 1276 precision = slen;
pythontech 0:5868e8752d44 1277 }
pythontech 0:5868e8752d44 1278 if (slen > (mp_uint_t)precision) {
pythontech 0:5868e8752d44 1279 slen = precision;
pythontech 0:5868e8752d44 1280 }
pythontech 0:5868e8752d44 1281 mp_print_strn(&print, s, slen, flags, fill, width);
pythontech 0:5868e8752d44 1282 break;
pythontech 0:5868e8752d44 1283 }
pythontech 0:5868e8752d44 1284
pythontech 0:5868e8752d44 1285 default:
pythontech 0:5868e8752d44 1286 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1287 terse_str_format_value_error();
pythontech 0:5868e8752d44 1288 } else {
pythontech 0:5868e8752d44 1289 nlr_raise(mp_obj_new_exception_msg_varg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1290 "unknown format code '%c' for object of type 'str'",
pythontech 0:5868e8752d44 1291 type, mp_obj_get_type_str(arg)));
pythontech 0:5868e8752d44 1292 }
pythontech 0:5868e8752d44 1293 }
pythontech 0:5868e8752d44 1294 }
pythontech 0:5868e8752d44 1295 }
pythontech 0:5868e8752d44 1296
pythontech 0:5868e8752d44 1297 return vstr;
pythontech 0:5868e8752d44 1298 }
pythontech 0:5868e8752d44 1299
pythontech 0:5868e8752d44 1300 mp_obj_t mp_obj_str_format(size_t n_args, const mp_obj_t *args, mp_map_t *kwargs) {
pythontech 0:5868e8752d44 1301 assert(MP_OBJ_IS_STR_OR_BYTES(args[0]));
pythontech 0:5868e8752d44 1302
pythontech 0:5868e8752d44 1303 GET_STR_DATA_LEN(args[0], str, len);
pythontech 0:5868e8752d44 1304 int arg_i = 0;
pythontech 0:5868e8752d44 1305 vstr_t vstr = mp_obj_str_format_helper((const char*)str, (const char*)str + len, &arg_i, n_args, args, kwargs);
pythontech 0:5868e8752d44 1306 return mp_obj_new_str_from_vstr(&mp_type_str, &vstr);
pythontech 0:5868e8752d44 1307 }
pythontech 0:5868e8752d44 1308
pythontech 0:5868e8752d44 1309 STATIC mp_obj_t str_modulo_format(mp_obj_t pattern, mp_uint_t n_args, const mp_obj_t *args, mp_obj_t dict) {
pythontech 0:5868e8752d44 1310 assert(MP_OBJ_IS_STR_OR_BYTES(pattern));
pythontech 0:5868e8752d44 1311
pythontech 0:5868e8752d44 1312 GET_STR_DATA_LEN(pattern, str, len);
pythontech 0:5868e8752d44 1313 const byte *start_str = str;
pythontech 0:5868e8752d44 1314 bool is_bytes = MP_OBJ_IS_TYPE(pattern, &mp_type_bytes);
pythontech 0:5868e8752d44 1315 int arg_i = 0;
pythontech 0:5868e8752d44 1316 vstr_t vstr;
pythontech 0:5868e8752d44 1317 mp_print_t print;
pythontech 0:5868e8752d44 1318 vstr_init_print(&vstr, 16, &print);
pythontech 0:5868e8752d44 1319
pythontech 0:5868e8752d44 1320 for (const byte *top = str + len; str < top; str++) {
pythontech 0:5868e8752d44 1321 mp_obj_t arg = MP_OBJ_NULL;
pythontech 0:5868e8752d44 1322 if (*str != '%') {
pythontech 0:5868e8752d44 1323 vstr_add_byte(&vstr, *str);
pythontech 0:5868e8752d44 1324 continue;
pythontech 0:5868e8752d44 1325 }
pythontech 0:5868e8752d44 1326 if (++str >= top) {
pythontech 0:5868e8752d44 1327 goto incomplete_format;
pythontech 0:5868e8752d44 1328 }
pythontech 0:5868e8752d44 1329 if (*str == '%') {
pythontech 0:5868e8752d44 1330 vstr_add_byte(&vstr, '%');
pythontech 0:5868e8752d44 1331 continue;
pythontech 0:5868e8752d44 1332 }
pythontech 0:5868e8752d44 1333
pythontech 0:5868e8752d44 1334 // Dictionary value lookup
pythontech 0:5868e8752d44 1335 if (*str == '(') {
pythontech 0:5868e8752d44 1336 const byte *key = ++str;
pythontech 0:5868e8752d44 1337 while (*str != ')') {
pythontech 0:5868e8752d44 1338 if (str >= top) {
pythontech 0:5868e8752d44 1339 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1340 terse_str_format_value_error();
pythontech 0:5868e8752d44 1341 } else {
pythontech 0:5868e8752d44 1342 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1343 "incomplete format key"));
pythontech 0:5868e8752d44 1344 }
pythontech 0:5868e8752d44 1345 }
pythontech 0:5868e8752d44 1346 ++str;
pythontech 0:5868e8752d44 1347 }
pythontech 0:5868e8752d44 1348 mp_obj_t k_obj = mp_obj_new_str((const char*)key, str - key, true);
pythontech 0:5868e8752d44 1349 arg = mp_obj_dict_get(dict, k_obj);
pythontech 0:5868e8752d44 1350 str++;
pythontech 0:5868e8752d44 1351 }
pythontech 0:5868e8752d44 1352
pythontech 0:5868e8752d44 1353 int flags = 0;
pythontech 0:5868e8752d44 1354 char fill = ' ';
pythontech 0:5868e8752d44 1355 int alt = 0;
pythontech 0:5868e8752d44 1356 while (str < top) {
pythontech 0:5868e8752d44 1357 if (*str == '-') flags |= PF_FLAG_LEFT_ADJUST;
pythontech 0:5868e8752d44 1358 else if (*str == '+') flags |= PF_FLAG_SHOW_SIGN;
pythontech 0:5868e8752d44 1359 else if (*str == ' ') flags |= PF_FLAG_SPACE_SIGN;
pythontech 0:5868e8752d44 1360 else if (*str == '#') alt = PF_FLAG_SHOW_PREFIX;
pythontech 0:5868e8752d44 1361 else if (*str == '0') {
pythontech 0:5868e8752d44 1362 flags |= PF_FLAG_PAD_AFTER_SIGN;
pythontech 0:5868e8752d44 1363 fill = '0';
pythontech 0:5868e8752d44 1364 } else break;
pythontech 0:5868e8752d44 1365 str++;
pythontech 0:5868e8752d44 1366 }
pythontech 0:5868e8752d44 1367 // parse width, if it exists
pythontech 0:5868e8752d44 1368 int width = 0;
pythontech 0:5868e8752d44 1369 if (str < top) {
pythontech 0:5868e8752d44 1370 if (*str == '*') {
pythontech 0:5868e8752d44 1371 if ((uint)arg_i >= n_args) {
pythontech 0:5868e8752d44 1372 goto not_enough_args;
pythontech 0:5868e8752d44 1373 }
pythontech 0:5868e8752d44 1374 width = mp_obj_get_int(args[arg_i++]);
pythontech 0:5868e8752d44 1375 str++;
pythontech 0:5868e8752d44 1376 } else {
pythontech 0:5868e8752d44 1377 str = (const byte*)str_to_int((const char*)str, (const char*)top, &width);
pythontech 0:5868e8752d44 1378 }
pythontech 0:5868e8752d44 1379 }
pythontech 0:5868e8752d44 1380 int prec = -1;
pythontech 0:5868e8752d44 1381 if (str < top && *str == '.') {
pythontech 0:5868e8752d44 1382 if (++str < top) {
pythontech 0:5868e8752d44 1383 if (*str == '*') {
pythontech 0:5868e8752d44 1384 if ((uint)arg_i >= n_args) {
pythontech 0:5868e8752d44 1385 goto not_enough_args;
pythontech 0:5868e8752d44 1386 }
pythontech 0:5868e8752d44 1387 prec = mp_obj_get_int(args[arg_i++]);
pythontech 0:5868e8752d44 1388 str++;
pythontech 0:5868e8752d44 1389 } else {
pythontech 0:5868e8752d44 1390 prec = 0;
pythontech 0:5868e8752d44 1391 str = (const byte*)str_to_int((const char*)str, (const char*)top, &prec);
pythontech 0:5868e8752d44 1392 }
pythontech 0:5868e8752d44 1393 }
pythontech 0:5868e8752d44 1394 }
pythontech 0:5868e8752d44 1395
pythontech 0:5868e8752d44 1396 if (str >= top) {
pythontech 0:5868e8752d44 1397 incomplete_format:
pythontech 0:5868e8752d44 1398 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1399 terse_str_format_value_error();
pythontech 0:5868e8752d44 1400 } else {
pythontech 0:5868e8752d44 1401 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1402 "incomplete format"));
pythontech 0:5868e8752d44 1403 }
pythontech 0:5868e8752d44 1404 }
pythontech 0:5868e8752d44 1405
pythontech 0:5868e8752d44 1406 // Tuple value lookup
pythontech 0:5868e8752d44 1407 if (arg == MP_OBJ_NULL) {
pythontech 0:5868e8752d44 1408 if ((uint)arg_i >= n_args) {
pythontech 0:5868e8752d44 1409 not_enough_args:
pythontech 0:5868e8752d44 1410 nlr_raise(mp_obj_new_exception_msg(&mp_type_TypeError, "not enough arguments for format string"));
pythontech 0:5868e8752d44 1411 }
pythontech 0:5868e8752d44 1412 arg = args[arg_i++];
pythontech 0:5868e8752d44 1413 }
pythontech 0:5868e8752d44 1414 switch (*str) {
pythontech 0:5868e8752d44 1415 case 'c':
pythontech 0:5868e8752d44 1416 if (MP_OBJ_IS_STR(arg)) {
pythontech 0:5868e8752d44 1417 mp_uint_t slen;
pythontech 0:5868e8752d44 1418 const char *s = mp_obj_str_get_data(arg, &slen);
pythontech 0:5868e8752d44 1419 if (slen != 1) {
pythontech 0:5868e8752d44 1420 nlr_raise(mp_obj_new_exception_msg(&mp_type_TypeError,
pythontech 0:5868e8752d44 1421 "%%c requires int or char"));
pythontech 0:5868e8752d44 1422 }
pythontech 0:5868e8752d44 1423 mp_print_strn(&print, s, 1, flags, ' ', width);
pythontech 0:5868e8752d44 1424 } else if (arg_looks_integer(arg)) {
pythontech 0:5868e8752d44 1425 char ch = mp_obj_get_int(arg);
pythontech 0:5868e8752d44 1426 mp_print_strn(&print, &ch, 1, flags, ' ', width);
pythontech 0:5868e8752d44 1427 } else {
pythontech 0:5868e8752d44 1428 nlr_raise(mp_obj_new_exception_msg(&mp_type_TypeError,
pythontech 0:5868e8752d44 1429 "integer required"));
pythontech 0:5868e8752d44 1430 }
pythontech 0:5868e8752d44 1431 break;
pythontech 0:5868e8752d44 1432
pythontech 0:5868e8752d44 1433 case 'd':
pythontech 0:5868e8752d44 1434 case 'i':
pythontech 0:5868e8752d44 1435 case 'u':
pythontech 0:5868e8752d44 1436 mp_print_mp_int(&print, arg_as_int(arg), 10, 'a', flags, fill, width, prec);
pythontech 0:5868e8752d44 1437 break;
pythontech 0:5868e8752d44 1438
pythontech 0:5868e8752d44 1439 #if MICROPY_PY_BUILTINS_FLOAT
pythontech 0:5868e8752d44 1440 case 'e':
pythontech 0:5868e8752d44 1441 case 'E':
pythontech 0:5868e8752d44 1442 case 'f':
pythontech 0:5868e8752d44 1443 case 'F':
pythontech 0:5868e8752d44 1444 case 'g':
pythontech 0:5868e8752d44 1445 case 'G':
pythontech 0:5868e8752d44 1446 mp_print_float(&print, mp_obj_get_float(arg), *str, flags, fill, width, prec);
pythontech 0:5868e8752d44 1447 break;
pythontech 0:5868e8752d44 1448 #endif
pythontech 0:5868e8752d44 1449
pythontech 0:5868e8752d44 1450 case 'o':
pythontech 0:5868e8752d44 1451 if (alt) {
pythontech 0:5868e8752d44 1452 flags |= (PF_FLAG_SHOW_PREFIX | PF_FLAG_SHOW_OCTAL_LETTER);
pythontech 0:5868e8752d44 1453 }
pythontech 0:5868e8752d44 1454 mp_print_mp_int(&print, arg, 8, 'a', flags, fill, width, prec);
pythontech 0:5868e8752d44 1455 break;
pythontech 0:5868e8752d44 1456
pythontech 0:5868e8752d44 1457 case 'r':
pythontech 0:5868e8752d44 1458 case 's':
pythontech 0:5868e8752d44 1459 {
pythontech 0:5868e8752d44 1460 vstr_t arg_vstr;
pythontech 0:5868e8752d44 1461 mp_print_t arg_print;
pythontech 0:5868e8752d44 1462 vstr_init_print(&arg_vstr, 16, &arg_print);
pythontech 0:5868e8752d44 1463 mp_print_kind_t print_kind = (*str == 'r' ? PRINT_REPR : PRINT_STR);
pythontech 0:5868e8752d44 1464 if (print_kind == PRINT_STR && is_bytes && MP_OBJ_IS_TYPE(arg, &mp_type_bytes)) {
pythontech 0:5868e8752d44 1465 // If we have something like b"%s" % b"1", bytes arg should be
pythontech 0:5868e8752d44 1466 // printed undecorated.
pythontech 0:5868e8752d44 1467 print_kind = PRINT_RAW;
pythontech 0:5868e8752d44 1468 }
pythontech 0:5868e8752d44 1469 mp_obj_print_helper(&arg_print, arg, print_kind);
pythontech 0:5868e8752d44 1470 uint vlen = arg_vstr.len;
pythontech 0:5868e8752d44 1471 if (prec < 0) {
pythontech 0:5868e8752d44 1472 prec = vlen;
pythontech 0:5868e8752d44 1473 }
pythontech 0:5868e8752d44 1474 if (vlen > (uint)prec) {
pythontech 0:5868e8752d44 1475 vlen = prec;
pythontech 0:5868e8752d44 1476 }
pythontech 0:5868e8752d44 1477 mp_print_strn(&print, arg_vstr.buf, vlen, flags, ' ', width);
pythontech 0:5868e8752d44 1478 vstr_clear(&arg_vstr);
pythontech 0:5868e8752d44 1479 break;
pythontech 0:5868e8752d44 1480 }
pythontech 0:5868e8752d44 1481
pythontech 0:5868e8752d44 1482 case 'X':
pythontech 0:5868e8752d44 1483 case 'x':
pythontech 0:5868e8752d44 1484 mp_print_mp_int(&print, arg, 16, *str - ('X' - 'A'), flags | alt, fill, width, prec);
pythontech 0:5868e8752d44 1485 break;
pythontech 0:5868e8752d44 1486
pythontech 0:5868e8752d44 1487 default:
pythontech 0:5868e8752d44 1488 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 1489 terse_str_format_value_error();
pythontech 0:5868e8752d44 1490 } else {
pythontech 0:5868e8752d44 1491 nlr_raise(mp_obj_new_exception_msg_varg(&mp_type_ValueError,
pythontech 0:5868e8752d44 1492 "unsupported format character '%c' (0x%x) at index %d",
pythontech 0:5868e8752d44 1493 *str, *str, str - start_str));
pythontech 0:5868e8752d44 1494 }
pythontech 0:5868e8752d44 1495 }
pythontech 0:5868e8752d44 1496 }
pythontech 0:5868e8752d44 1497
pythontech 0:5868e8752d44 1498 if ((uint)arg_i != n_args) {
pythontech 0:5868e8752d44 1499 nlr_raise(mp_obj_new_exception_msg(&mp_type_TypeError, "not all arguments converted during string formatting"));
pythontech 0:5868e8752d44 1500 }
pythontech 0:5868e8752d44 1501
pythontech 0:5868e8752d44 1502 return mp_obj_new_str_from_vstr(is_bytes ? &mp_type_bytes : &mp_type_str, &vstr);
pythontech 0:5868e8752d44 1503 }
pythontech 0:5868e8752d44 1504
pythontech 0:5868e8752d44 1505 // The implementation is optimized, returning the original string if there's
pythontech 0:5868e8752d44 1506 // nothing to replace.
pythontech 0:5868e8752d44 1507 STATIC mp_obj_t str_replace(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 1508 assert(MP_OBJ_IS_STR_OR_BYTES(args[0]));
pythontech 0:5868e8752d44 1509
pythontech 0:5868e8752d44 1510 mp_int_t max_rep = -1;
pythontech 0:5868e8752d44 1511 if (n_args == 4) {
pythontech 0:5868e8752d44 1512 max_rep = mp_obj_get_int(args[3]);
pythontech 0:5868e8752d44 1513 if (max_rep == 0) {
pythontech 0:5868e8752d44 1514 return args[0];
pythontech 0:5868e8752d44 1515 } else if (max_rep < 0) {
pythontech 0:5868e8752d44 1516 max_rep = -1;
pythontech 0:5868e8752d44 1517 }
pythontech 0:5868e8752d44 1518 }
pythontech 0:5868e8752d44 1519
pythontech 0:5868e8752d44 1520 // if max_rep is still -1 by this point we will need to do all possible replacements
pythontech 0:5868e8752d44 1521
pythontech 0:5868e8752d44 1522 // check argument types
pythontech 0:5868e8752d44 1523
pythontech 0:5868e8752d44 1524 const mp_obj_type_t *self_type = mp_obj_get_type(args[0]);
pythontech 0:5868e8752d44 1525
pythontech 0:5868e8752d44 1526 if (mp_obj_get_type(args[1]) != self_type) {
pythontech 0:5868e8752d44 1527 bad_implicit_conversion(args[1]);
pythontech 0:5868e8752d44 1528 }
pythontech 0:5868e8752d44 1529
pythontech 0:5868e8752d44 1530 if (mp_obj_get_type(args[2]) != self_type) {
pythontech 0:5868e8752d44 1531 bad_implicit_conversion(args[2]);
pythontech 0:5868e8752d44 1532 }
pythontech 0:5868e8752d44 1533
pythontech 0:5868e8752d44 1534 // extract string data
pythontech 0:5868e8752d44 1535
pythontech 0:5868e8752d44 1536 GET_STR_DATA_LEN(args[0], str, str_len);
pythontech 0:5868e8752d44 1537 GET_STR_DATA_LEN(args[1], old, old_len);
pythontech 0:5868e8752d44 1538 GET_STR_DATA_LEN(args[2], new, new_len);
pythontech 0:5868e8752d44 1539
pythontech 0:5868e8752d44 1540 // old won't exist in str if it's longer, so nothing to replace
pythontech 0:5868e8752d44 1541 if (old_len > str_len) {
pythontech 0:5868e8752d44 1542 return args[0];
pythontech 0:5868e8752d44 1543 }
pythontech 0:5868e8752d44 1544
pythontech 0:5868e8752d44 1545 // data for the replaced string
pythontech 0:5868e8752d44 1546 byte *data = NULL;
pythontech 0:5868e8752d44 1547 vstr_t vstr;
pythontech 0:5868e8752d44 1548
pythontech 0:5868e8752d44 1549 // do 2 passes over the string:
pythontech 0:5868e8752d44 1550 // first pass computes the required length of the replaced string
pythontech 0:5868e8752d44 1551 // second pass does the replacements
pythontech 0:5868e8752d44 1552 for (;;) {
pythontech 0:5868e8752d44 1553 mp_uint_t replaced_str_index = 0;
pythontech 0:5868e8752d44 1554 mp_uint_t num_replacements_done = 0;
pythontech 0:5868e8752d44 1555 const byte *old_occurrence;
pythontech 0:5868e8752d44 1556 const byte *offset_ptr = str;
pythontech 0:5868e8752d44 1557 mp_uint_t str_len_remain = str_len;
pythontech 0:5868e8752d44 1558 if (old_len == 0) {
pythontech 0:5868e8752d44 1559 // if old_str is empty, copy new_str to start of replaced string
pythontech 0:5868e8752d44 1560 // copy the replacement string
pythontech 0:5868e8752d44 1561 if (data != NULL) {
pythontech 0:5868e8752d44 1562 memcpy(data, new, new_len);
pythontech 0:5868e8752d44 1563 }
pythontech 0:5868e8752d44 1564 replaced_str_index += new_len;
pythontech 0:5868e8752d44 1565 num_replacements_done++;
pythontech 0:5868e8752d44 1566 }
pythontech 0:5868e8752d44 1567 while (num_replacements_done != (mp_uint_t)max_rep && str_len_remain > 0 && (old_occurrence = find_subbytes(offset_ptr, str_len_remain, old, old_len, 1)) != NULL) {
pythontech 0:5868e8752d44 1568 if (old_len == 0) {
pythontech 0:5868e8752d44 1569 old_occurrence += 1;
pythontech 0:5868e8752d44 1570 }
pythontech 0:5868e8752d44 1571 // copy from just after end of last occurrence of to-be-replaced string to right before start of next occurrence
pythontech 0:5868e8752d44 1572 if (data != NULL) {
pythontech 0:5868e8752d44 1573 memcpy(data + replaced_str_index, offset_ptr, old_occurrence - offset_ptr);
pythontech 0:5868e8752d44 1574 }
pythontech 0:5868e8752d44 1575 replaced_str_index += old_occurrence - offset_ptr;
pythontech 0:5868e8752d44 1576 // copy the replacement string
pythontech 0:5868e8752d44 1577 if (data != NULL) {
pythontech 0:5868e8752d44 1578 memcpy(data + replaced_str_index, new, new_len);
pythontech 0:5868e8752d44 1579 }
pythontech 0:5868e8752d44 1580 replaced_str_index += new_len;
pythontech 0:5868e8752d44 1581 offset_ptr = old_occurrence + old_len;
pythontech 0:5868e8752d44 1582 str_len_remain = str + str_len - offset_ptr;
pythontech 0:5868e8752d44 1583 num_replacements_done++;
pythontech 0:5868e8752d44 1584 }
pythontech 0:5868e8752d44 1585
pythontech 0:5868e8752d44 1586 // copy from just after end of last occurrence of to-be-replaced string to end of old string
pythontech 0:5868e8752d44 1587 if (data != NULL) {
pythontech 0:5868e8752d44 1588 memcpy(data + replaced_str_index, offset_ptr, str_len_remain);
pythontech 0:5868e8752d44 1589 }
pythontech 0:5868e8752d44 1590 replaced_str_index += str_len_remain;
pythontech 0:5868e8752d44 1591
pythontech 0:5868e8752d44 1592 if (data == NULL) {
pythontech 0:5868e8752d44 1593 // first pass
pythontech 0:5868e8752d44 1594 if (num_replacements_done == 0) {
pythontech 0:5868e8752d44 1595 // no substr found, return original string
pythontech 0:5868e8752d44 1596 return args[0];
pythontech 0:5868e8752d44 1597 } else {
pythontech 0:5868e8752d44 1598 // substr found, allocate new string
pythontech 0:5868e8752d44 1599 vstr_init_len(&vstr, replaced_str_index);
pythontech 0:5868e8752d44 1600 data = (byte*)vstr.buf;
pythontech 0:5868e8752d44 1601 assert(data != NULL);
pythontech 0:5868e8752d44 1602 }
pythontech 0:5868e8752d44 1603 } else {
pythontech 0:5868e8752d44 1604 // second pass, we are done
pythontech 0:5868e8752d44 1605 break;
pythontech 0:5868e8752d44 1606 }
pythontech 0:5868e8752d44 1607 }
pythontech 0:5868e8752d44 1608
pythontech 0:5868e8752d44 1609 return mp_obj_new_str_from_vstr(self_type, &vstr);
pythontech 0:5868e8752d44 1610 }
pythontech 0:5868e8752d44 1611
pythontech 0:5868e8752d44 1612 STATIC mp_obj_t str_count(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 1613 const mp_obj_type_t *self_type = mp_obj_get_type(args[0]);
pythontech 0:5868e8752d44 1614 assert(2 <= n_args && n_args <= 4);
pythontech 0:5868e8752d44 1615 assert(MP_OBJ_IS_STR_OR_BYTES(args[0]));
pythontech 0:5868e8752d44 1616
pythontech 0:5868e8752d44 1617 // check argument type
pythontech 0:5868e8752d44 1618 if (mp_obj_get_type(args[1]) != self_type) {
pythontech 0:5868e8752d44 1619 bad_implicit_conversion(args[1]);
pythontech 0:5868e8752d44 1620 }
pythontech 0:5868e8752d44 1621
pythontech 0:5868e8752d44 1622 GET_STR_DATA_LEN(args[0], haystack, haystack_len);
pythontech 0:5868e8752d44 1623 GET_STR_DATA_LEN(args[1], needle, needle_len);
pythontech 0:5868e8752d44 1624
pythontech 0:5868e8752d44 1625 const byte *start = haystack;
pythontech 0:5868e8752d44 1626 const byte *end = haystack + haystack_len;
pythontech 0:5868e8752d44 1627 if (n_args >= 3 && args[2] != mp_const_none) {
pythontech 0:5868e8752d44 1628 start = str_index_to_ptr(self_type, haystack, haystack_len, args[2], true);
pythontech 0:5868e8752d44 1629 }
pythontech 0:5868e8752d44 1630 if (n_args >= 4 && args[3] != mp_const_none) {
pythontech 0:5868e8752d44 1631 end = str_index_to_ptr(self_type, haystack, haystack_len, args[3], true);
pythontech 0:5868e8752d44 1632 }
pythontech 0:5868e8752d44 1633
pythontech 0:5868e8752d44 1634 // if needle_len is zero then we count each gap between characters as an occurrence
pythontech 0:5868e8752d44 1635 if (needle_len == 0) {
pythontech 0:5868e8752d44 1636 return MP_OBJ_NEW_SMALL_INT(unichar_charlen((const char*)start, end - start) + 1);
pythontech 0:5868e8752d44 1637 }
pythontech 0:5868e8752d44 1638
pythontech 0:5868e8752d44 1639 // count the occurrences
pythontech 0:5868e8752d44 1640 mp_int_t num_occurrences = 0;
pythontech 0:5868e8752d44 1641 for (const byte *haystack_ptr = start; haystack_ptr + needle_len <= end;) {
pythontech 0:5868e8752d44 1642 if (memcmp(haystack_ptr, needle, needle_len) == 0) {
pythontech 0:5868e8752d44 1643 num_occurrences++;
pythontech 0:5868e8752d44 1644 haystack_ptr += needle_len;
pythontech 0:5868e8752d44 1645 } else {
pythontech 0:5868e8752d44 1646 haystack_ptr = utf8_next_char(haystack_ptr);
pythontech 0:5868e8752d44 1647 }
pythontech 0:5868e8752d44 1648 }
pythontech 0:5868e8752d44 1649
pythontech 0:5868e8752d44 1650 return MP_OBJ_NEW_SMALL_INT(num_occurrences);
pythontech 0:5868e8752d44 1651 }
pythontech 0:5868e8752d44 1652
pythontech 0:5868e8752d44 1653 STATIC mp_obj_t str_partitioner(mp_obj_t self_in, mp_obj_t arg, mp_int_t direction) {
pythontech 0:5868e8752d44 1654 assert(MP_OBJ_IS_STR_OR_BYTES(self_in));
pythontech 0:5868e8752d44 1655 mp_obj_type_t *self_type = mp_obj_get_type(self_in);
pythontech 0:5868e8752d44 1656 if (self_type != mp_obj_get_type(arg)) {
pythontech 0:5868e8752d44 1657 bad_implicit_conversion(arg);
pythontech 0:5868e8752d44 1658 }
pythontech 0:5868e8752d44 1659
pythontech 0:5868e8752d44 1660 GET_STR_DATA_LEN(self_in, str, str_len);
pythontech 0:5868e8752d44 1661 GET_STR_DATA_LEN(arg, sep, sep_len);
pythontech 0:5868e8752d44 1662
pythontech 0:5868e8752d44 1663 if (sep_len == 0) {
pythontech 0:5868e8752d44 1664 nlr_raise(mp_obj_new_exception_msg(&mp_type_ValueError, "empty separator"));
pythontech 0:5868e8752d44 1665 }
pythontech 0:5868e8752d44 1666
pythontech 0:5868e8752d44 1667 mp_obj_t result[3];
pythontech 0:5868e8752d44 1668 if (self_type == &mp_type_str) {
pythontech 0:5868e8752d44 1669 result[0] = MP_OBJ_NEW_QSTR(MP_QSTR_);
pythontech 0:5868e8752d44 1670 result[1] = MP_OBJ_NEW_QSTR(MP_QSTR_);
pythontech 0:5868e8752d44 1671 result[2] = MP_OBJ_NEW_QSTR(MP_QSTR_);
pythontech 0:5868e8752d44 1672 } else {
pythontech 0:5868e8752d44 1673 result[0] = mp_const_empty_bytes;
pythontech 0:5868e8752d44 1674 result[1] = mp_const_empty_bytes;
pythontech 0:5868e8752d44 1675 result[2] = mp_const_empty_bytes;
pythontech 0:5868e8752d44 1676 }
pythontech 0:5868e8752d44 1677
pythontech 0:5868e8752d44 1678 if (direction > 0) {
pythontech 0:5868e8752d44 1679 result[0] = self_in;
pythontech 0:5868e8752d44 1680 } else {
pythontech 0:5868e8752d44 1681 result[2] = self_in;
pythontech 0:5868e8752d44 1682 }
pythontech 0:5868e8752d44 1683
pythontech 0:5868e8752d44 1684 const byte *position_ptr = find_subbytes(str, str_len, sep, sep_len, direction);
pythontech 0:5868e8752d44 1685 if (position_ptr != NULL) {
pythontech 0:5868e8752d44 1686 mp_uint_t position = position_ptr - str;
pythontech 0:5868e8752d44 1687 result[0] = mp_obj_new_str_of_type(self_type, str, position);
pythontech 0:5868e8752d44 1688 result[1] = arg;
pythontech 0:5868e8752d44 1689 result[2] = mp_obj_new_str_of_type(self_type, str + position + sep_len, str_len - position - sep_len);
pythontech 0:5868e8752d44 1690 }
pythontech 0:5868e8752d44 1691
pythontech 0:5868e8752d44 1692 return mp_obj_new_tuple(3, result);
pythontech 0:5868e8752d44 1693 }
pythontech 0:5868e8752d44 1694
pythontech 0:5868e8752d44 1695 STATIC mp_obj_t str_partition(mp_obj_t self_in, mp_obj_t arg) {
pythontech 0:5868e8752d44 1696 return str_partitioner(self_in, arg, 1);
pythontech 0:5868e8752d44 1697 }
pythontech 0:5868e8752d44 1698
pythontech 0:5868e8752d44 1699 STATIC mp_obj_t str_rpartition(mp_obj_t self_in, mp_obj_t arg) {
pythontech 0:5868e8752d44 1700 return str_partitioner(self_in, arg, -1);
pythontech 0:5868e8752d44 1701 }
pythontech 0:5868e8752d44 1702
pythontech 0:5868e8752d44 1703 // Supposedly not too critical operations, so optimize for code size
pythontech 0:5868e8752d44 1704 STATIC mp_obj_t str_caseconv(unichar (*op)(unichar), mp_obj_t self_in) {
pythontech 0:5868e8752d44 1705 GET_STR_DATA_LEN(self_in, self_data, self_len);
pythontech 0:5868e8752d44 1706 vstr_t vstr;
pythontech 0:5868e8752d44 1707 vstr_init_len(&vstr, self_len);
pythontech 0:5868e8752d44 1708 byte *data = (byte*)vstr.buf;
pythontech 0:5868e8752d44 1709 for (mp_uint_t i = 0; i < self_len; i++) {
pythontech 0:5868e8752d44 1710 *data++ = op(*self_data++);
pythontech 0:5868e8752d44 1711 }
pythontech 0:5868e8752d44 1712 return mp_obj_new_str_from_vstr(mp_obj_get_type(self_in), &vstr);
pythontech 0:5868e8752d44 1713 }
pythontech 0:5868e8752d44 1714
pythontech 0:5868e8752d44 1715 STATIC mp_obj_t str_lower(mp_obj_t self_in) {
pythontech 0:5868e8752d44 1716 return str_caseconv(unichar_tolower, self_in);
pythontech 0:5868e8752d44 1717 }
pythontech 0:5868e8752d44 1718
pythontech 0:5868e8752d44 1719 STATIC mp_obj_t str_upper(mp_obj_t self_in) {
pythontech 0:5868e8752d44 1720 return str_caseconv(unichar_toupper, self_in);
pythontech 0:5868e8752d44 1721 }
pythontech 0:5868e8752d44 1722
pythontech 0:5868e8752d44 1723 STATIC mp_obj_t str_uni_istype(bool (*f)(unichar), mp_obj_t self_in) {
pythontech 0:5868e8752d44 1724 GET_STR_DATA_LEN(self_in, self_data, self_len);
pythontech 0:5868e8752d44 1725
pythontech 0:5868e8752d44 1726 if (self_len == 0) {
pythontech 0:5868e8752d44 1727 return mp_const_false; // default to False for empty str
pythontech 0:5868e8752d44 1728 }
pythontech 0:5868e8752d44 1729
pythontech 0:5868e8752d44 1730 if (f != unichar_isupper && f != unichar_islower) {
pythontech 0:5868e8752d44 1731 for (mp_uint_t i = 0; i < self_len; i++) {
pythontech 0:5868e8752d44 1732 if (!f(*self_data++)) {
pythontech 0:5868e8752d44 1733 return mp_const_false;
pythontech 0:5868e8752d44 1734 }
pythontech 0:5868e8752d44 1735 }
pythontech 0:5868e8752d44 1736 } else {
pythontech 0:5868e8752d44 1737 bool contains_alpha = false;
pythontech 0:5868e8752d44 1738
pythontech 0:5868e8752d44 1739 for (mp_uint_t i = 0; i < self_len; i++) { // only check alphanumeric characters
pythontech 0:5868e8752d44 1740 if (unichar_isalpha(*self_data++)) {
pythontech 0:5868e8752d44 1741 contains_alpha = true;
pythontech 0:5868e8752d44 1742 if (!f(*(self_data - 1))) { // -1 because we already incremented above
pythontech 0:5868e8752d44 1743 return mp_const_false;
pythontech 0:5868e8752d44 1744 }
pythontech 0:5868e8752d44 1745 }
pythontech 0:5868e8752d44 1746 }
pythontech 0:5868e8752d44 1747
pythontech 0:5868e8752d44 1748 if (!contains_alpha) {
pythontech 0:5868e8752d44 1749 return mp_const_false;
pythontech 0:5868e8752d44 1750 }
pythontech 0:5868e8752d44 1751 }
pythontech 0:5868e8752d44 1752
pythontech 0:5868e8752d44 1753 return mp_const_true;
pythontech 0:5868e8752d44 1754 }
pythontech 0:5868e8752d44 1755
pythontech 0:5868e8752d44 1756 STATIC mp_obj_t str_isspace(mp_obj_t self_in) {
pythontech 0:5868e8752d44 1757 return str_uni_istype(unichar_isspace, self_in);
pythontech 0:5868e8752d44 1758 }
pythontech 0:5868e8752d44 1759
pythontech 0:5868e8752d44 1760 STATIC mp_obj_t str_isalpha(mp_obj_t self_in) {
pythontech 0:5868e8752d44 1761 return str_uni_istype(unichar_isalpha, self_in);
pythontech 0:5868e8752d44 1762 }
pythontech 0:5868e8752d44 1763
pythontech 0:5868e8752d44 1764 STATIC mp_obj_t str_isdigit(mp_obj_t self_in) {
pythontech 0:5868e8752d44 1765 return str_uni_istype(unichar_isdigit, self_in);
pythontech 0:5868e8752d44 1766 }
pythontech 0:5868e8752d44 1767
pythontech 0:5868e8752d44 1768 STATIC mp_obj_t str_isupper(mp_obj_t self_in) {
pythontech 0:5868e8752d44 1769 return str_uni_istype(unichar_isupper, self_in);
pythontech 0:5868e8752d44 1770 }
pythontech 0:5868e8752d44 1771
pythontech 0:5868e8752d44 1772 STATIC mp_obj_t str_islower(mp_obj_t self_in) {
pythontech 0:5868e8752d44 1773 return str_uni_istype(unichar_islower, self_in);
pythontech 0:5868e8752d44 1774 }
pythontech 0:5868e8752d44 1775
pythontech 0:5868e8752d44 1776 #if MICROPY_CPYTHON_COMPAT
pythontech 0:5868e8752d44 1777 // These methods are superfluous in the presense of str() and bytes()
pythontech 0:5868e8752d44 1778 // constructors.
pythontech 0:5868e8752d44 1779 // TODO: should accept kwargs too
pythontech 0:5868e8752d44 1780 STATIC mp_obj_t bytes_decode(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 1781 mp_obj_t new_args[2];
pythontech 0:5868e8752d44 1782 if (n_args == 1) {
pythontech 0:5868e8752d44 1783 new_args[0] = args[0];
pythontech 0:5868e8752d44 1784 new_args[1] = MP_OBJ_NEW_QSTR(MP_QSTR_utf_hyphen_8);
pythontech 0:5868e8752d44 1785 args = new_args;
pythontech 0:5868e8752d44 1786 n_args++;
pythontech 0:5868e8752d44 1787 }
pythontech 0:5868e8752d44 1788 return mp_obj_str_make_new(&mp_type_str, n_args, 0, args);
pythontech 0:5868e8752d44 1789 }
pythontech 0:5868e8752d44 1790
pythontech 0:5868e8752d44 1791 // TODO: should accept kwargs too
pythontech 0:5868e8752d44 1792 STATIC mp_obj_t str_encode(size_t n_args, const mp_obj_t *args) {
pythontech 0:5868e8752d44 1793 mp_obj_t new_args[2];
pythontech 0:5868e8752d44 1794 if (n_args == 1) {
pythontech 0:5868e8752d44 1795 new_args[0] = args[0];
pythontech 0:5868e8752d44 1796 new_args[1] = MP_OBJ_NEW_QSTR(MP_QSTR_utf_hyphen_8);
pythontech 0:5868e8752d44 1797 args = new_args;
pythontech 0:5868e8752d44 1798 n_args++;
pythontech 0:5868e8752d44 1799 }
pythontech 0:5868e8752d44 1800 return bytes_make_new(NULL, n_args, 0, args);
pythontech 0:5868e8752d44 1801 }
pythontech 0:5868e8752d44 1802 #endif
pythontech 0:5868e8752d44 1803
pythontech 0:5868e8752d44 1804 mp_int_t mp_obj_str_get_buffer(mp_obj_t self_in, mp_buffer_info_t *bufinfo, mp_uint_t flags) {
pythontech 0:5868e8752d44 1805 if (flags == MP_BUFFER_READ) {
pythontech 0:5868e8752d44 1806 GET_STR_DATA_LEN(self_in, str_data, str_len);
pythontech 0:5868e8752d44 1807 bufinfo->buf = (void*)str_data;
pythontech 0:5868e8752d44 1808 bufinfo->len = str_len;
pythontech 0:5868e8752d44 1809 bufinfo->typecode = 'b';
pythontech 0:5868e8752d44 1810 return 0;
pythontech 0:5868e8752d44 1811 } else {
pythontech 0:5868e8752d44 1812 // can't write to a string
pythontech 0:5868e8752d44 1813 bufinfo->buf = NULL;
pythontech 0:5868e8752d44 1814 bufinfo->len = 0;
pythontech 0:5868e8752d44 1815 bufinfo->typecode = -1;
pythontech 0:5868e8752d44 1816 return 1;
pythontech 0:5868e8752d44 1817 }
pythontech 0:5868e8752d44 1818 }
pythontech 0:5868e8752d44 1819
pythontech 0:5868e8752d44 1820 #if MICROPY_CPYTHON_COMPAT
pythontech 0:5868e8752d44 1821 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(bytes_decode_obj, 1, 3, bytes_decode);
pythontech 0:5868e8752d44 1822 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_encode_obj, 1, 3, str_encode);
pythontech 0:5868e8752d44 1823 #endif
pythontech 0:5868e8752d44 1824 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_find_obj, 2, 4, str_find);
pythontech 0:5868e8752d44 1825 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_rfind_obj, 2, 4, str_rfind);
pythontech 0:5868e8752d44 1826 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_index_obj, 2, 4, str_index);
pythontech 0:5868e8752d44 1827 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_rindex_obj, 2, 4, str_rindex);
pythontech 0:5868e8752d44 1828 MP_DEFINE_CONST_FUN_OBJ_2(str_join_obj, str_join);
pythontech 0:5868e8752d44 1829 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_split_obj, 1, 3, mp_obj_str_split);
pythontech 0:5868e8752d44 1830 #if MICROPY_PY_BUILTINS_STR_SPLITLINES
pythontech 0:5868e8752d44 1831 MP_DEFINE_CONST_FUN_OBJ_KW(str_splitlines_obj, 1, str_splitlines);
pythontech 0:5868e8752d44 1832 #endif
pythontech 0:5868e8752d44 1833 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_rsplit_obj, 1, 3, str_rsplit);
pythontech 0:5868e8752d44 1834 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_startswith_obj, 2, 3, str_startswith);
pythontech 0:5868e8752d44 1835 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_endswith_obj, 2, 3, str_endswith);
pythontech 0:5868e8752d44 1836 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_strip_obj, 1, 2, str_strip);
pythontech 0:5868e8752d44 1837 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_lstrip_obj, 1, 2, str_lstrip);
pythontech 0:5868e8752d44 1838 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_rstrip_obj, 1, 2, str_rstrip);
pythontech 0:5868e8752d44 1839 MP_DEFINE_CONST_FUN_OBJ_KW(str_format_obj, 1, mp_obj_str_format);
pythontech 0:5868e8752d44 1840 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_replace_obj, 3, 4, str_replace);
pythontech 0:5868e8752d44 1841 MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(str_count_obj, 2, 4, str_count);
pythontech 0:5868e8752d44 1842 MP_DEFINE_CONST_FUN_OBJ_2(str_partition_obj, str_partition);
pythontech 0:5868e8752d44 1843 MP_DEFINE_CONST_FUN_OBJ_2(str_rpartition_obj, str_rpartition);
pythontech 0:5868e8752d44 1844 MP_DEFINE_CONST_FUN_OBJ_1(str_lower_obj, str_lower);
pythontech 0:5868e8752d44 1845 MP_DEFINE_CONST_FUN_OBJ_1(str_upper_obj, str_upper);
pythontech 0:5868e8752d44 1846 MP_DEFINE_CONST_FUN_OBJ_1(str_isspace_obj, str_isspace);
pythontech 0:5868e8752d44 1847 MP_DEFINE_CONST_FUN_OBJ_1(str_isalpha_obj, str_isalpha);
pythontech 0:5868e8752d44 1848 MP_DEFINE_CONST_FUN_OBJ_1(str_isdigit_obj, str_isdigit);
pythontech 0:5868e8752d44 1849 MP_DEFINE_CONST_FUN_OBJ_1(str_isupper_obj, str_isupper);
pythontech 0:5868e8752d44 1850 MP_DEFINE_CONST_FUN_OBJ_1(str_islower_obj, str_islower);
pythontech 0:5868e8752d44 1851
pythontech 0:5868e8752d44 1852 STATIC const mp_rom_map_elem_t str8_locals_dict_table[] = {
pythontech 0:5868e8752d44 1853 #if MICROPY_CPYTHON_COMPAT
pythontech 0:5868e8752d44 1854 { MP_ROM_QSTR(MP_QSTR_decode), MP_ROM_PTR(&bytes_decode_obj) },
pythontech 0:5868e8752d44 1855 #if !MICROPY_PY_BUILTINS_STR_UNICODE
pythontech 0:5868e8752d44 1856 // If we have separate unicode type, then here we have methods only
pythontech 0:5868e8752d44 1857 // for bytes type, and it should not have encode() methods. Otherwise,
pythontech 0:5868e8752d44 1858 // we have non-compliant-but-practical bytestring type, which shares
pythontech 0:5868e8752d44 1859 // method table with bytes, so they both have encode() and decode()
pythontech 0:5868e8752d44 1860 // methods (which should do type checking at runtime).
pythontech 0:5868e8752d44 1861 { MP_ROM_QSTR(MP_QSTR_encode), MP_ROM_PTR(&str_encode_obj) },
pythontech 0:5868e8752d44 1862 #endif
pythontech 0:5868e8752d44 1863 #endif
pythontech 0:5868e8752d44 1864 { MP_ROM_QSTR(MP_QSTR_find), MP_ROM_PTR(&str_find_obj) },
pythontech 0:5868e8752d44 1865 { MP_ROM_QSTR(MP_QSTR_rfind), MP_ROM_PTR(&str_rfind_obj) },
pythontech 0:5868e8752d44 1866 { MP_ROM_QSTR(MP_QSTR_index), MP_ROM_PTR(&str_index_obj) },
pythontech 0:5868e8752d44 1867 { MP_ROM_QSTR(MP_QSTR_rindex), MP_ROM_PTR(&str_rindex_obj) },
pythontech 0:5868e8752d44 1868 { MP_ROM_QSTR(MP_QSTR_join), MP_ROM_PTR(&str_join_obj) },
pythontech 0:5868e8752d44 1869 { MP_ROM_QSTR(MP_QSTR_split), MP_ROM_PTR(&str_split_obj) },
pythontech 0:5868e8752d44 1870 #if MICROPY_PY_BUILTINS_STR_SPLITLINES
pythontech 0:5868e8752d44 1871 { MP_ROM_QSTR(MP_QSTR_splitlines), MP_ROM_PTR(&str_splitlines_obj) },
pythontech 0:5868e8752d44 1872 #endif
pythontech 0:5868e8752d44 1873 { MP_ROM_QSTR(MP_QSTR_rsplit), MP_ROM_PTR(&str_rsplit_obj) },
pythontech 0:5868e8752d44 1874 { MP_ROM_QSTR(MP_QSTR_startswith), MP_ROM_PTR(&str_startswith_obj) },
pythontech 0:5868e8752d44 1875 { MP_ROM_QSTR(MP_QSTR_endswith), MP_ROM_PTR(&str_endswith_obj) },
pythontech 0:5868e8752d44 1876 { MP_ROM_QSTR(MP_QSTR_strip), MP_ROM_PTR(&str_strip_obj) },
pythontech 0:5868e8752d44 1877 { MP_ROM_QSTR(MP_QSTR_lstrip), MP_ROM_PTR(&str_lstrip_obj) },
pythontech 0:5868e8752d44 1878 { MP_ROM_QSTR(MP_QSTR_rstrip), MP_ROM_PTR(&str_rstrip_obj) },
pythontech 0:5868e8752d44 1879 { MP_ROM_QSTR(MP_QSTR_format), MP_ROM_PTR(&str_format_obj) },
pythontech 0:5868e8752d44 1880 { MP_ROM_QSTR(MP_QSTR_replace), MP_ROM_PTR(&str_replace_obj) },
pythontech 0:5868e8752d44 1881 { MP_ROM_QSTR(MP_QSTR_count), MP_ROM_PTR(&str_count_obj) },
pythontech 0:5868e8752d44 1882 { MP_ROM_QSTR(MP_QSTR_partition), MP_ROM_PTR(&str_partition_obj) },
pythontech 0:5868e8752d44 1883 { MP_ROM_QSTR(MP_QSTR_rpartition), MP_ROM_PTR(&str_rpartition_obj) },
pythontech 0:5868e8752d44 1884 { MP_ROM_QSTR(MP_QSTR_lower), MP_ROM_PTR(&str_lower_obj) },
pythontech 0:5868e8752d44 1885 { MP_ROM_QSTR(MP_QSTR_upper), MP_ROM_PTR(&str_upper_obj) },
pythontech 0:5868e8752d44 1886 { MP_ROM_QSTR(MP_QSTR_isspace), MP_ROM_PTR(&str_isspace_obj) },
pythontech 0:5868e8752d44 1887 { MP_ROM_QSTR(MP_QSTR_isalpha), MP_ROM_PTR(&str_isalpha_obj) },
pythontech 0:5868e8752d44 1888 { MP_ROM_QSTR(MP_QSTR_isdigit), MP_ROM_PTR(&str_isdigit_obj) },
pythontech 0:5868e8752d44 1889 { MP_ROM_QSTR(MP_QSTR_isupper), MP_ROM_PTR(&str_isupper_obj) },
pythontech 0:5868e8752d44 1890 { MP_ROM_QSTR(MP_QSTR_islower), MP_ROM_PTR(&str_islower_obj) },
pythontech 0:5868e8752d44 1891 };
pythontech 0:5868e8752d44 1892
pythontech 0:5868e8752d44 1893 STATIC MP_DEFINE_CONST_DICT(str8_locals_dict, str8_locals_dict_table);
pythontech 0:5868e8752d44 1894
pythontech 0:5868e8752d44 1895 #if !MICROPY_PY_BUILTINS_STR_UNICODE
pythontech 0:5868e8752d44 1896 STATIC mp_obj_t mp_obj_new_str_iterator(mp_obj_t str);
pythontech 0:5868e8752d44 1897
pythontech 0:5868e8752d44 1898 const mp_obj_type_t mp_type_str = {
pythontech 0:5868e8752d44 1899 { &mp_type_type },
pythontech 0:5868e8752d44 1900 .name = MP_QSTR_str,
pythontech 0:5868e8752d44 1901 .print = str_print,
pythontech 0:5868e8752d44 1902 .make_new = mp_obj_str_make_new,
pythontech 0:5868e8752d44 1903 .binary_op = mp_obj_str_binary_op,
pythontech 0:5868e8752d44 1904 .subscr = bytes_subscr,
pythontech 0:5868e8752d44 1905 .getiter = mp_obj_new_str_iterator,
pythontech 0:5868e8752d44 1906 .buffer_p = { .get_buffer = mp_obj_str_get_buffer },
pythontech 0:5868e8752d44 1907 .locals_dict = (mp_obj_dict_t*)&str8_locals_dict,
pythontech 0:5868e8752d44 1908 };
pythontech 0:5868e8752d44 1909 #endif
pythontech 0:5868e8752d44 1910
pythontech 0:5868e8752d44 1911 // Reuses most of methods from str
pythontech 0:5868e8752d44 1912 const mp_obj_type_t mp_type_bytes = {
pythontech 0:5868e8752d44 1913 { &mp_type_type },
pythontech 0:5868e8752d44 1914 .name = MP_QSTR_bytes,
pythontech 0:5868e8752d44 1915 .print = str_print,
pythontech 0:5868e8752d44 1916 .make_new = bytes_make_new,
pythontech 0:5868e8752d44 1917 .binary_op = mp_obj_str_binary_op,
pythontech 0:5868e8752d44 1918 .subscr = bytes_subscr,
pythontech 0:5868e8752d44 1919 .getiter = mp_obj_new_bytes_iterator,
pythontech 0:5868e8752d44 1920 .buffer_p = { .get_buffer = mp_obj_str_get_buffer },
pythontech 0:5868e8752d44 1921 .locals_dict = (mp_obj_dict_t*)&str8_locals_dict,
pythontech 0:5868e8752d44 1922 };
pythontech 0:5868e8752d44 1923
pythontech 0:5868e8752d44 1924 // the zero-length bytes
pythontech 0:5868e8752d44 1925 const mp_obj_str_t mp_const_empty_bytes_obj = {{&mp_type_bytes}, 0, 0, NULL};
pythontech 0:5868e8752d44 1926
pythontech 0:5868e8752d44 1927 // Create a str/bytes object using the given data. New memory is allocated and
pythontech 0:5868e8752d44 1928 // the data is copied across.
pythontech 0:5868e8752d44 1929 mp_obj_t mp_obj_new_str_of_type(const mp_obj_type_t *type, const byte* data, size_t len) {
pythontech 0:5868e8752d44 1930 mp_obj_str_t *o = m_new_obj(mp_obj_str_t);
pythontech 0:5868e8752d44 1931 o->base.type = type;
pythontech 0:5868e8752d44 1932 o->len = len;
pythontech 0:5868e8752d44 1933 if (data) {
pythontech 0:5868e8752d44 1934 o->hash = qstr_compute_hash(data, len);
pythontech 0:5868e8752d44 1935 byte *p = m_new(byte, len + 1);
pythontech 0:5868e8752d44 1936 o->data = p;
pythontech 0:5868e8752d44 1937 memcpy(p, data, len * sizeof(byte));
pythontech 0:5868e8752d44 1938 p[len] = '\0'; // for now we add null for compatibility with C ASCIIZ strings
pythontech 0:5868e8752d44 1939 }
pythontech 0:5868e8752d44 1940 return MP_OBJ_FROM_PTR(o);
pythontech 0:5868e8752d44 1941 }
pythontech 0:5868e8752d44 1942
pythontech 0:5868e8752d44 1943 // Create a str/bytes object from the given vstr. The vstr buffer is resized to
pythontech 0:5868e8752d44 1944 // the exact length required and then reused for the str/bytes object. The vstr
pythontech 0:5868e8752d44 1945 // is cleared and can safely be passed to vstr_free if it was heap allocated.
pythontech 0:5868e8752d44 1946 mp_obj_t mp_obj_new_str_from_vstr(const mp_obj_type_t *type, vstr_t *vstr) {
pythontech 0:5868e8752d44 1947 // if not a bytes object, look if a qstr with this data already exists
pythontech 0:5868e8752d44 1948 if (type == &mp_type_str) {
pythontech 0:5868e8752d44 1949 qstr q = qstr_find_strn(vstr->buf, vstr->len);
pythontech 0:5868e8752d44 1950 if (q != MP_QSTR_NULL) {
pythontech 0:5868e8752d44 1951 vstr_clear(vstr);
pythontech 0:5868e8752d44 1952 vstr->alloc = 0;
pythontech 0:5868e8752d44 1953 return MP_OBJ_NEW_QSTR(q);
pythontech 0:5868e8752d44 1954 }
pythontech 0:5868e8752d44 1955 }
pythontech 0:5868e8752d44 1956
pythontech 0:5868e8752d44 1957 // make a new str/bytes object
pythontech 0:5868e8752d44 1958 mp_obj_str_t *o = m_new_obj(mp_obj_str_t);
pythontech 0:5868e8752d44 1959 o->base.type = type;
pythontech 0:5868e8752d44 1960 o->len = vstr->len;
pythontech 0:5868e8752d44 1961 o->hash = qstr_compute_hash((byte*)vstr->buf, vstr->len);
pythontech 0:5868e8752d44 1962 if (vstr->len + 1 == vstr->alloc) {
pythontech 0:5868e8752d44 1963 o->data = (byte*)vstr->buf;
pythontech 0:5868e8752d44 1964 } else {
pythontech 0:5868e8752d44 1965 o->data = (byte*)m_renew(char, vstr->buf, vstr->alloc, vstr->len + 1);
pythontech 0:5868e8752d44 1966 }
pythontech 0:5868e8752d44 1967 ((byte*)o->data)[o->len] = '\0'; // add null byte
pythontech 0:5868e8752d44 1968 vstr->buf = NULL;
pythontech 0:5868e8752d44 1969 vstr->alloc = 0;
pythontech 0:5868e8752d44 1970 return MP_OBJ_FROM_PTR(o);
pythontech 0:5868e8752d44 1971 }
pythontech 0:5868e8752d44 1972
pythontech 0:5868e8752d44 1973 mp_obj_t mp_obj_new_str(const char* data, mp_uint_t len, bool make_qstr_if_not_already) {
pythontech 0:5868e8752d44 1974 if (make_qstr_if_not_already) {
pythontech 0:5868e8752d44 1975 // use existing, or make a new qstr
pythontech 0:5868e8752d44 1976 return MP_OBJ_NEW_QSTR(qstr_from_strn(data, len));
pythontech 0:5868e8752d44 1977 } else {
pythontech 0:5868e8752d44 1978 qstr q = qstr_find_strn(data, len);
pythontech 0:5868e8752d44 1979 if (q != MP_QSTR_NULL) {
pythontech 0:5868e8752d44 1980 // qstr with this data already exists
pythontech 0:5868e8752d44 1981 return MP_OBJ_NEW_QSTR(q);
pythontech 0:5868e8752d44 1982 } else {
pythontech 0:5868e8752d44 1983 // no existing qstr, don't make one
pythontech 0:5868e8752d44 1984 return mp_obj_new_str_of_type(&mp_type_str, (const byte*)data, len);
pythontech 0:5868e8752d44 1985 }
pythontech 0:5868e8752d44 1986 }
pythontech 0:5868e8752d44 1987 }
pythontech 0:5868e8752d44 1988
pythontech 0:5868e8752d44 1989 mp_obj_t mp_obj_str_intern(mp_obj_t str) {
pythontech 0:5868e8752d44 1990 GET_STR_DATA_LEN(str, data, len);
pythontech 0:5868e8752d44 1991 return MP_OBJ_NEW_QSTR(qstr_from_strn((const char*)data, len));
pythontech 0:5868e8752d44 1992 }
pythontech 0:5868e8752d44 1993
pythontech 0:5868e8752d44 1994 mp_obj_t mp_obj_new_bytes(const byte* data, mp_uint_t len) {
pythontech 0:5868e8752d44 1995 return mp_obj_new_str_of_type(&mp_type_bytes, data, len);
pythontech 0:5868e8752d44 1996 }
pythontech 0:5868e8752d44 1997
pythontech 0:5868e8752d44 1998 bool mp_obj_str_equal(mp_obj_t s1, mp_obj_t s2) {
pythontech 0:5868e8752d44 1999 if (MP_OBJ_IS_QSTR(s1) && MP_OBJ_IS_QSTR(s2)) {
pythontech 0:5868e8752d44 2000 return s1 == s2;
pythontech 0:5868e8752d44 2001 } else {
pythontech 0:5868e8752d44 2002 GET_STR_HASH(s1, h1);
pythontech 0:5868e8752d44 2003 GET_STR_HASH(s2, h2);
pythontech 0:5868e8752d44 2004 // If any of hashes is 0, it means it's not valid
pythontech 0:5868e8752d44 2005 if (h1 != 0 && h2 != 0 && h1 != h2) {
pythontech 0:5868e8752d44 2006 return false;
pythontech 0:5868e8752d44 2007 }
pythontech 0:5868e8752d44 2008 GET_STR_DATA_LEN(s1, d1, l1);
pythontech 0:5868e8752d44 2009 GET_STR_DATA_LEN(s2, d2, l2);
pythontech 0:5868e8752d44 2010 if (l1 != l2) {
pythontech 0:5868e8752d44 2011 return false;
pythontech 0:5868e8752d44 2012 }
pythontech 0:5868e8752d44 2013 return memcmp(d1, d2, l1) == 0;
pythontech 0:5868e8752d44 2014 }
pythontech 0:5868e8752d44 2015 }
pythontech 0:5868e8752d44 2016
pythontech 0:5868e8752d44 2017 STATIC void bad_implicit_conversion(mp_obj_t self_in) {
pythontech 0:5868e8752d44 2018 if (MICROPY_ERROR_REPORTING == MICROPY_ERROR_REPORTING_TERSE) {
pythontech 0:5868e8752d44 2019 nlr_raise(mp_obj_new_exception_msg(&mp_type_TypeError,
pythontech 0:5868e8752d44 2020 "can't convert to str implicitly"));
pythontech 0:5868e8752d44 2021 } else {
pythontech 0:5868e8752d44 2022 nlr_raise(mp_obj_new_exception_msg_varg(&mp_type_TypeError,
pythontech 0:5868e8752d44 2023 "can't convert '%s' object to str implicitly",
pythontech 0:5868e8752d44 2024 mp_obj_get_type_str(self_in)));
pythontech 0:5868e8752d44 2025 }
pythontech 0:5868e8752d44 2026 }
pythontech 0:5868e8752d44 2027
pythontech 0:5868e8752d44 2028 // use this if you will anyway convert the string to a qstr
pythontech 0:5868e8752d44 2029 // will be more efficient for the case where it's already a qstr
pythontech 0:5868e8752d44 2030 qstr mp_obj_str_get_qstr(mp_obj_t self_in) {
pythontech 0:5868e8752d44 2031 if (MP_OBJ_IS_QSTR(self_in)) {
pythontech 0:5868e8752d44 2032 return MP_OBJ_QSTR_VALUE(self_in);
pythontech 0:5868e8752d44 2033 } else if (MP_OBJ_IS_TYPE(self_in, &mp_type_str)) {
pythontech 0:5868e8752d44 2034 mp_obj_str_t *self = MP_OBJ_TO_PTR(self_in);
pythontech 0:5868e8752d44 2035 return qstr_from_strn((char*)self->data, self->len);
pythontech 0:5868e8752d44 2036 } else {
pythontech 0:5868e8752d44 2037 bad_implicit_conversion(self_in);
pythontech 0:5868e8752d44 2038 }
pythontech 0:5868e8752d44 2039 }
pythontech 0:5868e8752d44 2040
pythontech 0:5868e8752d44 2041 // only use this function if you need the str data to be zero terminated
pythontech 0:5868e8752d44 2042 // at the moment all strings are zero terminated to help with C ASCIIZ compatibility
pythontech 0:5868e8752d44 2043 const char *mp_obj_str_get_str(mp_obj_t self_in) {
pythontech 0:5868e8752d44 2044 if (MP_OBJ_IS_STR_OR_BYTES(self_in)) {
pythontech 0:5868e8752d44 2045 GET_STR_DATA_LEN(self_in, s, l);
pythontech 0:5868e8752d44 2046 (void)l; // len unused
pythontech 0:5868e8752d44 2047 return (const char*)s;
pythontech 0:5868e8752d44 2048 } else {
pythontech 0:5868e8752d44 2049 bad_implicit_conversion(self_in);
pythontech 0:5868e8752d44 2050 }
pythontech 0:5868e8752d44 2051 }
pythontech 0:5868e8752d44 2052
pythontech 0:5868e8752d44 2053 const char *mp_obj_str_get_data(mp_obj_t self_in, mp_uint_t *len) {
pythontech 0:5868e8752d44 2054 if (MP_OBJ_IS_STR_OR_BYTES(self_in)) {
pythontech 0:5868e8752d44 2055 GET_STR_DATA_LEN(self_in, s, l);
pythontech 0:5868e8752d44 2056 *len = l;
pythontech 0:5868e8752d44 2057 return (const char*)s;
pythontech 0:5868e8752d44 2058 } else {
pythontech 0:5868e8752d44 2059 bad_implicit_conversion(self_in);
pythontech 0:5868e8752d44 2060 }
pythontech 0:5868e8752d44 2061 }
pythontech 0:5868e8752d44 2062
pythontech 0:5868e8752d44 2063 #if MICROPY_OBJ_REPR == MICROPY_OBJ_REPR_C
pythontech 0:5868e8752d44 2064 const byte *mp_obj_str_get_data_no_check(mp_obj_t self_in, size_t *len) {
pythontech 0:5868e8752d44 2065 if (MP_OBJ_IS_QSTR(self_in)) {
pythontech 0:5868e8752d44 2066 return qstr_data(MP_OBJ_QSTR_VALUE(self_in), len);
pythontech 0:5868e8752d44 2067 } else {
pythontech 0:5868e8752d44 2068 *len = ((mp_obj_str_t*)self_in)->len;
pythontech 0:5868e8752d44 2069 return ((mp_obj_str_t*)self_in)->data;
pythontech 0:5868e8752d44 2070 }
pythontech 0:5868e8752d44 2071 }
pythontech 0:5868e8752d44 2072 #endif
pythontech 0:5868e8752d44 2073
pythontech 0:5868e8752d44 2074 /******************************************************************************/
pythontech 0:5868e8752d44 2075 /* str iterator */
pythontech 0:5868e8752d44 2076
pythontech 0:5868e8752d44 2077 typedef struct _mp_obj_str8_it_t {
pythontech 0:5868e8752d44 2078 mp_obj_base_t base;
pythontech 0:5868e8752d44 2079 mp_fun_1_t iternext;
pythontech 0:5868e8752d44 2080 mp_obj_t str;
pythontech 0:5868e8752d44 2081 mp_uint_t cur;
pythontech 0:5868e8752d44 2082 } mp_obj_str8_it_t;
pythontech 0:5868e8752d44 2083
pythontech 0:5868e8752d44 2084 #if !MICROPY_PY_BUILTINS_STR_UNICODE
pythontech 0:5868e8752d44 2085 STATIC mp_obj_t str_it_iternext(mp_obj_t self_in) {
pythontech 0:5868e8752d44 2086 mp_obj_str8_it_t *self = self_in;
pythontech 0:5868e8752d44 2087 GET_STR_DATA_LEN(self->str, str, len);
pythontech 0:5868e8752d44 2088 if (self->cur < len) {
pythontech 0:5868e8752d44 2089 mp_obj_t o_out = mp_obj_new_str((const char*)str + self->cur, 1, true);
pythontech 0:5868e8752d44 2090 self->cur += 1;
pythontech 0:5868e8752d44 2091 return o_out;
pythontech 0:5868e8752d44 2092 } else {
pythontech 0:5868e8752d44 2093 return MP_OBJ_STOP_ITERATION;
pythontech 0:5868e8752d44 2094 }
pythontech 0:5868e8752d44 2095 }
pythontech 0:5868e8752d44 2096
pythontech 0:5868e8752d44 2097 STATIC mp_obj_t mp_obj_new_str_iterator(mp_obj_t str) {
pythontech 0:5868e8752d44 2098 mp_obj_str8_it_t *o = m_new_obj(mp_obj_str8_it_t);
pythontech 0:5868e8752d44 2099 o->base.type = &mp_type_polymorph_iter;
pythontech 0:5868e8752d44 2100 o->iternext = str_it_iternext;
pythontech 0:5868e8752d44 2101 o->str = str;
pythontech 0:5868e8752d44 2102 o->cur = 0;
pythontech 0:5868e8752d44 2103 return o;
pythontech 0:5868e8752d44 2104 }
pythontech 0:5868e8752d44 2105 #endif
pythontech 0:5868e8752d44 2106
pythontech 0:5868e8752d44 2107 STATIC mp_obj_t bytes_it_iternext(mp_obj_t self_in) {
pythontech 0:5868e8752d44 2108 mp_obj_str8_it_t *self = MP_OBJ_TO_PTR(self_in);
pythontech 0:5868e8752d44 2109 GET_STR_DATA_LEN(self->str, str, len);
pythontech 0:5868e8752d44 2110 if (self->cur < len) {
pythontech 0:5868e8752d44 2111 mp_obj_t o_out = MP_OBJ_NEW_SMALL_INT(str[self->cur]);
pythontech 0:5868e8752d44 2112 self->cur += 1;
pythontech 0:5868e8752d44 2113 return o_out;
pythontech 0:5868e8752d44 2114 } else {
pythontech 0:5868e8752d44 2115 return MP_OBJ_STOP_ITERATION;
pythontech 0:5868e8752d44 2116 }
pythontech 0:5868e8752d44 2117 }
pythontech 0:5868e8752d44 2118
pythontech 0:5868e8752d44 2119 mp_obj_t mp_obj_new_bytes_iterator(mp_obj_t str) {
pythontech 0:5868e8752d44 2120 mp_obj_str8_it_t *o = m_new_obj(mp_obj_str8_it_t);
pythontech 0:5868e8752d44 2121 o->base.type = &mp_type_polymorph_iter;
pythontech 0:5868e8752d44 2122 o->iternext = bytes_it_iternext;
pythontech 0:5868e8752d44 2123 o->str = str;
pythontech 0:5868e8752d44 2124 o->cur = 0;
pythontech 0:5868e8752d44 2125 return MP_OBJ_FROM_PTR(o);
pythontech 0:5868e8752d44 2126 }