C Library strtok

20 Mar 2012

Does anyone know how to return from strtok into an array rather than a pointer?

int main( )
{
    char string[32] = {"This, is. a_ string\n"};
    char result1[16];
    char* result2;
    
    // use an array rather than pointer - error "expression must be a modifiable lvalue"
    (char*)&result1[0] = strtok( string, "," );
    // This is the appropriate return type
    result2 = strtok( NULL, "." );
    printf( "%s\n", (char*)&result[0] );
    printf( "%s\n", result2);

    return 0;
}

Help is appreciated!!!

20 Mar 2012

What are you trying to achieve?

21 Mar 2012

A single global array statically allocated and used by many different functions. I understand the example doesn't show the array as being global.

21 Mar 2012

That's not really what I asked but I think I see your goal. You want to split a string into parts and use them later, right? Then this approach won't work. strtok() modifies the original string in-place (puts null bytes at found tokens), and thus the pointers returned by it will become invalid once the original string goes out of scope. It's better to allocate copies of what strtok returns. For example:

// store pointers to five tokens
char *my_tokens[5] = { 0 };  // zero-fill on startup

int main( )
{
  char to_split[] = "this,is,a,test";

  // get first token
  char *tok = strtok(to_split, ",");
  int i = 0;

  // loop until no more tokens or out of space in array
  while ( tok != NULL && i < 5 )
  {
    // strdup allocates a copy of the source string and retuns a new pointer
    my_tokens[i] = strdup(tok);
    // get next token
    tok = strtok(NULL, ",");
    // move to next array element
    i++;
  } 
}
21 Mar 2012

The goal is to eliminate the need for heap or the stack data when using strtok's return.

In your example you are using the stack for the return from strtok. This is exactly what i'm trying to avoid.

My question is really more of about the syntax. Can you instruct the compiler to treat an array as a pointer for return types, specifically for strtok?

21 Mar 2012

Quote:

The goal is to eliminate the need for heap or the stack data when using strtok's return.

What would be the point of that? Any variable needs stack or heap space, you can't just store data in the ether. Actually, char result1[16]; in your example is placed in the stack. What exactly have been "eliminated" here?

Quote:

In your example you are using the stack for the return from strtok. This is exactly what i'm trying to avoid.

Actually, strdup() allocates memory on the heap. And my_tokens array is a global variable, placed in RAM.

Quote:

My question is really more of about the syntax. Can you instruct the compiler to treat an array as a pointer for return types, specifically for strtok?

Even if you did that, you'd just have pointers in your array instead of character data, as you probably expect, and treating it as characters would produce garbage.

Once again, can you please explain the overall problem? Why don't you want to use heap or stack? I think you're approaching it from the wrong angle.

21 Mar 2012

Hi

In my CMDB library I use strtok to parse commandline parameters into an array, but allow mixed datatypes.

My CMDB library code contains like:

        char *tok; 
        argcnt=0;
        tok = strtok(prmstr_buf, " ");
        while (tok != NULL) {
            //Store Pointers
            prms[argcnt++] = tok;
            tok = strtok(NULL, " ");
        }

where prms is defined as

char *prms[MAX_ARGS];

and (I think this is the syntax your looking for)..

strncpy(parms[i].val.s,(char*)toks[i], strlen((char*)toks[i]));

to copy the value into an array of unions that allows me to define sscanf patterns to defined which type of parameters to expect.

I defined parms as:

    /** Storage for Parsed Parameters
    */
    struct  parm parms[MAX_ARGS];

and parm like a union:

    /** 
     * Used for parsing parameters.
     */
    enum parmtype {
        PARM_UNUSED,            //0

        PARM_FLOAT,             //1     (f)

        PARM_LONG,              //2     (l/ul)
        PARM_INT,               //3     (i/ui)
        PARM_SHORT,             //4     (w/uw)

        PARM_CHAR,              //5     (c/uc)
        PARM_STRING             //6     (s)
    };

    /** 
     * Used for storing parsed parameters.
     */
    union value {
        float               f;

        unsigned long       ul;
        long                 l;

        int                  i;
        unsigned int        ui;

        short                w;
        unsigned short      uw;

        char                 c;
        unsigned char       uc;

        char                 s[MAX_PARM_LEN];
    };

    /** Used for parsing parameters.
    */
    struct parm {
        enum parmtype    type;
        union value     val;
    };

To get things out of the parms array (after matching patterns and doing checks) I use a number of utility methods like:

    char* STRINGPARM(int ndx) {
        return parms[ndx].val.s;
    }
21 Mar 2012

Igor Skochinsky wrote:

Once again, can you please explain the overall problem? Why don't you want to use heap or stack? I think you're approaching it from the wrong angle.

I think you are right about this and sorry for not clearly explaining what it is i'm trying to avoid.

I don't want the data pointed to by the return of strtok to be corrupted if other functions are called before the return is stored in the appropriate data structure

// safe place for parsed data
typedef struct _data
{
  char stored[32] = {0};
}data;

void func(char* data)
{
  char* res = strtok(data, ","); 
  // eliminate the possibility of corruption to memory pointed at by res. Other 
  //  events and callbacks may be invoked between strtok and memcpy that use the
  //  stack and heap
  memcpy( data.stored, res, strlen(res) );
}
21 Mar 2012

Hi,

Using memcpy is not a very preferred method (better to use a strncpy). Without additional checks and code you certainly do not eliminate corruption but encourage it.

If i'm correct, your code will work correct only the first time (as arrays are zero'd by the compiler and assuming strlen is less then 32) but the second time, is the string is shorted than the first, it will fail.

strncpy is already somewhat better as it padds short strings with zero's. You only would have to check if the strlen is samller then 32.

wvd_vegt

21 Mar 2012

Umm, how would any other calls corrupt the memory? The returned pointer points into the original string (data), and if you see corruption there, you have bugs in your program. I think the only thing you need to worry about is if you have other places that use strtok - it can handle parsing of only one string at a time. If it can be called potentially in several places, you should use strtok_r instead (though I'm not sure if mbed's libc has it).

By the way, you should check that strlen(res) is less than size of data.stored. And you should probably copy the terminating zero too. strdup() might be the easier way.

21 Mar 2012

Hi

If you do not zero the data array before your run strtok.memcpy again it will (partially) overwrite the previous string as memcpy is not string aware and does not add the trailing \0 string terminator. From what i read at

http://www.cplusplus.com/reference/clibrary/cstring/strncpy/

Quote:

strncpy

char * strncpy ( char * destination, const char * source, size_t num );

Copy characters from string Copies the first num characters of source to destination. If the end of the source C string (which is signaled by a null-character) is found before num characters have been copied, destination is padded with zeros until a total of num characters have been written to it.

No null-character is implicitly appended to the end of destination, so destination will only be null-terminated if the length of the C string in source is less than num.

strncpy can be passed 32 as length and will happily pad the remaining of the buffer without need to zero it first. So the source will not be corrupted but the 'uncorrupted' copy you wanted to get of it in the first place. So Igor, IMHO if you add all your safety suggestions (and do not forget anything) you end up with strncpy's behavior in more lines of code (and the more code the more bugs).

21 Mar 2012

void func(char* data)
{
  char* res = strtok(data, ","); 
  // eliminate the possibility of corruption to memory pointed at by res. Other 
  //  events and callbacks may be invoked between strtok and memcpy that use the
  //  stack and heap
  memcpy( data.stored, res, strlen(res) );
}

void func()
{
  char* new_ptr = NULL;
}

Let's talk about the memory pointed to by the temp variable res who's size is not dynamically allocated. Say strtok returns 16 bytes and another function is called that uses a temp variable new_ptr which is pointer. Does the address pointed to by new_ptr seek for the \0 before giving the address to the new pointer (hopefully that's not too confusing)?

21 Mar 2012

Every function's stack frame is separate from others. Declaring local variables in one function will not overwrite them in another (barring buffer overruns). There's no "seeking" going on, the compiler knows how much space each variable takes at compile time and allocates stack offsets accordingly.

A char * variable itself always takes four bytes (=size of a pointer) on the stack, regardless of how big is the string it's pointing to. strtok does not "return 16 bytes", it returns a pointer into the original string (data), which has been cut off at the first token (the token delimiter has been replaced by a \0).

21 Mar 2012

Igor Skochinsky wrote:

... it returns a pointer into the original string (data) ...

That clears it up. The only thing that could get corrupted before copying the token would be the actual string being parsed (memory leak, buffer overrun, etc...)