Windows.  Viruses.  Notebooks.  Internet.  office.  Utilities.  Drivers

Other Alias

strtok

REVIEW

#include

char *strtok(char *str, const char *delim);
char *strtok_r(char *str, const char *delim, char **saveptr);

Property test macro requirements for glibc (see feature_test_macros(7)):

strtok_r(): _SVID_SOURCE || _BSD_SOURCE || _POSIX_C_SOURCE >= 1 || _XOPEN_SOURCE || _POSIX_SOURCE

DESCRIPTION

Function strtok() splits a string into a sequence of zero or more non-empty tokens. On first call strtok() the parsed string must be specified in the argument str. In each subsequent call that parses the same string, the value str must be NULL.

In argument delim sets a set of bytes that are considered token separators in the parsed string. The caller may specify different strings in delim in subsequent calls when parsing the same string.

Every call strtok() returns a pointer to a null-terminated string that contains the next token. This string does not include the delimiter byte. If there are no more tokens, then strtok() returns NULL.

Call sequence strtok() that operate on a single string maintains a pointer that specifies the point at which to start searching for the next token. First call strtok() assigns this pointer a reference to the first byte of the string. The start of the next token is determined by searching ahead in str the next byte is not a delimiter. If a byte is found, then it is taken as the beginning of the next token. If no such byte is found, then there are no more tokens and strtok() returns NULL (for an empty string or consisting only of delimiters in this case, NULL will be returned on the first call strtok()).

The end of each token is found by a forward search lasting until either the delimiter byte or the terminating null ("\0") byte is found. If a delimiter byte is found, it is replaced with a null byte to terminate the current token, and strtok() stores a pointer to the next byte; this pointer will be used as the starting point when looking up the next token. In this case strtok() returns a pointer to the beginning of the found token.

It follows from the description above that a sequence of two or more contiguous delimiter bytes in a scanned string is considered a single delimiter, and delimiter bytes at the beginning or end of the string are ignored. In other words, the tokens returned strtok() - always not empty lines. That is, for example, if there is a line " aaa;;bbb,”, then subsequent calls strtok() with specified line separators " ;, ' would return the strings ' aaa" And " bbb' followed by a null pointer.

Function strtok_r() is the reentrant version strtok(). Argument saveptr is a pointer to a variable char* which is used inside strtok_r() to take into account the context between subsequent calls when parsing the same string.

On first call strtok_r() meaning str must point to the parsed string, and the value saveptr ignored. On subsequent calls, the value str must be NULL and the value saveptr must not have changed since the previous call.

Multiple strings can be parsed at the same time on multiple runs strtok_r() with different arguments saveptr.

RETURN VALUE

Functions strtok() And strtok_r() return a pointer to the next token, or NULL if there are no more tokens.

ATTRIBUTES

For a description of the terms in this section, see attributes(7).
Interface Attribute Meaning
strtok() harmlessness in the threadsunsafe (MT-Unsafe race:strtok)
strtok_r() harmlessness in the threadsharmless (MT-Safe)

COMPLIANCE

strtok() POSIX.1-2001, POSIX.1-2008, C89, C99, SVr4, 4.3BSD. strtok_r() POSIX.1-2001, POSIX.1-2008.

DEFECTS

Use these features with caution. Note that: * These functions change their first argument. * These functions cannot be used with constant strings. * The identity of the delimiter byte is lost. * When analyzing function strtok() uses a static buffer, so is not thread-safe. Use strtok_r() in this case.

EXAMPLE

The program below uses nested loops that call strtok_r() to split the string into its constituent tokens. In the first parameter command line the string to be parsed is given. The second parameter specifies the delimiter byte(s), which is used to divide the string into "composite" tokens. The third parameter specifies the separator byte(s) that is used to separate "composite" tokens into subtokens.

An example of the output of the program:

$./a.out "a/bbb///cc;xxx:yyy:" ":;" "/" 1: a/bbb///cc --> a --> bbb --> cc 2: xxx --> xxx 3: yyy --> yyy

Program source code

#include #include #include int main(int argc, char *argv) ( char *str1, *str2, *token, *subtoken; char *saveptr1, *saveptr2; int j; if (argc != 4) ( fprintf(stderr, "Usage: % s string delim subdelim\n", argv); exit(EXIT_FAILURE); ) for (j = 1, str1 = argv; ; j++, str1 = NULL) ( token = strtok_r(str1, argv, &saveptr1); if (token = = NULL) break; printf("%d: %s\n", j, token); for (str2 = token; ; str2 = NULL) ( subtoken = strtok_r(str2, argv, &saveptr2); if (subtoken == NULL) break; printf(" --> %s\n", subtoken); ) ) exit(EXIT_SUCCESS); )

Another example of a program using strtok() can be found in getaddrinfo_a(3).

4 answers

Two things to know about strtok . As mentioned, it "maintains internal state". In addition, he messed up the line you feed her. Essentially it will write "\0" where it will find the marker you provided and return a pointer to the beginning of the string. Internally, it maintains the location of the last token; and the next time you call it, it will start from there.

An important consequence is that you cannot use strtok on a string like const char* "hello world"; , since you'll get an access violation when changing the contents of a const char* string.

The "good thing" about strtok is that it doesn't actually copy strings, so you don't have to manage extra memory allocation, etc. But if you don't understand the above, you will have trouble using it.

Example. If you have "this, is, string", successive calls to strtok will generate pointers as follows (the ^ value is the return value). Note that "\0" is added where the tokens are found; this means that the original line is changed:

T h i s , i s , a , s t r i n g \0 this,is,a,string t h i s \0 i s , a , s t r i n g \0 this ^ t h i s \0 i s \0 a , s t r i n g \0 is ^ t h i s \0 i s \0 a \ 0 s t r i n g \0 a ^ t h i s \0 i s \0 a \0 s t r i n g \0 string ^

Hope this makes sense.

The strtok() function stores data between calls. It uses this data when you call it with a NULL pointer.

The point at which the last token was found is stored internally with a function to be used on the next call (no specific library implementation required to prevent data crashes).

strtok maintains internal state. When you call it with non-NULL, it reinitializes itself to use the string you supply. When you call it with NULL , it uses that string and any other state it currently has to return the next token.

Due to the way strtok works, you need to make sure you are linking to the multi-threaded version of the C runtime if you are writing a multi-threaded application. This ensures that each thread gets its own internal state for strtok .

The strtok function stores data in an internal static variable that is shared among all threads.

For thread safety you should use strtok_r

Take a look at static char *last;

Char * strtok(s, delim) register char *s; register const char *delim; ( register char *spanp; register int c, sc; char *tok; static char *last; if (s == NULL && (s = last) == NULL) return (NULL); /* * Skip (span) leading delimiters (s += strspn(s, delim), sort of).*/ cont: c = *s++; for (spanp = (char *)delim; (sc = *spanp++) != 0;) ( if (c == sc) goto cont; ) if (c == 0) ( /* no non-delimiter characters */ last = NULL; return (NULL); ) tok = s - 1; /* * Scan token (scan for delimiters : s += strcspn(s, delim), sort of). * Note that delim must have one NUL; we stop if we see that, too. */ for (;;) ( c = *s++; spanp = (char *)delim; do ( if ((sc = *spanp++) == c) ( if (c == 0) s = NULL; else s[-1] = 0; last = s; return (tok); ) ) while (sc != 0); ) /* NOTREACHED */ )

share

The strtok() function returns a pointer to the next token in the string pointed to by parameter str1. The characters that make up the string addressed by parameter str2 are the delimiters that define the token. If there is no token to return, a null pointer is returned.

In the C99 version, the restrict qualifier is applied to the parameters str1 and str2.

To split a string into tokens, the first time the strtok() function is called, the str1 parameter must point to the beginning of that string. In subsequent function calls, a null pointer must be used as the str1 parameter. In this case, the complete string will be updated on each function call.

Each call to the strtok() function can use a different set of token separators.

The strtok() function provides a means to reduce a string to its constituent parts. For example, the following program tokenizes the string "One, two, and three".

#include #include int main(void) char *p; p = strtok("One, two, and three.", ","); printf(p); do ( p = strtok(NULL, ",. "); if(p) printf("|%s", p); ) while(p), return 0; )

The output of this program is as follows.

one | two | and | three

Notice how the strtok() function is first called with the original string, but subsequent calls use NULL as the first argument. The strtok() function maintains an internal pointer to the string to be processed. If the first argument to strtok() points to a string, the internal pointer is set to the beginning of that string. If the first argument equals the value NULL, the strtok() function continues processing the previous line, starting from the position stored in the previous step, and advances the internal pointer as the next token is received. Thus, the strtok() function "traverses" the entire string. Also notice how the delimiter string changes on the first and subsequent calls to the function. Delimiters can be defined differently on each call.

char far * far _fstrtok(const char far *str1, const char far *str2)

Description:

The strtok() function returns a pointer to the next token in the string pointed to by str1. The characters from the string pointed to by str2 are used as delimiters that define the token. If the token is not found, NULL is returned.

The first call to strtok() actually uses str1 as the pointer. Subsequent calls use NULL as the first argument. Thus the entire string can be tokenized.

It is important to understand that the strtok() function modifies the string pointed to by str1. Each time a token is found, a null character is placed at the location where the delimiter was found. So strtok() advances along the string.

With each call to strtok(), you can vary the set of delimiters.

The _fstrtok() function is the FAR version of the function in question.

The following program tokenizes the string "The summer soldier, the sunshine patriot" using spaces and commas as delimiters. As a result of the program, a line of the following form will be generated: “The | summer | soldiers | the | sunshine | patriot".
#include
#include
int main(void )
{
char*p;
p = strtok( "The summer soldier, the sunshine patriot", " " ) ;
printf(p);
do(
p=strtok(" \0 " , ", " ) ;
if (p) printf("|%s" , p) ;
) while (p);
return 0 ;
}

Programming languages ​​may include special functions to work with strings, thereby saving the programmer from having to write their own string processing functions. For example, it is often necessary to determine the length of a string, and so languages ​​provide a function to measure its length.

In the C programming language, functions for working with strings are declared in the string.h header file, which you must remember to include in your source code. There are about twenty functions for working with strings. Among them are those that search for characters in a string, comparison functions, string copying, as well as more specific ones. List and description of most of the existing this moment in the C language, functions can be found in the appendix of the book by B. Kernighan, D. Ritchie "The C Programming Language. Second Edition".

All functions declared in string.h may or may not modify one of the strings passed by the pointer in the course of their work. It depends on the purpose of the function. However, most of them return something: either a pointer to a character or an integer. Moreover, if the function changes one of its parameters and was called for this, then what it returns can be ignored (that is, not assigned to anything in the calling function).

For example, the strcpy() function has the following declaration: char *strcpy (char *, const char*) . It copies the string pointed to by the second parameter to the string pointed to by the first parameter. So the first parameter is changed. In addition, the function returns a pointer to the first character of the string:

char s1[ 10 ] , s2[ 10 ] ; char*s3; s3 = s2; gets (s1) ; s3 = strcpy (s2, s1) ; puts (s2) ; puts (s3) ; printf("%p, %p \n", s2, s3) ;

Here s2 and s3 point to the same character (printf() outputs the same address). However, what strcpy() returns cannot be assigned to an array. The result of this function is usually not assigned to anything; sometimes it is enough that it simply changes one of the strings passed by the pointer.

Another thing is functions such as strlen() or strcmp() , which do not change parameters, but are called for the sake of the result. The strcmp() function compares the two argument strings letter by letter (lexicographically) and returns 0, -1, or 1. For example, calling strcmp("boy", "body") will return 1 because letter code "y" more than a letter"d". Calling strcmp("body", "boy") will return -1, because the first argument is lexicographically less than the second.

strtok() function

Using the strtok() function, you can split a string into separate parts (tokens). The declaration of this function looks like this char *strtok (char *, const char *) . The first time the function is called, the first parameter is the string to split. The second parameter specifies the delimiter string. On subsequent function calls for the same row, the first parameter must be NULL, because the function has already "remembered" what it works with. Consider an example:

char str = "one, two, three, four, five" ; char*sp; sp = strtok (str, ", " ) ; while (sp) ( puts (sp) ; sp = strtok (NULL, ", " ) ; )

As a result of executing this code, the following words are displayed in a column:

one two three four five

The first time strtok() is called, the function is passed a pointer to the first character of the array and a delimiter string. After this call, the array str is changed, only the word "one" remains in it, and the function also returns a pointer to this word, which is assigned to sp.

Although we have lost the remainder of the array in the calling function, a pointer to the remainder of the array is retained inside strtok(). When NULL is passed, the function "knows" to work with that tail.

Copying parts of strings

When you just need to join two strings, the problem is easily solved by calling the strcat() function, which appends the second to the end of the first argument. A similar function, strncat(), appends n characters from the second string to the first. n is specified as the third parameter.

What if the situation is more complex? For example, there are two non-empty lines and you need to connect the beginning of the first and the end of the second. You can do this using the strcpy() function, if you pass references not to the first characters of strings:

char s1[ 20 ] = "Peter Smith" , s2 = "Julia Roberts" ; strcpy (s1+ 5 , s2+ 5 ) ; puts(s1) ;

IN this case"Peter Roberts" will be displayed on the screen. Why did it happen? A pointer to the sixth character of the first string was passed to the strcpy() function. This led to the fact that when copying, the characters of this string are overwritten only starting from the 6th, because strcpy() doesn't "know" anything about previous characters. Only part of the string is also passed as the second argument, which is copied into the first.

How to insert one line in the middle of another? You can solve this problem using the third "buffer" line, where you can first copy the first line, then the second, overwriting the end of the first, then attach the end of the first. But you can also do this:

char s1[ 20 ] = "one three" , s2[ 20 ] = "two" ; strcpy (s2+ 3 , s1+ 3 ) ; strcpy (s1+ 4 , s2) ; puts(s1) ;

Here, first, the end of the first is copied into the second line, it turns out "two three". Then the second line is copied to the first line, bypassing its beginning.

Description of some functions for working with strings

Exercise
Below are descriptions of some functions that perform operations on strings. Design and write small programs that illustrate how these functions work.

  • char *strchr (const char *, int c) . Returns a pointer to the first occurrence of the character c in the string. Returns NULL if there is no such character in the string.
  • char *strstr (const char *s2, const char *s1) . Returns a pointer to the first occurrence of string s1 in string s2. If there are no matches, returns NULL.
  • char *strncpy (char *, const char *, size_t n) . Copies n characters of the second string to the first.
  • size_t strspn (const char *, const char *) . Returns the length of the beginning of the first string that includes the characters that make up the second string.

If you notice an error, select a piece of text and press Ctrl + Enter
SHARE: