40.2.3.6 Commands over Strings

Tcl has several commands over strings. There are commands for searching for patterns in strings, formatting and parsing strings (much the same as printf and scanf in the C language), and general string manipulation commands.

Firstly we will deal with formatting and parsing of strings. The commands for this are format and scan respectively.

     format formatString ?value value ...?

which works in a similar to C's printf; given a format string with placeholders for values and a series of values, return the appropriate string.

Here is an example of printing out a table for base 10 logarithms for the numbers 1 to 10:

     for {set n 1} {$n <= 10} {incr n} {
         puts [format "log10(%d) = %.4f" $n [expr log10($n)]]
     }

which produces the output

     ln(1) = 0.0000
     ln(2) = 0.3010
     ln(3) = 0.4771
     ln(4) = 0.6021
     ln(5) = 0.6990
     ln(6) = 0.7782
     ln(7) = 0.8451
     ln(8) = 0.9031
     ln(9) = 0.9542
     ln(10) = 1.0000

The reverse function of format is scan:

     scan string formatString varName ?varName ...?

which parses the string according to the format string and assigns the appropriate values to the variables. it returns the number of fields successfully parsed.

An example,

     scan "qty 10, unit cost 1.5, total 15.0" \
          "qty %d, unit cost %f, total %f"    \
          quantity cost_per_unit total

would assign the value 10 to the variable quantity, 1.5 to the variable cost_per_unit and the value 15.0 to the variable total.

There are commands for performing two kinds of pattern matching on strings: one for matching using regular expressions, and one for matching using UNIX-style wildcard pattern matching (globbing).

The command for regular expressions matching is as follows:

     regexp ?-indices? ?-nocase? exp string ?matchVar? ?subVar subVar ...?

where exp is the regular expression and string is the string on which the matching is performed. The regexp command returns 1 if the expression matches the string, 0 otherwise. The optional -nocase switch does matching without regard to the case of letters in the string. The optional matchVar and subVar variables, if present, are set to the values of string matches. In the regular expression, a match that is to be saved into a variable is enclosed in round braces. An example is

     regexp {([0-9]+)} "I have 3 oranges" a

will assign the value 3 to the variable a.

If the optional switch -indices is present then instead of storing the matching substrings in the variables, the indices of the substrings are stored; that is a list with a pair of numbers denoting the start and end position of the substring in the string. Using the same example:

     regexp -indices {([0-9]+)} "I have 3 oranges" a

will assign the value "7 7", because the matched numeral 3 is in the eighth position in the string, and indices count from 0.

String matching using the UNIX-style wildcard pattern matching technique is done through the string match command:

     string match pattern string

where pattern is a wildcard pattern and string is the string to match. If the match succeeds, the command returns 1; otherwise, it returns 0. An example is

     string match {[a-z]*[0-9]} {a_$%^_3}

which matches because the command says match any string that starts with a lower case letter and ends with a number, regardless of anything in between.

There is a command for performing string substitutions using regular expressions:

     regsub ?-all? ?-nocase? exp string subSpec varName

where exp is the regular expression and string is the input string on which the substitution is made, subSpec is the string that is substituted for the part of the string matched by the regular expression, and varName is the variable on which the resulting string is copied into. With the -nocase switch, then the matching is done without regard to the case of letters in the input string. The -all switch causes repeated matching and substitution to happen on the input string. The result of a regsub command is the number of substitutions made.

An example of string substitution is:

     regsub {#name#} {My name is #name#} Rob result

which sets the variable result to the value "My name is Rob". An example of using the -all switch:

     regsub -all {#name#} {#name#'s name is #name#} Rob result

sets the variable result to the value "Rob's name is Rob" and it returns the value 2 because two substitutions were made.

The are a host of other ways to manipulate strings through variants of the string command. Here we will go through them.

To select a character from a string given the character position, use the string index command. An example is:

     string index "Hello world" 6

which returns w, the 7th character of the string. (Strings are indexed from 0).

To select a substring of a string, given a range of indices use the string range command. An example is:

     string range "Hello world" 3 7

which returns the string "lo wo". There is a special index marker named end, which is used to denote the the end of a string, so the code

     string range "Hello world" 6 end

will return the string "world".

There are two ways to do simple search for a substring on a string, using the string first and string last commands. An example of string first is:

     string first "dog" "My dog is a big dog"

find the first position in string "My dog is a big dog" that matches "dog". It will return the position in the string in which the substring was found, in this case 3. If the substring cannot be found then the value -1 is returned.

Similarly,

     string last "dog" "My dog is a big dog"

will return the value 16 because it returns the index of the last place in the string that the substring matches. Again, if there is no match, -1 is returned.

To find the length of a string use string length, which given a string simply returns its length.

     string length "123456"

returns the value 6.

To convert a string completely to upper case use string toupper:

     string toupper "this is in upper case"

returns the string "THIS IS IN UPPER CASE".

Similarly,

     string tolower "THIS IS IN LOWER CASE"

returns the string "this is in lower case".

There are commands for removing characters from strings: string trim, string trimright, and string trimleft.

     string trim string ?chars?

which removes the characters in the string chars from the string string and returns the trimmed string. If chars is not present, then whitespace characters are removed. An example is:

     string string "The dog ate the exercise book" "doe"

which would return the string "Th g at th xrcis bk".

string trimleft is the same as string trim except only leading characters are removed. Similarly string trimright removes only trailing characters. For example:

     string trimright $my_input

would return a copy of the string contained in $my_input but with all the trailing whitespace characters removed.