Tcl has several commands over strings. There are commands for searching
for patterns in strings, formatting and parsing strings (much the same
as printf
and scanf
in the C language), and general string
manipulation commands.
Firstly we will deal with formatting and parsing of strings.
The commands for this are format
and scan
respectively.
format formatString ?value value ...?
which works in a similar to C's printf
; given a format string with
placeholders for values and a series of values, return the appropriate string.
Here is an example of printing out a table for base 10 logarithms for the numbers 1 to 10:
for {set n 1} {$n <= 10} {incr n} { puts [format "log10(%d) = %.4f" $n [expr log10($n)]] }
which produces the output
ln(1) = 0.0000 ln(2) = 0.3010 ln(3) = 0.4771 ln(4) = 0.6021 ln(5) = 0.6990 ln(6) = 0.7782 ln(7) = 0.8451 ln(8) = 0.9031 ln(9) = 0.9542 ln(10) = 1.0000
The reverse function of format
is scan
:
scan string formatString varName ?varName ...?
which parses the string according to the format string and assigns the appropriate values to the variables. it returns the number of fields successfully parsed.
An example,
scan "qty 10, unit cost 1.5, total 15.0" \ "qty %d, unit cost %f, total %f" \ quantity cost_per_unit total
would assign the value 10 to the variable quantity
, 1.5 to the variable
cost_per_unit
and the value 15.0 to the variable total
.
There are commands for performing two kinds of pattern matching on strings: one for matching using regular expressions, and one for matching using UNIX-style wildcard pattern matching (globbing).
The command for regular expressions matching is as follows:
regexp ?-indices? ?-nocase? exp string ?matchVar? ?subVar subVar ...?
where exp is the regular expression and string is the string on which the matching is performed. The regexp command returns 1 if the expression matches the string, 0 otherwise. The optional -nocase switch does matching without regard to the case of letters in the string. The optional matchVar and subVar variables, if present, are set to the values of string matches. In the regular expression, a match that is to be saved into a variable is enclosed in round braces. An example is
regexp {([0-9]+)} "I have 3 oranges" a
will assign the value 3 to the variable a
.
If the optional switch -indices is present then instead of storing the matching substrings in the variables, the indices of the substrings are stored; that is a list with a pair of numbers denoting the start and end position of the substring in the string. Using the same example:
regexp -indices {([0-9]+)} "I have 3 oranges" a
will assign the value "7 7"
, because the matched numeral 3
is in the eighth position in the string, and indices count from 0.
String matching using the UNIX-style wildcard pattern matching technique
is done through the string match
command:
string match pattern string
where pattern is a wildcard pattern and string is the string to match. If the match succeeds, the command returns 1; otherwise, it returns 0. An example is
string match {[a-z]*[0-9]} {a_$%^_3}
which matches because the command says match any string that starts with a lower case letter and ends with a number, regardless of anything in between.
There is a command for performing string substitutions using regular expressions:
regsub ?-all? ?-nocase? exp string subSpec varName
where exp is the regular expression and string is the input
string on which the substitution is made, subSpec is the string
that is substituted for the part of the string matched by the regular
expression, and varName is the variable on which the resulting
string is copied into. With the -nocase switch, then the
matching is done without regard to the case of letters in the input
string. The -all switch causes repeated matching and
substitution to happen on the input string. The result of a
regsub
command is the number of substitutions made.
An example of string substitution is:
regsub {#name#} {My name is #name#} Rob result
which sets the variable result
to the value "My name is Rob".
An example of using the -all switch:
regsub -all {#name#} {#name#'s name is #name#} Rob result
sets the variable result
to the value "Rob's name is Rob"
and it returns the value 2 because two substitutions were made.
The are a host of other ways to manipulate strings through variants
of the string
command. Here we will go through them.
To select a character from a string given the character position,
use the string index
command. An example is:
string index "Hello world" 6
which returns w
, the 7th character of the string.
(Strings are indexed from 0).
To select a substring of a string, given a range of indices use the
string range
command. An example is:
string range "Hello world" 3 7
which returns the string "lo wo".
There is a special index marker named end
, which is used to denote the
the end of a string, so the code
string range "Hello world" 6 end
will return the string "world".
There are two ways to do simple search for a substring on a string,
using the string first
and string last
commands.
An example of string first
is:
string first "dog" "My dog is a big dog"
find the first position in string "My dog is a big dog" that matches "dog". It will return the position in the string in which the substring was found, in this case 3. If the substring cannot be found then the value -1 is returned.
Similarly,
string last "dog" "My dog is a big dog"
will return the value 16 because it returns the index of the last place in the string that the substring matches. Again, if there is no match, -1 is returned.
To find the length of a string use string length
, which given a
string simply returns its length.
string length "123456"
returns the value 6.
To convert a string completely to upper case use string toupper
:
string toupper "this is in upper case"
returns the string "THIS IS IN UPPER CASE".
Similarly,
string tolower "THIS IS IN LOWER CASE"
returns the string "this is in lower case".
There are commands for removing characters from strings:
string trim
, string trimright
, and string trimleft
.
string trim string ?chars?
which removes the characters in the string chars from the string string and returns the trimmed string. If chars is not present, then whitespace characters are removed. An example is:
string string "The dog ate the exercise book" "doe"
which would return the string "Th g at th xrcis bk".
string trimleft
is the same as string trim
except only leading
characters are removed. Similarly string trimright
removes only
trailing characters.
For example:
string trimright $my_input
would return a copy of the string contained in $my_input
but with all
the trailing whitespace characters removed.