|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Unix tr command
Unix tr command copies the standard input to the standard output
with substitution or deletion of selected characters.
Input characters in set1 are mapped
to corresponding characters in set2.
If length is unequal then set2 is extended to the length of set1
by repeating its last character as necessary. Excess characters in set2
are ignored.
Utility performs classic alphabet1
to alphabet2 type of translation
sometimes called 1:1 transliteration and as such is suitable for implementation
of
Caesar cipher. Unix inherited tr
from Multix as a derivative of PL/1 translate
built-in, which in turn was a generalization of a TR command in System/360
architecture (see
IBM System-360 Green Card).
The format of the tr
command is somewhat strange -- this is one of the few Unix commands that
accepts input only from standard input.
tr
[ options ] [
set1 [
set2 ] ]
Sets can be specified by enumeration of characters like in
tr '{}' '()' < infile > outfile)
or using ranges like in tr A-Z
a-z < infile > outfile. Instead of individual characters special
POSIX character classes. can be used. Among them:
alnum: alphanumeric characters
alpha: alphabetic characters
cntrl: control (non-printing) characters
digit: numeric characters
graph: graphic characters
lower: lower-case alphabetic characters
print: printable characters
punct: punctuation characters
space: whitespace characters
upper: upper-case characters
xdigit: hexadecimal characters
Typical usage of classes involves changing the case from upper
to lower or vise versa like in the following example:
cat names | tr '[:upper:]' '[:lower:]' > lc_names
Classes can be combined to form a more complex set, for example
'[:lower:][:upper:]'
The tr utility accepts several options:
-c
- work on the complement of the listed characters, i.e., operations
apply to characters not in the given set
-d
- delete characters in the first set from the output
-s
- squeeze repeated characters in the output into just one character.
Most Unix administrators do not suspect about existence of those options,
which are quite useful and greatly extend the usability of this generally
very simple command. Here is more full description of those options:
- -c, --complement Complement
set1 with respect to the universe of characters whose ASCII codes
are 01 through 0377 octal. For example:
- To replace every nonprinting character, other than valid control
characters, with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile
- Here is more complex and rather elegant example in which the
goal is to create a list of words in a file:
tr -cs '[:lower:][:upper:]' '[\n*]' < text > words
This translates each sequence of characters other than lowercase
letters and uppercase letters into a single newline character. The
* (asterisk) causes the tr command to repeat the
new line character enough times to make the second string as long
as the first string.
- -d, --delete Delete specified
set of characters defined in set1 but do not translate.
The most important usage of this tr option is for security
purposes: it can sanitize all arguments so the evil user cannot submit
commands as arguments in a script. Such symbols as backticks,
all kind of brackets ( ()[]{} ), colon and semicolon as well as =#$&!@
should be removed from the values of the argument, if they cannot occur
in the argument before you start processing those values. If script
is used by a considerable population there is always one blacksheep
that will try to mangle input arguments to see what will happens ;-)
For example:
- tr --delete
'=;:`"<>,./?!@#$%^&(){}[]'
-
tr can be used to change
the carriage returns at the end of each line into the newline UNIX
expects. tr allows you to specify characters
as octal values by preceding the value with a backslash, so the
command:
tr -d '\015' < pc.file > unix.file
OR
tr -d '\r' < pc.file > unix.file
will remove the carriage return from the carriage
return/newline pair used by Microsoft OSes as a line terminator.
Please note that this can also be done by dos2unix utility.
- To delete all NULL characters from a file:
tr -d '\0' < textfile > newfile
- -s, --squeeze-repeats
Replace sequences of the same character with one. -s uses
set1 if neither translating nor deleting
specified, otherwise squeeze uses set2 and occurs
after translation or deletion. For example:
- To replace every sequence of characters in the <space> character
class with a single : (colon) character, enter:
tr -s '[:space:]' '[\:*]'
- To replace every sequence of one or more new lines with a single
new line:
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
- Here is more complex and rather elegant example in which the
goal is to create a list of words in a file:
tr -cs '[:lower:][:upper:]' '[\n*]' < text > words
This translates each sequence of characters other than lowercase
letters and uppercase letters into a single newline character. The
* (asterisk) causes the tr command to repeat the
new line character enough times to make the second string as long
as the first string.
- -t, --truncate-set1 Truncate
set1 to the length of set2. By default set2
is truncated to the length of set1. This option reverse the
default behavior. It is available only in GNU implementation of tr.
Sets are specified as strings of characters. Most represent
themselves. Interpreted sequences are:
- \nnn -- character with octal value
nnn
- \xnn -- character with hexadecimal
value nn
- \\ -- backslash
- \a -- alert
- \b -- backpace
- \f -- form feed
- \r -- return
- \t -- horizontal tab
- \v -- vertical tab
- \E -- escape
- c1-c2 -- all characters from c1 to c2 in
ascending order. The character specified by c1 must
collate before the character specified by c2.
- [c1-c2] -- same as c1-c2 if both sets use
this form
- [c*] -- set2 extended to the length of set1
with the symbol c. In other words fills out the set2 with the character
specified by c. This option can be used only at the end
of the set2. Any characters specified after the *
(asterisk) are ignored.
- [c*N] -- N copies of symbol c. N is considered
a decimal integer unless the first digit is a 0; then it is considered
an octal integer.
- [:alnum:] -- all letters and digits
- [:alpha:] -- all letters
- [:blank:] -- all horizontal whitespace
- [:cntrl:] -- all control characters
- [:digit:] -- all digits
- [:graph:] -- all printable characters, not including
space
- [:lower:] -- all lower case letters
- [:print:] -- all printable characters, including space
- [:punct:] -- all punctuation characters
- [:space:] -- all horizontal or vertical whitespace
- [:upper:] -- all upper case letters
- [:xdigit:] -- all hexadecimal digits
- [=c=] -- Specifies all of the characters with the same
equivalence class as the character specified by C.
Notes:
- Translation occurs if -d is not given and both set1
and set2 appear
- -t may be used only when translating.
- set2 is extended to the length of set1 by repeating
its last character as necessary. Excess characters in set2 are
ignored.
- Only [:lower:] and
[:upper:] are guaranteed to expand in ascending
order. They can be used in pairs to specify case conversion.
- -s (Squeeze all strings of repeated output characters
to single characters) uses set1 if
neither translating nor deleting specified, otherwise
squeeze uses set2 and occurs after translation or deletion.
Notes:
- This is a Spartan WHYFF (We Help
You For Free) site written by people for whom English
is not a native language.
Some amount of grammar and spelling errors should be
expected.
- The site contain some broken links
as it develops like a living tree...
Please try to use Google, Open directory,
etc. to find a replacement link (see
HOWTO search the WEB for details). We would appreciate
if you can
mail us a correct link.
|
|
|
|
Example of how to use tr to convert text into one word per line. Too
simplistic; should be at least [:alnum:]
Create a list of the words in /path/to/file, one per line, enter:
$ tr -cs "[:alpha:]" "\n" < /path/to/file
Where,
- -c : Complement the set of characters in string1
- -s : Replace each input sequence of a repeated character that
is listed in SET1 with a single occurrence of that character
01 Aug 2006 | developerWorks
Translating text
Now that you know at least five different ways of generating some
text, let's look at doing some simple translations on it.
The tr command lets you translate characters in one
set to the corresponding characters in a second set. Let's take a look
at a few examples (Listing
4) to see how it works.
Listing 4. Using tr to translate characters
echo "a test" | tr t p
echo "a test" | tr aest 1234
echo "a test" | tr -d t
echo "a test" | tr '[:lower:]' '[:upper:]'
|
Looking at the output of these commands (see
Listing 5) gives you a
clue about how tr works (here's a hint: it's a direct replacement
of characters in the first set with the corresponding characters from
the second set).
Listing 5. What has tr done?
chrish@dhcp3 [199]$ echo "a test" | tr t p
a pesp
chrish@dhcp3 [200]$ echo "a test" | tr aest 1234
1 4234
chrish@dhcp3 [201]$ echo "a test" | tr -d t
a es
chrish@dhcp3 [202]$ echo "a test" | tr '[:lower:]' '[:upper:]'
A TEST
|
The first and second examples are simple enough, replacing one character
for another. The third example, with the -d option (delete),
removes the specified characters completely from the output. This is
often used to remove carriage returns from DOS text files to turn them
into UNIX text files (see Listing
6). Finally, the last example uses character classes (those names
inside of [: :]) to convert all lower-case letters into upper-case letters.
Portable Operating System Interface-standard (POSIX-standard) character
classes include:
alnum: alphanumeric characters
alpha: alphabetic characters
cntrl: control (non-printing) characters
digit: numeric characters
graph: graphic characters
lower: lower-case alphabetic characters
print: printable characters
punct: punctuation characters
space: whitespace characters
upper: upper-case characters
xdigit: hexadecimal characters
Listing 6. Converting DOS text files into UNIX
text files
tr -d '\r' < input_dos_file.txt > output_unix_file.txt
|
Although the tr command respects C locale environment
variables (try man locale for more information about these), don't expect
it to do anything sensible with UTF-8 documents, such as being able
to replace lower-case accented characters with appropriate upper-case
characters. The tr command works best with ASCII and the
other standard C locales.
The following example is a complete awk program, which
prints the number of occurrences of each word in its input. It illustrates
the associative nature of awk arrays by using strings as
subscripts. It also demonstrates the `for x in
array' construction. Finally, it shows how awk
can be used in conjunction with other utility programs to do a useful
task of some complexity with a minimum of effort. Some explanations
follow the program listing.
awk '
# Print list of word frequencies
{
for (i = 1; i <= NF; i++)
freq[$i]++
}
END {
for (word in freq)
printf "%s\t%d\n", word, freq[word]
}'
The first thing to notice about this program is that it has two rules.
The first rule, because it has an empty pattern, is executed on every
line of the input. It uses awk's field-accessing mechanism
(see section
Examining Fields) to pick out the individual words from the line,
and the built-in variable NF (see section
Built-in Variables) to know how many fields are available.
For each input word, an element of the array freq is
incremented to reflect that the word has been seen an additional time.
The second rule, because it has the pattern END, is
not executed until the input has been exhausted. It prints out the contents
of the freq table that has been built up inside the first
action.
Note that this program has several problems that would prevent it
from being useful by itself on real text files:
- Words are detected using the
awk convention that
fields are separated by whitespace and that other characters in
the input (except newlines) don't have any special meaning to
awk. This means that punctuation characters count as
part of words.
- The
awk language considers upper and lower case
characters to be distinct. Therefore, `foo' and
`Foo' are not treated by this program as the same word.
This is undesirable since in normal text, words are capitalized
if they begin sentences, and a frequency analyzer should not be
sensitive to that.
- The output does not come out in any useful order. You're more
likely to be interested in which words occur most frequently, or
having an alphabetized table of how frequently each word occurs.
The way to solve these problems is to use other system utilities
to process the input and output of the awk script. Suppose
the script shown above is saved in the file `frequency.awk'.
Then the shell command:
tr A-Z a-z < file1 | tr -cd 'a-z\012' \
| awk -f frequency.awk \
| sort +1 -nr
produces a table of the words appearing in `file1' in order
of decreasing frequency.
The first tr command in this pipeline translates all
the upper case characters in `file1' to lower case. The second
tr command deletes all the characters in the input except
lower case characters and newlines. The second argument to the second
tr is quoted to protect the backslash in it from being
interpreted by the shell. The awk program reads this suitably
massaged data and produces a word frequency table, which is not ordered.
The awk script's output is now sorted by the sort
command and printed on the terminal. The options given to sort
in this example specify to sort by the second field of each input line
(skipping one field), that the sort keys should be treated as numeric
quantities (otherwise `15' would come before `5'),
and that the sorting should be done in descending (reverse) order.
See the general operating system documentation for more information
on how to use the tr and sort commands.
Shell scripting exampleIn the following example
you will get confirmation before deleting the file. If the
user responds in lower case, the tr command will do nothing,
but if the user responds in upper case, the character will
be changed to lower case. This will ensure that even if
user responds with YES, YeS, YEs etc; script should remove
file:
#!/bin/bash
echo -n "Enter file name : "
read myfile
echo -n "Are you sure ( yes or no ) ? "
read confirmation
confirmation="$(echo ${confirmation} | tr ‘A-Z’ ‘a-z’)"
if [ "$confirmation" == "yes" ]; then
[ -f $myfile ] && /bin/rm $myfile || echo "Error - file $myfile not found"
else
: # do nothing
fi
Remove all non-printable characters from myfile.txt
$ tr -cd "[:print:]" < myfile.txt
Remove all two more successive blank spaces from a copy
of the text in a file called input.txt and save output to
a new file called output.txt
tr -s ' ' ' ' < input.txt > output.txt
The -d option is used to delete every instance of the
string (i.e., sequence of characters) specified in set1.
For example, the following would remove every instance of
the word nameserver from a copy of the text in a file called
/etc/resolv.conf and write the output to a file called ns.ipaddress.txt:
tr -d 'nameserver' < /etc/resolv.conf > ns.ipaddress.txt
From AIX man pages
Examples
- To translate braces into parentheses, enter:
tr '{}' '()' < textfile > newfile
This translates each { (left brace) to ( (left
parenthesis) and each } (right brace) to ) (right
parenthesis). All other characters remain unchanged.
- To translate braces into brackets, enter:
tr '{}' '\[]' < textfile > newfile
This translates each { (left brace) to [ (left
bracket) and each } (right brace) to ] (right
bracket). The left bracket must be entered with a \ (backslash)
escape character.
- To translate lowercase characters to uppercase, enter:
tr 'a-z' 'A-Z' < textfile > newfile
- To create a list of words in a file, enter:
tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfile
This translates each sequence of characters other than lowercase
letters and uppercase letters into a single newline character. The
* (asterisk) causes the tr command to repeat the
new line character enough times to make the second string as long
as the first string.
- To delete all NULL characters from a file, enter:
tr -d '\0' < textfile > newfile
- To replace every sequence of one or more new lines with a single
new line, enter:
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
- To replace every nonprinting character, other than valid control
characters, with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile
This scans a file created in a different locale to find characters
that are not printable characters in the current locale.
- To replace every sequence of characters in the <space> character
class with a single # (pound sign) character, enter:
tr -s '[:space:]' '[#*]'
Cat-ting our file (columns.txt) and then piping the output of the cat
command to the input of the translate command causing all lowercase
names to be translated to uppercase names.
cat columns.txt | tr '[a-z]' '[A-Z]'
|
Remember we have not modified the file columns.txt so how do we save
the output? Simple, by redirecting the output of the translate command
with '>' to a file called UpCaseColumns.txt with:
cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txt
|
Since the tr command, does not take a filename
like sed did, we could have changed the above example to:
tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txt
|
As you can see the input to the translate command now comes, not
from stdin, but rather from columns.txt. So either way we do it, we
can achieve what we've set out to do, using tr as part of a stream,
or taking the input from the stdin ('<').
In the shell program we use to remove all non-printable ASCII characters
from a text file, we tell the tr command to delete every character
in the translation process except for the specific characters we specify.
In essence, we filter out the undesirable characters. The
tr command we use in our program is shown below:
tr
-cd '\11\12\40-\176' < $INPUT_FILE > $OUTPUT_FILE
In this command, the variable INPUT_FILE must contain the
name of the Solaris file you'll be reading from, and OUTPUT_FILE
must contain the name of the output file you'll be writing to.
When the -c and -d options of the
tr command are used in combination like this, the only characters
tr writes to the standard output stream are the characters
we've specified on the command line.
Although it may not look very attractive, we're using octal characters
in our tr command to make our programming job easier and more
efficient. Our command tells tr to retain only the octal
characters 11, 12, and 40 through 176
when writing to standard output. Octal character 11 corresponds
to the [TAB] character, and octal 12 corresponds to
the [LINEFEED] character. The octal characters 40
through 176 correspond to the standard visible keyboard characters,
beginning with the [Space] character (octal 40) through the
~ character (octal 176). These are the only characters
retained by tr -- the rest are filtered out, leaving us with
a clean ASCII file.
Example1: Change uppercase to lowercase in a file:
D:\temp>more score.txt
john 81 91
mark 82 93
tina 88 92
D:\temp>tr '[a-z]' '[A-Z]' < score.txt > score1.txt
D:\temp>more score1.txt
JOHN 81 91
MARK 82 93
TINA 88 92
LP: Would you talk a little more about the tr utility?
Ah, tr. Well, first thing that comes to mind is that it is the answer
to the trivia question, "Name a Linux utility that accepts input only
from standard input and never from a file named as an argument on the
command line." It is an odd beast that is useful only sometimes--but
when it is useful it is very useful. Here is an excerpt that
talks about tr:
"The tr utility reads standard input and, for each input character,
maps it to an alternate character, deletes the character, or leaves
the character alone. This utility reads from standard input and
writes to standard output.
"The tr utility is typically used with two arguments, string1
and string2. The position of each character in the two strings is
important: Each time tr finds a character from string1 in its input,
it replaces that character with the corresponding character from
string2.
"With one argument, string1, and the --delete option, tr deletes
the characters specified in string1. The option --squeeze-repeats
replaces multiple sequential occurrences of characters in string1
with single occurrences (for example, abbc becomes abc).
"You can use a hyphen to represent a range of characters instring1
or string2. The two command lines in the following example produce
the same result:
$ echo abcdef | tr 'abcdef' 'xyzabc'
xyzabc
$ echo abcdef | tr 'a-f' 'x-za-c'
xyzabc
"The next example demonstrates a popular method for disguising
text, often called ROT13 (rotate 13) because it replaces the first
letter of the alphabet with the thirteenth, the second with the
fourteenth, and so forth.
$ echo The punchline of the joke is ... |
> tr 'A-M N-Z a-m n-z' 'N-Z A-M n-z a-m'
Gur chapuyvar bs gur wbxr vf ...
"To make the text intelligible again, reverse the order of the
arguments to tr:
$ echo Gur chapuyvar bs gur wbxr vf ... |
> tr 'N-Z A-M n-z a-m' 'A-M N-Z a-m n-z'
The punchline of the joke is ...
"The --delete option causes
tr to delete selected characters:
$ echo If you can read this, you can spot the missing vowels! |
> tr --delete 'aeiou'
If y cn rd ths, y cn spt th mssng vwls!
"In the following example, tr replaces characters and reduces
pairs of identical characters to single characters:
$ echo tennessee | tr --squeeze-repeats 'tnse' 'srne'
serene
"The next example replaces each sequence of nonalphabetic characters
(the complement of all the alphabetic characters as specified by
the character class alpha) in the file draft1 with a single NEWLINE
character. The output is a list of words, one per line.
$ tr --complement --squeeze-repeats '[:alpha:]' '\n' < draft1
"The final example uses character classes to upshift the string
hi there:
$ echo hi there | tr '[:lower:]' '[:upper:]'
HI THERE
Luckily, we can also use ranges of characters
to specify the characters more efficiently:
tr a-z A-Z
Ever had those horrible upper case
DOS file names? Here's a Bourne script to take care of them:
for f in *; do mv $f `echo $f | tr A-Z a-z` done
Many
UNIX editors allow some text to be processed by the shell. For example,
to replace all upper case characters of the next paragraph with lower
case while in vi, type:
tr A-Z a-z
As another example, the command:
tr a-z A-Z
capitalizes the current and next line
(the character after the ! is a movement character). If you read
the International Obfuscated
C Code Contest (ftp://ftp.uu.net./pub/ioccc/), you frequently see
that part of the hints are coded by a method called
rot13. rot13
is a Caesar cypher, i.e., a cypher in which all letters are shifted
some number of places. For example, a becomes b, b becomes c, ..., y
becomes z, and z becomes a. In rot13 each letter is shifted 13 places.
It is a weak cypher, and to decipher it, you can use rot13 again. You
can also use tr to read the text in this way:
tr a-zA-Z n-za-mN-ZA-M
Another interesting way to use tr is to
change files from
Macintosh format to UNIX format. For returns, the Macintosh uses
\r while UNIX uses \n. GNU tr allows you to use the C
special characters, so type:tr \r \n
If you don't have GNU's version of tr,
you can always use the corresponding octal numbers as shown here:
tr \015 \012
You might wonder what would happen if
the second string is shorter than the first string. POSIX says this
is not allowed. System V says that only that portion of the first string
is used that has a matching character in the second string. BSD and
GNU pad the second string with its final character in order to match
the length of the first string. The reason this last method is handy
becomes clearer when we take complements into account. Assume you wish
to make a list of all words and keywords in your listing. When you use
-c, tr complements the first string. In C, all identifiers and
keywords consist of a-zA-Z0-9_, so those are the characters we
want to keep. Thus, we can do the following:
tr -c a-zA-Z0-9_ \n
If we pipe the tr output through
sort -u, we get our desired list. If we follow POSIX, the second
string would have to describe 193 newline characters (described as
\n*193 or \n*). If we use system V, only the zero byte
is translated to a newline, since the complement of a-zA-Z0-9_
starts with the zero byte.
The second important use of tr is to
remove characters. For this option, you use the flag -d with
one string as an argument. To fix up those nasty MS-DOS text files with
a ^M at the end of the line and a trailing ^Z, specify
tr in this way:
tr -d \015\032
Many people have written a program in C
to do this same operation. Well, a C program isn't necessary--you only
need to know the right program, tr, with the right flags. The -d
flag isn't used often, but is nice to have when needed. You can combine
it with the -c flag to delete everything except characters from
the string you supplied as an argument.
Repeated characters can be squeezed into
a single one using the -s option with one string as an argument.
It can also be used to squeeze white space. To remove empty lines, type:
tr -s \n
The -s option can be used with two
strings as arguments. In that case, tr first translates the text as
if -s were not given and then tries to squeeze the characters
in the second string. For instance, we can squeeze all standard white
space to a single space by specifying:
tr -s \n [ *]
The -d flag can also be used with
two strings: the characters in the first string will be removed and
the characters in the second string will be squeezed.
tr
may not be a great program; however, it gets the job done. It is particularly
useful in scripts using pipes and command substitutions (i.e., inside
the back quotes). If you use tr often, you'll learn to appreciate its
capabilities. Small is beautiful.
t
r
is a simple pattern translator. Its practical application overlaps a
bit with other, more complex tools, such as sed and awk [with larger
binary footprints]. tr is quite useful for simple textual replacements,
deletions and additions. Its behavior is dictated by "from" and "to"
character sets provided as the first and second argument. The general
usage syntax of tr is as follows:
# (12) tr usage
tr [options] "set1" ["set2"] < input > output
Note that tr does not accept file arguments; it reads from standard
input and writes to standard output. When two character sets are provided,
tr operates on the characters contained in "set1" and performs some
amount of substitution based on "set2". Listing 1 demonstrates some
of the more common tasks performed with tr.
# (13) Transform lower case alphas to their
# equivelent upper case.
$ echo "Hello World." | tr "[a-z]" "[A-Z]"
HELLO WORLD.
# (14) Same lower to upper transformation -
# uses character class names :lower:
# and :upper:. (tr recognizes 12
# character class names).
$ tr "[:lower:]" "[:upper:]" README > UPPER_README
# (15) Make $PATH a bit more readable/searchable -
# substitude ':' with a line feed
$ echo $PATH | tr ":" "\n"
/usr/bin
/bin
/usr/local/bin
.....
$ echo $PATH | tr ":" "\n" | grep -i "local"
/usr/local/bin
/usr/home/curly/Local_bin
# (16) Remove all white space from a file.
$ tr -d "[:space:]" < README > NO_WHITE_SPACE
# (17) Substitute all single or sequence of ;
# with a single :
$ echo ";;;;This;;is;a;;;;simple;;;example." \
| tr -s ";" ":"
:This:is:a:simple:example.
echo "12345678 9247"
| tr 123456789 computerh - this example takes an echo response of
'12345678 9247' and pipes it through the tr replacing the appropriate
numbers with the letters. In this example it would return computer
hope.
tr -cd '\11\12\40-\176'
< myfile1 > myfile2 - this example would take the file myfile1 and
strip all non printable characters and take that results to myfile2.
Any combination of the options -c, -d,
or -s may be used:
- -c Complement the set of characters in
string1 with respect to the universe of characters whose
ASCII codes are 01 through 0377 octal.
- -d Delete all input characters in string1.
- -s Squeeze all strings of repeated output
characters that are in string2 to single characters.
Example 1 Creating a list of all the
words in a filename
The following example creates a list of all the words in filename1,
one per line, in filename2, where a word is taken to be a
maximal string of alphabetics. The second string is quoted to protect
`\' from the shell. 012 is the ASCII code for NEWLINE.
example%
tr -cs A-Za-z '\012' <filename1>filename2
In case of broken links
please try to use Google search. If you find the page please notify
us about new location
tr (Unix)
- Wikipedia, the free encyclopedia
Commonly Used Unix-like Commands
Hacking on Characters with tr Want to quickly strip special characters
from a file or change a mac text file into a Unix text file? Learn how in
this excerpt from
Unix Power
Tools, 2nd Edition.
Learn how to
remove extended ASCII characters from Unix files
Beginners Guide to Unix Shell Programming
Unix text editing - sed, tr, cut, od
[Chapter 35] 35.11 Hacking on Characters with tr
GNU Core-utils -- contains interesting example of simple spellchecker
construicted using tr.
Flags
| -A |
Performs all operations on a byte-by-byte basis using the
ASCII collation order for ranges and character classes, instead
of the collation order for the current locale. |
| -c |
Specifies that the value of String1 be replaced by
the complement of the string specified by String1.
The complement of String1 is all of the characters in
the character set of the current locale, except the characters
specified by String1. If the -A and -c
flags are both specified, characters are complemented with respect
to the set of all 8-bit character codes. If the -c and
-s flags are both specified, the -s flag applies
to characters in the complement of String1. |
| -d |
Deletes each character from standard input that is contained
in the string specified by String1. |
| -s |
Removes all but the first in a sequence of a repeated characters.
Character sequences specified by String1 are removed
from standard input before translation, and character sequences
specified by String2 are removed from standard output. |
| String1 |
Specifies a string of characters. |
| String2 |
Specifies a string of characters. |
- To translate braces into parentheses, enter:
tr '{}' '()' < textfile > newfile
This translates each { (left brace) to (
(left parenthesis) and each } (right brace) to
) (right parenthesis). All other characters remain
unchanged.
- To translate braces into brackets, enter:
tr '{}' '\[]' < textfile > newfile
This translates each { (left brace) to [
(left bracket) and each } (right brace) to ]
(right bracket). The left bracket must be entered with a \ (backslash)
escape character.
- To translate lowercase characters to uppercase, enter:
tr 'a-z' 'A-Z' < textfile > newfile
- To create a list of words in a file, enter:
tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfile
This translates each sequence of characters other than lowercase
letters and uppercase letters into a single newline character. The
* (asterisk) causes the tr command to repeat
the new line character enough times to make the second string as
long as the first string.
- To delete all NULL characters from a file, enter:
tr -d '\0' < textfile > newfile
- To replace every sequence of one or more new lines with a single
new line, enter:
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
- To replace every nonprinting character, other than valid control
characters, with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile
This scans a file created in a different locale to find characters
that are not printable characters in the current locale.
- To replace every sequence of characters in the <space> character
class with a single # (pound sign) character, enter:
tr -s '[:space:]' '[#*]'
Cat-ting our file (columns.txt) and then piping
the output of the cat command to the input of the translate command
causing all lowercase names to be translated to uppercase names.
cat columns.txt | tr '[a-z]' '[A-Z]'
|
Remember we have not modified the file columns.txt
so how do we save the output? Simple, by redirecting the output
of the translate command with '>' to a file called UpCaseColumns.txt
with:
cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txt
|
Since the tr command, does
not take a filename like sed did, we could have changed the above
example to:
tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txt
|
As you can see the input to the translate command
now comes, not from stdin, but rather from columns.txt. So either
way we do it, we can achieve what we've set out to do, using tr
as part of a stream, or taking the input from the stdin ('<').
We can also use translate in another way: to distinguish
between spaces and tabs. Spaces and tabs can be a pain when using
scripts to compile system reports. What we need is a way of translating
these characters. Now, there are many ways to skin a cat in Linux
and shell scripting. I'm going to show you one way, although I'm
sure you could now write a sed expression to do the same thing.
Assume that I have a file with a number of columns
in it, but I am not sure about the number of spaces or tabs between
the different columns, I would need some way of changing these spaces
into a single space. Why? Since, having a space (one or more) or
a tab (one or more) between the columns will produce significantly
different output if we extracted information from the file with
a shell script. How do we do convert many spaces or tabs into a
single space? Well, translate is our right-hand man (or woman) for
this particular task. In order not to waste our time modifying our
columns.txt let's work on the free command, which shows you free
memory on your system. Type:
If you look at the output you will see that there's
lots of spaces between each one of these fields. How do we reduce
multiple spaces between fields to a single space? We can use to
tr to squeeze characters (you can squeeze any characters but in
this case we want to squeeze a space):
The -s switch tells the translate command to squeeze.
(Read the info page on tr to find out all the other switches of
tr).
We could squeeze zeroes with:
Which would obviously make zero sense!
Going back to our previous command of squeezing
spaces, you'll see immediately that our memory usage table (which
is what the free command produces) becomes much more usable because
we've removed superfluous spaces.
Perhaps, we want some fields from the output.
We could redirect the output of this into a file with:
free |tr -s ' ' > file.txt
|
Traditional systems would have you use a Text
editor to cut and paste the fields you are interested in, into a
new file. Do we want to do that? Absolutely not! We're lazy, we
want to find a better way of doing this.
What I'm interested in, is the line that contains
'Mem'. As part of your project, you should be building a set of
scripts to monitor your system. Memory sounds like a good one that
you may want to save. Instead of just redirecting the
tr command to a file, let's first pass it
through sed where we extract only the lines beginning with the word
"Mem":
free | tr -s ' ' | sed '/^Mem/!d'
|
This returns only the line that we're interested
in. We could run this over and over again, to ensure that the values
change.
Let's take this one step further. We're only interested
in the second, third and fourth fields of the line (representing
total memory, used memory and free memory respectively). How do
we retrieve only these fields?
Copyright © 1996-2008 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with. We do not warrant the correctness
of the information provided or its fitness for any purpose.
Last modified:
September 15, 2008