|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Softpanorama University
Classic (pipable) Unix Tools
| |
There are many people who use UNIX or Linux but who IMHO do not understand
UNIX. UNIX is not just an operating system, it is a way of doing things,
and the shell plays a key role by providing the glue that makes it work.
The UNIX methodology relies heavily on reuse of a set of tools rather
than on building monolithic applications. Even perl programmers
often miss the point, writing the heart and soul of the application as perl
script without making use of the UNIX toolkit.
David Korn(bold italic is mine -- BNN)
"The tools we use have a profound (and devious!)
influence on our thinking habits, and, therefore, on our thinking abilities."
-- Edsger Dijkstra
|
IMHO there are three Unix tools that can spell the difference
between really good programmer or sysadmin and just above average one (even if the
latter has solid knowledge of shell and Perl, knowledge of shell and Perl is necessary
but not sufficient):
Unix has an impressive array of command line utilities that glues
with pipes constitute a powerful file processing language. Shell serves as a glue
to pipes and non-pipe utilities. The
main innovation of Unix was that these commands were able to communicate via pipes,
the first component architecture in existence. The following commands are most often
used in pipes:
- cat <filename>
"cat" can be used as a quick way to display the contents of a file. It stands
for "concatenate", and has a variety of other uses. If used with no arguments,
it accepts standard input, so you can also use it as a quick and dirty editor,
using Ctrl-D when you are finished typing.
- cut is often used to extract certain parts
of each line.
- find Find is the utility used to finding files
with certain properties. It is also useful for finding directories and can
greatly simplify filesystem navigation.
- less <filename> "less" is what is known as a "pager".
When using a terminal, sometimes you may wish to view a file that is longer
than one screen. "less" allows you to view one page at a time, move back and
forth, and even search for specific text. To move ahead one page, use the space
bar. To move back, use the "b" key. To search for something, type "/" followed
by the string you are searching for. If a command produces too much output,
you can also redirect that to less. For example, "tar --help | less".
- gzip [options] <file-list> "gzip" compresses and decompresses
files. The extension ".gz" often indicates a gzipped file. It is often combined
with the "tar" command to produce a file with the extension ".tar.gz", and is
known as a "tarball". Options: -d: Decompress (same as using the gunzip
command) -r: Compresses/decompresses files recursively (includes subdirectories)
- head and tail They are symmetrical commands that cut the beginnign or
the end of the file.
- head [options] <file-list>
"head" allows you to look at the beginning of the file. This is useful if
you have a long file, but you only need information from the very beginning.
Option: -n: Where n is the number of lines that you wish
to display. The default is ten.
- tail [options] <file-list>
"tail" allows you to look at the end of the file. This is useful if you
have a long file, but you only need information from the very end, such
as a log file. Option: -n: Where n is the number of lines
that you wish to display. The default is ten.
- sort
is used for sorting records.
- tar option [modifiers] <file-list>
"tar" can create and extract archive files. Archiving is a way of combining
many files into one large file, and is often used for backups and software distribution.
These archives are often compressed to save transfer time. Options (use only
one): -r: Append the files to the end of the archive -c: Create a new archive,
destroying an old archive of the same name, if it exists -x: Extract the contents
of an archive -t: List the contents of an archive (does not require a file-list)
-u: Update the archive (only add the files if they do not exist, or have been
modified) Modifiers: -z: Use gzip to perform compression (if creating an archive)
or decompression (if extracting from an archive) -f: Filename to decompress,
or filename which will hold the archive -v: Be verbose (list each file as it
is read/written to/from the archive) Examples: In these examples, we'll create
a tarball, then move it to a test directory and untar it to make sure it worked.
Not just for tape archives,
tar can overcome
several of the pitfalls of using cp -r. Find out
how in this excerpt from
Unix Power Tools,
2nd Edition.
- uniq is used for elimination identical records after sorting
- Xargs is used for
execution of the command at the last stage of a pipe. This is a very
powerful tool.
Gnu's
xargs patches up a sticky problem in the
original - it choked on filenames with spaces or newlines. Find
out how to take advantage of that patch in this excerpt from
Unix Power Tools,
2nd Edition.
Dr. Nikolai Bezroukov
Notes:
- Those pages are written by people for
whom English is not a native language.
Some amount of grammar and spelling errors should be expected.
- This is a Spartan WHYFF (We Help You For
Free) site. It cannot replace the
best teachers and
the best books.
- The site contain some obsolete pages as
it develops like a living tree... Some links on older pages
are broken. Please try to use
Google, Open directory, etc. to find a replacement link (see
HOWTO search the WEB for details). We would appreciate if
you can
mail us a correct link.
|
|
|
|
eval: When You Need Another Chance
Ever want to use a variable to get a variable in a shell script
or to construct a command on the fly? Find out how in this
excerpt from Unix
Power Tools, 2nd Edition.
Build Strings with { }
Save typing by expanding strings at the shell prompt. Learn hot
to use the {} pattern-expansion characters in this excerpt from
Unix Power Tools,
2nd Edition.
Handle Too-Long Command Lines with xargs
That command line getting too long? Conquer it with one of the
tools that makes Unix "weird and wonderful" in this excerpt from
Unix Power Tools,
2nd Edition.
xargs: Problems with Spaces and Newlines
Gnu's xargs patches up a sticky problem in the
original - it choked on filenames with spaces or newlines. Find
out how to take advantage of that patch in this excerpt from
Unix Power Tools,
2nd Edition.
Using Standard Input and Output
A quick review of basic redirection techniques used by every
Unix guru.
The () Subshell Operators
Learn why using parentheses to group commands is a useful shell
trick in this modified excerpt from
Unix Power Tools,
2nd Edition.
What Can You Do with an Empty File?
There are more uses for /dev/null than you might have imagined.
Learn four of them in this excerpt from
Unix Power Tools,
2nd Edition.
Copying Directory Trees with cp -r
Want to recursively copy everything under a given directory?
Don't get caught by the gotcha's. Learn about cp -r
in this excerpt from
Unix Power Tools,
2nd Edition.
Copying Directory Trees with (tar | tar)
Not just for tape archives, tar can overcome
several of the pitfalls of using cp -r. Find out
how in this excerpt from
Unix Power Tools,
2nd Edition.
Telling tar Which Files to Exclude or Include
Sometimes you don't want to tar just everything in
a directory. Or maybe you want to include some subdirectories
and exclude others. Find out how in this excerpt from
Unix Power Tools,
2nd Edition.
Protecting Files with the Sticky Bit
Want to keep others from altering or deleting your files even if
they have write permissions to your directory? Learn about the
sticky bit in this excerpt from
Unix Power Tools,
2nd Edition.
Checking Differences with diff
Quickly examine differences between similar files.
Comparing Three Different Versions with diff3
Got three similar files to compare? Use diff3!
Context diffs
Context diffs show the lines around changes in similar files.
ex Scripts Built by diff
diff can build automatic editing scripts you can use to change
multiple files or to store a revision history.
Looking for Closure
A gawk script that can be used to make sure items that need to
occur in pairs actually do so.
Change Many Files by Editing Just One
Use ed and diff to edit mulitple files.
patch: Generalized Updating of Files that Differ
There's an easy way to make changes based on diffs, use Larry
Wall's patch utility.
Hacking on Characters with tr
Want to quickly strip special characters from a file or change a
mac text file into a Unix text file? Learn how in this excerpt
from Unix Power
Tools, 2nd Edition.
Trapping Exits Caused by Interrupts
If your shell script is terminated prematurely it could get
messy. Learn how to trap those unruly interrupts in this excerpt
from Unix Power
Tools, 2nd Edition.
Handling Command-Line Arguments with a for Loop
Need a shell script that can step through its command line
arguments one by one? Read how to do it with a for loop in this
excerpt from Unix
Power Tools, 2nd Edition.
The exec Command
There is more than one use for exec, learn a couple of
new ones in this excerpt from
Unix Power Tools,
2nd Edition.
Standard Input to a for Loop
A for loop can be used to step through a list of arguments from
standard input. Find out how in this excerpt from
Unix Power Tools,
2nd Edition.
Making a for Loop with Multiple Variables
Got more than one variable you want to use in your for loop?
Find out how in this excerpt from
Unix Power Tools,
2nd Edition.
Useful Solaris Commands
truss -c (Solaris
>= 8): This astounding option to truss provides a profile summary of the command
being trussed:
$ truss -c grep asdf work.doc
syscall seconds calls errors
_exit .00 1
read .01 24
open .00 8 4
close .00 5
brk .00 15
stat .00 1
fstat .00 4
execve .00 1
mmap .00 10
munmap .01 3
memcntl .00 2
llseek .00 1
open64 .00 1
---- --- ---
sys totals: .02 76 4
usr time: .00
elapsed: .05
It can also show profile data on a running process.
In this case, the data shows what the process did between when truss
was started and when truss execution was terminated with a control-c.
It’s ideal for determining why a process is hung without having to wade through
the pages of truss output.
truss -d and
truss -D (Solaris >= 8): These truss options show the time associated
with each system call being shown by truss and is excellent for finding performance
problems in custom or commercial code. For example:
$ truss -d who
Base time stamp: 1035385727.3460 [ Wed Oct 23 11:08:47 EDT 2002 ]
0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64) argc = 1
0.0032 stat(“/usr/bin/who”, 0xFFBEFA98) = 0
0.0037 open(“/var/ld/ld.config”, O_RDONLY) Err#2 ENOENT
0.0042 open(“/usr/local/lib/libc.so.1”, O_RDONLY) Err#2 ENOENT
0.0047 open(“/usr/lib/libc.so.1”, O_RDONLY) = 3
0.0051 fstat(3, 0xFFBEF42C) = 0
. . .
truss -D is
even more useful, showing the time delta between system calls:
Dilbert> truss -D who
0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64) argc = 1
0.0028 stat(“/usr/bin/who”, 0xFFBEFA98) = 0
0.0005 open(“/var/ld/ld.config”, O_RDONLY) Err#2 ENOENT
0.0006 open(“/usr/local/lib/libc.so.1”, O_RDONLY) Err#2 ENOENT
0.0005 open(“/usr/lib/libc.so.1”, O_RDONLY) = 3
0.0004 fstat(3, 0xFFBEF42C) = 0
In this example, the stat system call
took a lot longer than the others.
truss -T:
This is a great debugging help. It will stop a process at the execution of a
specified system call. (“-U” does the same, but with user-level function calls.)
A core could then be taken for further analysis, or any of the /proc tools could
be used to determine many aspects of the status of the process.
truss -l (improved
in Solaris 9): Shows the thread number of each call in a multi-threaded processes.
Solaris 9 truss -l finally makes it possible to watch the execution of
a multi-threaded application.
Truss is truly a powerful tool. It can be used
on core files to analyze what caused the problem, for example. It can also show
details on user-level library calls (either system libraries or programmer libraries)
via the “-u” option.
pkg-get: This
is a nice tool (http://www.bolthole.com/solaris) for automatically getting
freeware packages. It is configured via /etc/pkg-get.conf. Once it’s
up and running, execute pkg-get -a to get a list of available packages,
and pkg-get -i to get and install a given package.
plimit (Solaris
>= 8): This command displays and sets the per-process limits on a running process.
This is handy if a long-running process is running up against a limit (for example,
number of open files). Rather than using limit and restarting the command,
plimit can modify the running process.
coreadm (Solaris
>= 8): In the “old” days (before coreadm), core dumps were placed in
the process’s working directory. Core files would also overwrite each other.
All this and more has been addressed by coreadm, a tool to manage core
file creation. With it, you can specify whether to save cores, where cores should
be stored, how many versions should be retained, and more. Settings can be retained
between reboots by coreadm modifying /etc/coreadm.conf.
pgrep (Solaris
>= 8): pgrep searches through /proc for processes matching the given
criteria, and returns their process-ids. A great option is “-n”, which returns
the newest process that matches.
preap (Solaris
>= 9): Reaps zombie processes. Any processes stuck in the “z” state (as shown
by ps), can be removed from the system with this command.
pargs (Solaris
>= 9): Shows the arguments and environment variables of a process.
nohup -p (Solaris
>= 9): The nohup command can be used to start a process, so that if the
shell that started the process closes (i.e., the process gets a “SIGHUP” signal),
the process will keep running. This is useful for backgrounding a task that
should continue running no matter what happens around it. But what happens if
you start a process and later want to HUP-proof it? With Solaris 9, nohup
-p takes a process-id and causes SIGHUP to be ignored.
prstat (Solaris
>= 8): prstat is top and a lot more. Both commands provide a screen’s
worth of process and other information and update it frequently, for a nice
window on system performance. prstat has much better accuracy than
top. It also has some nice options. “-a” shows process and user information
concurrently (sorted by CPU hog, by default). “-c” causes it to act like
vmstat (new reports printed below old ones). “-C” shows processes in a processor
set. “-j” shows processes in a “project”. “-L” shows per-thread information
as well as per-process. “-m” and “-v” show quite a bit of per-process performance
detail (including pages, traps, lock wait, and CPU wait). The output data can
also be sorted by resident-set (real memory) size, virtual memory size, execute
time, and so on. prstat is very useful on systems without top,
and should probably be used instead of top because of its accuracy (and
some sites care that it is a supported program).
trapstat (Solaris
>= 9): trapstat joins lockstat and kstat as the most inscrutable
commands on Solaris. Each shows gory details about the innards of the running
operating system. Each is indispensable in solving strange happenings on a Solaris
system. Best of all, their output is good to send along with bug reports, but
further study can reveal useful information for general use as well.
vmstat -p
(Solaris >= 8): Until this option became available, it was almost impossible
(see the “se toolkit”) to determine what kind of memory demand was causing a
system to page. vmstat -p is key because it not only shows whether your
system is under memory stress (via the “sr” column), it also shows whether that
stress is from application code, application data, or I/O. “-p” can really help
pinpoint the cause of any mysterious memory issues on Solaris.
pmap -x (Solaris
>= 8, bugs fixed in Solaris >= 9): If the process with memory problems is known,
and more details on its memory use are needed, check out pmap -x. The
target process-id has its memory map fully explained, as in:
# pmap -x 1779
1779: -ksh
Address Kbytes RSS Anon Locked Mode Mapped File
00010000 192 192 - - r-x-- ksh
00040000 8 8 8 - rwx-- ksh
00042000 32 32 8 - rwx-- [ heap ]
FF180000 680 664 - - r-x-- libc.so.1
FF23A000 24 24 - - rwx-- libc.so.1
FF240000 8 8 - - rwx-- libc.so.1
FF280000 568 472 - - r-x-- libnsl.so.1
FF31E000 32 32 - - rwx-- libnsl.so.1
FF326000 32 24 - - rwx-- libnsl.so.1
FF340000 16 16 - - r-x-- libc_psr.so.1
FF350000 16 16 - - r-x-- libmp.so.2
FF364000 8 8 - - rwx-- libmp.so.2
FF380000 40 40 - - r-x-- libsocket.so.1
FF39A000 8 8 - - rwx-- libsocket.so.1
FF3A0000 8 8 - - r-x-- libdl.so.1
FF3B0000 8 8 8 - rwx-- [ anon ]
FF3C0000 152 152 - - r-x-- ld.so.1
FF3F6000 8 8 8 - rwx-- ld.so.1
FFBFE000 8 8 8 - rw--- [ stack ]
-------- ------- ------- ------- -------
total Kb 1848 1728 40 -
Here we see each chunk of memory, what it is
being used for, how much space it is taking (virtual and real), and mode information.
df -h (Solaris
>= 9): This command is popular on Linux, and just made its way into Solaris.
df -h displays summary information about file systems in human-readable
form:
$ df -h
Filesystem size used avail capacity Mounted on
/dev/dsk/c0t0d0s0 4.8G 1.7G 3.0G 37% /
/proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
fd 0K 0K 0K 0% /dev/fd
swap 848M 40K 848M 1% /var/run
swap 849M 1.0M 848M 1% /tmp
/dev/dsk/c0t0d0s7 13G 78K 13G 1% /export/home
Conclusion
Each administrator has a set of tools used daily,
and another set of tools to help in a pinch. This column included a wide variety
of commands and options that are lesser known, but can be very useful. Do you
have favorite tools that have saved you in a bind? If so, please send them to
me so I can expand my tool set as well. Alternately, send along any tools that
you hate or that you feel are dangerous, which could also turn into a useful
column!
What makes a good utility?
There is a wonderful discussion of this question in
The UNIX Programming Environment, by Kernighan & Pike. A good utility is
one that does its job as well as possible. It has to play well with others;
it has to be amenable to being combined with other utilities. A program
that doesn't combine with others isn't a utility; it's an application.
Utilities are supposed to let you build one-off
applications cheaply and easily from the materials at hand. A lot of people
think of them as being like tools in a toolbox. The goal is not to have a single
widget that does everything, but to have a handful of tools, each of which does
one thing as well as possible.
Some utilities are reasonably useful on
their own, whereas others imply cooperation in pipelines of utilities. Examples
of the former include sort
and grep. On
the other hand, xargs
is rarely used except with other utilities, most often
find.
What language
to write in?
Most of the UNIX system utilities are written in C. The examples here
are in Perl and sh. Use the right tool for the right job. If you use
a utility heavily enough, the cost of writing it in a compiled language
might be justified by the performance gain. On the other hand, for the
fairly common case where a program's workload is light, a scripting
language may offer faster development.
If you aren't sure, you should use the
language you know best. At least when you're prototyping a utility,
or figuring out how useful it is, favor programmer efficiency over performance
tuning. Most of the UNIX system utilities are in C, simply because they're
heavily used enough to justify the development cost. Perl and sh (or
ksh) can be good languages for a quick prototype. Utilities that tie
other programs together may be easier to write in a shell than in a
more conventional programming language. On the other hand, any time
you want to interact with raw bytes, C is probably looming on your horizon.
|
Designing a utility
A good rule of thumb is to start thinking about the design of a utility the
second time you have to solve a problem. Don't mourn the one-off hack you write
the first time; think of it as a prototype. The second time, compare what you
need to do with what you needed to do the first time. Around the third time,
you should start thinking about taking the time to write a general utility.
Even a merely repetitive task might merit the development of a utility; for
instance, many generalized file-renaming programs have been written based on
the frustration of trying to rename files in a generalized way.
Here are some design goals of utilities; each
gets its own section, below.
- Do one thing well.
- Be a filter.
- Generalize.
- Be robust.
- Be new.
Do one thing well
Do one thing well; don't do multiple things badly. The best example of this
doing one thing well is probably sort.
No utilities
other than sort
have a sort feature. The idea is simple; if you only solve a problem once, you
can take the time to do it well.
Imagine how frustrating it would be if most programs
sorted data, but some supported only lexographic sorts, while others supported
only numeric sorts, and a few even supported selection of keys rather than sorting
by whole lines. It would be annoying at best.
When you find a problem to solve, try to break
the problem up into parts, and don't duplicate the parts for which utilities
already exist. The more you can focus on a tool that lets you work with existing
tools, the better the chances that your utility will stay useful.
You may need to write more than one program.
The best way to solve a specialized task is often to write one or two utilities
and a bit of glue to tie them together, rather than writing a single program
to solve the whole thing. It's fine to use a 20-line shell script to tie your
new utility together with existing tools. If you try to solve the whole problem
at once, the first change that comes along might require you to rethink everything.
I have occasionally needed to produce two-column
or three-column output from a database. It is generally more efficient to write
a program to build the output in a single column and then glue it to a program
that puts things in columns. The shell script that combines these two utilities
is itself a throwaway; the separate utilities have outlived it.
Some utilities serve very specialized needs.
If the output of ls
in a crowded directory scrolls off the screen very quickly, it might be because
there's a file with a very long name, forcing
ls to use only a single
column for output. Paging through it using more
takes time. Why not just sort lines by length, and pipe the result through
tail, as follows?
Listing 1. One of
the smallest utilities anywhere, sl
#/usr/bin/perl -w
print sort { length $a <=> length $b } <>;
|
The script in Listing 1 does exactly one
thing. It takes no options, because it needs no options; it only cares about
the length of lines. Thanks to Perl's convenient
<> idiom, this automatically
works either on standard input or on files named on the command line.
Be a filter
Almost all utilities are best conceived of as filters, although a few very useful
utilities don't fit this model. (For instance, a program that counts might be
very useful, even though it doesn't work well as a filter. Programs that take
only command-line arguments as input, and produce potentially complicated output,
can be very useful.) Most utilities, though, should work as filters. By convention,
filters work on lines of text. Most filters should have some support for running
on multiple input files.
Remember that a utility needs to work on
the command line and in scripts. Sometimes, the ideal behavior varies a little.
For instance, most versions of ls
automatically sort input into columns when writing to a terminal. The default
behavior of grep
is to print the file name in which a match was found only if multiple files
were specified. Such differences should have to do with how users will want
the utility to work, not with other agendas. For instance, old versions of GNU
bc displayed
an intrusive copyright notice when started. Please don't do that. Make your
utility stick to doing its job.
Utilities like to live in pipelines. A
pipeline lets a utility focus on doing its job, and nothing else. To live in
a pipeline, a utility needs to read data from standard input and write data
to standard output. If you want to deal with records, it's best if you can make
each line be a "record." Existing programs such as
sort and
join are already thinking
that way. They'll thank you for it.
One utility I occasionally use is a program that
calls other programs iteratively over a tree of files. This makes very good
use of the standard UNIX utility filter model, but it only works with utilities
that read input and write output; you can't use it with utilities that operate
in place, or take input and output file names.
Most programs that can run from standard
input can also reasonably be run on a single file, or possibly on a group of
files. Note that this arguably violates the rule against duplicating effort;
obviously, this could be managed by feeding
cat into the next program in the series. However,
in practice, it seems to be justified.
Some programs may legitimately read records in
one format but produce something entirely different. An example would be a utility
to put material into columnar form. Such a utility might equate lines to records
on input, but produce multiple records per line on output.
Not every utility fits entirely into this
model. For instance, xargs
takes not records but names of files as input, and all of the actual processing
is done by some other program.
Generalize
Try to think of tasks similar to the one you're actually performing; if you
can find a general description of these tasks, it may be best to try to write
a utility that fits that description. For instance, if you find yourself sorting
text lexicographically one day and numerically another day, it might make sense
to consider attempting a general sort utility.
Generalizing functionality sometimes leads to
the discovery that what seemed like a single utility is really two utilities
used in concert. That's fine. Two well-defined utilities can be easier to write
than one ugly or complicated one.
Doing one thing well doesn't mean doing
exactly one thing. It means handling a consistent but useful problem space.
Lots of people use grep.
However, a great deal of its utility comes from the ability to perform related
tasks. The various options to grep
do the work of a handful of small utilities that would have ended up sharing,
or duplicating, a lot of code.
This rule, and the rule to do one thing,
are both corollaries of an underlying principle: avoid duplication of code whenever
possible. If you write a half-dozen programs, each of which sorts lines, you
can end up having to fix similar bugs half a dozen times instead of having one
better-maintained sort
program to work on.
This is the part of writing a utility that adds
the most work to the process of getting it completed. You may not have time
to generalize something fully at first, but it pays off when you get to keep
using the utility.
Sometimes, it's very useful to add related functionality
to a program, even when it's not quite the same task. For instance, a program
to pretty-print raw binary data might be more useful if, when run on a terminal
device, it threw the terminal into raw mode. This makes it a lot easier to test
questions involving keymaps, new keyboards, and the like. Not sure why you're
getting tildes when you hit the delete key? This is an easy way to find out
what's really getting sent. It's not exactly the same task, but it's similar
enough to be a likely addition.
The errno
utility in
Listing 2 below is a good example of generalizing, as it supports both numeric
and symbolic names.
Be robust
It's important that a utility be durable. A utility that crashes easily or can't
handle real data is not a useful utility. Utilities should handle arbitrarily
long lines, huge files, and so on. It is perhaps tolerable for a utility to
fail on a data set larger than it can hold in memory, but some utilities don't
do this; for instance, sort,
by using temporary files, can generally sort data sets much larger than it can
hold in memory.
Try to make sure you've figured out what data
your utility can possibly run on. Don't just ignore the possibility of data
you can't handle. Check for it and diagnose it. The more specific your error
messages, the more helpful you are being to your users. Try to give the user
enough information to know what happened and how to fix it. When processing
data files, try to identify exactly what the malformed data was. When trying
to parse a number, don't just give up; tell the user what you got, and if possible,
what line of the input stream the data was on.
As a good example, consider the difference
between two implementations of dc.
If you run dc /home,
one of them says "Cannot use directory as input!" The other just returns silently;
no error message, no unusual exit code. Which of these would you rather have
in your path when you make a typo on a cd
command? Similarly, the former will give verbose error messages if you feed
it the stream of data from a directory, perhaps by doing
dc < /home. On the other
hand, it might be nice for it to give up early on when getting invalid data.
Security holes are often rooted in a program
that isn't robust in the face of unexpected data. Keep in mind that a good utility
might find its way into a shell script run as root. A buffer overflow in a program
such as find
is likely to be a risk to a great number of systems.
The better a program deals with unexpected data,
the more likely it is to adapt well to varied circumstances. Often, trying to
make a program more robust leads to a better understanding of its role, and
better generalizations of it.
Be new
One of the worst kinds of utility to write is the one you already have. I wrote
a wonderful utility called count.
It allowed me to perform just about any counting task. It's a great utility,
but there's a standard BSD utility called jot
that does the same thing. Likewise, my very clever program for turning data
into columns duplicates an existing utility,
rs, likewise found on BSD
systems except that rs
is much more flexible and better designed. See
Resources below for more information on
jot and rs.
If you're about to start writing a utility, take
a bit of time to browse around a few systems to see if there might be one already.
Don't be afraid to steal Linux utilities for use on BSD, or BSD utilities for
use on Linux; one of the joys of utility code is that almost all utilities are
quite portable.
Don't forget to look at the possibility of combining
existing applications to make a utility. It is possible, in theory, that you'll
find stringing existing programs together is not fast enough, but it's very
rare that writing a new utility is faster than waiting for a slightly slow pipeline.
An example utility
In a sense this program is a counterexample, in that it is never useful as a
filter. It works very well as a command-line utility, however.
This program does one thing only. It prints out
errno lines from /usr/include/sys/errno.h in a slightly pretty-printed format.
For instance:
$ errno 22
EINVAL [22]: Invalid argument
Listing 2. Errno
finder
#!/bin/sh
usage() {
echo >&2 "usage: errno [numbers or error names]\n"
exit 1
}
for i
do
case "$i" in
[0-9]*)
awk '/^#define/ && $3 == '"$i"' {
for (i = 5; i < NF; ++i) {
foo = foo " " $i;
}
printf("%-22s%s\n", $2 " [" $3 "]:", foo);
foo = ""
}' < /usr/include/sys/errno.h
;;
E*)
awk '/^#define/ && $2 == "'"$i"'" {
for (i = 5; i < NF; ++i) {
foo = foo " " $i;
}
printf("%-22s%s\n", $2 " [" $3 "]:", foo);
foo = ""
}' < /usr/include/sys/errno.h
;;
*)
echo >&2 "errno: can't figure out whether '$i' is a name or a number."
usage
;;
esac
done
|
Does it generalize? Yes, nicely. It supports
both numeric and symbolic names. On the other hand, it doesn't know about other
files, such as /usr/include/sys/signal.h, that are likely in the same format.
It could easily be extended to do that, but for a convenience utility like this,
it's easier to just make a copy called "signal" that reads signal.h, and uses
"SIG*" as the pattern to match a name.
This is just a tad more convenient than
using grep
on system header files, but it's less error-prone. It doesn't produce garbled
results from ill-considered arguments. On the other hand, it produces no diagnostic
if a given name or number is not found in the header. It also doesn't bother
to correct some invalid inputs. Still, as a command-line utility never intended
to be used in an automated context, it's okay.
Another example might be a program to unsort
input (see
Resources for a link to this utility). This is simple enough; read in input
files, store them in some way, then generate a random order in which to print
out the lines. This is a utility of nearly infinite applications. It's also
a lot easier to write than a sorting program; for instance, you don't need to
specify which keys you're not sorting on, or whether you want things in a random
order alphabetically, lexicographically, or numerically. The tricky part comes
in reading in potentially very long lines. In fact, the provided version cheats;
it assumes there will be no null bytes in the lines it reads. It's a lot harder
to get that right, and I was lazy when I wrote it.
Summary
If you find yourself performing a task repeatedly, consider writing a program
to do it. If the program turns out to be reasonable to generalize a bit, generalize
it, and you will have written a utility.
Don't design the utility the first time you need
it. Wait until you have some experience. Feel free to write a prototype or two;
a good utility is sufficiently better than a bad utility to justify a bit of
time and effort on researching it. Don't feel bad if what you thought would
be a great utility ends up gathering dust after you wrote it. If you find yourself
frustrated by your new program's shortcomings, you just had another prototyping
phase. If it turns out to be useless, well, that happens sometimes.
The thing you're looking for is a program
that finds general application outside your initial usage patterns. I wrote
unsort because
I wanted an easy way to get a random series of colors out of an old X11 "rgb.txt"
file. Since then, I've used it for an incredible number of tasks, not the least
of which was producing test data for debugging and benchmarking sort routines.
One good utility can pay back the time you spent
on all the near misses. The next thing to do is make it available for others,
so they can experiment. Make your failed attempts available, too; other people
may have a use for a utility you didn't need. More importantly, your failed
utility may be someone else's prototype, and lead to a wonderful utility program
for everyone.
Resources
-
The UNIX Programming Environment
by Brian W. Kernighan and Rob Pike (Prentice Hall, Inc., 1984) is an essential
part of any programmer's bookshelf. You can also
download all the example code for this book on the Bell Labs' CSR page.
-
Download core, file, shell, and text utilities
from the GNU Web site software pages.
- You can
download the source code for the "unsort" utility described in this
article.
- More information about the BSD utilities
jot and
rs are available on freshmeat.
- Read more best practices and hints on getting
started coding in "Developing
a Linux command-line utility" (developerWorks,
June 2002).
- Learn how to write secure applications,
validate and check inputs, prevent buffer overflows, and more with David
Wheeler's
Secure programmer column on
developerWorks.
- The "Bash
by example" series on
developerWorks will help you get started writing shell scripts.
- The tutorial "Building
a cross-platform C library" shows how to convert an existing C program
or module -- or utility -- into a shared library (developerWorks,
June 2001).
- Peter Seebach has previously written a
series on how to treat UNIX utilities as a component architecture on
developerWorks.
- Get the
developerWorks Subscription (formerly the Toolbox subscription) to get
CDs and downloads of the latest software from IBM to build, test, evaluate,
and demonstrate applications on the IBM Software Development Platform.
- For Linux tools on AIX, check out the
AIX toolbox for Linux applications.
- To develop an application on Linux using
trial versions of the latest IBM tools and products, visit the
Speed-start your Linux app site.
- Find more resources for Linux developers
in the
developerWorks Linux section.
"What is it about m4 that makes it so useful,
and yet so overlooked? m4 -- a macro processor -- unfortunately has a dry name
that disguises a great utility. A macro processor is basically a program that
scans text and looks for defined symbols, which it replaces with other text
or other symbols."
[Apr 17, 2003] Exploring processes with Truss:
Part 1 By Sandra Henry-Stocker
The ps command can tell you quite a few things about each process running
on your system. These include the process owner, memory use, accumulated time,
the process status (e.g., waiting on resources) and many other things as well.
But one thing that ps cannot tell you is what a process is doing - what files
it is using, what ports it has opened, what libraries it is using and what system
calls it is making. If you can't look at source code to determine how a program
works, you can tell a lot about it by using a procedure called "tracing". When
you trace a process (e.g., truss date), you get verbose commentary on the process'
actions. For example, you will see a line like this each time the program opens
a file:
open("/usr/lib/libc.so.1", O_RDONLY) = 4
The text on the left side of the equals sign clearly indicates what is happening.
The program is trying to open the file /usr/lib/libc.so.1 and it's trying to
open it in read-only mode (as you would expect, given that this is a system
library). The right side is not nearly as self-evident. We have just the number
4. Open is not a Unix command, of course, but a system call. That means that
you can only use the command within a program. Due to the nature of Unix, however,
system calls are documented in man pages just like ls and pwd.
To determine what this number represents, you can skip down in this column
or you can read the man page. If you elect to read the man page, you will undoubtedly
read a line that tells you that the open() function returns a file descriptor
for the named file. In other words, the number, 4 in our example, is the number
of the file descriptor referred to in this open call. If the process that you
are tracing opens a number of files, you will see a sequence of open calls.
With other activity removed, the list might look something like this:
open("/dev/zero", O_RDONLY) = 3
open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT
open("/usr/lib/libc.so.1", O_RDONLY) = 4
open("/usr/lib/libdl.so.1", O_RDONLY) = 4
open64("./../", O_RDONLY|O_NDELAY) = 3
open64("./../../", O_RDONLY|O_NDELAY) = 3
open("/etc/mnttab", O_RDONLY) = 4
Notice that the first file handle is 3 and that file handles 3 and 4 are
used repeatedly. The initial file handle is always 3. This indicates that it
is the first file handle following those that are the same for every process
that you will run - 0, 1 and 2. These represent standard in, standard out and
standard error.
The file handles shown in the example truss output above are repeated only
because the associated files are subsequently closed. When a file is closed,
the file handle that was used to access it can be used again.
The close commands include only the file handle, since the location of the
file is known. A close command would, therefore, be something like close(3).
One of the lines shown above displays a different response - Err#2
ENOENT. This "error" (the word is put in quotes because this does not necessarily
indicate that the process is defective in any way) indicates that the file the
open call is attempting to open does not exist. Read "ENOENT" as "No such file".
Some open calls place multiple restrictions on the way that a file is opened.
The open64 calls in the example output above, for example, specify both O_RDONLY
and O_NDELAY. Again, reading the man page will help you to understand what each
of these specifications means and will present with a list of other options
as well.
As you might expect, open is only one of many system calls that you will
see when you run the truss command. Next week we will look at some additional
system calls and determine what they are doing.
Exploring processes with Truss: part 2 By Sandra Henry-Stocker
While truss and its cousins on non-Solaris systems (e.g., strace on Linux
and ktrace on many BSD systems) provide a lot of data on what a running process
is doing, this information is only useful if you know what it means. Last week,
we looked at the open call and the file handles that are returned by the call
to open(). This week, we look at some other system calls and analyze what these
system calls are doing. You've probably noticed that the nomenclature for system
functions is to follow the name of the call with a set of empty parentheses
for example, open(). You will see this nomenclature in use whenever system calls
are discussed.
The fstat() and fstat64() calls obtains information about open files - "fstat"
refers to "file status". As you might expect, this information is retrieved
from the files' inodes, including whether or not you are allowed to read the
files' contents. If you trace the ls command (i.e., truss ls), for example,
your trace will start with lines that resemble these:
1 execve("/usr/bin/ls", 0x08047BCC, 0x08047BD4)
argc = 1
2 open("/dev/zero", O_RDONLY) = 3
3 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE, 3, 0) = 0xDFBFA000
4 xstat(2, "/usr/bin/ls", 0x08047934) =
0
5 open("/var/ld/ld.config", O_RDONLY) Err#2
ENOENT
6 sysconfig(_CONFIG_PAGESIZE) = 4096
7 open("/usr/lib/libc.so.1", O_RDONLY)
= 4
8 fxstat(2, 4, 0x08047310) = 0
...
28 lstat64(".", 0x080478B4) = 0
29 open64(".", O_RDONLY|O_NDELAY) = 3
30 fcntl(3, F_SETFD, 0x00000001) = 0
31 fstat64(3, 0x0804787C) = 0
32 brk(0x08057208) = 0
33 brk(0x08059208) = 0
34 getdents64(3, 0x08056F40, 1048) = 424
35 getdents64(3, 0x08056F40, 1048) = 0
36 close(3) = 0
In line 31, we see a call to fstat64, but what file is it checking? The man
page for the fstat() and your intuition are probably both telling you that this
fstat call is obtaining information on the file opened two lines before – "."
or the current directory - and that it is referring to this file by its file
handle (3) returned by the open() call in line
2. Keep in mind that a directory is simply a file, though a different variety
of file, so the same system calls are used as would be used to check a text
file.
You will probably also notice that the file being opened is called /dev/zero
(again, see line 2). Most Unix sysadmins will immediately know that /dev/zero
is a special kind of file - primarily because it is stored in /dev. And, if
moved to look more closely at the file, they
will confirm that the file that /dev/zero points to (it is itself a symbolic
link) is a special character file. What /dev/zero provides to system programmers,
and to sysadmins if they care to use it, is an endless stream of zeroes. This
is more useful than might first appear.
To see how /dev/zero works, you can create a 10M-byte file full of zeroes
with a command like this:
/bin/dd < /dev/zero > zerofile bs=1024 seek=10240
count=1
This command works well because it creates the needed file with only a few
read and write operations; in other words, it is very efficient.
You can verify that the file is zero-filled with od.
# od -x zerofile
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
50002000
Each string of four zeros (0000) represents two bytes of data. The * on the
second line of output indicates that all of the remaining lines are identical
to the first.
Looking back at the truss output above, we cannot help but notice that the
first line of the truss output includes the name of the command that we are
tracing. The execve() system call executes a process. The first argument to
execve() is the name of the file from which the new process
image is to be loaded. The mmap() call which follows maps the process image
into memory. In
other words, it directly incorporates file data into the process address
space. The getdents64() calls on lines 34 and 35 are extracting information
from the directory file - "dents" refers to "directory entries'.
The sequence of steps that we see at the beginning of the truss output executing
the entered command, opening /dev/zero, mapping memory and so on - looks the
same whether you are tracing ls, pwd, date or restarting Apache. In fact, the
first dozen or so lines in your truss output will be nearly identical regardless
of the command you are running. You should, however, expect to see some differences
between different Unix systems and different versions of Solaris.
Viewing the output of truss, you can get a solid sense of how the operating
system works. The same insights are available if you are tracing your own applications
or troubleshooting third party executables.
-------------------
Sandra Henry-Stocker
3.2. Displaying all processes owned by a specific
user
$ ps ux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
heyne 691 0.0 2.4 19272 9576 ? S 13:35 0:00 kdeinit: kded
heyne 700 0.1 1.0 5880 3944 ? S 13:35 0:01 artsd -F 10 -S 40
... ... ...
You can also use the syntax "ps U username".
As you can see, the ps command can give you a
lot of interesting information. If you for example want to know what your friend
actually does, just replace your login name with her/his name and you see all
processe belonging to her/him.
3.3. Own output format
If you are bored by the regular output, you could
simply change the format. To do so use the formatting characters which are supported
by the ps command.
If you execute the ps command with the 'o' parameter you can tell the ps command
what you want to see:
e.g.
Odd display with AIX field descriptors:
$ ps -o "%u : %U : %p : %a"
RUSER : USER : PID : COMMAND
heyne : heyne : 3363 : bash
heyne : heyne : 3367 : ps -o %u : %U : %p : %a
Could
the command-line tools you've forgotten or never knew save time and some frustration?
One incarnation of the so called 80/20 rule has been associated with software
systems. It has been observed that 80% of a user population regularly uses only
20% of a system's features. Without backing this up with hard statistics, my
20+ years of building and using software systems tells me that this hypothesis
is probably true. The collection of Linux command-line programs is no exception
to this generalization. Of the dozens of shell-level commands offered by Linux,
perhaps only ten commands are commonly understood and utilized, and the remaining
majority are virtually ignored.
Which of these dogs of the Linux shell have the most value to offer? I'll
briefly describe ten of the less popular but useful Linux shell commands, those
which I have gotten some mileage from over the years. Specifically, I've chosen
to focus on commands that parse and format textual content.
The working examples presented here assume a basic familiarity with command-line
syntax, simple shell constructs and some of the not-so-uncommon Linux commands.
Even so, the command-line examples are fairly well commented and straightforward.
Whenever practical, the output of usage examples is presented under each command-line
execution.
The following eight commands parse, format and display textual content. Although
not all provided examples demonstrate this, be aware that the following commands
will read from standard input if file arguments are not presented.
Table 1. Summary of Commands
Head/Tail
As their names imply, head and tail are used to display some
amount of the top or bottom of a text block. head presents beginning
of a file to standard output while tail does the same with the end of a file.
Review the following commented examples:
## (1) displays the first 6 lines of a file
head -6 readme.txt
## (2) displays the last 25 lines of a file
tail -25 mail.txt
Here's an example of using head and tail in concert to display the 11th through
20th line of a file.
# (3)
head -20 file | tail -10
Manual pages show that the tail command has more command-line options than
head. One of the more useful tail option is -f. When it is used, tail does not
return when end-of-file is detected, unless it is explicitly interrupted. Instead,
tail sleeps for a period and checks for new lines of data that may have been
appended since the last read.
## (4) display ongoing updates to the given
## log file
tail -f /usr/tmp/logs/daemon_log.txt
Imagine that a dæmon process was continually appending activity logs to the
/usr/adm/logs/daemon_log.txt file. Using tail -f at a console window,
for example, will more or less track all updates to the file in real time. (The
-f option is applicable only when tail's input is a file).
If you give multiple arguments to tail, you can track several log files in
the same window.
## track the mail log and the server error log
## at the same time.
tail -f /var/log/mail.log /var/log/apache/error_log
tac--Concatenate in Reverse
What is cat spelled backwards? Well, that's what tac's functionality is all
about. It concatenates file order and their contents in reverse. So what's its
usefulness? It can be used on any task that requires ordering elements in a
last-in, first-out (LIFO) manner. Consider the following command line to list
the three most recently established user accounts from the most recent through
the least recent.
# (5) last 3 /etc/passwd records - in reverse
$ tail -3 /etc/passwd | tac
curly:x:1003:100:3rd Stooge:/homes/curly:/bin/ksh
larry:x:1002:100:2nd Stooge:/homes/larry:/bin/ksh
moe:x:1001:100:1st Stooge:/homes/moe:/bin/ksh
nl--Numbered Line Output
nl is a simple but useful numbering filter. I displays input with
each line numbered in the left margin, in a format dictated by command-line
options. nl provides a plethora of options that specify every detail
of its numbered output. The following commented examples demonstrate some of
of those options:
# (6) Display the first 4 entries of the password
# file - numbers to be three columns wide and
# padded by zeros.
$ head -4 /etc/passwd | nl -nrz -w3
001 root:x:0:1:Super-User:/:/bin/ksh
002 daemon:x:1:1::/:
003 bin:x:2:2::/usr/bin:
004 sys:x:3:3::/:
#
# (7) Prepend ordered line numbers followed by an
# '=' sign to each line -- start at 101.
$ nl -s= -v101 Data.txt
101=1st Line ...
102=2nd Line ...
103=3rd Line ...
104=4th Line ...
105=5th Line ...
.......
fmt--Format
The fmt command is a simple text formatter that focuses on making textual
data conform to a maximum line width. It accomplishes this by joining and breaking
lines around white space. Imagine that you need to maintain textual content
that was generated with a word processor. The exported text may contain lines
whose lengths vary from very short to much longer than a standard screen length.
If such text is to be maintained in a text editor (like vi), fmt is the command
of choice to transform the original text into a more maintainable format. The
first example below shows fmt being asked to reformat file contents as text
lines no greater than 60 characters long.
# (8) No more than 60 char lines
$ fmt -w 60 README.txt > NEW_README.txt
#
# (9) Force uniform spacing:
# 1 space between words, 2 between sentences
$ echo "Hello World. Hello Universe." | fmt -u -w80
Hello World. Hello Universe.
fold--Break Up Input
fold is similar to fmt but is used typically to format data
that will be used by other programs, rather than to make the text more readable
to the human eye. The commented examples below are fairly easy to follow:
# (10) Format text in 3 column width lines
$ echo oxoxoxoxo | fold -w3
oxo
xox
oxo
# (11) Parse by triplet-char strings -
# search for 'xox'
$ echo oxoxoxoxo | fold -w3 | grep "xox"
xox
# (12) One way to iterate through a string of chars
$ for i in $(echo 12345 | fold -w1)
> do
> ### perform some task ...
> print $i
> done
1
2
3
4
5
pr
pr shares features with simpler commands like nl and fmt, but its
command-line options make it ideal for converting text files into a format that's
suitable for printing. pr offers options that allow you to specify page
length, column width, margins, headers/footers, double line spacing and more.
Aside from being the best suited formatter for printing tasks, pr also offers
other useful features. These features include allowing you to view multiple
files vertically in adjacent columns or columnizing a list in a fixed number
of columns (see Listing 2).
Listing 2. Using pr
Miscellaneous
The following two commands are specialized parsers used to pick apart file
path pieces.
Basename/Dirname
The basename and dirname commands are useful for presenting portions of a
given file path. Quite often in scripting situations, it's convenient to be
able to parse and capture a file name or the containing-directory name portions
of a file path. These commands reduce this task to a simple one-line command.
(There are other ways to approach this using the Korn shell or sed "magic",
but basename and dirname are more portable and straightforward).
basename is used to strip off the directory, and optionally, the file
suffix parts of a file path. Consider the following trivial examples:
:# (21) Parse out the Java Class name
$ basename
/usr/local/src/java/TheClass.java .java
TheClass
# (22) Parse out the file name.
$ basename srcs/C/main.c
main.c
dirname is used to display the containing directory path, as much
of the path as is provided. Consider the following examples:
# (23) absolute and relative directory examples
$ dirname /homes/curly/.profile
/homes/curly
$ dirname curly/.profile
curly
#
# (24) From any korn-shell script, the following
# line will assign the directory from where
# the script was launched
SCRIPT_HOME="$(dirname $(whence $0))"
#
# (25)
# Okay, how about a non-trivial practical example?
# List all directories (under $PWD that contain a
# file called 'core'.
$ for i in $(find $PWD -name core )^
> do
> dirname $i
> done | sort -u
bin
rje/gcc
src/C
ttyrec is a tty recorder. Recorded data can be played back with the included
ttyplay command. ttyrec is just a derivative of script command for recording
timing information with microsecond accuracy as well. It can record emacs -nw,
vi, lynx, or any programs running on tty.
In the next few articles, I'd like to take
a look at backups and archiving utilities. if you're like I was when I started
using Unix, I was intimidated by the words tar,
cpio and
dump, and a
quick peek at their respective man pages did not alleviate my fears.
Links to the manuals for the Gnu tools most commonly used in embedded development:
Using and Porting GNU CC * Using as, The GNU Assembler * GASP, an assembly preprocessor
* Using ld, the GNU linker
http://www.objsw.com/docs/
Matt Braithwaite writes "Answering RMS's
call for
free documentation, Karl
Fogel has written a book on CVS that is free (GPLed) and
available online. (The
paper
version has additional non-free material.) " Also, edinator wrote to say
that ORA has put the
Using Samba text online. The entire text of the Oreilly Docbook is downloadable
www.docbook.org
(Oct 21, 2000, 18:38 UTC) (116 reads) (0 talkbacks) (Posted by
john)
"This book takes a different approach in
that it steps through the development of a fictional application. The application
you will build is an interface for a DVD rental store."
(Oct 21, 2000, 18:03 UTC) (203 reads) (0 talkbacks) (Posted by
john)
"The Red Hat Package Manager (RPM) has establised
itself as one of the most popular distrubution formats for linux software today.
A first time user may feel overwhelmed by the vast number of options available and
this article will help a newbie to get familiar with usage of this tool."
Signal
Ground: Stupid dd Tricks (or, Why We Didn't buy Norton Ghost)
"The company that employs Tom and me builds big
pieces of food processing machinery that cost upwards of $400K. Each machine
includes an embedded PCs running -- and I cringe -- NT 4. While the company's
legacy currently dictates NT, those of us at the lower levels of the totem pole
work to wedge Linux in wherever we can. What follows is a short story of a successful
insertion that turned out to be (gasp!) financially beneficial to the company,
too."
"...Ghost works well; it does exactly what we
wanted it to. You boot off of a floppy (while the image medium is in another
drive), and Ghost does the rest. The problem lies in Ghost's licensing. If you
want to install in a situation like ours, you have to purchase a Value-Added
Reseller (VAR) license from Symantec. And, every time you create a drive, you
have to pay them about 17 dollars. When you also figure in the time needed to
keep track of those licenses, that adds up in a hurry."
"It finally occurred to me that we could use
Linux and a couple of simple tools (dd, gzip, and a shell script) to do the
same thing as Ghost -- at least as far as our purposes go. ... The Results?
We showed our little program to management, and they were impressed. We were
able to create disk images almost as quickly as Norton Ghost, and we did it
all in an afternoon using entirely free software. The rest is history."
Issue #87 Common Shell Tools - Focus On Linux - 05-25-00
"sort and uniq
The sort command is used to sort the lines in an input
stream in alphanumeric or telephone book order. The simplest ways to use
sort are to provide it with a filename to sort or an input stream whose
data should be output in sorted form:
sort myfile.txt
cat myfile.txt | sort
This tool can be told to sort based on alternate fields and
in several different orders. The uniq command is often used in conjunction
with sort because it removes consecutive duplicate lines from and input
stream before writing them to standard output. This provides a quick easy way
to sort a pool of data and them remove duplicate entries.
A more in-depth discussion of sort can be found in
the past QuickTip called
Sort and Uniq.
tr
The tr command in its simplest form can be thought
of as a simpler case of the sed command discussed earlier. It is used
to replace all occurances of a single character in an input stream with an alternate
character before writing to the output stream. For example, to change all percent
(%) characters to spaces, you might use:
tr '%' ' '
newfile.txt
Though sed can be used to accomplish the same task,
it is often simpler to use tr when replacing a single character because
the syntax is easy to remember and many special characters which must be escaped
for sed can be supplied to tr without escaping.
wc
The wc, or "word count" command does just what its
name implies: it counts words. As an added feature, tr also counts lines
and bytes. The formats for counting words, lines, or bytes in a file or input
stream are:
$ wc -w myfile.txt
897 myfile.txt
$ wc -l myfile.txt
193 myfile.txt
$ wc -c myfile.txt
5927 myfile.txt
$
Notice that the output for wc normally includes the
filename (when reading from a file) and always includes a number of spaces as
well. Often, this behavior is undesirable, usually when a number is required
without leading or trailing whitespace. In such cases, sed and cut
can be used to eliminate them:
$ wc -l myfile.txt | cut -d ' ' -f 1 | sed 's! !!g'
193
$
Note that other methods for removing spaces or filenames include
using a more complex sed command alone or even using awk, which
we won't discuss in this issue.
xargs
The xargs utility is used to break long input streams
into groups of lines so that the shell isn't overloaded by command substitution.
For example, the following command may fail if too many files are present in
the current directory tree for BASH to substitute correctly:
lpr $(find .)
However, using xargs, the desired effect can be obtained:
find . | xargs lpr
More information on using xargs can be found in the
QuickTip called
Long Argument Lists and on the xargs manual page.
Linux Today PRNewswire
SCO Contributes to the Open Source Community; Kicks Off Open Source Initiatives
"SCO is contributing source
code for two developer tools -- "cscope" and "fur." The code is released under
the terms of the BSD License and will be maintained by SCO.
The first technology, cscope, is available to download at www.sco.com/opensource.
Software developers can use cscope to help design and debug programs coded with
the C programming language. The second technology, Fur, will be available to
download in several weeks. Fur is a real-time analysis program used to optimize
application and system binaries for more effective run time execution. Dramatic
results have been seen in high-level applications and database systems using
fur."
One of the greatest strengths of
the Open Source movement is the availability of source code for almost every
program. This article will discuss in general terms, with some examples, how
to install a program from source code rather than a precompiled binary package.
The primary audience for this article is the user who has some familiarity with
installing programs from binaries, but isn't familiar with installing from source
code. Some knowledge of compiling software is helpful, but not required.
Open Source Software Chronicles
-- July-September, 1998
In case of broken links
please try to use Google search. If you find the page please notify
us about new location
Less sucks less more than more.
That's why I use more less, and less more.
Copyright © 1996-2008 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with. We do not warrant the correctness
of the information provided or its fitness for any purpose.
Last modified:
June 05, 2008