Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Unix split command

News Syntax Recommended Links Sorting algorithms Recommended Papers Rcut Reference Pipes
Perl re-implemenations uniq sort tr AWK Tips Humor Etc

The external split command splits a file into smaller files based on a specified number of lines. Each of these smaller files are equal in size, with the exception of the last one created. It is the remainder of the original. Your original file is not changed by split. You may find the split command helpful in dividing large data files into smaller, more manageable files. Since the extensions added by split are a sortable sequence, you can process the new files using shell looping structures.

There are some commands that cannot handle extremely large files; therefore, you may have to split the input for these commands into more manageable blocks.

The split command reads input from a file or the standard input and creates multiple output files. Thus it is suitable as the last stage of the pipeline.  Each file contains n lines of the original. If you do not provide n, split uses the value of 1000. You can split file into chanks on one line per file using n=1.

If you provide the newname argument, the destination files are named newnameXX. Where XX is aa for the first file, ab for the second, and continues until the file zz. That's a total of 676 files you can generate if you divide your input into small enough sizes. When using newname, you must use a name two characters shorter than the maximum allowed for filenames. Maximum filename length is 100; therefore, you can only use filenames of 98 characters for newname. If you do not provide a newname argument, the destination files are named xXX. split uses the x as a newname.

The general format of the split command follows.

     split [ -n ] [-] [file [new_name]

Options

The following option may be used to control how split functions.

-n Causes the output files to contain n lines.
If n is not specified, split uses 1000. The input is split into files containing 1000 line each.

Arguments

The following list describes the arguments that may be passed to the split command.

- Causes split to read from the standard input.
file The name of the file split reads and divides into n or 1000 line files.
If no file is given on the command line, split will read from the standard input.
newname The base part of the name used for all output files. An extension is added to newname for each file created. The extension is made up of two alpha characters. The first file extension is "aa," then "ab," and so on until the original input is completely divided.
If newname is not specified, the output is written to a file with a base part of "x" and the normal extensions. Thus the default output filenames are xaa, xab, and so on.

split places its output in files with an extension of two characters. The characters begin with "aa," the next file is "ab," and so on until the entire input has been split and stored in multiple files.

Examples

In this activity you use the split command to divide the standard input into separate output files. Begin at the shell prompt.

ls /bin | split -20 - bin

Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: September 15, 2008