The underrated split command

May 8, 2007 |

Many of us deal with large files from time to time such as apache logs and system logs such as wtmp/utmpx file but at times we deal with files that stretch the limits of our tools. File systems today allow files as large as 2 terabyte. While files of this size are rarely seen, files that are several gigabytes in size are not uncommon.

One option for dealing with huge files is to break them into more manageable chunks and move or process these chunks independently. The command to use for this is called split and it works with text or binary files. Split can divide files into chunks that contain a certain number of lines.

Examples

The following will create a series of 200,000-line files, giving them the default names for split files - xaa, xab, xac and so on:
$ split -l 200000 largelogfile

- The following example gives more meaningful names to each chunk and will result in a series of files called log_aa, log_ab, log_ac and so on:

$ split -l 200000 largelogfile log_

If you’re going to split a binary file such as movies and mp3 files then it’s just as easy as splitting a text file. The following command will split the WMV file, movie1.wmv, into a series of 10 kilobyte chunks and and name them wmv_aa, wmv_ab, wmv_ac and so on.

$ split -b 10k movie1.wmv wmv_

So now that you’ve split the file into many smaller ones, you can easily use the cat command to restore all smaller chunks into original text file or binary. Here is a quick example of breaking a file into smaller pieces and restoring it to original:

$ -rw-r–r– 1 root root 47704 May 8 15:11 design.jpg

$ split -b 10000 design.jpg logo_

The output is:

-rw-rw-r– 1 root root 10000 May 8 15:38 logo_aa
-rw-rw-r– 1 root root 10000 May 8 15:38 logo_ab
-rw-rw-r– 1 root root 10000 May 8 15:38 logo_ac
-rw-rw-r– 1 root root 10000 May 8 15:38 logo_ad
-rw-rw-r– 1 root root   7704 May 8 15:38 logo_ae

To piece the file back together, use the cat command and a wild card.

$ cat logo* > design_new.jpg

We now have two JPEG files, the original and the reconstituted file:

-rw-r–r– 1 root root 47704 May 8 15:11 design.jpg
-rw-rw-r– 1 root root 47704 May 8 15:43 design_new.jpg

Why use split? Sometimes, you’ll want to split a file simply because it’s faster to run a command or script against it. Also, you may need to move a file from one system to another and by breaking a 20gb file into forty 500 megabyte pieces, it might be much easier to move the data around. You can also break it into chunks and write to a CD and easily restore to actual file when needed.



Comments

3 Comments so far

  1. Peter on June 29, 2007 1:52 pm

    very useful information (clear and to the point).

  2. Barry Kelly on February 6, 2008 6:22 am

    The reason it’s been underrated, to the best of my knowledge, is that it’s new. I don’t think it existed before 2007, to judge by my memory and the copyright notice.

    – Barry

  3. to nohup or not to nohup : lxpages.com blog on February 21, 2008 12:29 pm

    […] you can use screen to do the same thing by detaching your terminal before logging out but that still ties your command […]

Name

Email

Website

Speak your mind

  • Categories

  • Sponsors