Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Mail Systems
Eclipse Documentation

How To Guides
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Problem Solutions
Privacy Policy




Node:diff Performance, Next:, Previous:Adjusting Output, Up:Top

diff Performance Tradeoffs

GNU diff runs quite efficiently; however, in some circumstances you can cause it to run faster or produce a more compact set of changes.

One way to improve diff performance is to use hard or symbolic links to files instead of copies. This improves performance because diff normally does not need to read two hard or symbolic links to the same file, since their contents must be identical. For example, suppose you copy a large directory hierarchy, make a few changes to the copy, and then often use diff -r to compare the original to the copy. If the original files are read-only, you can greatly improve performance by creating the copy using hard or symbolic links (e.g., with GNU cp -lR or cp -sR). Before editing a file in the copy for the first time, you should break the link and replace it with a regular copy.

You can also affect the performance of GNU diff by giving it options that change the way it compares files. Performance has more than one dimension. These options improve one aspect of performance at the cost of another, or they improve performance in some cases while hurting it in others.

The way that GNU diff determines which lines have changed always comes up with a near-minimal set of differences. Usually it is good enough for practical purposes. If the diff output is large, you might want diff to use a modified algorithm that sometimes produces a smaller set of differences. The -d or --minimal option does this; however, it can also cause diff to run more slowly than usual, so it is not the default behavior.

When the files you are comparing are large and have small groups of changes scattered throughout them, you can use the --speed-large-files option to make a different modification to the algorithm that diff uses. If the input files have a constant small density of changes, this option speeds up the comparisons without changing the output. If not, diff might produce a larger set of differences; however, the output will still be correct.

Normally diff discards the prefix and suffix that is common to both files before it attempts to find a minimal set of differences. This makes diff run faster, but occasionally it may produce non-minimal output. The --horizon-lines=lines option prevents diff from discarding the last lines lines of the prefix and the first lines lines of the suffix. This gives diff further opportunities to find a minimal output.

Suppose a run of changed lines includes a sequence of lines at one end and there is an identical sequence of lines just outside the other end. The diff command is free to choose which identical sequence is included in the hunk. In this case, diff normally shifts the hunk's boundaries when this merges adjacent hunks, or shifts a hunk's lines towards the end of the file. Merging hunks can make the output look nicer in some cases.

  Published under the terms of the GNU General Public License Design by Interspire