Here are some ideas for improving GNU @command{diff} and @command{patch}. The GNU project has identified some improvements as potential programming projects for volunteers. You can also help by reporting any bugs that you find.
If you are a programmer and would like to contribute something to the GNU project, please consider volunteering for one of these projects. If you are seriously contemplating work, please write to gnu@gnu.org to coordinate with other volunteers.
One should be able to use GNU @command{diff} to generate a patch from any pair of directory trees, and given the patch and a copy of one such tree, use @command{patch} to generate a faithful copy of the other. Unfortunately, some changes to directory trees cannot be expressed using current patch formats; also, @command{patch} does not handle some of the existing formats. These shortcomings motivate the following suggested projects.
@command{diff}, @command{diff3} and @command{sdiff} treat each line of input as a string of unibyte characters. This can mishandle multibyte characters in some cases. For example, when asked to ignore spaces, @command{diff} does not properly ignore a multibyte space character.
Also, @command{diff} currently assumes that each byte is one column wide, and this assumption is incorrect in some locales, e.g., locales that use UTF-8 encoding. This causes problems with the @option{-y} or @option{--side-by-side} option of @command{diff}.
These problems need to be fixed without unduly affecting the performance of the utilities in unibyte environments.
The IBM GNU/Linux Technology Center Internationalization Team has proposed some patches to support internationalized @command{diff} http://oss.software.ibm.com/developer/opensource/linux/patches/i18n/diffutils-2.7.2-i18n-0.1.patch.gz. Unfortunately, these patches are incomplete and are to an older version of @command{diff}, so more work needs to be done in this area.
@command{diff} and @command{patch} do not handle some changes to directory structure. For example, suppose one directory tree contains a directory named `D' with some subsidiary files, and another contains a file with the same name `D'. `diff -r' does not output enough information for @command{patch} to transform the directory subtree into the file.
There should be a way to specify that a file has been removed without having to include its entire contents in the patch file. There should also be a way to tell @command{patch} that a file was renamed, even if there is no way for @command{diff} to generate such information. There should be a way to tell @command{patch} that a file's time stamp has changed, even if its contents have not changed.
These problems can be fixed by extending the @command{diff} output format to represent changes in directory structure, and extending @command{patch} to understand these extensions.
Some files are neither directories nor regular files: they are unusual files like symbolic links, device special files, named pipes, and sockets. Currently, @command{diff} treats symbolic links like regular files; it treats other special files like regular files if they are specified at the top level, but simply reports their presence when comparing directories. This means that @command{patch} cannot represent changes to such files. For example, if you change which file a symbolic link points to, @command{diff} outputs the difference between the two files, instead of the change to the symbolic link.
@command{diff} should optionally report changes to special files specially, and @command{patch} should be extended to understand these extensions.
When a file name contains an unusual character like a newline or white space, `diff -r' generates a patch that @command{patch} cannot parse. The problem is with format of @command{diff} output, not just with @command{patch}, because with odd enough file names one can cause @command{diff} to generate a patch that is syntactically correct but patches the wrong files. The format of @command{diff} output should be extended to handle all possible file names.
Applying @command{patch} to a multiple-file diff can result in files whose time stamps are out of order. GNU @command{patch} has options to restore the time stamps of the updated files (see section Updating Time Stamps on Patched Files), but sometimes it is useful to generate a patch that works even if the recipient does not have GNU patch, or does not use these options. One way to do this would be to implement a @command{diff} option to output diffs in time stamp order.
It would be nice to have a feature for specifying two strings, one in from-file and one in to-file, which should be considered to match. Thus, if the two strings are `foo' and `bar', then if two lines differ only in that `foo' in file 1 corresponds to `bar' in file 2, the lines are treated as identical.
It is not clear how general this feature can or should be, or what syntax should be used for it.
A partial substitute is to filter one or both files before comparing, e.g.:
sed 's/foo/bar/g' file1 | diff - file2
However, this outputs the filtered text, not the original.
When comparing two large directory structures, one of which was originally copied from the other with time stamps preserved (e.g., with `cp -pR'), it would greatly improve performance if an option told @command{diff} to assume that two files with the same size and time stamps have the same content. See section @command{diff} Performance Tradeoffs.
If you think you have found a bug in GNU @command{cmp}, @command{diff}, @command{diff3}, or @command{sdiff}, please report it by electronic mail to the GNU utilities bug report mailing list bug-gnu-utils@gnu.org. Please send bug reports for GNU @command{patch} to bug-patch@gnu.org. Send as precise a description of the problem as you can, including the output of the @option{--version} option and sample input files that produce the bug, if applicable. If you have a nontrivial fix for the bug, please send it as well. If you have a patch, please send it too. It may simplify the maintainer's job if the patch is relative to a recent test release, which you can find in the directory ftp://alpha.gnu.org/gnu/diffutils/.
Go to the first, previous, next, last section, table of contents.