Thursday, January 19, 2012

Patches Between Files

Introduction: Using diff and patch (tutorial)

Author rechosen Category Tutorials Tags diff, introduction, patch, usage

The commands diff and patch form a powerful combination. They are widely used to get differences between original files and updated files in such a way that other people who only have the original files can turn them into the updated files with just a single patch file that contains only the differences. This tutorial explains the basics of how to use these great commands.

Difficulty: Medium

This tutorial assumes some basic Linux and command line knowledge, like changing directories, copying files and editing text files.

Using diff to create a simple patch

The most simple way of using diff is getting the differences between two files, an original file and an updated file. You could, for example, write a few words in a normal text file, make some modifications, and then save the modified content to a second file. Then, you could compare these files with diff, like this:

[rechosen@localhost ~]$ diff originalfile updatedfile

Of course, replace originalfile and updatedfile with the appropiate filenames of your case. You will most probably get an output like this:

1c1
< These are a few words.
\ No newline at end of file
---
> These still are just a few words.
\ No newline at end of file

Note: to demonstrate the creation of a simple patch, I used the file originalfile with the content "These are a few words." and the file updatedfile with the content "These still are just a few words.". You can create these files yourself if you want to run the commands in the tutorial and get about the same output.

The 1c1 is a way of indicating line numbers and specifying what should be done. Note that those line numbers can also be line ranges (12,15 means line 12 to line 15). The "c" tells patch to replace the content of the lines. Two other characters with a meaning exist: "a" and "d", with "a" meaning "add" or "append" and "d" meaning "delete". The syntax is (line number or range)(c, a or d)(line number or range), although when using "a" or "d", one of the (line number or range) parts may only contain a single line number.

  • When using "c", the line numbers left of it are the lines in the original file that should be replaced with text contained in the patch, and the line numbers right of it are the lines the content should be in in the patched version of the file.
  • When using "a", the line number on the left may only be a single number, meaning where to add the lines in the patched version of the file, and the line numbers right of it are the lines the content should be in in the patched version of the file.
  • When using "d", the line numbers left of it are the lines that should be deleted to create the patched version of the file, and the line number on the right may only be a single number, telling where the lines would have been in the patched version of the file if they wouldn't have been deleted. You might think that that last number is redundant, but remember that patches can also be applied in a reverse way. I'll explain more about that later on in this tutorial.

The "<" means that patch should remove the characters after this sign, and the ">" means that the characters after this sign should be added. When replacing content (a "c" between the line numbers), you will see both the < and the > sign. When adding content (an "a" between the line numbers), you'll only see the > sign, and when deleting content (a "d" between the line numbers), only the < sign.

The "\", followed by "No newline at end of file", is only there because I didn't press enter after typing the words. Generally, it always is good practice to add a final newline to every text file you create. Certain pieces of software can't do without them. Therefore, the absence of a final newline is reported so explicit by diff. Adding final newlines to the files makes the output a lot shorter:

1c1
< These are a few words.
---
> These still are just a few words.

As you may have noticed, I omitted explaining what the 3 -'s are for. They indicate the end of the lines that should be replaced and the beginning of the lines that should replace them. They separate the old and the new lines. You will only see these when replacing content (a "c" between the line numbers).

If we want to create a patch, we should put the output of diff into a file. Of course, you could do this by copying the output from your console and, after pasting it in your favourite text editor, saving the file, but there is a shorter way. We can let bash write diff's output to a file for us this way:

[rechosen@localhost ~]$ diff originalfile updatedfile > patchfile.patch

Again, replace the filenames with the ones appropiate in your case. You might like to know that telling bash to write a command's output to a file using > works with every command. This can be very useful to save to output of a command to a (log) file.

Applying the simple patch we created

Well then, did we just create a patch? The short answer is: yes, we did. We can use the patchfile to change a copy of originalfile to a copy of updatedfile. Of course, it wouldn't make that much sense to apply the patch on the files we created the patch from. Therefore, copy the original file and the patchfile to an other place, and go to that place. Then, try applying the patch this way:

[rechosen@localhost ~]$ patch originalfile -i patchfile.patch -o updatedfile

Again, replace the filenames where necessary. If all went well, the file updatedfile just created by patch should be identical to the one you had at first, when creating the patch with diff. You can check this using diff's -s option:

[rechosen@localhost ~]$ diff -s updatedfile [/path/to/the/original/updatedfile]/updatefile

Replace the part between [ and ] with the path to the original update file. For example, if the updatedfile you used when creating the patch is located in the parent directory of your current directory, replace "[/path/to/the/original/updatedfile]" with ".." (bash understands this as the parent directory of the current working directory). And of course, also replace the filenames again where appropiate.

Congratulations! If diff reported the files to be equal, you just succesfully created and used a patch! However, the patch format we just used is not the only one. In the next chapter, I will explain about an other patch format.

Contextual patching

In the first chapter, we created a patch using diff's normal format. This format, however, doesn't provide any of the lines of context around the ones to be replaced, and therefore, a change in the line numbers (one or more extra newlines somewhere, or some deleted lines) would make it very difficult for the patch program to determine which lines to change instead. Also, if a different file that is being patched by accident contains the same lines as the original file at the right places, patch will happily apply the patchfile's changes to this file. This could result in broken code and other unwanted side-effects. Fortunately, diff supports other formats than the normal one. Let's create a patch for the same files, but this time using the context output format:

[rechosen@localhost ~]$ diff -c originalfile updatedfile

By now, it should be clear that you should replace the filenames where necessary =). You should get an output like this:

*** originalfile 2007-02-03 22:15:48.000000000 0100
--- updatedfile 2007-02-03 22:15:56.000000000 0100
***************
*** 1 ****
! These are a few words.
--- 1 ----
! These still are just a few words.

As you can see, the filenames are included. This will save us some typing when applying the patch. The timestamps you can see next to the filenames are the date and time of the last modification of the file. The line with 15 *'s indicates the starting of a hunk. A hunk describes which changes, like replacements, additions and deletions, should be made to a certain block of text. The two numbers 1 are line numbers (again, these can also be line ranges (12,15 means line 12 to line 15)), and ! means that the line should be replaced. The line with a ! before the three -'s (hey, where did we see those before?) should be replaced by the second line with a !, after the three -'s (of course, the ! itself will not be included; it's context format syntax).

As you can see, there aren't any c's, a's and d's here. The action to perform is determined by the character in front of the line. The !, as explained, means that the line should be replaced. The other available characters are +, - and " " (a space). The + means add (or append), the - means delete, and the " " means nothing: patch will only use it as context to be sure it's modifying the right part of the file.

Applying this patch is a bit easier: under the same circumstances as before (let bash write the diff output to a file again, then copy the patchfile and the original file to an other location), you'll need to run:

[rechosen@localhost ~]$ patch -i patchfile.patch -o updatedfile

You'll probably think now: why do we still have to specify the new filename? Well, that's because patch was made with the intention to update existing files in mind, not to create new updated files. This usually comes in handy when patching source trees of programs, which is pretty much the main use of patch. And that brings us to our next subject: to patch a whole source tree, multiple files should included in the patchfile. The next chapter will tell how to do this.

Getting the differences between multiple files

The easiest way to get the differences between multiple files is to put them all in a directory and to let diff compare the whole directories. You can just specify directories instead of files, diff will autodetect whether you're giving it a file or a directory:

[rechosen@localhost ~]$ diff originaldirectory/ updateddirectory/

Note: if the directories you're comparing also include subdirectories, you should add the -r option to make diff compare the files in subdirectories, too.

This should give an output like this:

diff originaldirectory/file1 updateddirectory/file1
1c1
< This is the first original file.
---
> This is the first updated file.
diff originaldirectory/file2 updateddirectory/file2
1c1
< This is the second original file.
---
> This is the second updated file.
14d13
< We're going to add something in this file and to delete this line.
26a26
> This is line has been added to this updated file.

Note: for this example, I created some example files. You can download an archive containing these files here: http://www.linuxtutorialblog.com/post/introduction-using-diff-and-patch-tutorial/diffpatchexamplefiles.tar.gz.

As you can see, the normal output format only specifies filenames when comparing multiple files. You can also see examples of the addition and deletion of lines.

Now, let's have a look at the output of the same comparison in the context format:

diff -c originaldirectory/file1 updateddirectory/file1
*** originaldirectory/file1 2007-02-04 16:17:57.000000000 +0100
--- updateddirectory/file1 2007-02-04 16:18:33.000000000 +0100
***************
*** 1 ****
! This is the first original file.
--- 1 ----
! This is the first updated file.
diff -c originaldirectory/file2 updateddirectory/file2
*** originaldirectory/file2 2007-02-04 16:19:37.000000000 +0100
--- updateddirectory/file2 2007-02-04 16:20:08.000000000 +0100
***************
*** 1,4 ****
! This is the second original file.

S
O
--- 1,4 ----
! This is the second updated file.

S
O
***************
*** 11,17 ****
C
E

- We're going to add something in this file and to delete this line.

S
O
--- 11,16 ----
***************
*** 24,28 ****
--- 23,28 ----
C
E

+ This is line has been added to this updated file.

Something will be added above this line.

The first thing you should notice is increase in length; the context format provides more information than the normal format. This wasn't that visible in the first example, as there wasn't any context to include. However, this time there was context, and that surely lenghtens the patch a lot. You might also have noticed that the filenames are mentioned twice every time. This is probably done either to make it easier for patch to recognize when to start patching the next file, or to provide better backwards-compatibility (or both).

The other way to let diff compare multiple files is writing a shell script that runs diff multiple times and correctly adds all output to one file, including the lines with the diff commands. I will not tell you how to do this as the other way (putting the files in a directory) is a lot easier and is used widely.

Creating this patch with diff was considerably easy, but the use of directories kicks in a new problem: will patch just patch the mentioned files in the current working directory and forget about the directory they were in when creating the patch, or will it patch the files inside the directories specified in the patch? Have a look at the next chapter to find out!

Patching multiple files

In the chapter before this one, we created a patch that can be used to patch multiple files. If you haven't done so already, save diff's output to an actual patchfile in a way like this:

[rechosen@localhost ~]$ diff -c originaldirectory/ updateddirectory/ > patchfile.patch

Note: we'll be using the context format patch here as it generally is good practice to use a format that provides context.

It's time to try using our patchfile. Copy the original directory and the patchfile to an other location, go to that other location, and apply the patch with this command:

[rechosen@localhost ~]$ patch -i patchfile.patch

Huh? It reports that it cannot find the file to patch! Yep, that's right. It is trying to find the file file1 in the current directory (patch defaultly strips away all directories in front of the filename). Of course, this file isn't there because we're trying to update the file in the directory originaldirectory. For this reason, we should tell patch not to strip away any directories in the filenames. That can be done this way:

[rechosen@localhost ~]$ patch -p0 -i patchfile.patch

Note: you might think you could also just move into originaldirectory and run the patch command there. Don't! This is bad practice: if the patchfile includes any files to patch in subdirectories, patch will look for them in the working directory, and, obviously, not find them or find the wrong ones. Use the -p option to make patch look in subdirectories as it should.

The -p options tells patch how many slashes (including what's before them, usually directories) it should strip away before the filename (note that, when using the option -p0, patch looks for the files to patch in both originaldirectory and updateddirectory, in our case). In this case, we set it to 0 (do not strip away any slash), but you can also set it to 1 (to strip away the first slash including anything before it), or 2 (to strip away the first two slashes including everything before it), or any other amount. This can be very useful if you've got a patch which uses a different directory structure than you. For example: if you'd have a patch that uses a directory structure like this:

(...)
*** /home/username/sources/program/originaldirectory/file1 2007-02-04 16:17:57.000000000 +0100
--- /home/username/sources/program/updateddirectory/file1 2007-02-04 16:18:33.000000000 +0100
(...)

You could just count the slashes (/ (1) home/ (2) username/ (3) sources/ (4) program/ (5)) and give that value with the -p option. If you're using -p5, patch would look for both originaldirectory/file1 and updateddirectory/file1. Please do note that patch considers two slashes next to each other (like in /home/username//sources) as a single slash. This is because scripts sometimes (accidently or not) put an extra slash between directories.

Reversing an applied patch

Sometimes a patch is applied while it shouldn't have been. For example: a patch introduces a new bug in some code, and a fixed patch is released. However, you already applied the old, buggy patch, and you can't think of a quick way to get the original files again (maybe they were already patched dozens of times). You can then apply the buggy patch in a reversive way. The patch command will try to undo all changes it did by swapping the hunks. You can tell patch to try reversing by passing it the -R option:

[rechosen@localhost ~]$ patch -p0 -R -i patchfile.patch

Usually, this operation will succeed, and you'll get back the original files you had. By the way, there is another reason why you'd want to reverse a patch: sometimes (especially when sleepy), people release a patch with the files swapped. You've got a big chance that patch will detect this automatically and ask you if you want it to try patching reversively. Sometimes, however, patch will not detect it and wonder why the files don't seem to match. You can then try applying the patch in a reversed way manually, by passing the -R option to patch. It is good practice to make a backup before you try this, as it is possible that patch messes up and leaves you with irrecoverably spoiled files.

The unified format

The diff command can also output the differences in another format: the unified format. This format is more compact, as it omits redundant context lines and groups things like line number instructions. However, this format is currently only supported by GNU diff and patch. If you're releasing a patch in this format, you should be sure that it will only be applied by GNU patch users. Pretty much every Linux flavour features GNU patch.

The unified format is similar to the context format, but it's far from exactly the same. You can create a patch in the unified format this way:

[rechosen@localhost ~]$ diff -u originaldirectory/ updateddirectory/

The output should be something like this:

diff -u originaldirectory/file1 updateddirectory/file1
--- originaldirectory/file1 2007-02-04 16:17:57.000000000 +0100
+++ updateddirectory/file1 2007-02-04 16:18:33.000000000 +0100
@@ -1 +1 @@
-This is the first original file.
+This is the first updated file.
diff -u originaldirectory/file2 updateddirectory/file2
--- originaldirectory/file2 2007-02-04 16:19:37.000000000 +0100
+++ updateddirectory/file2 2007-02-04 16:20:08.000000000 +0100
@@ -1,4 +1,4 @@
-This is the second original file.
+This is the second updated file.

S
O
@@ -11,7 +11,6 @@
C
E

-We're going to add something in this file and to delete this line.

S
O
@@ -24,5 +23,6 @@
C
E

+This is line has been added to this updated file.

Something will be added above this line.

As you can see, the line numbers/ranges are grouped and placed between @'s. Also, there is no extra space after + or -. This saves some bytes. Another difference: the unified format does not feature a special replacement sign. It simply deletes (the - sign) the old line and adds (the + sign) the altered line instead. The only difference between adding/deleting and replacing can be found in the line numbers/ranges: when replacing a line, these are the same, and when adding or deleting, they differ.

Format comparison

Having read about three formats, you probably wonder which one to choose. Here's a small comparison:

  • The normal format features the best compatibility: pretty much every diff/patch-like command should recognize it. The lack of context is a big disadvantage, though.
  • The context format is widely supported, though not every diff/patch-like command knows it. However, the advantage of being able to include context makes up for that.
  • The unified format features context, too, and is more compact than the context format, but is only supported by a single brand of diff/patch-like commands.

If you're sure that the patch will be used by GNU diff/patch users only, unified is the best choice, as it keeps your patch as compact as possible. In most other cases, however, the context format is the best choice. The normal format should only be used if you're sure there's a user without context format support.

Varying the amount of context lines

It is possible to make diff include less lines of context around the lines that should be changed. Especially in big patchfiles, this can strip away a lot of bytes and make your patchfile more portable. However, if you include too few lines of context, patch might not work correctly. Quoting the GNU diff man page: "For proper operation, patch typically needs at least two lines of context."

Specifying the amount of context lines can be done in multiple ways:

  • If you want to use the context format, you can combine it into one option, the -C option. Example:

    [rechosen@localhost ~]$ diff -C 2 originaldirectory/ updateddirectory/

    The above command would use the context format with 2 context lines.

  • If you want to use the unified format, you can combine it into one option, the -U option. Example:

    [rechosen@localhost ~]$ diff -U 2 originaldirectory/ updateddirectory/

    The above command would use the unified format with 2 context lines.

  • Regardless which format you choose, you can specify the number of lines like this:

    [rechosen@localhost ~]$ diff -2 originaldirectory/ updateddirectory/

    However, this will only work if you also specify a context-supporting format. You'd have to combine this option either with -c or -u.

Final words

Although this tutorial describes a lot of features and workings of diff and patch, it does by far not describe everything you can do with these powerful tools. It is an introduction in the form of a tutorial. If you want to know more about these commands, you can read, for example, their manpages and GNU's documentation about diff and patch.

Well then, I hope this tutorial helped you. Thank you for reading! If you liked this tutorial, browse around this blog and see if there are more you like. Please help this blog to grow by leaving a link here and there, and let other people benefit from the growing amount of knowledge on this site. Thanks in advance and happy patching!


Patches from SVN

Creating and Applying Patches for Subversion


Subversion is a source code versioning system that allows developers to concurrently make changes to source code and reconcile any differences before a release is deployed.

To check out a copy of Habari using subversion to the current directory using the command line:
svn checkout http://svn.habariproject.org/habari/trunk/htdocs .

If you want to check out tests as well, leave the htdocs off the directory path above.

To update an existing working copy to the latest code in the repository, execute this from the command line in the directory containing the working copy:

svn update

There are a number of graphical interfaces to interact with subversion for each operating system. These may be easier to use if you are not as familiar with running commands from the command line.

Contents

[hide]

Common SVN Tasks

Merge Trunk Into a Branch

When working on a branch, the branch may become out of sync with trunk. To get your branch up to date with trunk, you must merge trunk into the branch. Doing so is a four step process.

  1. First, you must find the last revision that trunk was merged into the branch. To do so, hunt back through the commit logs.
    svn log --stop-on-copy
  2. Next, you merge all the changes that have happened on trunk since that time into your branch. To do so, execute the following command, substituting "A" with the revision from step 1.
    svn merge -r:HEAD http://svn.habariproject.org/habari/trunk/ .
  3. When merging, some conflicts may be created. These must be resolved before you can commit again. Information on resolving conflicts is available from the svn documentation.
  4. Once all conflicts have been resolved, just commit your changes to the branch.
    svn ci -m "merge trunk changes from rA:HEAD into branch"

Testing patches that are not yet released

What are diff files ?

A diff file is the difference between a version of the source code in the repository and a working copy. It represents any changes, such as additional features or bug fixes, that have been made by a developer in a working copy. A diff file is generated using the svn diff command.

svn diff > descriptive_name_of_patch.diff

Applying diff files to your copy of habari

You can use the patch command to apply the diff file.

For windows systems, you can obtain a copy of patch from the gnuwin32 project

The common execution is patch -p0 < /path/to/file. This command will apply the differences that have been recorded in file to the appropriate file(s) in the current directory. As one diff file can contain changes for many files, and it preserves relative paths to sub-directories, the format of the diff file may affect where you need to invoke patch.

A practical example of diff and patch

predominantly from Scott Merrill's answer to Michael Bishop's question....

I modified a whole bunch of files for the Site class. When I was done editing, I changed to the htdocs directory of my checkout:

$ cd /home/skippy/code/habari/trunk/htdocs

I then executed svn status to see (and review) a list of all the files in my local copy that I have changed:

$ svn status

After double-checking the list of files, I executed svn diff to output a list of all the changes to all the files. I piped this output into a file:

$ svn diff > /tmp/site.diff

I attached the diff file to an email, or upload it to the Google issue tracker.

You read my email, and save my diff file. You transfer this file to your web host, which is running an SVN checkout of Habari. Because my diff was made from /trunk/htdocs, you should change to that directory in your local copy. (You may need to discern this from the contents of the diff file, if you don't know otherwise.)

Once in the target directory, you execute:

$ patch -p0 < /path/to/site.diff

The output of patch should report what it's doing:

skippy@skippy:~/code/habari/trunk/htdocs$ patch -p0 < /tmp/site.diff patching file system/admin/footer.php patching file system/admin/content.php patching file system/admin/dashboard.php patching file system/admin/header.php patching file system/classes/url.php patching file system/classes/site.php patching file system/classes/plugins.php patching file system/classes/feedbackhandler.php patching file system/classes/post.php patching file system/classes/options.php patching file system/classes/installhandler.php patching file system/classes/user.php patching file system/classes/controller.php patching file system/installer/db_setup.php patching file index.php patching file user/themes/k2/comments.php patching file user/themes/k2/header.php

If there are any problems, patch will report which file(s) failed. Figuring out what failed for each file, and why, can sometimes be challenging.

When you've determined that the patch works (or not!), and you want to go back to your vanilla Habari, you use the Subversion revertcommand. Using the "-R" (recursive) flag lets you easily revert your entire checkout to a clean state:

skippy@skippy:~/code/habari/trunk/htdocs$ svn -R revert * Skipped 'config.php' Reverted 'index.php' Reverted 'system/admin/footer.php' Reverted 'system/admin/content.php' Reverted 'system/admin/dashboard.php' Reverted 'system/admin/header.php' Reverted 'system/classes/url.php' Reverted 'system/classes/site.php' Reverted 'system/classes/plugins.php' Reverted 'system/classes/feedbackhandler.php' Reverted 'system/classes/post.php' Reverted 'system/classes/options.php' Reverted 'system/classes/installhandler.php' Reverted 'system/classes/user.php' Reverted 'system/classes/controller.php' Reverted 'system/installer/db_setup.php' Reverted 'user/themes/k2/comments.php' Reverted 'user/themes/k2/header.php' 

Scripts to Semi-Automate SVN usage

#1173 contains a few scripts to semi-automate patch generation and testing. They were written for use on a Mac. Savvy Linux users will know how to change hdiff as needed.

hreset

hreset

is shorthand for:

svn revert -R *

and proceeding with a mass clean-up of .orig, .rej, .diff and .patch files that might be around.

hdiff

hdiff hdiff 123

are shorthand for:

svn diff svn diff > ~/Desktop/123.diff

hpatch and htest

htest is shorthand for calling hreset and hpatch immediately after it.

hpatch https://trac.habariproject.org/habari/attachment/ticket/132/132.diff hpatch 123 hpatch 123.2.diff hpatch 123 456.diff 

are shorthand for:

curl -s --connect-timeout 3 https://trac.habariproject.org/habari/raw-attachment/ticket/123/123.diff -o 123.diff &&   patch -p0 -l -i 123.diff && rm 123.diff curl -s --connect-timeout 3 https://trac.habariproject.org/habari/raw-attachment/ticket/123/123.diff -o 123.diff &&   patch -p0 -l -i 123.diff && rm 123.diff curl -s --connect-timeout 3 https://trac.habariproject.org/habari/raw-attachment/ticket/123/123.2.diff -o 123.2.diff &&   patch -p0 -l -i 123.2.diff && rm 123.2.diff curl -s --connect-timeout 3 https://trac.habariproject.org/habari/raw-attachment/ticket/123/456.diff -o 456.diff &&   patch -p0 -l -i 456.diff && rm 456.diff