When I started this blog I gave my categories the very descriptive id’s of 0, 1, 2 etc. (all categories have a name and an internal id). In hindsight not a very wise decision. The problem with changing it has all along been that I didn’t want to go though all the files manually and change the category id’s (Pebble, being the blogging software I use, stores the posts in XML files).
I know it doesn’t sound like a big deal. Simply stop Tomcat, do a search and replace in the XML-files and I would be laughing. The problem was that the blog is running on a terminal-only Linux box hence no easy GUI for the job and I had never done something like this in a terminal. On Windows I normally use UltraEdit for jobs like this.
I could probably have used the sed/awk programs on Linux but I have never really played around with these and they didn’t look very approachable to me. After the split of the blog and my recent dive into RegEx I thought I’d look into perl and see whether there was an easy way to do. Perl should be very easy to use for jobs like this since it understands RegEx natively.
Well – it was easy and combining the results from the find command with Perl was the solution.
Using find I could write a simple command to give me a recursive list of all the XML-files (posts are stored a <YYYY>/<MM>/<DD> directory structure).
#>find . -name [0-9]*.xml ./2004/11/21/12732398232.xml ./2004/11/23/39283232390.xml ... ...
Using Perl I could write a RegEx to change the text <category>3</category> to <category>mythtv</category> in the file foo.xml in the current directory. It may look a little confusing since the backslashes are necessary to escape the <, > and / characters since they have special meaning in RegEx’s.
perl -pi -e 's/<category>3/</category>mythtv</category>/' foo.xml
Nice. Combining the two and using indirection character ` (“inverted” ping) I could channel the result from the find command to Perl.
perl -pi -e 's/<category>3/</category>mythtv</category>/' `find . -name [0-9]*.xml`
If I had wanted to save the original files as backup copies with the .bak extension I could have changed the command slightly (addition in bold).
perl -pi.bak -e 's/<category>3/</category>mythtv</category>/' `find . -name [0-9]*.xml`
The syntax of the actual Perl RegEx is quite simple and consists of a command, the pattern to use for finding stuff, the pattern for the replacement followed by optional processing instructions.
s/<find pattern>/<replacement pattern - probably using back-references>/g
The ‘s’ at the start is for substitute (i.e. replace) and the optional ‘g’ at the end (I didn’t use it in my command since there is only one category-tag) can be used to do global replacements if you want to replace all occurences in the processed files.
If you want to get going using RegEx I would really suggest the book “Mastering Regular Expressions” from O’Reilly by Jeffrey E. F. Friedl.