Redirects for the rest of us

I recently migrated my blog from typepad to wordpress, following these directions and employing the help of a friend. This post is for others who make the switch (worth it!) and are looking for a solution to the “trailing .html” permalink redirect issues.

The permalinks for my all of my old posts from typepad looked like this in eyes of Google:

During the migration to wordpress (following directions above), these old URLs now looked identical but lacked the trailing “.html” extension:

Search results were sending people to the old links with the trailing .html extension and they were getting my 404 page. This is a common issue for wordpress pilgrims. Most solutions involve writing regular expressions and working with mod_rewrite. I didn’t have time to learn the nuances of mod_rewrite. I could also manually enter redirects in my .htaccess file, they look like this:

Redirect permanent /2007/01/title_of_post.html

Manually entering these redirects for hundreds of posts is just a crazy, nobody has time for this. Luckily, the manner in which I made the migration provided me with two very useful xml files that were exported from wordpress during different stages of the migration. One with links in the original URL structure with trailing .html extensions and the second without the trailing extension.

Now, I could write a simple python script to extract all of the old permalinks and the new permalinks from these two files. But, there is an even easier solution: DabbleDB.

I signed up for a free trial month and imported the first xml file. DabbleDB automatically parsed the xml file into separate tables based on about 20 different fields in the xml file, e.g. title, author, comments, etc. One of the fields is called “link”. These are the permalinks I need. I exported these links as a Comma-separate values (CSV) file. I repeated this process for the other xml file.

Now I’ve got two alphabetized lists of links, one with the old URL structure and one with the new URL structure. I’m inches away from generating a long list of redirects with no effort. No manual entry needed. No regular expressions and no mod_rewrite. Phew!

For the old links, I’m trying to go from this…

…to this:

Redirect permanent /2007/01/title_of_post.html

So, I opened my list of old links into a text editor (textwrangler) and did a search (””) and replace (”Redirect permanent /”). Now I’ve got a list of all my old links looking like this:

Redirect permanent /2007/01/title_of_post.html

Now I’ve got the both sides of the redirect equation in separate files. I could write a simple script to parse these to files into the list of redirects. But no need for such extravagance, excel will do just fine.

I pasted the list of reformatted old links in the first column and the new list of links in the second column. In a third column, I entered this formula:

=A1&” “&B1

Now I’ve got my list of redirects. I pasted this column into my .htaccess file. Then went to google and did this search: -index.html

Now I could check that my redirects were working by clicking on my links (the “-index.html” removed category links which I haven’t fixed yet).

A few 404 errors remain. These are the posts that I gave a title, saved a draft (permalink created), then change the title prior to publishing in typepad.

Even if I have to manually enter a few links in my .htaccess file, this whole process took 20 minutes. I saved a ton of time. No regular expressions or mod_rewrite. No headaches. No mess no fuss. Not bad. Plus I finally got a chance to play with DabbleDB — what a great tool.


2 Responses to “Redirects for the rest of us”

  1. Rick on June 24th, 2007 12:49 pm

    I didn’t know about DabbleDB. It looks quite interesting.

    Glad you got that sorted out.

  2. Deepak on June 24th, 2007 9:25 pm

    I like the new site (hadn’t visited outside RSS for a while). DabbleDB is cool isn’t it :)

Leave a Reply

You may use HTML tags in your comment. Please be patient, comments may take a while to post.

Subscribe without commenting