Recover mistyped traffic: Redirect .htm to .html

Have you ever copied a URL and left out the last character? This seems to happen a lot to the “l” (lowercase L) in “.html”. Possibly because it looks like the text insertion cursor “|”. The problem probably comes down to copy-paste mistakes. Regardless, I and others do it all the time. So, let us fix that.

I noticed about 8 % of traffic to my sites’ HTTP 404 Not Found error pages were variants of addresses missing the last character. I found many requests ending in “.htm” instead of “.html”, and found exactly zero requests ending in ”.ht”.

Servers can be smart given proper configuration. Below are instructions for Nginx and Apache servers for how to redirect all traffic missing the last character of the extension to their proper destination.

If you for any reason have one or more documents ending in .htm instead of .html, these will become inaccessible. Rename them .html before installing the below configurations.

Configuration example for Nginx

Redirect all requests ending in .htm to .html:

location ~* ".(htm)$" {
  return 307 "${scheme}://${server_name}${uri}l";
}

Same as above but integrate with Google Analytics:

location ~* ".(htm)$" {
  set $args "$args&utm_source=${uri}&utm_medium=traffic_recovery&utm_campaign=redirect_add_trailing_l"
  return 307 "${scheme}://${server_name}${uri}l?${args}"
}

Configuration example for Apache

Redirect all requests ending in .htm to .html:

RewriteEngine On
RewriteRule ".(htm)$" "%{REQUEST_SCHEME}://%{SERVER_NAME}%{REQUEST_URI}l" [QSA,R=307,L]

Same as above but integrate with Google Analytics:

RewriteEngine On
RewriteRule ".(htm)$" "%{REQUEST_SCHEME}://%{SERVER_NAME}%{REQUEST_URI}l?utm_source=%{REQUEST_URI}&utm_medium=traffic_recovery&utm_campaign=redirect_add_trailing_l" [QSA,R=307,L]

I’m using HTTP 307 Temporary Redirects (using same request method) instead of a HTTP 301 Permanent Redirect in the examples. This tells the browser to always try the original request again before performing the redirect. This is a good choice if you ever want to place a file on a .htm instead of a .html address.

The Google Analytics integration, as shown above, sets the source to the original (L-less) URI, medium to “traffic_recovery”, and campaign to “redirect_trailing_l”. If you want to do similar recovery operations (like redirecting .ht to .html), simply change the campaign name so you can keep track of which redirects traffic is coming from. You can adjust the examples to work with other analytics products.</p

Update: .htmll to .html

Slight variation. Control + L focuses on the address field in most browsers. It’ easy to imagine a Control + L, L, Control + A, Control + C sequence resulting in an extra trailing slash. At least, that’s my guess based on what I’m seeing in my logs. I may also have shared one of those broken links myself. Here is a similar solution that redirects .htmll to .html:

Configuration example for Nginx

Redirect all requests ending in .htmll to .html:

location ~* "^(/.*.html)l$" {
  set $newuri "$1";
  return 307 "${scheme}://${server_name}${newuri}";
}

Same as above but integrate with Google Analytics:

location ~* "^(/.*.html)l$" {
  set $args "$args&utm_source=${uri}&utm_medium=traffic_recovery&utm_campaign=redirect_remove_trailing_l"
  set $newuri "$1";
  return 307 "${scheme}://${server_name}${newuri}?${args}""
}

Configuration example for Apache

Redirect all requests ending in .htmll to .html:

RewriteEngine On
RewriteRule "^(/.*.html)l$" "%{REQUEST_SCHEME}://%{SERVER_NAME}%{REQUEST_URI}l" [QSA,R=307,L]

Same as above but integrate with Google Analytics:

RewriteEngine On
RewriteRule "^(/.*.html)l$" "%{REQUEST_SCHEME}://%{SERVER_NAME}$1?utm_source=%{REQUEST_URI}&utm_medium=traffic_recovery&utm_campaign=redirect_remove_trailing_l" [QSA,R=307,L]