Using canonical tag to avoid duplicate content penalties

Having multiple URLs hosting nearly similar or identical content has many disadvantages. First of all, the search ranking score is being divided between the two URLs. Secondly, search engines may elect to penalize the pages if the content duplication, which is what the search engine will think of the situation, is widespread. There are other disadvantages that are not listed as well.

The canonical URL tag may be helpful in these situations:

  • A website uses a slightly different GET variable to display the same content with minor variations. For example, http://ww2db.com/person_bio.php?person_id=1 displays Admiral Isoroku Yamamoto’s biography and http://ww2db.com/person_bio.php?person_id=c1 displays the same biography specifically a Japanese officer. The latter may point to the former as its canonical. This is also helpful in the situation where session IDs, login IDs, etc. are being carried in the URL as GET variables.
  • A website uses a different URL for its mobile content or print-only content. For example, if http://www.example.com/mobile shows the same content as http://www.example.com except for different page layout, then the mobile page may point to the main page as its canonical.

Below is an example of how to establish a canonical. Note it resides in the HEAD section of a web page.


<html>
<head>
	<link rel="canonical" href="http://ww2db.com/person_bio.php?person_id=1" />
</head>
<body>
	<p>This is the biography for the Japanese officer Isoroku Yamamoto.</p>
</body>
</html>

As you can see, in the above example, the biography of Admiral Yamamoto as a Japanese officer points to his regular biography page as its canonical, thus all the search ranking related information for the variant version of the page goes back to the original version of the page.

It works very similar to a 301 redirect, although not as comprehensive and not as absolute. Nevertheless, it may be helpful as it may cause less headaches to the web developers, especially if the website’s size is large.

It is important to note that the canonical URL must reside on the same root domain as the variant. For example, the Japanese officer version of the biography as listed above cannot point to http://www.combinedfleet.com/officers/Isoroku_Yamamoto as its canonical as it is on a different domain.

The canonical URL tag is a hint for search engine crawlers; it is not a directive, thus it may or may not be absolute that search engines will follow it. However, major search engines such as Google, Yahoo, and MSN Live Search had noted that they will respect this hint.

HTML meta tags guide with focus on SEO

Meta tags are HTML elements used to provide some metadata (data about data) about a web page. They are overwhelmingly used for search engine crawlers as opposed to for users, thus they are often viewed as vital component of search engine optimization (SEO). Meta tags should reside in the HEAD portion of your HTML code. Some of the meta tags, particularly those important in SEO work.

Keywords

This is one of the most heavily used meta tag for SEO purposes. You should provide a few keywords relevant to the particular page, and only the relevant ones only. You should not include keywords only because they are popular, as the long term potential negative effects of keyword stuffing far outweighs any short term benefits you may gain.


<meta name="keywords" content="cooking, boiling, egg, recipe" />

Robots

As the name suggests, this tag is not for humans. Instead, it provides some instructions for search engine crawlers. Some possible values are as follows.

  • index: Recommends the search engine to include this page from its index. This can be considered a little bit useless, as normally the crawler is out to index your page anyway. It does not hurt to include it, however.
  • noindex: Recommend the search engine to exclude this page from its index.
  • follow: Recommend the search engine crawler to follow all links found on this page. Like index, this can also be considered a little bit useless, as it is a normal function of a crawler regardless of this instruction.
  • nofollow: Recommends the search engine crawler to not follow any links found on this page.
  • noarchive: Recommends the search engine to not cache a copy of the page in its archive.
  • nosnippet: Recommends the search engine to not cache a copy of the page in its archive (this portion is same as noarchive), and also recommends the search engine to not display any snippets of the page’s content on the search results page. This may be useful if you are under strict intellectual property limitations that none of the page’s content can be displayed anywhere other than your site.
  • noodp: Recommends the search engine to not place the Open Directory Project description of your page next to the search results.
  • none: This is equivalent to “noindex, nofollow”.

Should you wish to use more than one value, use a comma to separate them. Spaces are ignored, so they make no difference. Also, the values found in the robots tag are processed without case sensitivity (eg. “noindex” and “NoIndex” are considered the same). Some examples are listed below.


<meta name="robots" content="none" />
<meta name="robots" content="index, nofollow" />
<meta name="robots" content="nofollow, nosnippet, noodp" />

Revisit-After

This allows you to recommend the crawler to return to the same page for another round of indexing. Like most other tags, this is just a recommendation to the search engines and their crawlers, so you should not expect this to be followed exactly. Along the same lines, you should also note that most major search engines do not obey this meta tag at all; they will re-index your pages when they deem necessary.


<meta name="revisit-after" content="15 days" />

Description

The description meta tag provides a space for you to provide a very brief description of the current page. This is not a recommendation to the crawlers like the ones mentioned thus far. This information is used by some search engines to display as the brief description on the search results page, so it is important for you to provide something concise and to-the-point so that users will know whether your page will give them the information they want. If you are unsure what to put in here, just put in whatever you have for the title tag (not discussed in this article) for that page, as shown in the example below.


<meta name="description" content="How to boil an egg" />

Content-Type

This informational entry describes the content of the page. There are too many possibilities to list all possible values for content-type, but the examples below should illustrate the usage.

Example 1: Notes the page is a HTML page, using the iso-8859-1 character set.


<meta http-equiv=Content-Type content="text/html; charset=iso-8859-1" />

Example 2: Notes the page is actually a .xls file (Microsoft Excel spreadsheet), so the user should use the appropriate program to handle it instead of trying to interpret it as HTML.


<meta http-equiv=Content-Type content="application/xls" />

Content-Language

This provides information on the language your page is written in. The following three examples are for American English, Traditional Chinese, and German, respectively.


<meta http-equiv="content-language" content="en-us" />
<meta http-equiv="content-language" content="zh-tw" />
<meta http-equiv="content-language" content="de" />

The two meta tags below are not really related to SEO, but it might be useful to know them in any case.

Generator

On the surface, it tells of the software package used to generate the page, particularly if the software package uses any proprietary stuff in the code. However, since HTML code is mostly standardized, this tag is not really useful even though it is still very popular as many software packages still make use of them. It may be cynical to say this tag is no more than advertisement for the software packages.


<meta name="generator" content="HTML Software v0.1.2.3" />

Pragma

The only option that I know of for the pragma is no-cache, which recommends the visitors’ browsers to not cache this page locally. This is useful if you have a page with content changes very frequently (ie. more than once for each visitor within the same visit) while the URL does not change. Try this meta tag if your users complain that they are always seeing stale data but you cannot reproduce the experience on other machines.


<meta http-equiv="pragma" content="no-cache" />

Additional Information

Author, contact_addr, and copyright meta tags allow you to provide some information about the page. They are purely informational and do not provide any recommendations to the crawlers.


<meta name="copyright" content="Copyright © 2008 C. Peter Chen" />
<meta name="author" content="C. Peter Chen" />
<meta name="contact_addr" content="email@me.here" />

Performing 301 Redirect with Apache, IIS, PHP, and others

Apache

If you use Apache with the mod-rewrite module installed, 301 Redirect may be set up by editing your .htaccess file found at the root directory of your website. The following is a sample of what you might add to your .htaccess file; in this example, we are redirecting any visits to /blah/pictures.html to its new location, http://ww2db.com/photo.php.

Redirect 301 /blah/pictures.html http://ww2db.com/photo.php

Sometimes you may encounter a situation where you changed technology of your platform, for example, from static HTML pages to dynamic PHP pages. With the sample code below, you can redirect all HTML pages to PHP pages by the same name.

RedirectMatch 301 (.*).html$ http://ww2db.com$1.php

Looking to use .htaccess to redirect an entire site because you have moved from one domain to another? This is what you want to do:

Redirect 301 / http://ww2db.com/

PHP

If your site’s pages are made of PHP files, redirection is fairly easy. If you had moved the /old/pictures.php page to the new location of http://ww2db.com/photo.php, modify the /old/pictures.php page by adding the following PHP snipped on the top of the code, ie. before any HTML code is written to screen.

header("HTTP/1.1 301 Moved Permanently");
header("Location: http://ww2db.com/photo.php");

IIS

If you using Microsoft’s Internet Information Services (IIS), perform the following steps to do a 301 Redirect.

  • Right click on “My Computer”, go to “Manage”, “Services and Applications”, and then “Internet Information Services (IIS) Manager”.
  • Under “Web Sites”, look for the particular folder or file you wish to redirect. Right click and select “Properties”.
  • Click the “A redirection to a URL” radio button. In the bottom section that had just appeared, and enter the new URL in the “Redirect to:” text box, and make sure to check “A permanent redirection for this resource” checkbox.
  • Click OK to complete the process.

ASP and ASP.NET

Classic Active Server Pages (ASP) web pages can be redirected in a method very similar to PHP above.

Response.Status="301 Moved Permanently"
Response.AddHeader "Location","http://ww2db.com/photo.php"

ASP.NET redirect code is very similar to its ASP predecessor’s.

private void Page_Load(object sender, System.EventArgs e) {
  Response.Status="301 Moved Permanently";
  Response.AddHeader("Location","http://ww2db.com/photo.php");
}

JSP

Redirection for Java Server Pages (JSP), again, is similar to the other scripting languages we have seen above.

response.setStatus(301);
response.setHeader("Location","http://ww2db.com/photo.php");
response.setHeader("Connection", "close");

CGI/Perl

If you are using Perl-based CGI code, this is the code you may wish to deploy to perform a 301 Redirect.

$q=new CGI;
print $q->redirect("http://ww2db.com/photo.php");

Cold Fusion

ColdFusion pages can be redirect also by working with the header.

<.cfheader statuscode="301" statustext="Moved permanently">
<.cfheader name="Location" value="http://ww2db.com/photo.php">

Ruby on Rails

Not surprisingly, Ruby on Rails pages are redirected in a very similar manner.

def old_action
headers["Status"]="301 Moved Permanently"
redirect_to "http://ww2db.com/photo.php"
end

How to get more external links

External links are one of the most important factors when considering a site’s rank within search engines, particularly with the major engines (Google and Yahoo!). A site with a large number of quality links makes a statement to the search engines that the site is an authoritative resource on its subject area and is usually rewarded with stronger rankings on its targeted keywords. Links from sites like CNN.com, NBC.com, AOL.com, or MSN would create this type of presence.

So, how do you get a link from the BBC? It would be nice if they just linked to anyone who asked, but it isn’t that easy. You need to offer a product or say something newsworthy so that they have reason to write an article about you. Doing so, would almost certainly provide your site with hundreds of other links, since other blogs and smaller sites also link to what you’re offering. But, getting the attention from a behemoth like the BBC is very difficult, so I recommend seeking out other opportunities to increase your link popularity:

  • Check out who’s linking to your competitors – Usually you will find that sites who link to your competition also offer an opportunity for similar sites. Of the big engines, Yahoo! displays the largest list of external links as well as the opportunity to export them into a spreadsheet for further analysis. So, hit Yahoo! and type in link:http://ww2db.com. Click “Inlinks” for the full list. If you find any sites that appear to offer a strong linking opportunity (by either that site’s link quantity or PageRank), feel free to contact them. Getting links from sites with a “.edu” or “.gov” extensions are almost always valuable links, so seek those first.
  • Create a blog – If you are having trouble creating unique and clever content – some products don’t lend themselves that way – then consider a blog. Creating a separate domain that is cross-linked with your main site offers a great deal of SEO value, and a blog is an excellent platform for either commenting on present news or repurposing other content you find surfing around. A few places to look for the latter include article repository sites like EzineArticles.com and GoArticles.com. For an interface, my preference is WordPress since it provides the most customization and offers the excellent All-In-One SEO Pack and aLinks plugins.
  • Look at online directories – There are a few directories which provide value for SEO. Those you should look at include the Yahoo! Directory ($299/year), DMOZ.org (free, but tough to get into), BOTW.org ($90/year), and Business.com ($299/year).
  • Buy links directly – Some of the very competitive online spaces take link acquisition a step further by purchasing links from blogs, newspaper sites, and other resources. There are services that act as a third party in these situations, such as TextLinkAds.com and TextLinkBrokers.com, but the search engines are aware of these services and can sometimes penalize sites that participate in link brokering schemes. My suggestion is to contact the top sites that rank for the terms that you want to, and see if they have a Links page where you can purchase a link. This indirect way of gaining links usually falls below Google and Yahoo!’s radar.

Attaining links is the key to the top of the SERPs (search engine result pages). Once you’ve got some solid content and the rest of your site set up, get going on getting them. It can be a daunting process, but the more sites you contact, the more links you’ll end up with.

And speaking of which, I’d like to take this time and snag a freebie for my online games site.

This article was written by Russ Ain, Search Analyst for Overdrive. He’s an avid participant of email management and may consider his Boston PR agency as a write-in candidate for the next election.

Keyword Research – Why and How

Prior to composing content for a site or writing meta data, you need to determine the keywords that your copy will revolve around. Sure, you know that your site is an e-commerce store for plasma televisions, but which keywords should you focus on? Are people searching by brand? And, if so, are they typing “pioneer plasma tv”, “pioneer plasma television”, or simply “pioneer plasma”. This is what keyword research will help you establish – a fairly healthy list of keywords that you can make sure your content and external links include.

So, where do you start? Well, I’ll go through four top keyword research tools that I use – in order of preference. They aren’t all free, but if you are serious about SEO, it may be worthwhile to expand your budget for this process.

Let’s begin:

  • Google Keyword Research Tool (free) – This is an excellent source of keyword popularity. Why? Because the data comes directly from Google! It doesn’t provide specifics in terms of exact annual searches, but gives you a range (i.e., 0.06, 0.13, 0.2, etc.). The way it works is you enter a list of keywords (max. 40), it spits out stats for those, as well as other terms that Google thinks are related. The dropdown boxes provide a host of options but I usually use the “Avg Search Volume” metric and then choose “Exact” from the right hand menu – otherwise, it gives you traffic for a phrase via Broad or Phrase match*.
  • Keyword Discovery ($600/year) – This is the better of the two paid tools, despite its higher expense. Their tagline, in terms of where they pull their stats from, is “a variety of search indices and log files” – whatever that means. However, it is a well-regarded service, and also provides concrete numbers for search volume. They do offer a free service, but it only provides search data for the top 10 most popular keywords they deem similar to the entered term. When you pay, it provides data for 100’s of phrases and you also have the opportunity to export this data into Excel. Another benefit of paying is that you can import up to 500 words and check their numbers. I use this service’s metrics in conjunction with the aforementioned Google tool for a second opinion (plus it offers hard numbers, which the boss will prefer!)
  • Google Trends (free) – Another useful Google tool which provides a graphical comparison of search traffic for multiple keywords. It also designates their popularity based on country. Provided that you are entering fairly popular terms, the results are very useful for comparing potential traffic. There is, however, nothing you can export and also no actual numbers associated with the traffic.
  • Wordtracker ($329/year) – Another tool with both a free and paid version. This one specifically tells you that it gets its data from 90 days worth of Metacrawler and Dogpile searches. Since those two engines have such tiny market shares (seriously, who names their search engine ‘dogpile’? I mean, c’mon!), the numbers spit back by searches aren’t very large. Their free version allows you to see the top 100 search terms per your input, while the paid lets you see the top 1,000 and also parse through other types of data. Just based on its sources, I prefer the larger dataset of Keyword Discovery.
  • SEO Book Keyword Tool (free) – This tool was created by Aaron Wall, who is considered one of the luminaries of SEO. He has a very highly-regarded blog, so take a look at www.seobook.com. His keyword tool pulls data from WordTracker and then estimates daily searches for Google, Yahoo! and MSN based on each engine’s market share. It’s a fairly decent comparison tool and you can export the data to Excel. My issues are that it uses the semi-weak WordTracker database and only gives you five or so alternate keywords.

Other sites may suggest that they have the ability to perform keyword research, but if it’s only a few hundred dollars (or less), I can almost guarantee that they are simply pulling data from one of the above services. Some large SEO or advertising firms may have crafted their own proprietary keyword research tool, but then you’ll probably have a multi-thousand dollar contract in order to take part in their services.

A final tip to leave you with – when you’re researching keywords, keep in mind the phrases’ legitimacy. When people use Google, they don’t always search with proper English, so be careful each phrase on your list can be appropriately used in site copy. Do keep those oddballs though, but use them strictly for your PPC campaigns.

* These terms are used within Google Adwords’ PPC program. Say you want to target the term “seo boston“. If you set Adwords to Broad match, then you will be targeting any phrase that includes any of those words – seo, seo boston, boston – the bids will be very high for terms like that. Phrase match is much more restrictive and requires that the actual keywords you are bidding on are present within all content, so an example phrase would be “great web design boston” or “web design boston ma.”

This article was written by Russ Ain. Russ is a search analyst for Overdrive Interactive. He also writes for his work blog. In his free time, he enjoys some online games.

SEO 101

Search engine optimization, also known as “SEO”, is a subject that emanated from the early days of Yahoo! and has progressed quite far since then. Google has since become the choice engine of many (and probably 99% of our readers). Everyone knows how to get to it (google.com, duh) but many don’t know how to cater your site to appearing at the top of the natural results (i.e., what’s below “Sponsored Search”). That’s where SEO comes in – the optimization of your website for search engines.

Let’s take a look at some tactics which do (and don’t really) assist with attaining that elusive #1 ranking.

Important

  • Content – Google is a fairly intelligent search engine – because Page and Brin are smart guys. Well, not only that, but their intention with Google was to make an engine that not only could read HTML and parse out the key words, but can also try and figure out what a page is about. Thus, the textual content on your page is really key. The more you can gab about your subject area – nursing, computer games, BMWs, day trading – the happier the search engines will be with your site.
    • Have as many pages that make sense for your subject.
    • Focus each one around a specific theme.
    • Include roughly 200-300 words per page.
    • Search engines can’t read JavaScript, Flash, video, or pictures. You are welcome to use these, but make sure there is some HTML text that describes what’s happening.
  • External/Back links – Another way that Brin/Page thought would create a great engine would be to create an authority system – they called this PageRank, which is simply the quality (and quantity) of links that lead to a site/page. Got a link from CNN, ABC.com, and AOL’s home pages? You’re in an envious position. .EDU and .GOV sites are also powerful places to get links from. One college scholarship site I used to work on began ranking in the top 5 for the terms “scholarship” and “scholarships” simply because I was able to acquire a few links from some very strong .GOV sites.

    Having trouble getting links? Make sure you’re linking to each one of your site’s pages from all the others. And include the keyword you are targeting for the subsequent page in the link text (also called anchor text). We’ll discuss some tactics for acquiring links like these in a future piece.

  • Title tags – These are the words contained in the link you click on in the search result and are also at the top of your browser when you open it. Since it is at the very beginning of your code, Google uses this to get a sense of what your page is about.

    There are many theories on the optimal way of writing title tags. Mine is to include a few keywords about your site/page and then your company or site name. Keep in mind that the Title is not only for search engines but it also needs to encourage users to click on it within the SERPs (search engine result pages). Even if you are ranked #1 – people don’t click, you don’t eat.

And here are a few areas that aren’t so important:

  • Meta keywords – Back in 1998, all you really had to do is have keywords in your meta tags and content. It didn’t matter if it made sense; it just needed to have the keywords you wanted to rank on. The engines are (thankfully) more sophisticated and meta keywords matter little for the major engines. Should you include them? Yes, but I’d rather spend time composing a new article focusing on another keyword you’d like to rank on.

    That said, the meta description does offer some value. Not so much from a keyword perspective. But more so, it is usually the text that people see in a search result. So, make yours inviting and descriptive. Make it so the user is encouraged to click on your SERP. That’s the main benefit.

  • URLs with keywords – This is another tactic that has lost value over the years. If your domain has keywords, it can serve some value from people typing it in within their browser bar, which can result in some easy links. But creating a URL structure that includes keywords is a fairly simple thing to do – and Google’s goal is to provide higher rankings for tasks that require more effort, such as acquiring good links and writing informative content. It doesn’t hurt to include your keywords within the URL structure; just don’t expect your rankings to jump from doing just that.

The above is a start of where you should really focus your time on. In my next few pieces I’ll review some more specific areas that you should look at when creating your
site, linking tactics, and keyword research.

This article was written by Russ Ain. Russ is a search analyst at a Boston SEO agency. He keeps it real with some service oriented architecture and rocks out with email archiving.

Simple HTML tips for better search engine placement

We should make good use of the H1 and H2 tags. H1 carries greater weight, but it should only be used once on each page. We suggest using the H1 tag for either a list of meaningful keywords for that particular page, or use it for the website’s title. The H2 tag can be used several times on a page, though there probably should not be too many occurrences. We should use H2 tags for our page titles and important headers and sub-headers on the page.

Although Google does not look at META KEYWORDS and META DESCRIPTION tags, fill them in. Google may be the big player at the time of this writing, they are not the only player in town, so make sure you use these two fields. We should not overload these two meta fields; use a short list of quality keywords. Another obvious one is the TITLE tag. We should make sure it is used, and each page should have a title that is meaning for that particular page; also, should we regularly have long titles, make sure the first 10 words or so contain words and phrases that will grab visitors’ attention. Below is an example of the META tags we just discussed, taken from the World War II Database website.

<head>
<title>World War II Database: Your WW2 History Reference Destination</title>
<meta name="description" content="World War II Database: Your WW2 History Reference Destination">
<meta name="keywords" content="ww2db, history, military, world war, ww2, photo, photograph, panzer, yamato, pearl harbor, stalingrad, okinawa, iwo jima, normandy, d-day">

...
</head>

Finally, we need to make sure most, if not all, of our IMG tags should have descriptive ALT parameters. Here is a quick example:

<img src="images/picture.jpg" alt="US Marines and US Navy Corpsman raising the American flag on Mount Suribachi at Iwo Jima">