I often contribute to StackExchange websites, mainly to the Webmasters StackExchange as SEO is the main topic there.

I plan to go over all of my (successful) answers on the Stack Exchange website and dive deeper in my blog.

Table of Contents

  1. Should I use a canonical tag when only 50% of the content is duplicated?
  2. Understanding the W3C spec and blockquote
  3. The decision making process in using canonical urls
  4. Unique content =/= uniquely written content

ChristianSch asked the following question on the Webmasters StackExchange:

Should I use a canonical tag when only 50% of the content is duplicated?

Christian's problem is a common one with large scale data-driven websites regarding just what to do with partial duplicate content.

Sometimes blocks of content have a 'home base' - the main source of the content to provide information to a user. However this content is useful when used elsewhere on the website when aggregated into a more functional form.

In Christian's case, this is a city guide website covering bars, restaurants and clubs. Christian uses the description from the venue's home website.

Because the venue page contains only partial content, he wonders just if a cross-domain canonical tag would be appropriate for this?

Understanding the W3C spec and blockquote

Fortunately this is not the first time in the history of the universe attribution has occured.

The W3C spec for HTML specifically designed for this use case with the <blockquote> tag.

For extracted content from a source, use this HTML:

<blockquote cite="http://venue-website/venue-description-page" title="Published title, author, date">
Your extracted content

The cite attribute for blockquote content was designed specifically for this use case in the DOM.

Google wouldn't be doing their job properly if they didn't respect the intent of W3C.

There is a very similar answer that highlights the use of blockquote for SEO purposes: Does Blockquote help or harm SEO

Of course the use of other content and ranking a page with borrowed content is very subjective and will be discussed hours on end within SEO circles.

The decision making process in using canonical urls

The choice on whether or not to use the canonical url depends on whether you want to attract organic traffic for example.com/target-url - and more importantly what kind of traffic.

If Christian was trying to compete for the brand name of the bar or venue, he should re-evaluate his content and out of courtesy place a canonical link to the content source.

But he isn't.

bar name review is a perfectly valid search intent to compete for and validates the existence of the page as unique content designed to serve the user.

There are many pages on the internet that do indeed attract organic traffic that use almost exclusively borrowed content. Lists, resource pages and reviews are great examples of this because they primarily serve the search intent.

Unique content does not necessarily mean uniquely written content.

This mind set comes from SEO practices when attempting to scale by reusing the same resources but trying to game search engines that each piece of content was indeed uniquely written.

Offering additional value to existing content such as criticism, analysis, insights or context would be the best way to justify the purpose of partial duplicate content.

The only canonical link Christian should have on his web pages are the ones for his permanent urls.