John Mueller has been quoted as saying HREFLANG is the most complicated issue to get right in SEO. A study by SEMRush found 15% of multilingual websites to contain HREFLANG errors. Personally having audited many international website, I would suspect this figure to be even higher.
For any website operating across multiple regions / languages the benefits of implementing HREFLANG markup are substantial from an SEO perspective. Firstly HREFLANG markup ensures users are served the optimal content dependent on their language and / or geographic location. Secondly HREFLANG markup prevents potential constraints arising from the serving of duplicate content. Such content can commonly occur when small content variances occur across different regions. Google provides a very comprehensive overview of HREFLANG markup within its Search Central documentation available on the below link:
https://developers.google.com/search/docs/advanced/crawling/localized-versions
In order to achieve the above benefits it is essential HREFLANG markup is implemented correctly. In this post I’m going to list the most common errors I see when auditing international sites with HREFLANG markup and more crucially how to avoid them. While all HREFLANG errors should be avoided wherever possible I’m also going to detail the ones which Google will automatically correct for you, helping you to prioritise which errors you should fix first.
Common HREFLANG Mistakes
Return Tag Errors
Return tag errors result from HREFLANG annotations that don’t successfully cross-reference each other. HREFLANG annotations on one page must be confirmed and successfully matched to those on all other pages which it references. If Page A references Page B, then Page B must link back to Page A. Additionally while no longer essential for best practice all pages should include a self-referential link to itself.
Incorrect Language / Region Codes
In order for Google to correctly interpret HREFLANG annotations both language and country codes must be presented in correct ISO format. All language codes must use the ISO 639-1 format and for more granular regional targeting of content these must use the ISO 3166-1 Alpha 2 format.
A common example of this incorrect code usage is the inclusion of ‘UK’ as a region code as opposed to the correct ‘GB’ code for content targeted for users within the United Kingdom.
Absence of Language Code
While the absence of a paired country code within HREFLANG markup is perfectly acceptable and in many cases desirable where content targeting spans multiple regions, the same is not true for language codes. All HREFLANG attributes must contain a language attribute to ensure compliance with the HREFLANG specification.
Incorrect Order of HREFLANG Values, Country Before Language
In order for HREFLANG markup to validate correctly it is crucial values are included in the correct order. The regional targeting of content should always be included prior to language targeting.
Incorrect Canonical Tags
Self-referring HREFLANG attributes on a page should match the self-referring canonical link. Often I witness the canonical tag on a regional / language specific page set to the domain root. This is incorrect as both markup needs to be in sync as they work hand in hand with the indexation and content targeting signals they provide to Google.
Adding HREFLANG on Error Pages
The simple rule for HREFLANG on error pages (404, 500 etc) is to ensure all such markup is excluded. HREFLANG markup should only be included on pages which you wish to drive users to and appear within Google’s index.
Adding HREFLANG to Noindexed Pages
As with the previous common error, if there is a requirement for a page to not appear within Google’s index then it should not feature any HREFLANG markup, be that on the page in question or included in within references from any other page.
Inclusion of Redirected Links with HREFLANG Markup
HREFLANG must not include URLs which trigger a redirect when directly accessed. All HREFLANG links should return a live page with a 200 header status. Such errors most commonly occur following the retirement of content when subsequent redirects have been implemented. It is important that as part of any content retirement strategy, the updating of HREFLANG markup is also reviewed.
HREFLANG Points to a Page Which Returns a Server Error (404, 500 etc)
As alluded to previously, all HREFLANG attributes must reference a URL which returns a 200 header status, any link returning an error status such as a 404 or 500 will invalidate HREFLANG markup. The inclusion of such errors most commonly occur when a content retirement strategy has not been followed correctly.
What HREFLANG Errors Will Google Allow:
Historically there was a belief that even a minor HREFLANG error could result in all markup being ignored. While this is still the case for many of the errors described in this post, there are a limited number of exceptions where Google will still correctly interpret and process HREFLANG markup where it contains minor errors / does not follow best practice guidelines. Correcting HREFLANG errors can often be time consuming due to required dev resource
Knowing which errors these
Self-referencing HREFLANG Links Are Not Required
One of the most common absences from HREFLANG markup is the inclusion of a self-referencing link. While best practice guidelines still stipulate one should be included, if one is not found to be present then the other HREFLANG attributes will still be read and processed. From personal experience I always find it slightly unusual that such self-referential tags are often excluded as when I am coding I find it less effort to actually include them.
Use of Underscores for Language / Region Targeting
The use of underscores instead of hyphens in language and country targeting, Google’s parser now corrects this inaccuracy.
Relative URLs Are Acceptable but Absolute URLs Are Preferred
To follow best practice guidelines HREFLANG attributes should utilise absolute URL references, however the use of relative URL references will not generate errors.
Utilising Multiple HREFLANG Tagging Methods
HREFLANG markup can be applied to a site via multiple methods, these include:
- Page level HTML tags
- XML sitemap
- HTTP headers
While the utilisation of multiple methods to apply HREFLANG markup is not inherently wrong it is discouraged by Google. Furthermore the usage of multiple methods will both add to code bloat, increase maintenance time and often increase the complexity of debugging tasks.
How to Test HREFLANG Markup
To help avoid the common HREFLANG errors which I have described in this post it is essential testing is completed. HREFLANG testing strategies can be segmented into two methods.
Online HREFLANG Validator
The first of these testing methods is to use an online validator. Examples include: https://technicalseo.com/tools/hreflang/ and https://hreflangchecker.com/. An online validator is useful for testing a single page quickly to confirm whether any potential errors are present. Such testing is useful when adhoc testing is required following a recent technical change to a page.
Website Crawl Tool
In contrast a website crawl tool such as Screaming Frog or DeepCrawl can be utilised to test HREFLANG markup at scale. While this method will provide a site wide level view of potential errors it can be slow to run but the generated output will be extremely comprehensive.
Have you got experience of implementing HREFLANG markup? Did you achieve a successful implementation, what were the most common issues and errors you encountered? Let me know in the comments below.