Did you know that recently, Google updated its URL structure guidelines to specify what characters Google Search supports in URLs. In this article, we will learn about the changes made by Google.
Google says that it “supports URLs as defined by RFC 3986. Characters defined by the standard as reserved must be percent-encoded. Unreserved ASCII characters may be left in the non-encoded form. Additionally, characters in the non-ASCII range should be UTF-8 encoded.”
Google has recommended several examples of URLs using UTF-8 encoding, which you can see here, along with what Google doesn’t recommend.
Further, they have explained how to resolve the problems related to URLs, which states:
Resolve problems related to URLs
To avoid potential problems with URL structure, we recommend the following:
- Create a simple URL structure. Consider organizing your content so that URLs are constructed logically and in a manner that is most intelligible to humans.
- Consider using a robots.txt file to block Googlebot’s access to problematic URLs. Typically, consider blocking dynamic URLs, such as URLs that generate search results, or URLs that can create infinite spaces, such as calendars. Using regular expressions in your robots.txt file can allow you to easily block large numbers of URLs.
- Wherever possible, avoid the use of session IDs in URLs. Consider using cookies instead.
- If upper and lower case text in a URL is treated the same by the web server, convert all text to the same case so it is easier for Google to determine that URLs reference the same page.
- Whenever possible, shorten URLs by trimming unnecessary parameters.
- If your site has an infinite calendar, add a no follow attribute to links to dynamically created future calendar pages.
- Check your site for broken relative links.