Page & Content Metadata #GoogleSEOGuide #s1ep48

Page & Content Metadata #GoogleSEOGuide #s1ep48

Valid Page Metadata

Using valid HTML for page metadata ensures that Google can use the metadata as documented. Google tries to understand HTML even when it is invalid or inconsistent with the?HTML standard, but errors in the markup can cause problems with how your metadata is used in Google Search. The primary element for specifying metadata about a page is the?<head>?element of an HTML document.

?? If you use an invalid element in the?<head>?element, Google ignores any elements that appear after the invalid element.

?? Valid elements in the?<head>?element

? The?<head>?element must only contain the following valid elements (and no other invalid elements), as per the HTML standard:

  • title
  • meta
  • link
  • script
  • style
  • base
  • noscript
  • template

???? Don't use invalid elements in the?<head>?element

? No element other than the aforementioned is permitted by the HTML standard in the?<head>?element.

Common elements that appear in the?<head>?element, rendering it invalid are:

  • iframe
  • img

Google strongly recommends that you don't use these invalid elements in the?<head>?element, but if you must, place these invalid elements after the ones you want Google to see. Once Google detects one of these invalid elements, it assumes the end of the?<head>?element and stops reading any further elements in the?<head>?element.

?? Meta Tags

meta tags are HTML tags used to provide additional information about a page to search engines and other clients. Clients process the meta tags and ignore those they don't support. meta tags are added to the <head> section of your HTML page and generally look like this:

<!DOCTYPE html>
<html>
? <head>
? ? <meta charset="utf-8">
? ? <meta name="description" content="Author: A.N. Author, Illustrator: P. Picture, Category: Books, Price: ?£9.24, Length: 784 pages">
? ? <meta name="google-site-verification" content="+nxGUDJ4QpAZ5l9Bsjdi102tLVC21AIh5d1Nl23908vVuFHs34=">
? ? <title>Example Books - high-quality used books for children</title>
? ? <meta name="robots" content="noindex,nofollow">
? </head>
</html>        

Google supports the following?meta?tags:

? <meta name="description" content="A description of the page"> Use this tag to provide a short description of the page. In some situations, this description is used in the?snippet shown in search results.

? <meta name="robots" content="...,?..."> and/or <meta name="googlebot" content="...,?..."> These?meta?tags control the behavior of search engine crawling and indexing. The?<meta name="robots" ...?tag applies to all search engines, while the?<meta name="googlebot ...?tag is specific to Google.

In the case of conflicting?robots?(or?googlebot)?meta?tags, the more restrictive tag applies. For example, if a page has both the?max-snippet:50?and?nosnippet?tags, the?nosnippet?tag will apply.

The default values are?index, follow?and don't need to be specified. For a full list of values that Google supports, see the?list of valid directives.

You can also specify this information in the header of your pages using the "X-Robots-Tag" HTTP header directive. This is particularly useful if you wish to limit indexing of non-HTML files like graphics or other kinds of documents.

? <meta name="google" content="nositelinkssearchbox"> When users search for your site, Google Search results sometimes display a search box specific to your site, along with other direct links to your site. This tag tells Google not to show the sitelinks search box. Learn more about?sitelinks search box.

? <meta name="googlebot" content="notranslate"> When Google recognizes that the contents of a page aren't in the language that the user likely wants to read, Google may provide a?translated title link and snippet?in search results. If the user clicks the translated title link, all further user interaction with the page is through Google Translate, which will automatically translate any links followed. In general, this gives you the chance to provide your unique and compelling content to a much larger group of users. However, there may be situations where this is not desired. This?meta?tag tells Google that you don't want us to provide a translation for this page.

? <meta name="google" content="nopagereadaloud"> Prevents various?Google text-to-speech services?from reading aloud web pages using text-to-speech (TTS).

? <meta name="google-site-verification" content="..."> You can use this tag on the top-level page of your site to?verify ownership for Search Console. Please note that while the values of the?name?and?content?attributes must match exactly what is provided to you (including upper and lower case), it doesn't matter if you change the tag from XHTML to HTML or if the format of the tag matches the format of your page.

? <meta http-equiv="Content-Type" content="...; charset=..."> or <meta charset="..."> This defines the page's content type and character set. Make sure that you surround the value of the content attribute with quotes - otherwise the charset attribute may be interpreted incorrectly. Google recommends using Unicode/UTF-8 where possible.?More information.

? <meta name="viewport" content="..."> This tag tells the browser how to render a page on a mobile device. Presence of this tag indicates to Google that the page is mobile friendly.?Read more about how to configure the?viewport?meta?tag.

? <meta name="rating" content="adult"> or <meta name="rating" content="RTA-5042-1996-1400-1577-RTA"> Labels a page as containing adult content, to signal that it be filtered by SafeSearch results.?Learn more about labeling SafeSearch pages.

?? HTML tag attributes

HTML tag attributes?are additional values of HTML tags that configure the parent tag. For example, the?href?attribute of the?<a>?tag configures the resource the anchor tag points to:?<a?...>.

Google Search supports a limited number of HTML attributes for indexing purposes. Attributes like?src?and?href?are used for discovering resources such as images and URLs. Google also supports various?rel?attributes?that allow site owners to qualify outbound links.

The?data-nosnippet?attribute?of?div,?span, and?section?tags allow you to exclude parts of an HTML page from snippets.

?? robots meta tag

The robots meta tag lets you utilize a granular, page-specific approach to controlling how an individual page should be indexed and served to users in Google Search results. Place the robots meta tag in the?<head>?section of a given page, like this:


<!DOCTYPE html>
<html><head>
<meta name="robots" content="noindex">
(…)
</head>
<body>(…)</body>
</html>        

In this example, the robots meta tag instructs search engines not to show the page in search results. The value of the?name?attribute (robots) specifies that the directive applies to all crawlers. To address a specific crawler, replace the?robots?value of the?name?attribute with the name of the crawler that you are addressing. Specific crawlers are also known as user agents (a crawler uses its user agent to request a page.) Google's standard web crawler has the user agent name?Googlebot.

? To prevent only Google from indexing your page, update the tag as follows:


<meta name="googlebot" content="noindex">        

This tag now instructs Google specifically not to show this page in its search results. Both the?name?and the?content?attributes are non-case sensitive.

? Search engines may have different crawlers for different purposes. See the?complete list of Google's crawlers. For example, to show a page in Google's web search results, but not in Google News, use the following meta tag:


<meta name="googlebot-news" content="noindex">        

? To specify multiple crawlers individually, use multiple robots meta tags:


<meta name="googlebot" content="noindex">
<meta name="googlebot-news" content="nosnippet">        

?? To block indexing of non-HTML resources, such as PDF files, video files, or image files, use the?X-Robots-Tag?response header?instead.

?? Using the?data-nosnippet?HTML attribute

You can designate textual parts of an HTML page not to be used as a snippet.

? This can be done on an HTML-element level with the?data-nosnippet?HTML attribute on?span,?div, and?section?elements.

The?data-nosnippet?is considered a?boolean attribute. As with all boolean attributes, any value specified is ignored. To ensure machine-readability, the HTML section must be valid HTML and all appropriate tags must be closed accordingly.

Examples:


<p>This text can be shown in a snippet
<span data-nosnippet>and this part would not be shown</span>.</p>

<div data-nosnippet>not in snippet</div>
<div data-nosnippet="true">also not in snippet</div>
<div data-nosnippet="false">also not in snippet</div>
<!-- all values are ignored -->

<div data-nosnippet>some text</html>
<!-- unclosed "div" will include all content afterwards -->

<mytag data-nosnippet>some text</mytag>
<!-- NOT VALID: not a span, div, or section -->        

Google typically renders pages in order to index them, however rendering is not guaranteed. Because of this, extraction of?data-nosnippet?may happen both before and after rendering. To avoid uncertainty from rendering, do not add or remove the?data-nosnippet?attribute of existing nodes through JavaScript. When adding DOM elements through JavaScript, include the?data-nosnippet?attribute as necessary when initially adding the element to the page's DOM. If custom elements are used, wrap or render them with?div,?span, or?section?elements if you need to use?data-nosnippet.

?? Using structured data

Robots meta tags govern the amount of content that Google extracts automatically from web pages for display as search results. But many publishers also use schema.org structured data to make specific information available for?search presentation.

? Robots meta tag limitations don't affect the use of that structured data, with the exception of?article.description?and the?description?values for structured data specified for other creative works. To specify the maximum length of a preview based on these?description?values, use the?max-snippet?robots meta tag.

For example,?recipe?structured data on a page is eligible for inclusion in the recipe carousel, even if the text preview would otherwise be limited. You can limit the length of a text preview with?max-snippet, but that robots meta tag doesn't apply when the information is provided using structured data for rich results.

To manage the use of structured data for your web pages, modify the structured data types and values themselves, adding or removing information in order to provide only the data you want to make available. Also note that structured data remains usable for search results when declared within a?data-nosnippet?element.

?? Block Search indexing with?noindex

noindex?is a rule set with either a?<meta>?tag or HTTP response header and is used to prevent indexing content by search engines that support the?noindex?rule, such as Google.

?? When Googlebot crawls that page and extracts the tag or header, Google will drop that page entirely from Google Search results, regardless of whether other sites link to it.

???? Important: For the?noindex?rule to be effective, the page or resource?must not?be blocked by a robots.txt file, and it has to be otherwise accessible to the crawler. If the page is blocked by a robots.txt file or the crawler can't access the page, the crawler will never see the?noindex?rule, and the page can still appear in search results, for example if other pages link to it.

Using?noindex?is useful if you don't have root access to your server, as it allows you to control access to your site on a page-by-page basis.

?? Implementing?noindex

There are two ways to implement?noindex: as a?<meta>?tag and as an HTTP response header. They have the same effect; choose the method that is more convenient for your site and appropriate for the content type. Specifying the?noindex?rule in the robots.txt file is not supported by Google.

You can also combine the?noindex?rule with other rules that control indexing. For example, you can join a?nofollow?hint with a?noindex?rule:?<meta name="robots" content="noindex, nofollow" />.

? <meta>?tag

To prevent?all search engines?that support the?noindex?rule from indexing a page on your site, place the following?<meta>?tag into the?<head>?section of your page:


<meta name="robots" content="noindex">        

To prevent?only Google web crawlers?from indexing a page:


<meta name="googlebot" content="noindex">        

Be aware that some search engines might interpret the?noindex?rule differently. As a result, it is possible that your page might still appear in results from other search engines.

? HTTP response header

Instead of a?<meta>?tag, you can return an?X-Robots-Tag?HTTP header with a value of either?noindex?or?none?in your response. A response header can be used for non-HTML resources, such as PDFs, video files, and image files. Here's an example of an HTTP response with an?X-Robots-Tag?header instructing search engines not to index a page:


HTTP/1.1 200 OK
(...)
X-Robots-Tag: noindex
(...)        

???? Debugging?noindex?issues

We have to crawl your page in order to see?<meta>?tags and HTTP headers. If a page is still appearing in results, it's probably because we haven't crawled the page since you added the?noindex?rule. Depending on the importance of the page on the internet, it may take months for Googlebot to revisit a page. You can request that Google recrawl a page using the?URL Inspection tool.

If you neeed to remove a page of your site quickly from Google's search results, see our?documentation about removals.

Another reason could also be that the robots.txt file is blocking the URL from Google web crawlers, so they can't see the tag. To unblock your page from Google, you must?edit your robots.txt file. You can edit and test your robots.txt using the?robots.txt Tester?tool.

Finally, make sure that the?noindex?rule is visible to Googlebot. To test if your?noindex?implementation is correct, use the?URL Inspection tool?to see the HTML that Googlebot received while crawling the page. You can also use the?Index Coverage report?in Search Console to monitor the pages on your site from which Googlebot extracted a?noindex?rule.

?? Make your links crawlable

Google can follow your links only if they?use proper?<a>?tags?with?resolvable URLs:

? Use proper?<a>?tags

Google can follow links only if they are an?<a>?tag with an?href?attribute. Links that use other formats won't be followed by Google's crawlers. Google cannot follow?<a>?links without an?href?attribute or other tags that perform a links because of script events. Here are examples of links that Google can and can't follow:

Can follow:

  • <a >
  • <a href="/relative/path/file">

Note that links are also crawlable when you use JavaScript to insert them into a page dynamically as long as it uses the markup shown above.

Can't follow:

  • <a routerLink="some/path">
  • <span >
  • <a onclick="goto('https://example.com')">

?? Link to resolvable URLs

Ensure that the URL linked to by your?<a>?tag is an actual web address that Googlebot can send requests to, for example:

Can resolve:

  • https://example.com/stuff
  • /products
  • /products.php?id=123

Can't resolve:

  • javascript:goTo('products')
  • javascript:window.location.href='/products'
  • #

?? rel?attributes

For certain links on your site, you might want to tell Google your relationship with the linked page. In order to do that, use one of the following?rel?attribute values in the?<a>?tag.

?? For regular links that you expect Google to follow without any qualifications, you don't need to add a?rel?attribute. For example:


<p>My favorite horse is the <a >palomino</a>.</p>        

?? For other links, use one or more of the following values:

? rel="sponsored" Mark links that are advertisements or paid placements (commonly called?paid links) with the?sponsored?value.?More information on Google's stance on paid links.


<a rel="sponsored" >Appenzeller</a>        

Note:?The?nofollow?attribute was?previously recommended?for these types of links and is still an acceptable way to flag them, though?sponsored?is preferred.

? rel="ugc" Google recommends marking user-generated content (UGC) links, such as comments and forum posts, with the?ugc?value.

<a rel="ugc" >Appenzeller</a>        

If you want to recognize and reward trustworthy contributors, you might remove this attribute from links posted by members or users who have consistently made high-quality contributions over time.?Read more about avoiding comment spam.

? rel="nofollow" Use the?nofollow?value when other values don't apply, and you'd rather Google not associate your site with, or crawl the linked page from, your site. ?? For links within your own site, use the?robots.txt?disallow?rule.

<a rel="nofollow" >Appenzeller</a>        

? Multiple values You may specify multiple?rel?values as a space- or comma-separated list.?Examples:

<p>I love <a rel="ugc nofollow" >Appenzeller</a> cheese.</p>
<p>I hate <a rel="ugc,nofollow" >Blue</a> cheese.</p>        


To be continued...

Thank you for learning with us. This episode is based on?the Fundamentals of SEO Starter Guide?by Google.?Remember, you could always follow all of the episodes for "Google SEO Guide" from the document below:

How Will We Do It? ??????

  • We will learn SEO
  • We will learn about Digital Marketing
  • We will not give up
  • We will not stop till we get results

We won’t care what anyone says, and SEO & Digital Marketing will change our lives!!!

What Will We Learn:???????

Click the below link and get the?#LearningSEOsocially?Calendar Season 1&2. Follow this plan and learn with me how to develop?#SEO?skills with free guides, resources, tools, and more.

Follow Eachother to?#LearnSEO?Socially:???????

BONUS:???????

Don't forget to get a list of?29?Must-Have?#SEO?Skills?basic to mastery.


要查看或添加评论,请登录

Naz?m Kopuz ?的更多文章

社区洞察

其他会员也浏览了