Sitemaps for Dummies

I was recently again "planning" to make a blog +portfolio website, while doing so, saw this term sitemap, then I recalled that I had seen this word attached with XML somewhere. Then started the YouTube search as always. From there got to know a lot of stuff, let me condense it down for you, if you are interested in knowing about it as well like me.

So, first of all, have you done any online shopping recently, then I think you might be able to relate to the feeling of getting lost in the ocean of products with their reviews but still with a brave heart we go through those uncharted territories with the belief that we have got the top level navbar on our back. Even if we land to a corner of the online site, we will somehow bounce back with the help of navbar on the top.( For people who were not able to get it, I was talking about the different kinds of options we get from the navigation menu as well as those helpers which indicates in which part of the site we are in).

Now put yourself in a similar shoe to a web crawler which needs to index your website, it has to go through all the parts of your site but it can only do so through links and other meta contents, and crawlers only have finite resources to do their job. And we need to help them do their jobs well, because it will help our site rank better in search engines. So, what as we developers or creators of the site do, such that crawlers can do their job more efficiently.

Well, we can create something called sitemap.xml file and upload it to our site, so that it is reachable. An example would be www.abc.com/sitemap.xml. (As a side note, we don't need to name the file as sitemap.xml , we can name it something else as well even without the .xml extension, we just need to let search engines know where to find our sitemap file, I am not an expert in this but based on my findings, we can run automation scripts to get the parts of the site which we want to index, and host it in our site without any manual intervention as well).

In the xml file, we can list down all the urls which our website would like to index on, with the priority of each url. But there is a caveat to only using a single xml if you have a lot of links to expose. The thing is a single xml file can only expose at max 50,000 urls or max 50 mb of data, so what if we want to break these limits.

Let me help you come to the answer. So, our main problem was that we wanted to help the crawler do its job well, so for that we created an index of all the urls of our website, so that the crawler can navigate through our website more efficiently and index our website properly for search engines.

Since we can create an index of all urls and put it in a file in our website, can't we also create an index which points to these files location and let the crawler do its magic. Well, as you have guessed, yes, we can. We can create an index of theses different xml files location urls and let the crawler refer to this xml file first and navigate through all the other xml files later as it sees fit.

By the way, as a side note there are two kinds of sitemaps, one is the one which we were talking about, it is an xml based sitemap. But the other is called an html based sitemap. Its somewhat related to my introduction in the beginning of this article, html sitemap is nothing but an html page which is somewhat similar to xml sitemap but is mainly meant for the user. Its mainly to improve the user experience. Personally, I have only found them most of the time in the footer section where most of the companies put a lot of links. An example would be www.abc.com/sitemap.html .

This was a very short introduction of sitemaps in general, if you would like to explore more then I would suggest you to refer this link sitemaps, it would give you more understanding of sitemaps as well as the relevant terms which I had given a brief introduction on.

Lastly, if anyone would like to add on anything which I might have missed out. Do comment below and I am always open for constructive criticism.

要查看或添加评论,请登录

Mohit Kumar Toshniwal的更多文章

  • Throttling and Debouncing

    Throttling and Debouncing

    Debouncing and throttling are two of the most important concepts which are widely used for web performance and rate…

    2 条评论

社区洞察

其他会员也浏览了