[Case Study] - Does Google Index Angular Websites?
As a technology company we are often faced with difficult decisions as to what technology to use for various projects. One of the factors for most web applications that comes into play is organic search. More specifically, making sure that the application you build is SEO-friendly and will be indexed by Google. Especially, for content-based websites like a digital magazine. Because I am not trying to promote this particular website I am not going to mention the name but the site is heavily used, produces around 10 articles a day from professional journalists, and historically was on Wordpress and has been ranking well for the articles they produce for almost a decade. We were engaged to make a new platform for them that would be highly scalable and offer a unique experience for its users.
We heavily researched various technologies including upgrading the existing Wordpress multi-site implementation, using React, and using Angular. We were concerned about going with a single page application (SPA) but read many articles on how Google is now able to index them specifically referencing both Angular and React (we were willing to accept that other search engines could not index them well). Plus, we figured Google created angular so surely they of all companies would be able to work with an angular application. Based upon this research and logic we took the plunge and began a year long exercise to build a new type of content management platform originally using Angular 4 and then launching with Angular 6. We quickly ran into a series of relatively significant problems after launching related to social media and organic search (including Google). Let's walk through those problems quickly:
- Link previews didn't work. Sharing articles on social media sites like FaceBook and Twitter (or LinkedIn) became a problem. Most of these sites as you might know use the meta-data of the page to extract the title, description, and featured image. If they cannot find or read the meta-data then you are left with just a boring old link at best.
- Much to our surprise and concern Google wasn't finding our articles either or crawling the site at all. In fact, it was only able to detect a couple of links from the Angular site which were just external includes in index.html. So in fact, none of our valuable content could be found on Google.
Okay, so the first issue with link previews was annoying but the second issue was downright catastrophic. So I reached out to Rand Fishkin and I said, no screamed "HELP!" He was kind enough to respond and to connect me with several really great SEO companies that helped us get to the bottom of the issues quickly. They confirmed that the reports we had read about Google indexing single page applications were inconsistent with their real world experience as well. So we talked through multiple options to accomplish server side rendering of the JavaScript code to provide a more "crawl-able" website and to certainly fix the link preview problem. So our options were to do one of the following:
- Use prerender.io to quickly get our angular app using server side rendering at least for Google and other search engines. However, we were informed that caching and spider timing issues might cause errors in Google Webmaster Tools which may be beyond our control (I'm sure there are workarounds for that but I didn't want to take any chances - it's probably a fine service so check it out yourself to see if it is right for you).
- Get our own headless chrome engine up and make sure that any bots/spiders would go to a pre-rendered version of the website while any browser traffic would go to the non server-side rendered code. This is essentially the same as #1 but we would be more in control of the cache.
- Implement angular universal which would be a relatively significant code rewrite and would require us to contain our application inside of node instead of serving it up the way we were with Azure. However since Google created angular universal to solve these exact problems we felt strongly encouraged to suck it up and go down this path.
So after a couple of months of additional development and painfully wrapping a regular angular site with angular universal we were ready to do a second launch. At some point I'll have to document the trials and tribulations we encountered trying to get our angular app working well inside of node express using angular universal. It wasn't easy but mostly because we had a lot of issues with RX/JS observables and changes that were made there that are incompatible with angular universal. In addition, we had to carefully wrap code that should run just on the server or just on the browser. Any way, let's discuss the results of the second launch and leave that for another day.
The good news is that our link preview problem was easily and readily solved by angular universal. Our FaceBook, Twitter, and other shares were looking beautiful and back to normal since the meta tags (title, description, feature image, etc.) were all dynamic and being served up by node perfectly. So what's happening here is completely logical since a single page app is contained literally in a single page, angular universal ties into the title and meta services of angular to modify that single page's meta data on the fly. So for each route in angular you can specify a new title, description, open graph, and any other tags that you want in the head of the SPA. As the application changes routes, the tags are updated in index.html giving the appearance of a new page (but really just index.html is getting modified).
The bad news (you knew it was coming) was that Google still did not crawl or index our articles well on the site. Why? Because we did not use anchor tags anywhere and we made heavy use of routerLinks (inside div's) of course and not href's in our code (well that was our theory). So regardless of what we did or how we tried to tweak angular universal, Google just still would not crawl the website well. So we made a series of new site directory pages that would be useful for some users as well as for search engines that did use anchor tags and href's everywhere (among other things) and created a collection of sitemap components to dynamically create and maintain sitemaps for our angular app. The result from Google Webmaster Tools follows:
You can see the progress made where we went from no pages to some pages with angular universal and then lots of additional pages in July after we added the dynamic sitemaps to our angular universal application. I can also elaborate further on pages with a lot of routerLinks versus pages with a lot of href links. We fed Google around 5000 pages (2500 using routerLinks and 2500 using hrefs). With very few exceptions Google indexed the pages using href links and ignored AKA did not index the pages using routerLinks. Since GWT doesn't provide me with all of the details I can't get an exact percentage on that but it is strikingly weighted towards the pages with no routerLinks.
So cool, how is any of this going to be useful to other developers? Well, I think the evidence I currently have and what I took from talking to Rand Fishkin and specifically the consultants he connected me to, this is a very common result. So, that doesn't mean you shouldn't use angular or react. It just means you need to plan and architect solutions that consider how you can make sure that your SPA will be crawlable. In order to help a little I also wanted to give you a bit of a feel for how we implemented the dynamic sitemaps (which I think you should definitely incorporate into your angular universal applications).
So first, we went ahead and created a model for sitemaps (based upon Google's schema) as well as shaping it so it will work well with our JSON to XML conversion library. The model looks like this:
We created a route for each sitemap and added a resolve to each route that iterated through the pages/routes of our website and filled our model with the absolute URL's to the "pages" we wanted google to index. We then use the wonderful xml-js package to convert the model to XML and then we simply wrote it to our dist folder.
Finally, we created a component that would redirect our users to the newly created/updated sitemap. Of course for Google we just literally give them the sitemap.xml directly by uploading it in Google Webmaster Tools
So we will continue to monitor the progress of Google crawling and indexing this site but for now we feel like we have a plan that is workable. If any of this doesn't make sense or you find yourself struggling with your own angular or react website please feel free to reach out to me here on LinkedIn. I do my best to respond to any non-spammers that reach out to me here because I know that I certainly appreciate the help of good folks like Rand Fishkin.
To summarize, there's no need to shy away from SPA's but it is important to stay aware of how you can make sure older and significant technologies (like search engine spiders) can still work with your site. You can make the coolest site in the world but if social sites and search engines can't deal with it then it's highly likely to become an irrelevant cool site.
Tech Executive - Improving Lives through Technology
6 年So there is one other important thing to note ... YES, sitemaps still work.? That is definitely been proven through this case study (at least for us ... we will do them).