Functional Cache - high performance on web portals

Functional Cache - high performance on web portals

This article has been originally published on 07.10.2011 at LinkedPHPers blog. Due to problems with Blogspot solution it has been moved to LinkedIn articles area.

I would like to share with some idea of a new caching architecture implemented?COBAEX CMS system prepared by a company I manage - COBA Solutions. It already proven it really rocks - and I think sharing an idea of how to speed up the page generation mechanisms might be interesting.


But getting to the point - the very first question is

Why the standard caching mechanisms are not enough ?

Generally standard caching mechanisms works in a way, that the page that is served by a server is copied either on a client, or on a caching (proxy) servers, and stays there in that way until it is changed. To determine whether the page has been changed or not the server needs to rebuild the page or use some information when it is changed. If the page has not changed - the pre-prepared page or page element is served, in other way - the new version of the page is served (and overwritten in cache).

Much more about this you can read in nice article by Larry Garfield published on his blog - so I think there is no need t describe the whole mechanism in details.

I will maybe focus on the facts why this mechanism is not enough right now. Generally this mechanism focuses only on the page transfer issues. It does not focus on the speed of page generation on the server.?

Additionally this mechanism needs a server to prepare a page many times -?even though?this page might be recreated again and again in the?completely?same, unmodified form and content.

The question I asked my self was why the server needs to recreate the same page again and again ??Why the server can't have the copies of this page and serve only an HTML page ??

The answer was that this is not the way most CMSes are working :) So we decided to change it - and prepare a mechanism, which we called

Functional Cache

The word "functional" in this name is quite important. In simple words - we prepared something like a cache, kept on the server, that is rebuilded (pages or page fragments are regenerated) not basing on the page views, but on the actual system functionality. What I mean is that the cache is rebuilded only if a system functionality (e.g. page edition or some dynamic portal functionalities) triggers it - and only the pages that actually changed are rebuilded (cache is rebuilded very much partially).?

A simple example of such regeneration (based on an actual COBAEX CMS implementation) might be the page?https://www.cobasolutions.com/business_software/en_cms?- this page will be regenerated once: somebody will modify the page content, top or bottom menu has been modified. In case of top or left menu - the page will be regenerated only if specific level of this menu has been changed (actually on this page top and left menu is actually the same menu object with different levels - but this is different topic, I think it's not worth getting into details which it is needed in this article).

Architecture Approach

Implementing such approach needed the complete revision and redesigning of the whole architecture. The "standard" architecture of CMS systems works in a way, that we have 3 elements:

  • an administration panel, that allows us to modify the pages, manage some specific functionalities etc. which actually stores all the changes in the database mentioned later
  • a database server, that actually stores how the whole site looks like
  • a page generation system, that is responsible for creating the pages using some templates (or not :) ) and the database mentioned above

In most solutions all these elements stays on a single physical server.

Generally this means the database is some kind of link between the administration panel and actual page.

In the architecture that uses the functional cache approach we basically have very similar elements, but used in a different way:

  • an administration panel, that allows us to modify the pages, manage some specific functionalities etc. which actually stores all the changed in the database
  • a database server that stores how the whole site looks like
  • additional element - the full page generation system that "compiles" all the pages or page elements basing on the database and templates and saves this to the HTML files. The additional thing that this mechanism is doing is actually publishing the pages on the presentation server mentioned below
  • a presentation server storing and serving pre-prepared HTML pages, that were "compiled" and published by the page generation system

The way it works in most implementations (the larger ones) is that 3 first elements are working on one physical server (that is actually accessed only by site admins - not site users) and the last one is different server (or even standard hosting - as it does not use any sophisticated functionality) accessed by all site users.

The link between the presentation and administration layers is the publishing mechanism - the mechanism implemented on secured administration server, that actually puts the static HTML files (using FTP, sFTP or SCP) on the presentation server.

The main advantages of this architecture are:

  • increased security - site users accesses only a copy of the data - the actual data is kept in secret in the servers not?accessible?by non-authorized people. Generally in case of standard sites the communication between administration and presentation servers is one-directional - administration server can connect presentation server (for publishing purposes), while the presentation server has no access to the administration one. In case of dynamic sites (e.g. community sites) the site users still does not access directly the administration server - they can do it only via presentation one running specific functionality, which orders presentation server to do something on an administration one.?
  • scallability - generally the system is fully scallable due to the fact we can have unlimited number of presentation server. What I mean is that we can add new presentation servers and publish the sites to them?particularly?without changes in the code.
  • "working copies" of the page content - as we are creating basically the pages on the administration servers, we can use the "working copies" mechanism for publishing multiple pages at one time. What I mean is e.g. we can create a whole new section of the site, that covers many pages and work on them using something that we call "temporary server". On this server we have an additional copy of our site, not accessible by all internet users, on which we can work on a page section, and only after it is fully designed and?approved?- order the administration server to publish the page on a presentation server. This means we will not get into the situation, that some pages will be updated, some not on a production site.
  • performance - last but?definitively?not least advantage. I am not sure, if I should in here repeat the whole architecture description, so I will just say, that the site is working in a way, that actually the normal HTML pages are served - so the (presentation) server accessed by site users is basically focused only on file serving, not doing any sophisticated functionality.

Performance tests

Even though?I believe the architecture description shows, that these systems simply must be faster then "standard" CMSes - we decided to make several tests?comparing?the pages created on Joomla and our system.

The test procedure we followed was to install on the same virtual machine Joomla standard installation and our system serving same pages, that were prepared by Joomla team. We used JMeter, restarting the VM before each test - and the results were following:

  • 50 users / 1 second, 10 repeats - both systems failed to serve requests in real time - average request time: Joomla - 10136ms; COBAEX CMS - 1317ms (7,7x faster); request time for 90% of requests: Joomla - 13653ms; COBAEX CMS - 1730ms (7,89x faster)
  • 50 users / 60 seconds, 10 repeats - COBAEX CMS is serving pages in real time, Joomla still fails to serve requests in real time - average request time: Joomla - 3740ms; COBAEX CMS - 26ms (143,85x faster); ?request time for 90% of requests: Joomla - 5914ms; COBAEX CMS - 38ms (155,63x faster)
  • 50 users / 120 seconds, 10 repeats - both systems serves pages in real time - average request time: Joomla - 170ms; COBAEX CMS - 29ms (5,86x faster); request time for 90% of requests: Joomla - 171ms; COBAEX CMS - 26ms (6,77x faster)

These tests shows, that even once the compared system designed in standard architecture is serving the pages in real time - the?architecture?proposed by COBAEX CMS is still faster - even 5,86 times in the worst case scenario.

Implementations

This system is not something brand new - this architecture already been proven on several implementations. I think it might be useful to mention them together with a simple description of the advantages for these specific implementations.

Akademickie Biuro Karier Uczelni Heleny Chodkowskiej (High School Academic Career Portal) - the COBAEX CMS system allowed us to create a site, that is very seucure in the matter of user authorization and CV storing. The actual CVs are stored on an administration server, and presentation server is requesting the needed, single CV only once it is needed. This means that once somebody would wish to get e.g. all the CVs from the system - it is hradly possible, as he would have to brake into administration system to get these CVs. Breaking into presentation system will give him no more then one CV.

Also regarding the authorization - this one is made using LDAP servers working directly in the School network, and the access to that servers is allowed only from administration server. This means that the actual user authorization process goes through 3 servers, which makes it quite secure :)

Uczelnia Heleny Chodkowskiej w Warszawie (High School Homepage) - COBAEX CMS system is used for administering a website with 500+ subpages, using subdomains, different news or menus and administered by 20+ users with full user rights management. The 2-server architecture additionally allowed us to put the administration servers in the closed high school network, while the page itself is served using the standard hosting services.

Generally the management system is centralized not only for this site, but also for several other sites - 2 additional schools and academic career portal mentioned before. And this administration is made using single interface and single sign-on system (while the pages are put on several different hosting sites).

twoje-zdanie.pl?opinion?portal - actually the first implementation of the COBAEX CMS system, that really makes a good use of functional cache mechanisms in scope of performance. The portal allows to add and publish?opinions?about different companies. This means it stores large number of companies, each having the details page, the?opinions?lists,?opinion?details and comments details pages. Once using a "compilation" mechanisms the site is really very fast, even using quite limited resources.

Actually this system was based on a quite old version of this system - the compilation mechanisms we have right now are uncomparable to the ones used on twoje-zdanie.pl - however still this portal rocks in the matter of speed and performance :)

Domoklik.pl real estate offers portal - this is the newest implementation of COBAEX CMS system. The newest version allowed us to create a portal that is actually the?fastest?one in Poland -?even though?having the largest number of offers in the database.

The mentioned implementations are of course not all of the COBAEX CMS implementations. I thought however there's no need to mention all of them (especially that some I cannot even say they are ours :) ) and the mentioned ones already shows, that the architecture is proving itself.

Some final words

OK - the article got quite long - but I hope still interesting.

One very important thing?regarding?the described solution is the fact, that it actually can work in parallel with the?existing, commonly used technologies such as caching proxies, request compression or else. The architecture of functional cache works only on a server side - so there are no problems with adding other, additional optimization methods.

Also the following article describes the general idea - the way how it will be used very much varies on a specific implementation. What I mean is that e.g. in case of Domoklik we are compiling the parts of the page, not the whole pages. The engine however allows us to add full page compilation, which might get us much faster. At the moment however we decided not to implement that - as even?without?that we have a portal, that loads in 1,4 sec :)

要查看或添加评论,请登录

Wojciech Zielinski的更多文章

社区洞察

其他会员也浏览了