Writing cloud native software
How to design and create software truly prepared to work in a #serverless environment in the cloud, without losing all your capital.
#cloudcomputing environments proliferate, making hosting data and applications cheaper. Nowadays, #startups can appear all of a sudden and all it takes is a tablet to configure your computing resources.
#cloudproviders, like #AmazonWebServices, #GoogleCloud or #MicrosoftAzure, offer extremely competitive prices and benefits for #startups.
However, never forget one thing: They do it for profit!
If you have already studied some #cloudproviders "offers", either for certification or just for knowledge, you must have noticed a strange pattern in the services they offer. Didn't you notice anything strange? Ok ... The services offered are generally very sliced, segmented, and they charge separately for things that should be provided together!
For example, instead of charging for web hosting, which would already include instances, storage, load distribution and firewall, which are minimally indispensable for any website, they slice these services and charge separately. And worse: the charge rules are not always clear.
Okay, every #cloudprovider has a page or app to estimate the monthly cost of the services you need to use. However, despite this, there are many services with blurred inter dependencies, making it (very) difficult to calculate their estimates correctly. In fact, you are almost always surprised by the charge at the end of the month on your credit card.
And there's no use complaining! All #cloudproviders are like that.
Typical cloud hosting services
Hosting an application on a #cloudprovider means running your software code and storing the data needed for it.
All providers charge for network traffic. In some providers, only outgoing traffic (which leaves the provider for the Internet) is charged, in others, both directions are charged. It is usually in Gigabit per second, but there may be minimal costs (amount charged regardless of the amount of traffic) and there may be discounts.
Whether for storage or computing, there are provisionable services, ie IaaS, and managed services (PaaS). In the case of provisioned services, you pay for the allocated infrastructure and for the time it is active. Managed services, on the other hand, have a different billing model, based on executed transactions, data actually stored and processing and response competition.
Let's start with data storage. There are usually some options for unstructured data, for example: Virtual disks, shared File Systems or Object Storage.
For these mechanisms, the charge is made by size actually stored or provisioned, depending on the mechanism. There may be a minimum billing size and there will certainly be competition charges, that is, the more people and the greater the volume accessed, the more you will pay.
And we have computing services, that is, the means by which you host and execute your software. Likewise, it can be provisioned or managed.
Currently, we have managed options like #serverless, in which you host your code as a function (#faas), and the platform takes care of invoking it. In this case, you pay for the number of transactions (invocations of your function) and for the actual execution time of each transaction. There may also be a charge for the amount of data transferred.
The traditional option is to use IaaS and provision a virtual infrastructure to host your source code and data. You will pay for the time that this infrastructure is active (running) and for the amount of network traffic, in addition to storage.
You can handle the competition by providing elastic scalability, with load balancers and instance self-provisioning mechanisms. Your infrastructure may increase or shrink depending on traffic or the number of users.
Contrasting the provisioned and managed computing modes, it is clear that #serverless can be advantageous for greater cost flexibility, after all, you only pay for transactions and not for hosting the code.
But there are some pitfalls in that! There are some ways for provisioning concurrency in managed mode, in order to provide the necessary scalability for your function and this can increase the cost exponentially.
Traditional programming
If you are a corporate developer, you don't care about this scalability issue. After all, you're going to host your app in the company's data center and damn the rest.
That's why companies are moving to the cloud! They are tired of investing large sums in IT infrastructure, which keeps growing year after year. And this investment is what is called CAPEX: Capital Expenditures or capital investments.
CAPEX means investing money in expanding data center capacities, without being sure that there will be a return on that.
Cloud environments represent OPEX expenses: Operational Expenditures, or operating costs, covered with your company's billing. That is, you spend only to produce your services.
As the cost of On Premises hosting (hosting your app in your own datacenter) is already paid, developers do not worry about optimization. They use and abuse memory and CPU as if there was no tomorrow, after all, response time is the most important metric.
Complex class structures, monolithic applications, use of concurrent programming mechanisms within the code and gigantic loops are part of the most common apps, developed in house.
#cloudnative Applications
Of course, you can take the application model that runs today in your datacenter to the cloud. It's called #liftAndShift migration.
Honestly, lift and shift is something you should never do, as it reduces the benefits of using the cloud, and can bring new complications for you.
Remember what I wrote about "traditional programming"? Yes ... You are bringing code produced without any concern with cost optimization to the cloud.
It is best to refactor your application to be #cloudnative.
But what is a #cloudnative application?
According to Wikipedia:
Cloud native computing is an approach in software development that utilizes cloud computing to "build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds". Technologies such as containers, microservices, serverless functions and immutable infrastructure, deployed via declarative code are common elements of this architectural style.
One of the fundamental aspects to optimize costs in #cloudnative applications is to "flatten" and "dilute" your code. No complex hierarchies, no monolithic modules, no playing around creating #threads and no megalomaniac loops.
Your code has to be divided into small #stateless functions, which collaborate with each other using queues. All scalability, security and transport issues must be left to the infrastructure.
Of course, you can use all this and allocate a VM in the cloud, what we call VPS - Virtual Private Server. However, the larger the footprint of your software, the larger the size of the VPS you will have to allocate, and the more you pay for it. If you choose to make it #serverless, the cost will be per transaction, but there are many things you will have to give up in your code.
I will address here 3 aspects that you should externalize, that is, leave it up to the infrastructure.
Security
Security is an overloaded term in IT. Here, I'm talking about authentication, authorization and encryption mechanisms.
Every cloud provider has declarative and transparent services to cover the various aspects of security. For example, instead of creating a User Account Database, with authorization levels, by which you authenticate and authorize your users, use some mechanism like SAML (Active Directory, Oracle etc.), OIDC (Google) or OAuth (Facebook). This transforms the user's external identity by assigning an identity and permissions to the user, which can be controlled declaratively by the cloud provider's API.
Remove all aspects of authentication and authorization from your code, passing the responsibility to the cloud provider.
Take the work out of encrypting or decrypting things out of your code, relying more on encryption at rest and in transit from the cloud provider.
Scalability
Dude, #scalability is an infrastructure task, so: take that responsibility out of your code!
Many corporate applications launch new threads or processes to scale out, or to process things in parallel. This is harmful and reduces the opportunities for cost optimization in the cloud environment.
In a cloud environment, scalability is horizontal, implemented with load distributors and self-instancing mechanisms. You can and should use the cloud provider's native mechanisms and services for this. Spawning processes only works in fairy tale books. It's a 70's thing and must be avoided.
The same thing goes for parallelization (which is a form of scalability). When we have time consuming tasks and open threads (or processes) to share the processing. Remember that scalability in the cloud must be horizontal!
Make the request asynchronous, with a status route for the customer to know when it is over. The function that processes the request must queue the request (every cloud provider has queue services), so that other worker functions can work in parallel.
Data loops
Another insistent mania of corporate developers are data loops, in which a given dataset is processed from start to finish. It can be a query to a database or a data file.
This type of processing, known as batch, is time consuming and can affect concurrency in your database, in addition to increasing processing costs.
Leaving a VPS running to process batch jobs is a real waste of money in the cloud. And scheduling execution of managed functions for this is also a money waste too.
It would be better to replace this with streams and events. Every cloud provider has storage services capable of generating streams and events, and can invoke functions on demand.
Let's say you want to process the new day's orders to generate a report. Instead of writing a program that scans the tables and runs once a day, start generating an event when each new order arrives. Create a stream and add new data to an output file, which can then be used by reporting software like JasperReports, for example.
Why avoid it? Data loops can consume your provisioned capacity, as well as being very expensive. Better to anticipate as much as possible, generating events when they happen and avoiding further batch processing.
Conclusion
I have several examples, based on my experience and that of others, of how this type of thing (scalability and data loops) increases cloud costs, but I didn't want to go into detail for each provider in this article, just give general advice on how to optimize costs and prepare your application to be #cloudnative.
Cleuton Sampaio, M.Sc., Cloud Architect
C# | Cloud | SQL
4 年Today I creating a solution for data migration using only one Notebook and a FaaS approach thank you for the guideline