On Building A Badass Modern Internet-scale App...
TL;DR: Use ServiceFabric
Every generation has their big moments, they all have the sense of being of being lucky for the times they live for the leaps they've made in improving human life. Think about being around during the time the electric bulb was invented, when Wright Brothers flew the first airplane in 1902, or when Beethoven composed The Choral in 1808. The list is endless, we have our moments too. Have you tried out the new McBigData burger from nearby? or the new SnapChat filter? :D Kidding, this isn't about any of that. One of the defining magnificence of our time is the internet and the democratization of information that occurred, as a result. This also leaves the tech industry with a disproportionate power to influence almost every walks of life, due to its close proximity and familiarity to newer developments. One such development is what we call the cloud. While cloud vendors can run you through a gazillion slides on what it is and how it benefits you, the most remarkable thing IMO is its ability to democratize computing for a marginal cost. This means almost everybody out there with an internet connection has the ability to learn almost anything they chose off the internet and massive computing capability is available to them, and some might say a kid in a dorm room has access to more computing power than George H. W. Bush when he was prez in 1990. With enough resourcefulness, there are enough technological resources at anybody's command these days!
This article is my 2 cents on how you may go about if you choose to build an application using/on the cloud that can scale to millions of users no matter where they're and on whatever device they chose. The idea of building something that's true web scale and meeting users where they're without giving them a sense of the complex engineering underneath sounds like fun. Also, as architects sometimes it is easy to get lost in the higher level of useless-abstract-correctness and use words like microservices, containers, or whatever-is-new-and-fashionable, eventually end up sound like marketing; marketing that promises the moon but fails to deliver. This is a break from that; this is about how to deliver the moon.
With that too long an intro, we'll set out to build a prototype of a mini social media analytics app. Imagine this app to be one that a marketer would find useful, say somebody wants to measure the reaction or sentiment of people towards a certain event, product, campaign, political initiative etc.. Our app would do that by searching Twitter for the latest and most popular tweets on this is subject and does a sentiment analysis on each tweet and would aggregate the result for an individual to gauge the public sentiment and trends.
So, what do we need to make this app using the most modern tech and use an architecture model that is used at some of the cutting-edge tech companies with millions of users? We'll learn just that as to what is different in these new age microservices-based app design methods over the conventional monolithic architecture. And, then delve into developing and deploying an application built with microservices-based architecture.
Before we dive deep into the creation of a microservices app, let's understand what existed before it, or what is an alternative architectural approach. It is called a monolith approach. This usually means you have a front end where all of the customer interaction happens and a middle tier with all the business logic and behind that a database that stores persistable data. This works just fine for most applications and in no way is this bad a design. There's no reason to redo your application in most cases just because microservices-based architecture is around. However, if you need low latency communication between components of an application or you need absolute freedom and control across various development groups to work on the feature they are responsible for, or you need scalability that can be finely controlled; microservices is something you could consider. Consider the ability to scale the advertising component of your web application without scaling all of the other components that make up the front end. Or, let your users spend time on the website while the shopping cart component is down; in a monolith scenario, the equivalent is that your entire web front end is down. The pros and cons of this approach are here.
In our discussion, the Twitter Analytics app will have its heavy lifting done by a few micro-services as I'll outline below. We'll have four components making up the entire analytics engine hosted on Service Fabric. Hold on! What is Service Fabric? It is Microsoft's distributed micro-services platform. It takes care of the underlying infrastructure management and allows for easy packaging, deployment, and management of application components or micro-services. We'll get to the fine details of Service Fabric with some context once we explore each component of our application.
The application will have a stateless OWIN self-hosted API gateway that takes in a search query, this query is passed along to a state-full ReadTweets service. The ReadTweets service queries Twitter for the most relevant tweets in the last two weeks. For the ReadTweets service to be able to gather tweets, it needs authentication, this auth token is provided by a stateless service called TwitterOAuthService micro service. Finally, the ReadTweets service calls a stateless SentimentAnalysis Service that calls a Google Text Analytics API to get sentiment score on each tweet. until the tweet stream ends the score aggregation and averaging is done within the ReadTweets service, the state is important for this service as it maintains the average score and the dictionaries are here and a failure would mean polling Twitter to redo a query, that will not be a bad design. I'll clarify this as we delve into the details of each service below.
GatewayService - [You could alternately read MS docs on WebAPI in SF here ] The purpose of this service is to accept a search query into the system from the internet. We can learn some important quirks of Service Fabric with this service. Visual Studio gets you to started with a project template for a stateless Web API communication listener. This is very similar to regular web API controller, query, a query unique id and user identifier is being captured here in my boilerplate ValuesController
public string Get(string queryid, string query, string usersid)
{
string path = "fabric:/TwitterAnalytics/ReadTweetsService";
IReadTweets ReadTweetsClient = ServiceProxy.Create<IReadTweets>(new Uri(path), new ServicePartitionKey(1));
Task.Run( delegate { ReadTweetsClient.TriggerReadAsync(query, queryid,usersid); });
return "value";
}
This method above takes in the values via query string and it's fairly easy to test from a browser with this method. I'll draw your attention to the path and ReadTweets client reference made in the above snippet. ServiceFabric has multiple ways to doing inter service communication, the method used here is called service remoting. A helper mechanism for inter-service communication is done by a naming service. Every service hosted in Service Fabric is recognizable and addressable by this naming service, think of it like how DNS acts for server infrastructure. Using the naming service and service remoting a call is made to the ReadTweets service, using an interface that is exposed from ReadTweets. The ServiceProxy functionality allows the bridging between these two services. Another part of this service is the OWIN communication listener that forms our self-hosted webserver, boilerplate code for this implementation is available here.
Every service has a servicemanifest.xml configuration file, this allows deciding what port is to be exposed for communication as below
<Resources>
<Endpoints>
<Endpoint Name="ServiceEndpoint" Type="Input" Protocol="http" Port="8280" />
</Endpoints>
</Resources>
Now, the OWIN self-host listener, the exposed port, and the service has to be linked together with the naming service
protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners()
{
var endpoints = Context.CodePackageActivationContext.GetEndpoints()
.Where(endpoint => endpoint.Protocol == EndpointProtocol.Http || endpoint.Protocol == EndpointProtocol.Https)
.Select(endpoint => endpoint.Name);
return endpoints.Select(endpoint => new ServiceInstanceListener(
serviceContext => new OwinCommunicationListener(Startup.ConfigureApp, serviceContext, ServiceEventSource.Current, endpoint), endpoint));
}
With that, the WebAPI is accessible using the browser and can be behind a load-balancer.
ReadTweets Service - This is a stateful service responsible for reading tweets from Twitter. We've not had much trouble scaling stateless services so far if there was any session state to be maintained we would do that in a service like Memcached or an external database, but stateful services are not so easy to scale. It's complicated. The way Service Fabric solves this problem is by what's called a Stateful Reliable Service. Stateful reliable service allows creating multiple instances of the same service which can share the memory state. This means the service has high availability in case an instance goes down. Also, it adds the capability to partition a stateful service for scalability, consider you wanting to send queries starting with 'A' to go to a certain instance and each letter of the alphabet to go to its own instance of ReadTweets service.
How does all that work? This is implemented using a new type of dictionary that is available using the SDK called IReliableCollections. These have implementation so dictionaries and queues that get replicated across multiple instances of a service guaranteeing data integrity. Additionally, these are transaction aware, much like databases are.
var tx = this.StateManager.CreateTransaction(); //creates a tx transation object
//looks for dictionary by the name appended by the queryid obtained form the API gateway?; every query will have it's own query id and hence it's own seprate counter
var negativescorecounter = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, float>>(queryid + "negativescorecounter");
//updates the counter or adds 0 if this is the first time the update is happening
await negativescorecounter.AddOrUpdateAsync(stx, queryid + "negativescorecounter", 0, (key, pvalue) => ++pvalue);
//commits the transation and replicates to all instances
await tx.CommitAsync();
The above snippet from the ReadTweets service demonstrated how a simple transaction is performed, this looks fairly simple but the capability and the difference it makes is immense. Conventionally, the robustness and high availability capabilities are backed by infrastructure using cluster and things of that sort. This IReliableCollections means you can write apps built for failure, failure of an instance is absolutely ok in this case; and the communications are low latency because the values are available in memory and not in an external store.
I'll skip the details on how it reads Twitter, but this uses app authentication with Search API to retrieve the tweets. All tweets are then sent to a dictionary attributed to the searchqueryid and then calls the Sentiment Analysis Service that in turn calls a Google text analytics API to get the score.
public Task<Sentiment> GetSentiment(string text)
{
Assembly assembly = Assembly.GetExecutingAssembly();
//an auth key is uploaded to resources folder that is generated
//from the GAE portal that is the auth to call textanalytics API
var resourcename;
string assemblyloc = assembly.Location;
assemblyloc = assemblyloc.Remove(assemblyloc.LastIndexOf('\\'));
//Google SDK requires that the env varible is set to point ot the file
//this is obtained using reflection?
System.Environment.SetEnvironmentVariable("GOOGLE_APPLICATION_CREDENTIALS", assemblyloc + resourcename);
Sentiment sentiment = new Sentiment();
// The text to analyze.
//string text = "Hello World!";
var client = LanguageServiceClient.Create();
try
{
//content is sent to this mentod using service remoting and the result is returned
var response = client.AnalyzeSentiment(new Document()
{
Content = text,
Type = Document.Types.Type.PlainText
});
sentiment = response.DocumentSentiment;
return Task.FromResult(sentiment);
}
catch (Exception ex){ return Task.FromResult(sentiment); }
}
Finally, the score is gathered at the ReadTweets service once the tweet stream ends for a certain search query.
public async Task SentimentScale(float score, float magnitude, string queryid = "test")
{
try
{
var totalcount = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, float>>(queryid + "totalcount");
var magnitudecounter = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, float>>(queryid + "magnitude");
var stx = this.StateManager.CreateTransaction();
if (score < 0)
{
var negativescorecounter = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, float>>(queryid + "negativescorecounter");
await negativescorecounter.AddOrUpdateAsync(stx, queryid + "negativescorecounter", 0, (key, pvalue) => ++pvalue);
}
else if (score > 0)
{
var positivescorecounter = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, float>>(queryid + "positivescorecounter");
await positivescorecounter.AddOrUpdateAsync(stx, queryid + "positivescorecounter", 0, (key, nvalue) => ++nvalue);
}
else
{
var mixedscorecounter = await this.StateManager.GetOrAddAsync<IReliableDictionary<string, float>>(queryid + "mixedscorecounter");
await mixedscorecounter.AddOrUpdateAsync(stx, queryid + "mixedscorecounter", 0, (key, mvalue) => ++mvalue);
}
await magnitudecounter.AddOrUpdateAsync(stx, queryid + "magnitude", 0, (key, value) => value + magnitude);
await totalcount.AddOrUpdateAsync(stx, queryid + "totalcount", 0, (key, value) => ++value);
var count = await totalcount.TryGetValueAsync(stx, queryid + "totalcount");
var magnitudecountercount = await magnitudecounter.TryGetValueAsync(stx, queryid + "magnitude");
await stx.CommitAsync();
}
catch (Exception ex)
{
throw;
}
}
The result is published using another service called PublishResult, this was built using database first entity framework. The reason it is database first is that the queries were coming from a mobile app and that mobile app has a database, and I allow this publishresult service to write to that directly.
public Task<bool> PushResult(string queryid, long count, float magnitude, float positive, float negative, float mixed)
{
using (var db = new XonectMob())
{
bool updateresult;
try
{
var resultitem = db.SearchQueries.Where(s => https://www.dhirubhai.net/redir/invalid-link-page?url=s%2eId == queryid).FirstOrDefault<SearchQuery>();
resultitem.Positive = positive;
resultitem.Negative = negative;
resultitem.Mixed = mixed;
resultitem.Count = count;
resultitem.Magnitude = magnitude;
db.SaveChanges(); updateresult = true;
}
catch (Exception ex) {; updateresult = false; }
return Task.FromResult(updateresult);
}
}
Once the result is published it is ready for consumption from a mobile app as below/via API. The sample screenshot below shows the sentiment score of the search query globalazure during the last Global Azure Bootcamp.
Building a Xamarin Forms app to consume the data on iOS, Android, UWP and Windows Phone to follow. That's when this blog stays true to the modern aspect of the app I promised above.
An alternate approach to using Service Fabric would be to build the same microservices app using containers, just that you'll have to build capabilities equivalent/implement similar components to service remoting, authenticating API calls between containers, caching stateful data outside etc.
You may share your thoughts and views below.