My 2nd golang program
Mukundaraman V
Technologist, Sr. Vice President - Cloud Data, 2x AWS, Data Platform , AWS, Cloud , Terraform, Big Data, NoSQL, ML, AI
I have another draft blog which says "My 1st golang program not a hello world" . There are a few pending work, hence I am posting this upfront and the next post would become a prequel to this post :-)
Architecture
I was recently involved in an activity to do some basic log processing activity on the VPC flow logs generated inside Amazon AWS. The application life cycle is divided into 3 steps.
- VPC Flow Log Data --> Amazon Cloud Watch
- Amazon Cloud Watch --> Amazon Kinesis
- Amazon Kinesis -- > AWS Lambda --> NoSQL/Any other persistent storage
Objective
The primary objective of the program is to transfer the logs for processing. The secondary objective of the application is to provide a complete serverless architecture/platform wherever possible/applicable.
We wanted to test a minimum of 100G of data to be generated by the VPC logs. Unfortunately, the dev accounts that we possess does not generate the required traffic. Hence the thought of simulating the VPC flow log triggered, I was intrigued by Golang off late and the resultant is the following repo which generates VPC flow log data and pushes it to a Kinesis stream
Source
Components
There are 3 components on this repo
- flowlogs/vpcflowlogs.go --> The program which generates the actual flow log data for a given size of batch N.
- flmain/kinesisproducer.go --> The program generates batch using goroutines, and ingests the records to Kinesis
- flmain/kinesisconsumer.go --> Reads data from Kinesis Stream --> Still buggy yet to fix
The schema which the Cloud watch generates slightly differs from what is being generated but the core vpc log structure is maintained. The structure is as follows:
Structure
type Vpcflowlog struct {
Id string
Version string
Account string
Eni string
Source string
Destination string
Srcport string
Destport string
Protocol string
Packets string
Bytes string
Windowstart string
Windowend string
Action string
Status string
}
Execution!!!
Checkout repo
Copy the folders flmain & flow inside to the GOPATH
go get dependent frameworks like aws
go run flmain kinesisproducer <streamname> <No. of Threads T> <Batchsize N> <Iterations I>
The above execution, if succeeds compilation, will generate an array of VPC Flow log data of size <BatchSize N> for each iteration for a total of <I iterations > in <T threads> For eg. go run flmain kinesisproducer kstream 100 300 200 will spawn 100 threads; each thread will run for 200 iterations and on each iteration 300 new records would be created and a total of 100*200*300 , 6M records will be ingested.
Before opting for batch ,I was ingesting 1 record at a time into the stream.
For 1M records with a batch size of 1000, 100 iterations and 10 threads, it nearly took 18 minutes to complete the ingestion into a Kinesis stream with 2 shards.
After opting for batch ingestion, the same 1M record ingestion with a batch size of 500 (?! Kinesis limitation) 100 iterations and 20 threads took less than 10 seconds to ingest into Kinesis
Hats Off
There were a series of links which really helped me to optimise at various stages and learn golang. I have added a few and haven't noted a few unfortunately. There is sill scope for lots of optimization but would like to hear more from the public forum.
This gentlemen has written a lot about Golang. Though I didn't understand a few, his blogposts were helpful at many places.
Golang nuts helped me to understand and resolve a few issues where I was clueless about.
Nightmares:
Multithreading in Java is my biggest nightmare. Python & Go provides very simplistic multithreading frameworks. Easy for anyone to kick off with a few reads. The more you read on channels, pointers, it required a repetitive reading to understand them well!! Otherwise it is happy GOing
Misc:
In addition to what was stated above, the kinesis producer has some additional info on. Would like to write more posts on that as well!!
- statsd-graphite/grafana integration for metrics collection
- golang instrumentation/profiling
- Compression /Decompression (pending) to Kinesis
Hope you like it or is useful for someone somewhere. Looking forward to write such more adhoc posts!!! Excuse typos!!
Thanks for visiting by!!!
Technology Advisor | Cloud & AI Strategy | Digital Transformation Generative AI | Open AI | Microsoft Azure | AWS | GCP | IIOT | Analytics
6 年Thats Nice Article Mukundaraman V as Golang is new language there are several myth that need to be break though to read more visit : https://bit.ly/2QkAq7n