How much code is duplicated in your code registry?

How much code is duplicated in your code registry?

By the way, this read requires little basic knowledge of docker and git.

Okay so today i am going to talk about a problem, in small as well as bigger organizations. We all are moving towards microservices and a lot of things. Everyone talks about solving scalability, durability blah blah. But somewhere behind the scene, we often struggle about one issue and that is overlooked.

What is that?

Code duplication

Well, that’s code duplication. And i have experienced, this is often overlooked. I will talk about how to work around it. Let image we already have one repo

No alt text provided for this image


So here is the repo, let’s suppose, we already had. Now let’s imagine we are starting a new repo. So what is the shortest way to achieve this?

copy paste?

no no.. wait.





What if we do copy paste?

Here you see the existing repo and new code repo. Both looks very similar but with minor changes. However these components, which are common, but are mandatory to drive an app.

No alt text provided for this image

And usually its about 30 to 60% of the code which are usually common across all similar repositories. And imagine if you have 1000 of microservices, each having their own repo, how much code will be common.

And at large scale do you know what problems you might be facing? How much productivity you loose?

Issues with code duplication

No alt text provided for this image
  • Well as guessed, its hard to manage them. Large lines of code, and more code is simply more mess. Shouldn’t we keep things simple and short?
  • Oh.. there is more .. what about built time? Say for example you have some dependency, that needs to be built. And same thing is built 100 times, even in parallel but it creates other issues like management and not the exact built for each one of them.. And of course on top of that it takes huge time.
  • Well, What if one of the earliest repo had some vulnerability in common code.? Yes.. then all of the new repo will have that vulnerability. Well no one want that. Correct
  • And yeah.. What if you want to change the version of some dependency? You do not want to change at 100s of places.. Right
  • How do you enforce compliances, like lintrc, or any standard directory structure etc etc.. Again every repo comes up with different different structure, which are mostly bad.
  • And yeahh.. One more.. What if you are using a dependency which requires some licensing? You wouldn't want end developers to care about licensing of those.. Right.

So i hope you got some idea about what problems you are already facing because of code duplications.

Okay i think you might be thinking about .. stop the talk.. Show me solution.

Solution

To find the solution, i asked myself. Is that compulsory to keep lib in code repo? Do we really need to commit those?

Well yes.. The solution found to be hidden in docker images and git branches. How? By using base images.So basically these branches act as factory of base images of actual app images. Again what does base image means is that, the base image will have all dependencies and also some common code which requires to run actual application. It will provide dev and build env for actual app image. Basically i just need to add business logic (over the base image) to create complete codebase of actual application..

Branching Strategy

No alt text provided for this image

So i will be talking about the base image branches.

  • Branch Master. It will be Minimal. It will just have hello world program. No dependencies no lib. So the image built out of just making sure that hello world is getting printed. And we will create new branches out of it.

we will have server, cli branch taken out of master

  • Cli branch This branch is the factory to produce base images of cli applications. It will have all cli dependencies and lib for building cli executable. It may also have testing lib etc etc
  • Server Branch. So as you guess we are taking new branches out of master branch to have different direction of our tools to built. Every branch inherit dependencies from parent branch. So parent branch should be minimal. Server will have things like logger and dao layer. Etc. Well we can further divide server branch into more specific kind of servers

Let's further create more branches out of server now.

  • Http branch. So one http or https based servers. That might have libraries related to routing logic eg mux in go or flask in python etc etc. It can also have some api specification tools like swagger specification to code generator. What could be other specific branch out of server?
  • Grpc Yeah.. grpc based server It could have grpc based libraries.. And it can also have protobuf specifications and other things

Now we are done branching.. I will start adding some business logic now. Addin business logic means, that now we will create a new repo cloned from these particular base images. So as you understood base image is something which gives the development and build env, where we add business logic and fire some command to get executable or some build.

  • Webapp and rest api repos. So I cloned http twice to create two new repos. I added business logic to build full fledge complete codebase to develop and build the applications. Of course you understand here, webapp means specific to serve web application apis. And rest api would a third party integration, they might be the same. But i am just giving examples. When they require different set of libraries.

Let create one out of grpc as well

  • Posts repo. One repo that generates the internal microservices say `posts`

Let’s create one out of cli to make good picture

  • Cli client repo. Let it be things like kubectl tools. So cli clients. You could do more classifications like grpc based client or http based. Or may be just none and that interacts with system commands whatever. I just gave examples
No alt text provided for this image


Well you know what these leaves just ref to their parent not by git but by docker base images. the branches are the factory of base images, and those base images are used in actual application repo. Repo can refer to different tags of base images, whenever there is any update in base image. I hope you got some idea. Let’s see how.


Actual code repo

No alt text provided for this image

And yeah the new repo cloned, might have their own feature branches etc etc. that as usual we follow in our day to day. the new code only have business logic. And all dependencies and common code is coming from the base image from which it was cloned. As you see now code is clean, and there is no shit. By the way however, developer can always see whats libraries they are using by exploring and exec into the container running out of base image.

New code

As you saw there is no go.sum or go.mod or when it comes to python, there would be no requirements.txt.. In case of nodejs, there would be no packages.json. Those will be coming from base image only.

Mounting the code into base image running comainter

I will mount pkg and output directories and also config directory to the running container taken out of base image. So once mounted the running container will have complete code base.And we can change code in local and that will reflect in container.

No alt text provided for this image

And yeah.. If a developer want to add new lib or change then they can change and raise PR to the factory branch with new base image tag. So this way we are separating out the business logic and the lib and common code.

And yeah, the one i have just shown is for golang based stack, but similar can be done for other languages as well following similar principals.

Now, Lets see how this helps to solve the problem i discussed above

Solutions of the problems

  • Of course there is very min code to manage. As i showed above.
  • Directory structure will be consisten in peer repos
  • Built time is very fast.. as lib and dependencies are already downloaded and built in base image
  • Every peer repo will have similar directory structure. So smooth understanding. KT and other things will be really smooth
  • And yeah.. You can manage all compliances, security scans, vulnerability checks etc etc all in single place, in base image itself

Now you might be thinking does it really work this way?

No alt text provided for this image


Demo time

Okay so i will show you a small running prototype, how it would work.

Thank you

So yeah thats it, if you want to explore more, here is link to git repo.

https://github.com/codeofnode/mb-go/tree/server

Well i just wanted to give an idea, there could number of possibilities how you can use this idea to solve many of your similar problems.

Pavel G.

Cloud Platform Manager at Resideo

4 年

While this approach saves on build time, it also brings out several issues: 1. upgrade to new versions of language or SDK's may lead to unexpected consequences, therefore it is necessary to provide a list of available base images to developers; 2. if libraries change infrequently, though-and-through rebuild of all base images and all dependent images may not be an issue. As long as all artefacts are accounted for and list their predecessors, and these trees are stored somewhere in a production-grade system. When a library included into base image is not stable, rebuild of all services will be a bigger pain than spending extra 30 seconds waiting for the build to complete for each microservice. 3. Developers do not have control over their own dependencies, which means they neither can lock into a version of a library for compatibility issues, nor seamlessly use upstream pushed few moments ago. 4. Image footprint will grow, even if some libraries are not used. While image size may not be an issue, any extras included into an image increase potential surface of attack. Not saying that this approach is invalid, but one has to weigh all pro's and con's before adapting it.

Nicely captured !!

要查看或添加评论,请登录

Ramesh K.的更多文章

社区洞察

其他会员也浏览了