登录查看更多内容

How much code is duplicated in your code registry?

Ramesh K.

Cloud and container security

发布日期: 2020年6月24日

By the way, this read requires little basic knowledge of docker and git.

Okay so today i am going to talk about a problem, in small as well as bigger organizations. We all are moving towards microservices and a lot of things. Everyone talks about solving scalability, durability blah blah. But somewhere behind the scene, we often struggle about one issue and that is overlooked.

What is that?

Code duplication

Well, that’s code duplication. And i have experienced, this is often overlooked. I will talk about how to work around it. Let image we already have one repo

So here is the repo, let’s suppose, we already had. Now let’s imagine we are starting a new repo. So what is the shortest way to achieve this?

copy paste?

no no.. wait.

What if we do copy paste?

Here you see the existing repo and new code repo. Both looks very similar but with minor changes. However these components, which are common, but are mandatory to drive an app.

And usually its about 30 to 60% of the code which are usually common across all similar repositories. And imagine if you have 1000 of microservices, each having their own repo, how much code will be common.

And at large scale do you know what problems you might be facing? How much productivity you loose?

Issues with code duplication

Well as guessed, its hard to manage them. Large lines of code, and more code is simply more mess. Shouldn’t we keep things simple and short?
Oh.. there is more .. what about built time? Say for example you have some dependency, that needs to be built. And same thing is built 100 times, even in parallel but it creates other issues like management and not the exact built for each one of them.. And of course on top of that it takes huge time.
Well, What if one of the earliest repo had some vulnerability in common code.? Yes.. then all of the new repo will have that vulnerability. Well no one want that. Correct
And yeah.. What if you want to change the version of some dependency? You do not want to change at 100s of places.. Right
How do you enforce compliances, like lintrc, or any standard directory structure etc etc.. Again every repo comes up with different different structure, which are mostly bad.
And yeahh.. One more.. What if you are using a dependency which requires some licensing? You wouldn't want end developers to care about licensing of those.. Right.

So i hope you got some idea about what problems you are already facing because of code duplications.

Okay i think you might be thinking about .. stop the talk.. Show me solution.

Solution

To find the solution, i asked myself. Is that compulsory to keep lib in code repo? Do we really need to commit those?

Well yes.. The solution found to be hidden in docker images and git branches. How? By using base images.So basically these branches act as factory of base images of actual app images. Again what does base image means is that, the base image will have all dependencies and also some common code which requires to run actual application. It will provide dev and build env for actual app image. Basically i just need to add business logic (over the base image) to create complete codebase of actual application..

Branching Strategy

So i will be talking about the base image branches.

Branch Master. It will be Minimal. It will just have hello world program. No dependencies no lib. So the image built out of just making sure that hello world is getting printed. And we will create new branches out of it.

we will have server, cli branch taken out of master

Cli branch This branch is the factory to produce base images of cli applications. It will have all cli dependencies and lib for building cli executable. It may also have testing lib etc etc
Server Branch. So as you guess we are taking new branches out of master branch to have different direction of our tools to built. Every branch inherit dependencies from parent branch. So parent branch should be minimal. Server will have things like logger and dao layer. Etc. Well we can further divide server branch into more specific kind of servers

Let's further create more branches out of server now.

Http branch. So one http or https based servers. That might have libraries related to routing logic eg mux in go or flask in python etc etc. It can also have some api specification tools like swagger specification to code generator. What could be other specific branch out of server?
Grpc Yeah.. grpc based server It could have grpc based libraries.. And it can also have protobuf specifications and other things

Now we are done branching.. I will start adding some business logic now. Addin business logic means, that now we will create a new repo cloned from these particular base images. So as you understood base image is something which gives the development and build env, where we add business logic and fire some command to get executable or some build.

Webapp and rest api repos. So I cloned http twice to create two new repos. I added business logic to build full fledge complete codebase to develop and build the applications. Of course you understand here, webapp means specific to serve web application apis. And rest api would a third party integration, they might be the same. But i am just giving examples. When they require different set of libraries.

Let create one out of grpc as well

Posts repo. One repo that generates the internal microservices say `posts`

Let’s create one out of cli to make good picture

Cli client repo. Let it be things like kubectl tools. So cli clients. You could do more classifications like grpc based client or http based. Or may be just none and that interacts with system commands whatever. I just gave examples

Well you know what these leaves just ref to their parent not by git but by docker base images. the branches are the factory of base images, and those base images are used in actual application repo. Repo can refer to different tags of base images, whenever there is any update in base image. I hope you got some idea. Let’s see how.

Actual code repo

And yeah the new repo cloned, might have their own feature branches etc etc. that as usual we follow in our day to day. the new code only have business logic. And all dependencies and common code is coming from the base image from which it was cloned. As you see now code is clean, and there is no shit. By the way however, developer can always see whats libraries they are using by exploring and exec into the container running out of base image.

New code

As you saw there is no go.sum or go.mod or when it comes to python, there would be no requirements.txt.. In case of nodejs, there would be no packages.json. Those will be coming from base image only.

Mounting the code into base image running comainter

I will mount pkg and output directories and also config directory to the running container taken out of base image. So once mounted the running container will have complete code base.And we can change code in local and that will reflect in container.

And yeah.. If a developer want to add new lib or change then they can change and raise PR to the factory branch with new base image tag. So this way we are separating out the business logic and the lib and common code.

And yeah, the one i have just shown is for golang based stack, but similar can be done for other languages as well following similar principals.

Now, Lets see how this helps to solve the problem i discussed above

Solutions of the problems

Of course there is very min code to manage. As i showed above.
Directory structure will be consisten in peer repos
Built time is very fast.. as lib and dependencies are already downloaded and built in base image
Every peer repo will have similar directory structure. So smooth understanding. KT and other things will be really smooth
And yeah.. You can manage all compliances, security scans, vulnerability checks etc etc all in single place, in base image itself

Now you might be thinking does it really work this way?

Demo time

Okay so i will show you a small running prototype, how it would work.

Thank you

So yeah thats it, if you want to explore more, here is link to git repo.

https://github.com/codeofnode/mb-go/tree/server

Well i just wanted to give an idea, there could number of possibilities how you can use this idea to solve many of your similar problems.

Pavel G.

Cloud Platform Manager at Resideo

4 年

While this approach saves on build time, it also brings out several issues: 1. upgrade to new versions of language or SDK's may lead to unexpected consequences, therefore it is necessary to provide a list of available base images to developers; 2. if libraries change infrequently, though-and-through rebuild of all base images and all dependent images may not be an issue. As long as all artefacts are accounted for and list their predecessors, and these trees are stored somewhere in a production-grade system. When a library included into base image is not stable, rebuild of all services will be a bigger pain than spending extra 30 seconds waiting for the build to complete for each microservice. 3. Developers do not have control over their own dependencies, which means they neither can lock into a version of a library for compatibility issues, nor seamlessly use upstream pushed few moments ago. 4. Image footprint will grow, even if some libraries are not used. While image size may not be an issue, any extras included into an image increase potential surface of attack. Not saying that this approach is invalid, but one has to weigh all pro's and con's before adapting it.

5 次回应

Ankit Jhunjhunwala

Software Engineer

4 年

Nicely captured !!

1 次回应

查看更多评论

要查看或添加评论，请登录

Ramesh K.的更多文章

Running VM in container, and then container in VM ...

2020年8月8日

Running VM in container, and then container in VM ...

Everyone of us..

1 条评论
Running containers without docker

2020年7月26日

Running containers without docker

We all use docker now a days, one way or other. Okay okay.

1 条评论
Containerisation is a solution to isolation, not to security.

2020年7月18日

Containerisation is a solution to isolation, not to security.

Certainly, running application in containers isolate it in different kernel namespaces (eg pid, user etc). But is that…
Kr00k might be reason behind disconnecting... connecting...

2020年7月12日

Kr00k might be reason behind disconnecting... connecting...

Have you just noticed a series of connect and disconnects with your wifi? Well your might be under kr00k attack. And…

2 条评论
No, split tunneling need not to be risky

2020年7月5日

No, split tunneling need not to be risky

We all know split tunneling is the solution when are connected over VPN of corporate netwrok, but your public internet…
Security and speed going hand in hand with QUIC

2020年7月2日

Security and speed going hand in hand with QUIC

"Security and speed does not go hand in hand". If you also think so, you have to explore QUIC (Quick UDP Internet…
Smart commit with git alias

2020年6月29日

Smart commit with git alias

Problem? git commit -am ‘JIRA-1234 draft for feature abc’ All of us know linking JIRA id with each commit can help in…

1 条评论
Using "cyber kill chain" to build strategy for cyber security

2020年6月25日

Using "cyber kill chain" to build strategy for cyber security

“Enforcing security is open ended, and hence very difficult” ..

2 条评论

See all articles

How much code is duplicated in your code registry?

Ramesh K.

Cloud and container security

Code duplication

What if we do copy paste?

Issues with code duplication

Solution

Branching Strategy

Actual code repo

New code

Mounting the code into base image running comainter

Solutions of the problems

Demo time

Thank you

Ramesh K.的更多文章

社区洞察

其他会员也浏览了

No Secrets Left Behind: Mastering Git Cleanup and Security Best Practices

Make your Jenkins as code and gain speed

Integrating Jenkins with Kubernetes(DevOps Task-3)

LAUNCHING AND MONITORING OF SERVER USING JENKINS

Managing and Deploying Application using Jenkins on the top of Kubernetes

Git Commands

New Best Practices for Jenkins Pipeline Global Shared Libraries

DevOps Task 2

Building a CI/CD Pipeline with a Self-Hosted Runner and Cloudflare Tunnel

Code duplication

What if we do copy paste?

Issues with code duplication

Solution

Branching Strategy

Actual code repo

New code

Mounting the code into base image running comainter

Solutions of the problems

Demo time

Thank you

Ramesh K.的更多文章

Running VM in container, and then container in VM ...

Running containers without docker

Containerisation is a solution to isolation, not to security.

Kr00k might be reason behind disconnecting... connecting...

No, split tunneling need not to be risky

Security and speed going hand in hand with QUIC

Smart commit with git alias

Using "cyber kill chain" to build strategy for cyber security

社区洞察

其他会员也浏览了

No Secrets Left Behind: Mastering Git Cleanup and Security Best Practices

Make your Jenkins as code and gain speed

Integrating Jenkins with Kubernetes(DevOps Task-3)

LAUNCHING AND MONITORING OF SERVER USING JENKINS

Managing and Deploying Application using Jenkins on the top of Kubernetes

Git Commands

New Best Practices for Jenkins Pipeline Global Shared Libraries

DevOps Task 2

Building a CI/CD Pipeline with a Self-Hosted Runner and Cloudflare Tunnel