Groovy Fun with Git - Part 3 of 3

Design and Coding of the Groovy Script

 When I started considering Groovy as an alternative for Bash scripting my goals were simple: to have an alternative to Bash, with the simplicity of Bash and Bash scripting life cycle, but with the power of Groovy (lists, maps, closures, and more).

In particular, I wanted a script, like a Bash script, that:

  1. Runs standalone, with shebang first line
  2. Can be distributed in one script file, and can run anywhere were Groovy is installed.
  3. Does not need to be built as a full-fledged Groovy/Java application, to be distributed as a jar.

Running as an executable jar may introduce classpath issues, and introduce the need for a wrapper shell script that sets classpath. This is an unwanted complexity.

To accomplish that "simplicity goal", I needed to prohibit the creation of new public classes, and subclasses, with the possible introduction of the complexity of inheritance hierarchies, polymorphisms, design patterns, etc., that is needed when creating full Java applications, frameworks, or libraries. We are doing none of that.

I decided the only classes I would introduce have to be embedded classes, preferably restricted to static inner classes, if I really must have classes. 

My original design included 4 classes to represent the data structure of Git objects: Node, Blob, Tree, Commit. Blob, Tree, and Commit being subclasses of Node, to handle variations of the content of the node by type. The .git/objects entire directory can then be represented as List<Node>. The rest of the design is a loop on this list, producing lines of text. We need a few more classes to represent the output of this directory scan: a Report class that represents the output listing of objects, and a GvScript class that produces the GV file of graphic description language commands, that is used to create the graphic file using the <dot> utility command, using the DOT language format.

That design does not meet the simplicity goal, although, by making all the classes inner classes, I can avoid build and package. 

So I tightened my constraints further: no use of any classes or OO design. The cost is not being able to use polymorphic calls and having if statements based on the type of node. That seemed not too bad of a compromise. So all the classes were out. With this new constraint, the simplicity goal becomes simpler: "No classes. No OO".

<note>

We can easily refactor the script into classes, in almost a mechanical way. Group methods that make a cohesive group, and put them in an inner class. Make all the methods static. Once you have a set of classes you're happy with, you can refactor further, and introduce more OO concepts: type hierarchies, polymorphism, packages, and design patterns. But remember, you may have to start building and packaging in a jar. The complexity of the script will mushroom. I don't think it is worth the effort

</note>

Adhering to the simplicity rule, the resulting script template now looks like this:

#!/usr/bin/groovy

# No global variables - only constants derived from args

final A = get-a-from-args()

final B = get-b-from-args()

def x

def y

def method-x () {

}

def method-y () {

}

Which is the general outline of a Bash script. This should make it possible to translate, perhaps through the use of a translation tool, a given Bash script to an equivalent Groovy script. That should be a fun project.

So, let's look at the working and tested code, to make sure the simplicity goal is not merely theory.

Basically, we are looping thru the .git/objects directory, which has directory names of two hex characters, with one file per directory that has a name of 38 characters (we don't need to know that these names came from the SHA1 of the file contents). We need to collect any data we need from that file into a tuple and add the tuple to a list.

Once we have the objects available in a list, we can produce different outputs by looping thru the list. We need two output lists (List<String>) with content of the node described by a tuple, the other with lines of graphic description language for that tuple/

That's the design. The rest is Groovy coding details.

Below is a listing of the main pieces of the code. You can pull the code from my GitHub gitobjects repository.

In a Bash script, the commands we need to produce printing of Git objects are:

git cat-file -t <objectname>

git cat-file -p <objectname>

git branches -v

To get the compressed file size of a Git object file, we need the OS command:

stat -c% <pathname>

The script needs to make calls to the OS. So the first utility function to code is:

def callOS (command) {

   command.execute().text()

}

We need to check the pre-conditions before the execute(), and check for errors after. But essentially that is the Groovy call to OS function.

The top-level statements in the script do their work by setting up the environment and calling methods. The script parses the script args, then calls loadObjects(), which scans the .git/objects directory, collecting data in the files in it, and returns a list of tuples. Each tuple describes one file: (objectname, type, content, size). The size is the actual compressed size on disk. The size of the uncompressed file is contained in the content field.

Once we have this object list of tuples, we can produce the two required outputs: the GitReport, through getReport(), and the GvFile, through getGvScript().

These three methods (loadObjects, getReport, and getGvScript) are the meat of this script. In addition, there is a bunch of utility methods grouped together into a utils section that can be turned into a Utils class - if we are so inclined.

These utility methods are self-explanatory. You can see them in the code.

You can pull the full source code from my GitHub at https://github.com/nabilh/groovy-scripts.git

Below is a list of all the methods in the script, the top-level script statements (outside of all the methods), and the three main methods.

Hope you will find this script useful in your Git travels, and a fun script in your Groovy scripting travels.


final LINE_LIMIT = 1
final BLOB_MAX_LINE_LENGTH = 40
final GV_MAX_LABEL_LENGTH = 10

def (REPO_DIR_NAME, GV_FILE) = parseArgs(args)

println "\nGit Report for $REPO_DIR_NAME"
println "Graphiz GV file ${GV_FILE}\n"

def objects = loadObjects(REPO_DIR_NAME)
def report = getReport(objects, LINE_LIMIT, BLOB_MAX_LINE_LENGTH)
def gvScript = getGvScript(objects, GV_FILE, GV_MAX_LABEL_LENGTH)

println report
writeGvFile(gvScript, GV_FILE)


def loadObjects(repoDirName) {

    final hexChars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f']

    def objectList = []

    final objectsDirName = repoDirName + ".git/objects"

    final objectsDir = new File(objectsDirName)

    def objectsDirSorted = objectsDir.listFiles().sort {file ->
        -file.lastModified()
    } as List<File>

    objectsDirSorted.each {dir ->
        if (dir.name[0] in hexChars) {
            dir.eachFile(FILES) {file ->
                def objectName = dir.name + file.name
                // makeNode (objectName, type, size)
                // how do we get the type?
                def type = callOS("git cat-file -t $objectName")
                def content = callOS("git cat-file -p $objectName")
                def size = fileSize(objectName, objectsDirName)
                def tuple = new Tuple(objectName, type, content, size)
                objectList.add(tuple)
            }
        }
    }
    objectList
}


def getReport(objects, lineLimit, maxLineLength) {

    def lines = []

    objects.each {tuple ->

        def objectName = tuple[0]
        def objectType = tuple[1]
        def content = tuple[2]
        def size = tuple[3]

        def line = sprintf("%s %s", objectName, objectType)
        lines.add(line)

        def noOfLines = numberOfLines(content)
        def length = content.length()

        if (objectType.equals('blob')) {
            content = checkContent(content, lineLimit, maxLineLength)
        }
        content = addLineNumbers(content)
        if (noOfLines == 1) {
            line = sprintf("content %d line, %d chars, %s compressed:\n%s\n", noOfLines, length, size, content)
        } else {
            line = sprintf("content %d lines, %d chars, %s compressed:\n%s\n", noOfLines, length, size, content)
        }
        lines.add(line)
    }
    return lines.join("\n")
}






               


 

要查看或添加评论,请登录

Nabil Hijazi的更多文章

  • Groovy Fun with Git - Part 2 of 3 - Using the Groovy Script

    Groovy Fun with Git - Part 2 of 3 - Using the Groovy Script

    In Part 1, I introduced a simple script to help explore the Git data structures, as we do simple experiments with git…

  • Groovy Fun with Git - Part 1 of 3

    Groovy Fun with Git - Part 1 of 3

    Pro Git, Scott Chacon's great book on Git, has a chapter on Git internals that is a must read, if you want to take a…

  • Microservices and Database Replication

    Microservices and Database Replication

    In a previous post, I discussed briefly the issue of data sharing in microservices. The consensus seems to be that each…

    2 条评论
  • Microservices - It's Not The Size That Matters!

    Microservices - It's Not The Size That Matters!

    The diagram above is NOT something you want! That is how to do microservices the wrong way. In many ways "micro" is not…

    9 条评论
  • Why Microservices Are Hard

    Why Microservices Are Hard

    Microservices are the latest incarnation of a "software brick" - an independent software component. A software…

  • Database Considered Harmful?

    Database Considered Harmful?

    Think "Events" (not CRUD) As you dip your toes into the world of microservices, you start thinking this is great stuff,…

  • Data and Microservices

    Data and Microservices

    When you first meet the concept of microservices, you find it striking how simple the ideas are. They are also not new.

    3 条评论
  • Decomposing into Microservices

    Decomposing into Microservices

    Event Partitioning: Old Idea from Structured Analysis. A Perfect Fit for Microservices Thinking.

  • Dependency Hell in Microservices and How to Avoid It

    Dependency Hell in Microservices and How to Avoid It

    In my previous post I talked about independence being THE defining characteristic for a microservice. It is also the…

    1 条评论
  • Why I am Passionate about Microservices

    Why I am Passionate about Microservices

    In the last 30 years or so I have designed and implemented a fair number of OO (Object Oriented) systems. I have always…

社区洞察

其他会员也浏览了