Groovy Fun with Git - Part 3 of 3

Design and Coding of the Groovy Script

 When I started considering Groovy as an alternative for Bash scripting my goals were simple: to have an alternative to Bash, with the simplicity of Bash and Bash scripting life cycle, but with the power of Groovy (lists, maps, closures, and more).

In particular, I wanted a script, like a Bash script, that:

  1. Runs standalone, with shebang first line
  2. Can be distributed in one script file, and can run anywhere were Groovy is installed.
  3. Does not need to be built as a full-fledged Groovy/Java application, to be distributed as a jar.

Running as an executable jar may introduce classpath issues, and introduce the need for a wrapper shell script that sets classpath. This is an unwanted complexity.

To accomplish that "simplicity goal", I needed to prohibit the creation of new public classes, and subclasses, with the possible introduction of the complexity of inheritance hierarchies, polymorphisms, design patterns, etc., that is needed when creating full Java applications, frameworks, or libraries. We are doing none of that.

I decided the only classes I would introduce have to be embedded classes, preferably restricted to static inner classes, if I really must have classes. 

My original design included 4 classes to represent the data structure of Git objects: Node, Blob, Tree, Commit. Blob, Tree, and Commit being subclasses of Node, to handle variations of the content of the node by type. The .git/objects entire directory can then be represented as List<Node>. The rest of the design is a loop on this list, producing lines of text. We need a few more classes to represent the output of this directory scan: a Report class that represents the output listing of objects, and a GvScript class that produces the GV file of graphic description language commands, that is used to create the graphic file using the <dot> utility command, using the DOT language format.

That design does not meet the simplicity goal, although, by making all the classes inner classes, I can avoid build and package. 

So I tightened my constraints further: no use of any classes or OO design. The cost is not being able to use polymorphic calls and having if statements based on the type of node. That seemed not too bad of a compromise. So all the classes were out. With this new constraint, the simplicity goal becomes simpler: "No classes. No OO".

<note>

We can easily refactor the script into classes, in almost a mechanical way. Group methods that make a cohesive group, and put them in an inner class. Make all the methods static. Once you have a set of classes you're happy with, you can refactor further, and introduce more OO concepts: type hierarchies, polymorphism, packages, and design patterns. But remember, you may have to start building and packaging in a jar. The complexity of the script will mushroom. I don't think it is worth the effort

</note>

Adhering to the simplicity rule, the resulting script template now looks like this:

#!/usr/bin/groovy

# No global variables - only constants derived from args

final A = get-a-from-args()

final B = get-b-from-args()

def x

def y

def method-x () {

}

def method-y () {

}

Which is the general outline of a Bash script. This should make it possible to translate, perhaps through the use of a translation tool, a given Bash script to an equivalent Groovy script. That should be a fun project.

So, let's look at the working and tested code, to make sure the simplicity goal is not merely theory.

Basically, we are looping thru the .git/objects directory, which has directory names of two hex characters, with one file per directory that has a name of 38 characters (we don't need to know that these names came from the SHA1 of the file contents). We need to collect any data we need from that file into a tuple and add the tuple to a list.

Once we have the objects available in a list, we can produce different outputs by looping thru the list. We need two output lists (List<String>) with content of the node described by a tuple, the other with lines of graphic description language for that tuple/

That's the design. The rest is Groovy coding details.

Below is a listing of the main pieces of the code. You can pull the code from my GitHub gitobjects repository.

In a Bash script, the commands we need to produce printing of Git objects are:

git cat-file -t <objectname>

git cat-file -p <objectname>

git branches -v

To get the compressed file size of a Git object file, we need the OS command:

stat -c% <pathname>

The script needs to make calls to the OS. So the first utility function to code is:

def callOS (command) {

   command.execute().text()

}

We need to check the pre-conditions before the execute(), and check for errors after. But essentially that is the Groovy call to OS function.

The top-level statements in the script do their work by setting up the environment and calling methods. The script parses the script args, then calls loadObjects(), which scans the .git/objects directory, collecting data in the files in it, and returns a list of tuples. Each tuple describes one file: (objectname, type, content, size). The size is the actual compressed size on disk. The size of the uncompressed file is contained in the content field.

Once we have this object list of tuples, we can produce the two required outputs: the GitReport, through getReport(), and the GvFile, through getGvScript().

These three methods (loadObjects, getReport, and getGvScript) are the meat of this script. In addition, there is a bunch of utility methods grouped together into a utils section that can be turned into a Utils class - if we are so inclined.

These utility methods are self-explanatory. You can see them in the code.

You can pull the full source code from my GitHub at https://github.com/nabilh/groovy-scripts.git

Below is a list of all the methods in the script, the top-level script statements (outside of all the methods), and the three main methods.

Hope you will find this script useful in your Git travels, and a fun script in your Groovy scripting travels.


final LINE_LIMIT = 1
final BLOB_MAX_LINE_LENGTH = 40
final GV_MAX_LABEL_LENGTH = 10

def (REPO_DIR_NAME, GV_FILE) = parseArgs(args)

println "\nGit Report for $REPO_DIR_NAME"
println "Graphiz GV file ${GV_FILE}\n"

def objects = loadObjects(REPO_DIR_NAME)
def report = getReport(objects, LINE_LIMIT, BLOB_MAX_LINE_LENGTH)
def gvScript = getGvScript(objects, GV_FILE, GV_MAX_LABEL_LENGTH)

println report
writeGvFile(gvScript, GV_FILE)


def loadObjects(repoDirName) {

    final hexChars = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f']

    def objectList = []

    final objectsDirName = repoDirName + ".git/objects"

    final objectsDir = new File(objectsDirName)

    def objectsDirSorted = objectsDir.listFiles().sort {file ->
        -file.lastModified()
    } as List<File>

    objectsDirSorted.each {dir ->
        if (dir.name[0] in hexChars) {
            dir.eachFile(FILES) {file ->
                def objectName = dir.name + file.name
                // makeNode (objectName, type, size)
                // how do we get the type?
                def type = callOS("git cat-file -t $objectName")
                def content = callOS("git cat-file -p $objectName")
                def size = fileSize(objectName, objectsDirName)
                def tuple = new Tuple(objectName, type, content, size)
                objectList.add(tuple)
            }
        }
    }
    objectList
}


def getReport(objects, lineLimit, maxLineLength) {

    def lines = []

    objects.each {tuple ->

        def objectName = tuple[0]
        def objectType = tuple[1]
        def content = tuple[2]
        def size = tuple[3]

        def line = sprintf("%s %s", objectName, objectType)
        lines.add(line)

        def noOfLines = numberOfLines(content)
        def length = content.length()

        if (objectType.equals('blob')) {
            content = checkContent(content, lineLimit, maxLineLength)
        }
        content = addLineNumbers(content)
        if (noOfLines == 1) {
            line = sprintf("content %d line, %d chars, %s compressed:\n%s\n", noOfLines, length, size, content)
        } else {
            line = sprintf("content %d lines, %d chars, %s compressed:\n%s\n", noOfLines, length, size, content)
        }
        lines.add(line)
    }
    return lines.join("\n")
}






               


 

要查看或添加评论,请登录

社区洞察

其他会员也浏览了