Heuristics
Classification and heuristics in mainframe modernization?
Mainframe migration projects are made of an accumulation of sub-projects that include data migration, scheduling, security, backup management, setup of the landing zone and the development environment, interfaces with external systems and more. When taken separately, all these issues are manageable, they do not raise to the qualification of rocket science.?
On the other hand, when combined in a multi-dimensional project, they can make for a formidable task.?
Any of these aspects can cause the overall project to fail, and whenever you can mark such an issue as solved for good, you’ve got a win. It is one problem you no longer need to care for. It is one risk factor that you no longer need to worry about. Your project is one step closer to completion.?
Heuristics to the rescue?
Tasks that can easily be done manually on a few samples cause headaches when extrapolated to the full scale of a real-world project.?
Take for instance, the fact that COBOL programs can either have fixed columns (usually from 6 to 72, and any text out of this zone must be ignored) or be free form, like most modern languages, where white space and new line characters can be used to make the code visually more appealing. To function properly, compilers must know whether a source file is free form or not, as a special treatment is required for each case.?
Any human reader can identify whether a source file is using fixed columns or free form and set up compilation options accordingly. But what if you have a portfolio of thousands of files, where some are free form, while the others are based on fixed columns, and there is no obvious way to discriminate them from the outset? You cannot just go through these one by one!?
By default, before compilation, the Raincode COBOL compiler scans the source code, and applies heuristics to determine whether a program is using fixed columns (and what these columns are, as they are not always aligned to the 6-72 standard). While one can always force the compiler to use free form or fixed columns, there seldom is a need to do so. This automatic classification facility is so effective that one can pretty much ignore the issue altogether, and let the compiler do its magic, even when applied to thousands of source files.?
It is not rocket science, but one of these cases where automating something, even something so simple that a 6 year old could easily do it, it allows you to check the box. It makes for one less issue to worry about.?
And that’s a win.?
领英推荐
What is this source file??
Talking about classification, before even considering columns or free form, we are sometimes given thousands of source files including COBOL programs and copy books, JCLs, the odd assembler source, PL/I programs, etc. without a hint, extension or directory structure, that would help to classify them.?
This classification is just as trivial a task as deciding whether a COBOL source program is in free form or aligned on predefined columns. All it takes is a glance, and you know whether any given source file is a COBOL program, a JCL or any other useful artefact. But again, going through thousands of files one by one and classifying them accordingly is totally unreasonable.?
To address these cases, Raincode provides a file classification tool that uses heuristics to classify source code, forcing extensions onto file names, and automating this task with an accuracy of over 98%. It is the kind of utility you don’t need to use more than once or twice per year, but you are ecstatic when you do!?
EBCDIC and ASCII?
Sometimes, heuristics can be used to determine not just the nature of a given file, but also its encoding.?
By default, Raincode JCL treats JCLs and PROCs as source files, and they are therefore converted to ASCII when moving the system from the mainframe to the target platform of choice.?
For convenience, Raincode JCL also supports the ability to run JCLs that are still encoded in the mainframe’s native EBCDIC encoding, so that one can keep the JCLs as they are, if there is any reason to do so.?
Things get trickier when a system uses a mix of ASCII and EBCDIC to encode JCLs. That sounds like a bizarre idea at first sight, but it absolutely can happen, for instance when programs generate PROCs (or full JCLs for that matter) before executing them.?
To support such cases and deal with the encoding issue for good, Raincode JCL provides a mode where it detects source files – JCL’s or PROC’s alike – encoding when opening the file and converts it on the fly if necessary. This even allows a JCL encoded in ASCII to use a PROC encoded in EBCDIC (or the other way around). JCL source files are very stereotyped in terms of their contents and structure, making this detection of the encoding accurate in 100% of the cases.?
Again, none of this is rocket science. None of this would warrant you a PhD with honors.?
But boy does it help!