The Case of the Suspicious Code
In the bleak backend of November 2022 I was holed away in my Chicago apartment, hiding away from the mix of Midwest sleet, snow, and sun. I blocked the rattling thunder of the nearby Red Line with noise cancelling headphones, and put my nose to the grindstone of grading student submissions for an Introduction to Object-Oriented Programming course, for which I served as a teaching assistant.
While the assignment was simple, the course included five class sections, and I was wearily trudging through the submissions of nearly 200 students when a strange pattern began to emerge.
Consider a freshman or sophomore in college, new to the world of computer programming, grappling with this new terrain with a new way to solve problems. What comes naturally to the new learner? What takes practice?
In my countless hours of grading, the predominant characteristic displayed among these new programmers was a lack of efficiency: long, overly complicated variable names; multiple statements to complete a single task when one line may have sufficed; extraneous statements floating in the ether of the white space.
Then there came a solution to the assignment that was so neat, so tidy, it could not have been any more efficient if I had coded it myself - variable names were simple, statements had been condensed down to their most efficient, and not a single line of extraneous white space was to be seen.
The first assignment I received with these characteristics I passed - after all, there were some students who weren't entirely new to the realm of object oriented programming, and they might well be equipped with the skills of clean code already. But then I ran into an assignment with the exact same code as the first, down to the variable names and number of lines. Now this raised my suspicion even further. But I reserved judgement - this assignment was simple enough in nature that two students of a similar temperament may have fallen together in their pattern of completing the requirements. By the third identical assignment, I pulled back and reevaluated what I had before me. So many students having identical, or near identical, code was cause for alarm.
My mind, weary with the repetition of lines of Java, was now alert. I went back and began paging through the assignments, pulling aside those that waved forth flags of doubt.
These flags included:
Now to my own sensibilities, I knew the code I found in front of me was not genuine. However, I needed evidence before I could present my suspicions to the professors I worked for.
I used several different search queries to find the font of programming wisdom these students had copied from. I first searched answers to this specific textbook problem, as the program name was also the name of the assignment. I found several GitHub repositories containing solutions to every assignment in our textbook, and by going into their files and finding the specific assignment, I found the code that so many of my students had used.
There were still some assignments that I had flagged for potential plagiarism that did not match this code, so I continued searching. Search engines like Google sometimes have a hard time using snippets of code in the search term, so I had to be selective with what parts of the code I looked up. The key ended up being putting parts of statements or lines into quotation marks, which brought up exact matches in places like Chegg and StackOverflow.
Below, the code on the right shows a solution from a GitHub repository, containing all the solutions to the assignments in this particular textbook; the code of the left shows an example of potential plagiarism in a fictional student's code (inspired by real student submissions).
领英推荐
Plagiarism flags found in the suspicious code:
All of these markers collectively, combined with looking at the student's previous work, led me to believe the submitted assignment was plagiarized.
After analyzing all the data, I felt I did not have the proper authority to issue zeros to these plagiarized assignments, nor deliver the serious academic verdict of plagiarism. So I compiled a detailed report of my findings, including images of copied code and where I found it online, and escalated the issue to the professors in charge of the course (three professors spanning five sections).
After I had concluded the Case of the Suspicious code, I began to ponder on the development of a program that could be used to flag suspicious code, and relay that information to a grader. It would be straightforward enough to code something workable in Java, though with potential expansion and need to access and process hundreds of files, Java may not be the most efficient language through which to do this process. However, since my personal computer language knowledge is strongest in Java, I began the pseudo-code process with Java in mind,
If I were to design a program that could be fed an assignment to check for plagiarism, this is how I could go about it:
Any final determination of plagiarism, especially at a college level, should be reviewed by a professor, so this program should not give a verdict on the likelihood of copied code. Instead, there could be an overall percent match score. The program should give the grader using it the assignment flagged for plagiarism, it's plagiarism score, and the library file that it identified as being copied.
Areas for expansion:
This was such a good read, thank you for sharing!