How the CIA Writes Python

How the CIA Writes Python

The CIA uses Python for hacking as well as utility scripts. Python 2.7 and 3.4 seem to be favorite versions [version]. This post gathers some interesting practical insights that you can apply.

I wanted to codify some nice rules for an organization and decided to refresh myself and look beyond the typical software sources. This post gathered information mainly from WikiLeaks.

I, of course, managed to insert an excalidraw drawing in the post. Hope you enjoy it!

Table of content

  • Style guide
  • Pypi
  • IDE
  • Installing Pyenv
  • Installing Python
  • Testing
  • Software specs document template
  • Template for cli scripts
  • Random
  • Distribution
  • Some Python insights from someone working at the CIA

Style guide

Though different style guides abound, the CIA has its own twist of things based on the Google Python Style Guide. It was written for people coming from a C background.

Written for people working at a specific location, it cites legibility (the ease with which code can be read and understood by others) reasons.

As with all style guides, it points out that it’s not hard rules and when tasked with a decision, exercise a common-sense solution.

Here are some interesting parts [style].

Exceptions

Must be derived from Exception. Raising Exception(‘message’) is preferred to (raise MyException, 'Error message') or string-based exceptions (raise 'Error message').

The body of try-catch blocks is to be kept short.

Globals

Globals should be avoided, preferring class-based variables.

If needed they should be made available at module level, accessed by functions.

Nested classes and functions

They are ok to use

List comprehensions

They are ok to use for simple cases, switching to a loop when things get more complicated

Generators

Use as needed noting to use yields instead of returns in docstrings.

Default iterator methods are encouraged

#Yes
 for key in adict: ...
 if key not in adict: ...
 if obj in alist: ...
 for line in afile: ...
 for k, v in dict.iteritems(): ...
  
#No
 for key in adict.keys(): ...
 if not adict.has_key(key): ...
 for line in afile.readlines(): ...        

Lambda functions

Ok for one-liners, if it is not longer than 60 to 70 chars, preferring functions from the operator module when needed ex. for multiplication.

Truth evaluation

The implicit false is to be avoided. Is or is not is to be used when comparing None.

if x is not to be used if you mean if x is not None.

When comparing a boolean variable to False, don’t use ==, use if not x.

For sequences, use len to know when it’s empty, don’t use the fact that empty sequences evaluate to False.

#This one for readability
#Yes
 if foo is not None: 

#No
 if not foo is None:        

Comparison

Startswith is preferred to slicing

if foo.startswith('bar'): # Yes
 if foo[:3] == 'bar': # No        

Object type comparison

if isinstance(obj, int): # Yes
 if type(obj) is type(1): # No        

Lexical scoping

An inner function can access outer variables and modify if global or non-local is used. The inner and outer are defined as in the source code.

def get_adder(summand1):
    """Returns a function that adds numbers to a given number."""
    def adder(summand2):
        return summand1 + summand2 
    return adder
     
print(get_adder(1)(2))        

Decorators

To be used cautiously, writing good docs and tests for them. Dependencies are to be avoided inside decorators.

A decorator that is called with valid parameters should (as much as possible) be guaranteed to succeed in all cases.

Threading

We should not rely on the atomicity of built-in types. Queue should be used to communicate data between threads else see threading primitives and locks.

Strings

Avoid using the + and += operators to accumulate a string within a loop. Since strings are immutable, this creates unnecessary temporary objects and results in quadratic rather than linear running time. Instead, add each substring to a list and ''.join the list after the loop terminates (or, write each substring to a io.BytesIO buffer).

Imports

#Yes
import os
import sys
 
#No
import os, sys        

Misc.

  • Avoid fancy features like import hacks, internal modifications or metaclasses. They make the code shorter but more difficult to read later as opposed to code that is longer, but straightforward
  • Lines should be max 100 chars, except for sensible cases like URLs and imports
  • Long texts should appear at the top of files except tests
  • Use () instead of \ for long lines

if (width == 0 and height == 0 and
 color == 'red' and emphasis == 'strong'):        
x = ('This will build a very long long '
 'long long long long long long string')        

  • Avoid () when not needed ex if(x):

Spacing

  • Indent using 4 spaces
  • Two blank lines between top-level definitions, one blank line between method definitions.
  • Two blank lines between top-level definitions, be they function or class definitions.
  • One blank line between method definitions and between the class line and the first method. Use single blank lines as you judge appropriate within functions or methods.
  • Generally only one statement per line.
  • Access control: if access is more complex, or the cost of accessing the variable is significant, you should use function calls
  • naming: module_name, package_name, ClassName, method_name, ExceptionName, function_name, GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name.
  • Use main to prevent code execution while importing

Comments

  • The final place to have comments is in tricky parts of the code.
  • Complicated operations get a few lines of comments before the operations commence.
  • Non-obvious ones get comments at the end of the line.
  • To improve legibility, these comments should be at least 2 spaces away from the code.
  • On the other hand, never describe the code. Assume the person reading the code knows Python (though not what you're trying to do) better than you do.

TODOs

  • Use TODO comments for code that is temporary, a short-term solution, or good-enough but not perfect.
  • TODOs should include the string TODO in all caps.
  • If for a future task, it might belong to a ticket

Pypi

pip2tgz is used to download packages as tarballs [pypi].

Samba (enables file sharing across different operating systems over a network) share is used to share packages.

A local pypi server is also used, with ~/.pip/pip.conf file configured:

[global]
index-url = https://10.2.3.96:8080/simple
trusted-host = 10.2.3.96        

They also have a way to drop packages at \fs-01\share\Python\packages that appears on the pypi index within 5 mins [pypi2]

Packages can be installed

pip install --index-url=https://10.3.2.212/simple/ foopackage        

IDEs

PyCharm is used, with an explanation on debugging [pycharm].


The auto-complete feature is also well-appreciated [pycharm2].

Installing pyenv

The pyenv install instructions are pretty generic and uses the github version [pyenv].

Installing Python

Python is installed from local sources, using pyenv [pyenv].

$ cat > 2.7.9 << EOF
#require_gcc
install_package "Python-2.7.9" "https://10.3.2.212/python/Python-2.7.9.tgz" ldflags_dirs standard verify_py27 ensurepip
EOF

$ pyenv install ./2.7.9
Downloading Python-2.7.9.tgz...
-> https://10.3.2.212/python/Python-2.7.9.tgz
Installing Python-2.7.9...
Installed Python-2.7.9 to /home/User #71475/.pyenv/versions/2.7.9

$ pyenv rehash        

Testing

Maybe it changed now but here’s the setup they were using [testing].


The Python installation is replicated on the remote server each time tests are being executed. Packages installed are to be included and zipped.

This seems to be the pre-era of CI/CD systems.

Software specs document template

Software design docs typically follow this pattern [software specs]

## Goals

- Run on Linux without the Collide overhead

- Simplify the user experience

## Background and strategic fit

Why are you doing this? How does this relate to your overall product strategy?

## Assumptions

List the assumptions you have such as user, technical or other business assumptions. (e.g. users will primarily access this feature from a tablet).

## Requirements

| # | Title | UserStory | Importance | Notes
| 14.1.2.19 | The tool shall have the ability to modify all configuration variables (unless explicitly marked otherwise) at runtime and persist those changes | Must Have | Additional considerations or noteworthy references (links, issues) |
|  | Derived | input directory

output directory

working directory

intervals( beacon, jitter, uninstall)

Max allowable beacon failures

Internet connectivity URL

chunk size

Target ID

blacklist/whitelist executables |  |  |  


## User interaction and design

Using argparse to format and parse acceptable command syntax

Use flags to tokenize values

SEARCH command will have the following capabilities:

-- swalkdir - recursively search for a string in a directory path

-- sdirlist - search filenames for a string in a directory path

-- slike - filename pattern match

-- scontains - file name or file content pattern match

-- sfreetext - file content pattern match

-- sliteral - any valid WSS search command

## Questions

Below is a list of questions to be addressed as a result of this requirements document:

| Question | Outcome |
| (e.g. How we make users more aware of this feature?) | Communicate the decision reached |

## Not Doing

List the features discussed which are out of scope or might be revisited in a later release.        

Template for cli scripts

They provide a template for cli scripts that follows this pattern. Pretty cool if you ask me as these are common cli operations [cli].

class Application:
    """
    This class defines the functionality for the script. It is instantiated in
    the global processing handler for __main__
    """

    def __init__(self):
        """
        Setups the member variable for the application object.
        """
        pass

    def logger(self):
        """
        Setups the python logging for application. By default it logs to both
        the console window and to a log file in the current directory.
        """
        pass

    def platform(self):
        """
        Determines the platform the script is running on.
        """
        pass

    def version(self):
        """
        Formats the version number of the script as a string.
        """
        pass

    def environ(self, name, default=None):
        """
        Helper method for looking up environment variables. Writes a warning to
        the application log if the environment variable cannot be found.
        """
        pass

    def shellspawn(self, binPath, binArgs=None):
        """
        Runs a shell command and does not wait for the command to complete.
        """
        pass

    def shellexec(self, binPath, binArgs=None):
        """
        Runs a shell command and waits for the command to complete before returning.
        """
        pass

    def copyExistingFile(self, srcFile, dstFile):
        """
        Copies a file from the source to the destination. Logs warnings or errors
        if it cannot find the file as expected.
        """
        pass

    def unittest(self):
        """
        This is the default action for the application. It should generally be
        a self-test.
        """
        pass

    def usage(self):
        """
        Prints the help text for the application.
        """
        pass

    def main(self, argv):
        """
        Entry point for the application. Command line processing should be done
        here.
        """
        pass


if __name__ == "__main__":
    application = Application()
    sys.exit(application.main(sys.argv))        

Random

  • Cython used to obfuscate code [bobby].
  • Comments are oftentimes in the format [hive]

with enlarged versions looking like this

		#
		#
		#      Runs the installScript ...    Note that the hive trigger should now timeout
		#                                      since the install script should remove 
		#                                      all currently running hive processes including
		#                                      our currently triggered implant and replace the
		#                                      existing hive with the new hive implant...
		#
		#        

Distribution

.pyz is a known format for tools usage [gyrfalcon].

Parting words

Analyzing repos like Hive shows that the CIA naming conventions follow the famousCase but, the codes are well-documented.

As with hacking scripts, the code as well as solutions used give the hacked-together vibe rather than your typical Python engineering experience.

Some Python insights from someone working at the CIA

Someone on Reddit apparently works for the CIA. They say that they don’t have globally-enforced rules. He also said that since you don’t have the internet sometimes or a phone, you’d have to go fetch the results, print it and be back. You also could not install from the internet as there was … no internet.

They also use Vim or VsCode, depends on the preference of the individual. There is a pre-approved list of software and versions. Very few libraries are approved and they end up writing a lot of stuffs already available. Code written is heavily vetted. Since new software takes time to be approved, they are always behind.

Else the level of the people vary, some still google everything and it’s not that different from the public sector, except for maintenance windows at really inconvenient times.

He likes Vim as everywhere there is Linux, Vi or Vim is sure to be around and he can start coding right away.

References

要查看或添加评论,请登录

Abdur-Rahmaan Janhangeer的更多文章

社区洞察

其他会员也浏览了