Much ado about .gitignore

Much ado about .gitignore

Git is essential now. As you create software, you will be creating a number of git repos. Soon you will run into a case where certain directories don't need to be part of a repo but used by local tools like editors, compilers, etc. Then you run into .gitignore file in the root directory of the repo. This file directs which files are tracked by Git and which files are to be left alone.

I suffer from OCD. After two weeks of serious looking, I say that the gitignore rules ARE simple, but can look arbitrary and confusing.

Hope this very long, probably unnecessary, article lays out all I came to know about git ignore. In turn I hope the article spares you few mental cramps, although they are voluntary on my part.

Note: This is a lengthy article and research. It is very possible there are mistakes with some of the options I discussed here. So test them when you need to use them. Please do let me know: linkedin or twitter.

Goal

  1. So you can comfortably approach .gitignore file to specify the files or directories you want Git not to worry about
  2. So you can read any existing .gitignore file and understand what it is saying, with very little doubt left

Towards that goal this article covers

  • Real simple things to start with such as placement of .gitignore files
  • Fairly Obvious rules (Relative files and directories to ignore)
  • Slightly less obvious rules (The ending "/" for files vs directories)
  • The most often used but confusing "Shortcut rules" (anywhere rules vs Relative)
  • The really confusing "**" rules
  • References to git documentation, few articles, Git repos where various .gitignore files are gathered
  • A reference to a public git repo that you can clone and experiment with a variety of gitignore options
  • Finally Analyze an existing .gitignore from a sample pyspark repo

Real Simple things First

The filename for git ignore is ".gitignore". No extension. It is a plain text file. Each line is an instruction. Comments are allowed. Look for or create this file at the root of the git repo. This file can also exist for each sub directory alo.

Placement of .gitignore file

Convention for placement of .gitignore is generally at the root. Many many things can be ignored anywhere in the hierarchy by having just one file at the root. So convention is to control most of the behavior of the whole repo from the one file placed under the root. One rarely needs to specialize this file at a sub directory level.

Confusion about the rules of .gitignore

Rules of .gitignore can get confusing. However simple things are kept simple. That is good news. There is a URL where these rules are documented at Git in the References section.

There are some rules that are short cuts and often used. There are also some rules that are not really obvious. I will cover these towards the end.

Let me start with the easiest rules first

Easiest rules to understand first

These rules are to do with relative paths of directories and files.

How to Ignore a specific file either at the root or deeper in the hierarchy

# This is a comment line

#ignore a file a.txt at the root
/a.txt

#ignore a deeper file
/dir1/dir2/a.obj

#you can obviously have empty lines

The "/" at the beginning of a line or pattern indicates this is a specific relative path and only applies to that set of matching files. The "/" as you will soon see has some additional meanings depending where it appears in the pattern. For now it is sufficient to know that a relative path starts with a "/"

How to ignore a specific relative directory and all its sub directories

# ignore this directory and all its sub directories
/dir1/

# deeper tree
/dir1/dir2/
/dir1/dir3/

etc...

The first "/" says this is relative path. The last "/" indicates that this path only applies to directories and not files.

So in "/dir1/" if the trailing "/" is eliminated then "dir1" refers to either a directory or a filename, or both. By placing a trailing "/" one forces this path to ignoring only directories and not files.

# Ignore only if dir1 is a directory
/dir1/

# Ignore dir1 whether a file or a dir
/dir1

Behavior of "*" in relative paths

Time to talk about "*". The "*" works like its role in a relative path spec in any operating system. There is no special meaning in the context of gitignore. Like in a path, it represents multiple files or directories that match the pattern. it plays the same role here.

Ignoring a specific set of of files (plural) with "*"

# ignore all root level files starting with name "test"
/test*

# or one level deep under /dir1
/dir1/test*.txt

Ignoring a specific set of directories (multiple) using "*"

# ignore all directories at the root that start with testdir-
/testdir-*/

# Notice the trailing "/"
# it tells that this applies only directories and not files

# Another example
/a/b/dir*xys*/

Example of allowing files while ignoring directories

#Ignore directories under /dir1 starting with test
/dir1/test*/

#However this will allow the following 

/dir1/test.txt
/dir1/test4.txt

# not allowed are these directories
/dir1/testdir
/dir1/test4dir

Summarizing these "relative path" basic rules

  1. Ignoring relative paths are pretty straight forward and few surprises.
  2. They do have to start with "/" even though they are relative
  3. You can use "*" for pattern matching
  4. The "*" doesn't cross boundaries of directory separators "/"

Slightly less obvious rules

There are slightly less obvious rules that affect the relative paths we have talked about so far. These slightly less obvious rules are

  1. Placement of "/" in a path, Beginning, Middle, or at the End
  2. Files vs Directories based on a path

Placement of "/" at the beginning or middle makes a path relative

# The following is not relative
# this pattern has a special meaning. will cover later
dir1

# This is NOT realative either
dir1/

#make this relative: / at the beginning
/dir1

#relative as well
/dir1/dir2

#the following is also relative: / at the middle
dir1/dir2
dir1/a.txt

Placement of "/" at the end indicates a directory and not a file

/dir1

# dir1 above can be a file or a directory to ignore

# Force to ignore a directory with a trailing "/"

# So the following only applies to a directory

/dir1/

The "/" at the end of a path almost always says the preceding spec is a directory spec.

Summarizing slightly less obvious rules

  1. For a path to be relative there must be a "/" at the beginning or in the middle of a path
  2. The "/" at the end indicates that the spec applies only to directories and not files
  3. So eliminating the trailing "/" means the entire namespace (file or directory) is not allowed in that relative path and those entities will be ignored.

Shortcuts: Most used rules in .gitignore

Although relative paths are useful, and easily used for singular cases, they are not the most often used. I will now move on to the most used kind of rules in the .gitignore file. These rules apply to a directory name or a filename any where in the hierarchy.

A name "profile-dir" with no interrupting (beginning or middle) "/" has a special meaning. Such a name or spec applies to any directory or filename anywhere in the sub directory hierarchy.

The name can use "*" symbol to generalize. The trailing "/" on the name is allowed and retains its meaning of files vs directories or both.

How to ignore any file or directory anywhere in the repo

# No file or directory anywhere named "xys" is ignored
xyz

With above instruction in the .gitignore file, any directory or a file with the name "xyz" is ignored. Notice the spec does not contain a leading "/" like "/xyz". If there was a leading "/" like "/xyz" then it would be considered a relative path and not an "anywhere" path.

Because of the above line in .gitignore, all of the following files and directories are ignored. Notice how "xyz" can be a directory or a file and it can be in multiple places in the hierarchy.

/dir1/xyz #where xyz is a file is ignored

/dir2/dir3/xyz #where xyz is a directory

/xyz #where xyz is a directory

Pattern for ignoring directories (only) any where

By using the rule of an ending "/" on a path name to indicate directory the "anywhere" spec can be altered to limit it to ignoring only directories. Here is an example

# The following ignores directories anywhere
# files with this name are allowed 
xyz/

#Notice the LACK OF BEGINNING "/"

With the above spec of "xyz/" the following are allowed and ignored

/dir1/xyz # allowed: where xyz is a file so it is allowed

/dir2/dir3/xyz # ignored: where xyz is a directory so it is ignored

/xyz # ignored: where xyz is a directory so it is ignored

The "anywhere" pattern can use the symbol "*" to generalize it. Some example below

# Ignore files or a directories anywhere whose name
# starts with "xyz"
xyz*

# Ignore all files anywhere with this extension *.obj
*.obj

# Ignore all directories (not files) whose name is ".profile"
# anywhere in the hierarchy
.profile/

To restate, "anywhere" pattern is the most often used pattern in .gitignore files when you open them. Because this allows ignoring files or directories anywhere in the hierarchy based on a name (and not relative path).

Lets revisit the "/" and realize that a spec like "abc/testdir/" is not an anywhere spec because the "/" is in the middle of the pattern. Such a pattern is then interpreted as "relative" and not "anywhere".

#Some more examples are

abc - anywhere pattern
abc/ - anywhere pattern
/abc - relative
/abc/ - relative
abc/def - relative
abc/def/ - relative

Summary of the shortcut anywhere rules

  1. A spec with no interrupting (beginning or middle) "/" applies to any name anywhere in the hierarchy
  2. The trailing "/" is allowed and retains its meaning of files vs directories or both
  3. This anywhere spec also allow "*" to generalize the spec
  4. This pattern is the most used in .gitignore files

Few implications of ending "/" in .gitignore files

  1. You can ignore files and directories (no trailing "/")
  2. You can ignore directories only but allow files (trailing "/")
  3. You cannot allow directories while ignoring files. So if you ignore files, directories are automatically ignored.

Really specialized rules: Not obvious

There are some rules involving "**" that are really not obvious and have overlapping effects with the previous rules. These are

  1. ** at the beginning of a line
  2. ** at the end
  3. ** in the middle

How to avoid a sub tree that looks like "/foo/bar" anywhere?

If /foo/bar is at the root you can say using the relative path rule

/foo/bar/

# Or

/foo/bar

# Or

foo/bar

But what if you want to ignore the tree /foo/bar that shows in multiple places like below

#But what if you want to ignore

/foo/bar
/dir1/dir2/foo/bar
/dir1/foo/bar

While allowing the directory /bar with out an associated /foo

# but allow the directory "bar" when it appears by itself
# with out preceding by "foo"

/bar
/dir1/dir2/bar
/dir1/bar

To do that you have to use the following spec in the .gitignore file

# you can avoid the tree /foo/bar anywhere
# by indicating

**/foo/bar

# the "**" must be at the beginning
# No "/" is at the front to note

Using "**" in the middle

The ** symbol can be used in the middle of a relative directory pattern to mean "any number and any depth of sub directories". See an example below

/dir1/**/foo/bar/

# This means: Any number of sub directories under "dir1"

#So this avoids

/dir1/a/foo/bar
/dir1/b/foo/bar
/dir1/a/b/foo/bar
/dir1/a/b/c/foo/bar

Using "**" at the end of a relative directory pattern

The ** symbol can be used at the end of a relative directory pattern to mean ignore all directories and files below this directory. Here is an example

/dir1/**

# Means: ignore everything under the relative /dir1 to any depth

So /** means ignore this directory and everything underneath

/**

# means ignore this directory and everything underneath

Summary of the really specialized rules involving "**"

  1. ** at the beginning of a line ignores the following directory hierarchy anywhere in the repo. So "**/foo/bar" ignores "/foo/bar" anywhere.
  2. ** at the end means ignore all sub directories. So "/foo/**" is same as "/foo/".
  3. ** in the middle like "/dir1/**/foo/bar" means any sub directory depth after "/dir1" but followed by "/foo/bar"

Now Understanding similarities in *, */, /*, /**

The shortcut "anywhere" rules (no intercepting "/") when used with "*" can achieve very similar results to "/**". See that analysis below and why that happens, which is a direct application of "*" and the "anywhere" rule.

# ******************************************
# Understand: * (ig: Any file or directory at any depth): 
# Result: Hide all
# ******************************************
# Lets see what happens a * is in the root gitignore
# Seem to block all files and directories including .gitignore
# Because its instruction is 
#  "ignore anything with this pattern anywhere file or dir")

*

# ******************************************
# Understand: */ (ig: Any directory at any depth)
# Result: Only root files show
# ******************************************
# Because all directories are ignored, only root files show
# when a directory is ignored all its children are ignored

*/

# ******************************************
# Understand: /** (Actual directive to ignore this directory and all below)
# Result: Hides all as expected
# ******************************************
# Meanning: Hide all sub directories including this one
# Hide this directory and everything underneath
# Similar behavior as *

/**

# ******************************************
# Understand: /* (Ignore immediate files or directories under the root)
# Result: Hides all again
# ******************************************
# 1. First level directories are hidden
# 2. First level files are hidden
# 3. Because of 1 all sub directories and their files are hidden as well
#
# Similar behavior as: *, /*, /**
#
# *******************************************

/*

Key References

As an exercise let's analyze a pyspark sample .gitignore

# Any sub directory named this way
# anywhere pattern. Trailing "/" for a directory
__pycache__/ 

# Ignore these file extensions anywhere .pyc, .pyo, .pyd
# No special meaning to "."  
# you could have guesssed [] means any of those chars
*.py[cod]

# any file with this pattern. $ has no meaning
# again anywhere pattern (No "/" start or middle)
*$py.class

#File or dir anywhere that is named ".python"
.Python

# Any sub directories named below
# anywhere pattern. No "/" start or middle
build/
develop-eggs/
dist/

# Any sub directory that ends in ".egg-info"
# again anywhere pattern
*.egg-info/

#Any file or sub directory named like this
.installed.cfg

#Any file or sub dir ending in ".egg"
*.egg

#Any file or subdirectory anywhere
MANIFEST

#relative directory /docs/_build/
docs/_build/

# any sub dir named target
target/

# Any file or sub dir: for Jupyter Notebook
# usually it is a sub dir
# I suspect placing a / at the end would work as well
.ipynb_checkpoints

# file or directory anywhere
.python-version

end of article

要查看或添加评论,请登录

Satya Komatineni的更多文章

社区洞察

其他会员也浏览了