Open Source Cheat Sheet
Brian Chau
Partner (Intellectual Property, Electrical Engineering) at Norton Rose Fulbright Canada LLP, IAM Patent 1000, Legal 500
The below is provided as a general commentary to help de-mystify the process of reviewing open source licenses for third party code / data.
It should not be used as a substitute for reading the actual terms of the licenses and having a lawyer or professional review any ambiguities, but I wanted to provide the framework that I use, ideally prior to incorporating the third party materials. Double check which license is in use (download the package if you can and find the license) and don't rely on the listings on GitHub, Kaggle, etc., as sometimes it is not correct or it is just inconsistent or unclear. In the past we have had to reach out to contributors to get clarification.
Sometimes the creators modify the licenses as well to add new sections. An important example is a Commons Clause license that is added to the end of a license such as Apache 2.0 so that essentially, you as a downstream user can't sell the software. Sometimes creators also add additional disclaimers (often seen with open health data). It is important to note that despite clauses in some licenses that say they cannot be modified, creators may try to modify them anyways, leading to conflicting sections (see on-going debate around Neo4j Sweden Software License - AGPL-3.0 + Commons Clause).
Open source is a fantastic way of disseminating knowledge and captures the idealistic spirit from the early days of computing (even pre-dating the Internet). Essentially, it provides a way for functional "building blocks" to be collaboratively developed in the community, with benefits distributed amongst the broader public (often but not necessarily for free). I think that this is very admirable and I am both an appreciative beneficiary and supporter of the community.
In modern code development, the use of open source is ubiquitous as open source packages, especially well-maintained ones, provide more stable and more secure functionality than otherwise are possible with in-house resources. Open source products allow a company to focus R&D efforts on their core functionality, and even small companies can thus leverage open source to create enterprise level suites of professional-looking products.
Because the source code or raw data is made available, you can incorporate the functionality into your product, modify it, extend it, add functionality ("remix it"). For example, let's say there is open source software that does function X, but you want to modify it to do X' or you want it to work in C#. You can modify the code to make it work as you see fit.
However, there may be obligations to make your products open source or available too, and if inadvertently triggered, these can open up your company to critical risks. Strong copyleft licenses, in particular, have a reputation for an expansive scope that needs to be carefully considered before use. These types of provisions are used to create a "share alike" obligation where the creator wants you to make your improvements / projects similarly available. This is not acceptable for many commercial companies, so they need to know how to avoid these obligations.
The below cheat sheet is provided to help sort through license considerations. While not covered here, it is important to also consider the risks involved in using third party materials (code with questionable IP ownership, ransomware, malware, unaddressed bugs, zero day exploits - this can be a very serious issue due to the extreme popularity of certain open source software platforms). We have a strong privacy / cyber team at NRF that can help with breach issues.
When you are reading the actual license, note that the operative phrases are not always in the most logical sections from a readability perspective. For example, the "definitions" section of many agreements do much of the heavy lifting and there is a large amount of cross-referencing between different sections, and sometimes circular . The FAQs are very useful.
If you do not like the obligations associated with the materials (e.g., too onerous, don't want to make source available), they are not set in stone. In many cases, we simply contacted the owner / creator of the materials and obtained a commercial license (even though it wasn't originally a listed option).
Where you could have a major issue is where there is great value being placed on a piece of proprietary code being a secret that is actually built on open source code having "make available" obligations which have not been satisfied.
1. Open Source Obligations
The main types of obligations are usually: (a) acknowledgement / notice requirements, (b) disclaimers, (c) allowing others to make derivative works based on your source code / modifications, and (d) making available your source code / modifications.?
The (a) acknowledgement and (d) making available obligation are typically the more challenging obligations to adhere to.
For corporate entities who do not wish to have their proprietary code released, it is important that they are mindful to avoid inadvertently having making available obligations attach to their proprietary code. As noted below, sometimes this is possible by segregating the code / libraries / files, but sometimes it is not. In my opinion, this has practically been the most pertinent issue.
Open source / licenses are found on third party code (e.g., Linux Kernel) and data sets (e.g., CIFAR-10), materials, but can be flexibly used for various other types of materials (open source board games!!).
The core idea is that the third party made their work available subject to certain obligations for downstream users / modifiers.?
As noted below, sometimes open source licenses are used for types of materials other than that which they were intended for (e.g., creative commons licenses for code, or GPL licenses for data sets).
2 Triggering Language
The threshold question is whether the open source obligations are triggered.?
The specific triggering language changes from agreement to agreement, but most of them are triggered upon a distribution / conveyance / modification.?
The most common trigger occurs when you incorporate the code / package into a product or code that you are making available or giving to the public.
Making available via SaaS and SaaSS, depending on the agreement, might trigger (does not for GPL-3.0 [explicit carveout], but it does for AGPL [explicit inclusion], SSPL, etc.).
Generally speaking, merely using open source as an executable does not trigger source code disclosure requirements.? ?For example – using VLC to play videos, using GNU Octave to draw graphs, etc.
Generally speaking, merely using open source internally and not for customer distribution does not trigger source code disclosure requirements.? For example – using open source for non-production uses.
There is very useful guidance in FAQ pages for each license.
*Be wary of compilers as they insert their own code into the outputs. ?See exceptions for GCC, Bison, etc.
3 Segregation, and Difference between Dynamic and Static Linking
Consider carefully how you plan to incorporate third party code.?
Some licenses have different requirements depending on whether their package was dynamic or statically linked (e.g., LGPL-2.0). ?Sometimes segregation can be used to avoid the “making available” obligation (see classpath exception). MPL-2.0 has a file-segregation exception.
The static linking requirements are usually much more onerous and dynamic linking is preferable (but might impact run-time performance). These obligations are often covered in a section entitled "linking", or in definitions of the term "combined work", etc. It's not always in the same section, so you might have to jump back and forth to get the big picture.
4 Complying with Obligations
You typically put the acknowledgements and notices in a conspicuous location, such as on a specific webpage, notices within an app, splash pages, source code READMEs and code headers, etc.[1]
The making source code available obligation can vary depending on the specific license, as certain licenses have very specific ways of making the code available (e.g., a written offer sometimes is sufficient). These obligations can be outdated because they were drafted in a previous era (mail a physical copy).?
For compliance, some companies have a special open source webpage where you can download their code, others make their repositories publicly available and give a link to the repository.
5 Enforcement / Violation
Because most cases settle, there isn’t much court guidance on enforcement.? Typically speaking, either the rights owner or a rights foundation in combination with the rights owner reaches out and informs you of a violation (the most common is the making available obligation).?
Depending on the negotiations and the license terms, sometimes positive steps can be used to cure the violation.?
Some licenses have explicit curative provisions. Practically speaking, this is very important because it means the license explicitly provides that you might be able to cure the violation.?
The big risk with violation is that most of the time, the license gets revoked as part of the violation, so this exposes your organization or you to a copyright infringement case. Without a curative provision, this risk is much more imminent.
Practical steps for addressing a violation might include paying for a license, taking requested steps, or in some cases, conducting a rip and replace.
An open question is whether an unrelated third party is able to enforce a making available right (e.g., A improves package B that has making available obligation.? C wants to use A’s improved code.? Can C ask A to disclose without naming B in the lawsuit?).
6 Common Lingo to be Aware of
Permissive license – licenses with few obligations that are relatively easy to comply with (e.g., MIT, Apache-2.0, most BSD variations).
领英推荐
Permissive license, no attribution – licenses with no attribution obligations whatsoever (e.g., BSD-0, MIT-0).
Creative Commons license – CC-**-** licenses.? Very common licenses for things such as data sets, photographs.? Different obligations are usually in the letters (e.g., BY for attribution, NC for non commercial). SA is the most important one to watch out for.
Copyleft (weak) – licenses that have limited "making available" obligations that are designed to be avoided, for example, by avoiding modifying the third party code directly and code segregation (e.g., MPL-2.0 – file level segregation, LGPL – library level segregation, EPL – avoid modifications).
Copyleft (strong) – licenses that have strong "making available" obligations that are difficult to avoid (e.g., GPL-3.0, SSPL).
Dual / multi license – products that have a number of license terms to choose from.? For example – Ghostscript can either be under AGPL or a commercial license (without a making available obligation).? Some packages allow different flavors of open source licenses.? The copyright owner has full discretion as to which licenses they want to release under and sometimes release under different licenses for downstream compatibility reasons.
Optional parts or optimizations – note that sometimes certain open source projects include different components under different licenses.
License compatibility / incompatibility – certain licenses impose limitations on how downstream projects can be licensed (e.g., if you make available your modifications, it must be under certain license terms only).? For example, you can’t modify a GPL-3.0 package and then release your modified version under MIT.?
License modification – certain third party licenses are modified versions of existing standard licenses (e.g., Commons Clause, Classpath Exception).
TPM / Tivo-ization – this means using certain technical protection measures to effectively restrict downstream modification / uses.? Controversial.
Bespoke / non-standard license – custom licenses that are sometimes unprofessionally drafted.? Some of these are funny but they can impart unintentional ambiguity (e.g., Do no evil, Beerware).
7 Common Red Flags and Issues
Incorporating / modifying any copyleft (strong) license.
Incorporating / modifying any Linux Kernel or BusyBox packages - this is where a large number of complaint / violations have been triggered.
Using outdated or poorly maintained packages (security / interoperability risk).
Incorporating code from one license and releasing in another license (inadvertent license incompatibility issues).
Remember that despite your best efforts, there can still be patent / copyright issues that can arise (e.g., Netfilter contributor issues, Hadoop-related patent litigation, LAION-5b issues).
Be careful with SaaS.? Remember some licenses were drafted far in the past, such as GPL-2.0, and unfortunately there are unresolved ambiguities.
8 Common Risk Matrix
Please do not use the below as-is and remember adapt to your specific situation and context. I have provided it as a starting point for your analysis.
Green; Low Risk – Permissive license / Permissive license, no attribution – remember to make the required notices (if any) and that you’re taking the usual disclaimed risks (e.g., MIT, Apache 2.0, ISC, BSD-2, BSD-3).
Yellow; Medium Risk – Copyleft (weak) – usually licenses that have certain making available obligations that can be avoided in certain situations. (e.g., MPL-2.0, EPL-2.0, CDDL).
Red; High Risk – Copyleft (strong), and LGPL – usually licenses with significant restrictions – need to be very careful on how it is integrated, and in many situations, you may be required to make your proprietary software available under the same or a similar license, and a competitor or the public might be able to use your proprietary software royalty-free. LGPL is included even though it has an exception because compliance is complicated and has many pitfalls.
Dataset Specific Commentary
Popular data set licenses can include data set specific licenses. The most common of these are the Open Data Commons licenses (ODbL), and the Community Data License Agreement (CDLA). There are different flavours of CDLA similar to Creative Commons licenses, and these licenses include useful database specific language, which could provide more clarity when they are enforced.
Some of the licenses have "copyleft" / share-alike type provisions, and these need to be assessed for suitability. For example, if you plan to make any additions, transformations, changes, etc., you may have to share your updated dataset.
CDLA-Sharing-1.0, for example, has a data set specific section stating that the terms do not impose obligations or restrictions on results from users' "computational use" of the data. See CDLA-Sharing-1.0 at Definitions 1.2, 1.11, 1.13, and most importantly, Section 3.5. ODbL is also a copyleft license that has a share-alike requirement.
[1] Examples: https://webostv.developer.lge.com/assets/netcast/NetCast-OpenSourceSWNotice.pdf, https://opensource.lge.com/project.
Image Credit
License List and Common Identifiers
Additional Reading
https://heathermeeker.com/open-source-for-business/ - an excellent (and free) resource. Exceptionally well written and clear.
An example chart is provided below (sorry LinkedIn doesn't support tables natively).
Commons Clause
Beerware
/*
* ----------------------------------------------------------------------------
* "THE BEER-WARE LICENSE" (Revision 42):
* <[email protected]> wrote this file. As long as you retain this notice you
* can do whatever you want with this stuff. If we meet some day, and you think
* this stuff is worth it, you can buy me a beer in return. Poul-Henning Kamp
* ----------------------------------------------------------------------------
*/
Synopsys courses are also very informative. https://synopsys.skilljar.com/page/all-courses-black-duck
GPL FAQ - I refer to this all the time - unlike GPL, it is easy to read. https://www.gnu.org/licenses/gpl-faq.en.html
Example open source software - the Stockfish chess engine (GPL-3.0)
Stockfish is free and distributed under the GNU General Public License version 3 (GPL v3). Essentially, this means you are free to do almost exactly what you want with the program, including distributing it among your friends, making it available for download from your website, selling it (either by itself or as part of some bigger software package), or using it as the starting point for a software project of your own.
The only real limitation is that whenever you distribute Stockfish in some way, you MUST always include the license and the full source code (or a pointer to where the source code can be found) to generate the exact binary you are distributing. If you make any changes to the source code, these changes must also be made available under GPL v3.