Make the tool usable without modification.

1. When trying to generate licenses.db from a fresh code checkout, the
   embed directive "//go:embed *.db *.txt" fails because there are no files
   matching *.db. As a workaround, add an empty file "empty.db" so
   there is a match.

2. licenses.db should not be checked in, so add it to a new .gitignore
   file.

3. Reorganize README.db with clearer and more prominent instructions on
   how to use the tool.
3 files changed
tree: 73845159b0e95aff1e414b32991911f6d6e640a1
  1. commentparser/
  2. internal/
  3. licenses/
  4. serializer/
  5. stringclassifier/
  6. tools/
  7. v2/
  8. .gitignore
  9. .travis.yml
  10. CHANGELOG
  11. classifier.go
  12. classifier_test.go
  13. CONTRIBUTING.md
  14. file_system_resources.go
  15. forbidden.go
  16. go.mod
  17. go.sum
  18. LICENSE
  19. license_type.go
  20. README.md
README.md

License Classifier

Build status

Introduction

The license classifier is a library and set of tools that can analyze text to determine what type of license it contains. It searches for license texts in a file and compares them to an archive of known licenses. These files could be, e.g., LICENSE files with a single or multiple licenses in it, or source code files with the license text in a comment.

A “confidence level” is associated with each result indicating how close the match was. A confidence level of 1.0 indicates an exact match, while a confidence level of 0.0 indicates that no license was able to match the text.

Usage

One-time setup

Use the license_serializer tool to regenerate the licenses.db archive. The archive contains preprocessed license texts for quicker comparisons against unknown texts.

$ go run tools/license_serializer/license_serializer.go -output licenses

Identifying licenses

Use the identify_license command line tool to identify the license(s) within a file.

$ go run tools/identify_license/identify_license.go /path/to/LICENSE
LICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794)
LICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829)
LICENSE: MIT (confidence: 1, offset: 17255, extent: 1059)

Adding a new license

Adding a new license is straight-forward:

  1. Create a file in licenses/.

    • The filename should be the name of the license or its abbreviation. If the license is an Open Source license, use the appropriate identifier specified at https://spdx.org/licenses/.
    • If the license is the “header” version of the license, append the suffix “.header” to it. See licenses/README.md for more details.
  2. Add the license name to the list in license_type.go.

  3. Regenerate the licenses.db file by running the license serializer:

    $ license_serializer -output licenseclassifier/licenses
    
  4. Create and run appropriate tests to verify that the license is indeed present.


This is not an official Google product (experimental or otherwise), it is just code that happens to be owned by Google.