Adds support to open local repositories and to use file-based object storage (#55)

* remove some comments

* idx writer/reader

* Shut up ssh tests, they are annoying

* Add file scheme test to clients

* Add dummy file client

* Add test fot file client

* Make tests use fixture endpoint

* add parser for packed-refs format

* add parser for packed-refs format

* WIP adding dir.Refs() tests

* Add test for fixture refs

* refs parser for the refs directory

* Documentation

* Add Capabilities to file client

* tgz.Exatract now accpets a path instead of a Reader

* fix bug in idxfile fanout calculation

* remove dead code

* packfile documentation

* clean packfile parser code

* add core.Object.Content() and returns errors for core.ObjectStorage.Iter()

* add seekable storage

* add dir repos to NewRepository

* clean prints

* Add dir client documentation to README

* Organize the README

* README

* Clean tgz package

* Clean temp dirs after tgz tests

* Gometalinter on gitdir

* Clean pattern function

* metalinter tgz

* metalinter gitdir

* gitdir coverage and remove seekable packfile filedescriptor leak

* gitdir Idxfile tests and remove file descriptor leak

* gitdir Idxfile tests when no idx is found

* clean storage/seekable/internal/index and some formats/idxfile API issues

* clean storage/seekable

* clean formats/idx

* turn packfile/doc.go into packfile/doc.txt

* move formats/packfile/reader to decoder

* fix packfile decoder error names

* improve documentation

* comment packfile decoder errors

* comment public API (format/packfile)

* remve duplicated code in packfile decoder test

* move tracking_reader into an internal package and clean it

* use iota for packfile format

* rename packfile parse.go to packfile object_at.go

* clean packfile deltas

* fix delta header size bug

* improve delta documentation

* clean packfile deltas

* clean packfiles deltas

* clean repository.go

* Remove go 1.5 from Travis CI

Because go 1.5 does not suport internal packages.

* change local repo scheme to local://

* change "local://" to "file://" as the local scheme

* fix broken indentation

* shortens names of variables in short scopes

* more shortening of variable names

* more shortening of variable names

* Rename git dir client to "file", as the scheme used for it

* Fix file format ctor name, now that the package name has change

* Sortcut local repo constructor to not use remotes

The object storage is build directly in the repository ctor, instead
of creating a remote and waiting for the user to pull it.

* update README and fix some errors in it

* remove file scheme client

* Local respositories has now a new ctor

This is, they are no longer identified by the scheme of the URL, but are
created different from inception.

* remove unused URL field form Repository

* move all git dir logic to seekable sotrage ctor

* fix documentation

* Make formats/file/dir an internal package to storage/seekable

* change package storage/seekable to storage/fs

* clean storage/fs

* overall storage/fs clean

* more cleaning

* some metalinter fixes

* upgrade cshared to last changes

* remove dead code

* fix test error info

* remove file scheme check from clients

* fix test error message

* fix test error message

* fix error messages

* style changes

* fix comments everywhere

* style changes

* style changes

* scaffolding and tests for local packfiles without ifx files

* outsource index building from packfile to the packfile decoder

* refactor packfile header reading into a new function

* move code to generate index from packfile back to index package

* add header parsing

* fix documentation errata

* add undeltified and OFS delta support for index building from the packfile

* add tests for packfile with ref-deltas

* support for packfiles with ref-deltas and no idx

* refactor packfile format parser to reuse code

* refactor packfile format parser to reuse code

* refactor packfile format parser to reuse code

* refactor packfile format parser to reuse code

* refactor packfile format parser to reuse code

* WIP refactor packfile format parser to reuse code

* refactor packfile format parser to reuse code

* remove prints from tests

* remove prints from tests

* refactor packfile.core into packfile.parser

* rename packfile reader to something that shows it is a recaller

* rename cannot recall error

* rename packfile.Reader to packfile.ReadRecaller and document

* speed up test by using StreamReader instead of SeekableReader when possible

* clean packfile StreamReader

* stream_reader tests

* refactor packfile.StreamReader into packfile.StreamReadRecaller

* refactor packfile.SeekableReader into packfile.SeekableReadRecaller and document it

* generalize packfile.StreamReadRecaller test to all packfile.ReadRecaller implementations

* speed up storage/fs tests

* speed up tests in . by loading packfiles in memory

* speed up repository tests by using and smaller fixture

* restore doc.go files

* rename packfile.ReadRecaller implementations to shorter names

* update comments to type changes

* packfile.Parser test (WIP)

* packfile.Parser tests and add ForgetAll() to packfile.ReadRecaller

* add test for packfile.ReadRecaller.ForgetAll()

* clarify seekable being able to recallByOffset forgetted objects

* use better names for internal maps

* metalinter packfile package

* speed up some tests

* documentation fixes

* change storage.fs package name to storage.proxy to avoid confusion with new filesystem support

* New fs package and os transparent implementation

Now NewRepositoryFromFS receives a fs and a path and tests are
modified accordingly, but it is still not using for anything.

* add fs to gitdir and proxy.store

* reduce fs interface for easier implementation

* remove garbage dirs from tgz tests

* change file name gitdir/dir.go to gitdir/gitdir.go

* fs.OS tests

* metalinter utils/fs

* add NewRepositoryFromFS documentation to README

* Readability fixes to README

* move tgz to an external dependency

* move filesystem impl. example to example dir

* rename proxy/store.go to proxy/storage.go for coherence with memory/storage.go

* rename proxy package to seekable
58 files changed
tree: c0e7eb355c9b8633d99bab9295cb72b6c3a9c0e1
  1. clients/
  2. core/
  3. cshared/
  4. diff/
  5. examples/
  6. formats/
  7. storage/
  8. utils/
  9. .travis.yml
  10. blame.go
  11. blame_test.go
  12. commit.go
  13. commit_test.go
  14. common.go
  15. common_test.go
  16. doc.go
  17. file.go
  18. file_test.go
  19. LICENSE
  20. Makefile
  21. objects.go
  22. objects_test.go
  23. README.md
  24. references.go
  25. references_test.go
  26. remote.go
  27. remote_test.go
  28. repository.go
  29. repository_test.go
  30. tag.go
  31. tag_test.go
  32. tree.go
  33. tree_test.go
  34. tree_walker.go
  35. tree_walker_test.go
README.md

go-git GoDoc Build Status codecov.io codebeat badge

A low level and highly extensible git client library for reading repositories from git servers. It is written in Go from scratch, without any C dependencies.

We have been following the open/close principle in its design to facilitate extensions.

go-git does not claim to be a replacement of git2go as its approach and functionality is quite different.

ok, but why? ...

At source{d} we analyze almost all the public open source contributions made to git repositories in the world.

We want to extract detailed information from each GitHub repository, which requires downloading repository packfiles and analyzing them: extracting their code, authors, dates and the languages and ecosystems they use. We are also interested in knowing who contributes to what, so we can tell top contributors from the more casual ones.

You can obtain all this information using the standard git command running over a local clone of a repository, but this simple solution does not scale well over millions of repositories: we want to avoid having local copies of the unpacked repositories in a regular file system; go-git allows us to work with an in-memory representation of repositories instead.

I see... but this is production ready?

Yes!!!, we have been using go-git at source{d} since August 2015 to analyze all GitHub public repositories (i.e. 16M of repositories).

Coming Soon

Blame support: right now we are using a forward version of a line-tracking algorithm and we are having some problems handling merges. The plan is to get merges right and change to a backward line-tracking algorithm soon.

Installation

The recommended way to install go-git is:

go get -u gopkg.in/src-d/go-git.v3/...

Examples

Retrieving the commits for a given repository:

r, err := git.NewRepository("https://github.com/src-d/go-git", nil)
if err != nil {
	panic(err)
}

if err := r.PullDefault(); err != nil {
	panic(err)
}

iter, err := r.Commits()
if err != nil {
	panic(err)
}
defer iter.Close()

for {
	//the commits are not shorted in any special order
	commit, err := iter.Next()
	if err != nil {
		if err == io.EOF {
			break
		}

		panic(err)
	}

	fmt.Println(commit)
}

Outputs:

commit 2275fa7d0c75d20103f90b0e1616937d5a9fc5e6
Author: Máximo Cuadros <mcuadros@gmail.com>
Date:   2015-10-23 00:44:33 +0200 +0200

commit 35b585759cbf29f8ec428ef89da20705d59f99ec
Author: Carlos Cobo <toqueteos@gmail.com>
Date:   2015-05-20 15:21:37 +0200 +0200

commit 7e3259c191a9de23d88b6077dcb1cd427e925432
Author: Alberto Cortés <alberto@sourced.tech>
Date:   2016-01-21 03:29:57 +0100 +0100

commit 24b8ae50db91f3909b11304014564bffc6fdee79
Author: Alberto Cortés <alberto@sourced.tech>
Date:   2015-12-11 17:57:10 +0100 +0100
...

Retrieving the latest commit for a given repository:

r, err := git.NewRepository("https://github.com/src-d/go-git", nil)
if err != nil {
	panic(err)
}

if err := r.PullDefault(); err != nil {
	panic(err)
}

hash, err := r.Remotes[git.DefaultRemoteName].Head()
if err != nil {
	panic(err)
}

commit, err := r.Commit(hash)
if err != nil {
	panic(err)
}

fmt.Println(commit)

Creating a repository from an ordinary local git directory (that has been previously prepared by running git gc on it).

// Download any git repository and prepare it as as follows:
//
//   $ git clone https://github.com/src-d/go-git /tmp/go-git
//   $ pushd /tmp/go-git ; git gc ; popd
//
// Then, create a go-git repository from the local content
// and print its commits as follows:

package main

import (
	"fmt"
	"io"

	"gopkg.in/src-d/go-git.v3"
	"gopkg.in/src-d/go-git.v3/utils/fs"
)

func main() {
	fs := fs.NewOS() // a simple proxy for the local host filesystem
	path := "/tmp/go-git/.git"

	repo, err := git.NewRepositoryFromFS(fs, path)
	if err != nil {
		panic(err)
	}

	iter, err := repo.Commits()
	if err != nil {
		panic(err)
	}
	defer iter.Close()

	for {
		commit, err := iter.Next()
		if err != nil {
			if err == io.EOF {
				break
			}
			panic(err)
		}

		fmt.Println(commit)
	}
}

Implementing your own filesystem will let you access repositories stored on remote services (e.g. amazon S3), see the examples directory for a simple filesystem implementation and usage.

Wrapping

go-git can be wrapped into any language which supports shared library interop. Python wrapper already exists. This is provided by “cshared” cgo files which can be built with go build -o libgogit.so -buildmode=c-shared github.com/src-d/go-git/cshared.

Acknowledgements

The earlier versions of the packfile reader are based on git-chain, project done by @yrashk

License

MIT, see LICENSE