CommitObjects is slow compared to equivalent git rev-list --all #1294
Description
I am trying to write a custom git grep wrapper using go-git
.
Essentially trying to replicate
git rev-list --all | xargs git --no-pager grep -i 'search_text'
CommitObjects()
is slow compared to the command git rev-list --all
To benchmark it, I used a big repo, (https://github.com/odoo/odoo/) with large number of commits.
I understand there would be some overheard due to creation of custom objects created to support various operations, but the current implementation of CommitObjects
is 16 times slower than the raw command.
The original strange thing I noticed that the go implementation would freeze for few seconds after reaching following commit 004a0b996ff8f269451e07346f71a129a1f3fbaf
then list out remaining ~ 18-20 commits.
main.go
package main
import (
"fmt"
"gopkg.in/src-d/go-git.v4/plumbing/object"
)
import "gopkg.in/src-d/go-git.v4"
func main() {
r, err := git.PlainOpen("odoo")
if err == nil {
bs, _ := r.CommitObjects()
bs.ForEach(func(ref *object.Commit) error {
fmt.Println(ref.Hash)
return nil
})
} else
{
fmt.Println(err.Error())
}
}
# go-git wrapper
./main 16.24s user 11.88s system 103% cpu 27.224 total
# raw command
git rev-list --all 1.67s user 0.32s system 81% cpu 2.456 total
I used Hyperfine(https://github.com/sharkdp/hyperfine) to run a more standard benchmark than the time
command and result is same.
hyperfine --min-runs 5 './main' 'git rev-list --all'
Benchmark #1: ./main
Time (mean ± σ): 28.729 s ± 2.729 s [User: 15.574 s, System: 12.378 s]
Range (min … max): 25.745 s … 32.868 s 5 runs
Benchmark #2: git rev-list --all
Time (mean ± σ): 1.413 s ± 0.163 s [User: 1.174 s, System: 0.171 s]
Range (min … max): 1.331 s … 1.704 s 5 runs
Warning: The first benchmarking run for this command was significantly slower than the rest (1.704 s). This could be caused by (filesystem) caches that were not filled until after the first run. You should consider using the '--warmup' option to fill those caches before the actual benchmark. Alternatively, use the '--prepare' option to clear the caches before each timing run.
Summary
'git rev-list --all' ran
20.33 ± 3.04 times faster than './main'
Profiling code
package main
import (
"fmt"
"github.com/pkg/profile"
"gopkg.in/src-d/go-git.v4/plumbing/object"
)
import "gopkg.in/src-d/go-git.v4"
func main() {
defer profile.Start().Stop()
r, err := git.PlainOpen("odoo")
if err == nil {
bs, _ := r.CommitObjects()
bs.ForEach(func(ref *object.Commit) error {
fmt.Println(ref.Hash)
return nil
})
} else
{
fmt.Println(err.Error())
}
}
Profile output
Am I missing something ?
Is there a more performant way of iterating commits ?
P.S. Benchmark was performed on a 2017 MBP
Model Name: MacBook Pro
Model Identifier: MacBookPro14,1
Processor Name: Dual-Core Intel Core i5
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 4 MB
Hyper-Threading Technology: Enabled
Memory: 8 GB
Activity