Closed
Description
Hi, I am currently trying to add a fuzz target to the test suite, currently I wrote this test (based on TestInfStream
)...
func FuzzStream(f *testing.F) {
testcases := []string{
`{id: 1, key: "object number 1"}`,
"hello\n \n\tworld",
}
for _, tc := range testcases {
f.Add(tc) // Use f.Add to provide a seed corpus
}
f.Fuzz(func(t *testing.T, orig string) {
origBytes := []byte(orig)
buffer := bytes.NewBuffer(origBytes)
tokenizer := New()
commaKey := TokenKey(10)
colonKey := TokenKey(11)
openKey := TokenKey(12)
closeKey := TokenKey(13)
dquoteKey := TokenKey(14)
tokenizer.DefineTokens(commaKey, []string{","})
tokenizer.DefineTokens(colonKey, []string{":"})
tokenizer.DefineTokens(openKey, []string{"{"})
tokenizer.DefineTokens(closeKey, []string{"}"})
tokenizer.DefineStringToken(dquoteKey, `"`, `"`).SetEscapeSymbol('\\')
stream := tokenizer.ParseStream(buffer, 100)
var actual []byte
for stream.IsValid() {
current := stream.CurrentToken()
// t.Logf("%#v", current)
actual = append(actual, current.Indent()...)
actual = append(actual, current.Value()...)
stream.GoNext()
}
// t.Logf("%#v", stream.CurrentToken())
// As we only concatenate the indents of each token, the trailing
// whitespaces and token separators are lost, so we trim these
// characters on the right of both actual and expected slices.
trimset := ". \t\r\n\x00"
expected := bytes.TrimRight(origBytes, trimset)
actual = bytes.TrimRight(actual, trimset)
if !bytes.Equal(expected, actual) {
t.Errorf("input:\n%q\nexpected:\n%q\nactual:\n%q", orig, expected, actual)
}
})
}
...and ran it using go test -fuzz FuzzStream
, which gave me the following output:
fuzz: elapsed: 0s, gathering baseline coverage: 0/120 completed
failure while testing seed corpus entry: FuzzStream/22babe8c5ca7133b
fuzz: elapsed: 0s, gathering baseline coverage: 0/120 completed
--- FAIL: FuzzStream (0.03s)
--- FAIL: FuzzStream (0.00s)
stream_test.go:491: input:
"0E"
expected:
"0E"
actual:
"0"
FAIL
exit status 1
FAIL github.com/bzick/tokenizer 0.042s
It seems to me that the result is indeed a bug in tokenizer, as the E
character is completely ignored, but I am not exactly sure how it should be handled. Should it be a single token {"0E"}
or two tokens {"0", "E"}
Activity